CINXE.COM
Spock Quick-Start Guide — OLCF User Documentation
<!DOCTYPE html> <html class="writer-html5" lang="en" data-content_root="../"> <head> <meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>Spock Quick-Start Guide — OLCF User Documentation</title> <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=b86133f3" /> <link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=e59714d7" /> <link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" /> <link rel="stylesheet" type="text/css" href="../_static/sphinx-design.min.css?v=95c83b7e" /> <link rel="stylesheet" type="text/css" href="../_static/css/theme_overrides.css?v=b668e930" /> <link rel="shortcut icon" href="../_static/favicon.ico"/> <link rel="canonical" href="https://docs.olcf.ornl.govsystems/spock_quick_start_guide.html"/> <script src="../_static/jquery.js?v=5d32c60e"></script> <script src="../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script> <script src="../_static/documentation_options.js?v=5929fcd5"></script> <script src="../_static/doctools.js?v=9a2dae69"></script> <script src="../_static/sphinx_highlight.js?v=dc90522c"></script> <script src="../_static/clipboard.min.js?v=a7894cd8"></script> <script src="../_static/copybutton.js?v=99362dbf"></script> <script src="../_static/design-tabs.js?v=f930bc37"></script> <script src="../_static/js/custom.js?v=1617c1c8"></script> <script src="../_static/js/theme.js"></script> <link rel="index" title="Index" href="../genindex.html" /> <link rel="search" title="Search" href="../search.html" /> <link rel="next" title="Crusher Quick-Start Guide" href="crusher_quick_start_guide.html" /> <link rel="prev" title="Odo" href="odo_user_guide.html" /> </head> <body class="wy-body-for-nav"> <div class="wy-grid-for-nav"> <nav data-toggle="wy-nav-shift" class="wy-nav-side"> <div class="wy-side-scroll"> <div class="wy-side-nav-search" style="background: #efefef" > <a href="../index.html"> <img src="../_static/olcf_logo.png" class="logo" alt="Logo"/> </a> <div role="search"> <form id="rtd-search-form" class="wy-form" action="../search.html" method="get"> <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </div> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> <ul> <li class="toctree-l1"><a class="reference internal" href="../support/index.html">Contact & Support</a><ul> <li class="toctree-l2"><a class="reference internal" href="../support/index.html#olcf-user-assistance-center">OLCF User Assistance Center</a></li> <li class="toctree-l2"><a class="reference internal" href="../support/index.html#authentication-support">Authentication Support</a></li> <li class="toctree-l2"><a class="reference internal" href="../support/index.html#olcf-office-hours">OLCF Office Hours</a></li> <li class="toctree-l2"><a class="reference internal" href="../support/index.html#communication-to-users">Communication to Users</a></li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../accounts/index.html">Accounts and Projects</a><ul> <li class="toctree-l2"><a class="reference internal" href="../accounts/accounts_and_projects.html">Request a New Allocation</a><ul> <li class="toctree-l3"><a class="reference internal" href="../accounts/accounts_and_projects.html#what-are-the-differences-between-project-types">What are the differences between project types?</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/accounts_and_projects.html#what-happens-after-a-project-request-is-approved">What happens after a project request is approved?</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/accounts_and_projects.html#guidance-on-frontier-allocation-requests">Guidance on Frontier Allocation Requests</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../accounts/accounts_and_projects.html#applying-for-a-user-account">Applying for a user account</a></li> <li class="toctree-l2"><a class="reference internal" href="../accounts/accounts_and_projects.html#checking-the-status-of-your-application">Checking the status of your application</a></li> <li class="toctree-l2"><a class="reference internal" href="../accounts/accounts_and_projects.html#get-access-to-additional-projects">Get access to additional projects</a></li> <li class="toctree-l2"><a class="reference internal" href="../accounts/frequently_asked_questions.html">Frequently Asked Questions</a><ul> <li class="toctree-l3"><a class="reference internal" href="../accounts/frequently_asked_questions.html#how-do-i-apply-for-an-account">How do I apply for an account?</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/frequently_asked_questions.html#what-is-the-status-of-my-application">What is the status of my application?</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/frequently_asked_questions.html#how-should-i-acknowledge-the-olcf-in-my-publications-and-presentations">How should I acknowledge the OLCF in my publications and presentations?</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/frequently_asked_questions.html#what-is-a-subproject">What is a subproject?</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/frequently_asked_questions.html#i-no-longer-need-my-account-who-should-i-inform-and-what-should-i-do-with-my-olcf-issued-rsa-securid-token">I no longer need my account. Who should I inform and what should I do with my OLCF issued RSA SecurID token?</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/frequently_asked_questions.html#my-securid-token-is-broken-expired-what-should-i-do">My SecurID token is broken/expired. What should I do?</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/frequently_asked_questions.html#getting-help">Getting Help</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/frequently_asked_questions.html#additional-resources">Additional Resources</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../accounts/documents_and_forms.html">Documents and Forms</a><ul> <li class="toctree-l3"><a class="reference internal" href="../accounts/documents_and_forms.html#forms-for-requesting-a-project-allocation">Forms for Requesting a Project Allocation</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/documents_and_forms.html#forms-for-requesting-an-account">Forms for Requesting an Account</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/documents_and_forms.html#forms-to-request-changes-to-computers-jobs-or-accounts">Forms to Request Changes to Computers, Jobs, or Accounts</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/documents_and_forms.html#report-templates">Report Templates</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/documents_and_forms.html#miscellaneous-forms">Miscellaneous Forms</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../accounts/olcf_policy_guide.html">OLCF Policy Guides</a><ul> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#olcf-acknowledgement">OLCF Acknowledgement</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#software-requests">Software Requests</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#special-requests-and-policy-exemptions">Special Requests and Policy Exemptions</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#computing-policy">Computing Policy</a><ul> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#computer-use">Computer Use</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#data-use">Data Use</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#software-use">Software Use</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#user-accountability">User Accountability</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#data-management-policy">Data Management Policy</a><ul> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#introduction">Introduction</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#data-retention-purge-quotas">Data Retention, Purge, & Quotas</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#data-prohibitions-safeguards">Data Prohibitions & Safeguards</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#software">Software</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#security-policy">Security Policy</a><ul> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#scope">Scope</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#personal-use">Personal Use</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#accessing-olcf-computational-resources">Accessing OLCF Computational Resources</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#data-management">Data Management</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#sensitive-data">Sensitive Data</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#data-transfer">Data Transfer</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#incite-allocation-under-utilization-policy">INCITE Allocation Under-utilization Policy</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#project-reporting-policy">Project Reporting Policy</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#non-proprietary-institutional-user-agreement-policy">Non-proprietary Institutional User Agreement Policy</a><ul> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#access">Access</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#rules-and-regulations">Rules and Regulations</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#safety-and-health">Safety and Health</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#intent-to-publish">Intent to Publish</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#export-control">Export Control</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#intellectual-property">Intellectual Property</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#hipaa-itar-project-rules-of-behavior-policy">HIPAA/ITAR Project Rules of Behavior Policy</a></li> <li class="toctree-l3"><a class="reference internal" href="../accounts/olcf_policy_guide.html#user-managed-software-ums-policy">User-Managed Software (UMS) Policy</a><ul> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#purpose">Purpose</a></li> <li class="toctree-l4"><a class="reference internal" href="../accounts/olcf_policy_guide.html#policies">Policies</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../accounts/glossary.html">Glossary</a></li> <li class="toctree-l2"><a class="reference internal" href="../accounts/index.html#additional-resources">Additional Resources</a></li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../connecting/index.html">Connecting</a><ul> <li class="toctree-l2"><a class="reference internal" href="../connecting/index.html#connecting-for-the-first-time">Connecting for the first time</a></li> <li class="toctree-l2"><a class="reference internal" href="../connecting/index.html#activating-a-new-securid-fob">Activating a new SecurID fob</a></li> <li class="toctree-l2"><a class="reference internal" href="../connecting/index.html#pins-passcodes-and-tokencodes">PINs, Passcodes, and Tokencodes</a></li> <li class="toctree-l2"><a class="reference internal" href="../connecting/index.html#x11-forwarding">X11 Forwarding</a></li> <li class="toctree-l2"><a class="reference internal" href="../connecting/index.html#id2">Systems Available to All Projects</a></li> <li class="toctree-l2"><a class="reference internal" href="../connecting/index.html#olcf-system-hostnames">OLCF System Hostnames</a></li> <li class="toctree-l2"><a class="reference internal" href="../connecting/index.html#starting-a-tmux-session">Starting a Tmux Session</a></li> <li class="toctree-l2"><a class="reference internal" href="../connecting/index.html#checking-system-availability">Checking System Availability</a></li> </ul> </li> </ul> <ul class="current"> <li class="toctree-l1 current"><a class="reference internal" href="index.html">Systems</a><ul class="current"> <li class="toctree-l2"><a class="reference internal" href="2024_olcf_system_changes.html">2024 Notable System Changes</a><ul> <li class="toctree-l3"><a class="reference internal" href="2024_olcf_system_changes.html#hpss-decommission-and-kronos-availability">HPSS Decommission and Kronos Availability</a><ul> <li class="toctree-l4"><a class="reference internal" href="2024_olcf_system_changes.html#late-july-2024-kronos-available">Late July 2024 - Kronos available</a></li> <li class="toctree-l4"><a class="reference internal" href="2024_olcf_system_changes.html#august-30-2024-hpss-becomes-read-only">August 30, 2024 - HPSS becomes read-only</a></li> <li class="toctree-l4"><a class="reference internal" href="2024_olcf_system_changes.html#january-31-2025-hpss-decommissioned">January 31, 2025 - HPSS decommissioned</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="2024_olcf_system_changes.html#summit-and-alpine2-decommissions">Summit and Alpine2 Decommissions</a><ul> <li class="toctree-l4"><a class="reference internal" href="2024_olcf_system_changes.html#november-15-2024-summit-decommissioned">November 15, 2024 - Summit decommissioned</a></li> <li class="toctree-l4"><a class="reference internal" href="2024_olcf_system_changes.html#november-19-2024-alpine2-read-only">November 19, 2024 - Alpine2 read-only</a></li> <li class="toctree-l4"><a class="reference internal" href="2024_olcf_system_changes.html#january-31-2025-alpine2-decommissioned">January 31, 2025 - Alpine2 decommissioned</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="frontier_user_guide.html">Frontier User Guide</a><ul> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#system-overview">System Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#frontier-compute-nodes">Frontier Compute Nodes</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#node-types">Node Types</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#system-interconnect">System Interconnect</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#file-systems">File Systems</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#operating-system">Operating System</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#gpus">GPUs</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#connecting">Connecting</a></li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#data-and-storage">Data and Storage</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#transition-from-alpine-to-orion">Transition from Alpine to Orion</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#lfs-setstripe-wrapper">LFS setstripe wrapper</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#nfs-filesystem">NFS Filesystem</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#lustre-filesystem">Lustre Filesystem</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#kronos-archival-storage">Kronos Archival Storage</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#nvme">NVMe</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#nvme-usage">NVMe Usage</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#using-globus-to-move-data-to-and-from-orion">Using Globus to Move Data to and from Orion</a></li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#amd-gpus">AMD GPUs</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#amd-vs-nvidia-terminology">AMD vs NVIDIA Terminology</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#blocks-workgroups-threads-work-items-grids-wavefronts">Blocks (workgroups), Threads (work items), Grids, Wavefronts</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#the-compute-unit">The Compute Unit</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#hip">HIP</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#things-to-remember-when-programming-for-amd-gpus">Things To Remember When Programming for AMD GPUs</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#programming-environment">Programming Environment</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#environment-modules-lmod">Environment Modules (Lmod)</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#compilers">Compilers</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#mpi">MPI</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#compiling">Compiling</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id4">Compilers</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id7">MPI</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#openmp">OpenMP</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#openmp-gpu-offload">OpenMP GPU Offload</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#openacc">OpenACC</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id9">HIP</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#hip-openmp-cpu-threading">HIP + OpenMP CPU Threading</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#sycl">SYCL</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#running-jobs">Running Jobs</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#login-vs-compute-nodes">Login vs Compute Nodes</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#simplified-node-layout">Simplified Node Layout</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#slurm">Slurm</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#batch-scripts">Batch Scripts</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#interactive-jobs">Interactive Jobs</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#common-slurm-options">Common Slurm Options</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#slurm-environment-variables">Slurm Environment Variables</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#job-states">Job States</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#job-reason-codes">Job Reason Codes</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#scheduling-policy">Scheduling Policy</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#job-dependencies">Job Dependencies</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#monitoring-and-modifying-batch-jobs">Monitoring and Modifying Batch Jobs</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#srun">Srun</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#process-and-thread-mapping-examples">Process and Thread Mapping Examples</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#ensemble-jobs">Ensemble Jobs</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#tips-for-launching-at-scale">Tips for Launching at Scale</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#software">Software</a></li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#debugging">Debugging</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#linaro-ddt">Linaro DDT</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#gdb">GDB</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#valgrind4hpc">Valgrind4hpc</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#omnitrace">Omnitrace</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#profiling-applications">Profiling Applications</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#getting-started-with-the-hpe-performance-analysis-tools-pat">Getting Started with the HPE Performance Analysis Tools (PAT)</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#getting-started-with-hpctoolkit">Getting Started with HPCToolkit</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#getting-started-with-the-rocm-profiler">Getting Started with the ROCm Profiler</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#roofline-profiling-with-the-rocm-profiler">Roofline Profiling with the ROCm Profiler</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#omniperf">Omniperf</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#tips-and-tricks">Tips and Tricks</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#using-reduced-precision-fp16-and-bf16-datatypes">Using reduced precision (FP16 and BF16 datatypes)</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#enabling-gpu-page-migration">Enabling GPU Page Migration</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#floating-point-fp-atomic-operations-and-coarse-fine-grained-memory-allocations">Floating-Point (FP) Atomic Operations and Coarse/Fine Grained Memory Allocations</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#performance-considerations-for-lds-fp-atomicadd">Performance considerations for LDS FP atomicAdd()</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#library-considerations-with-atomic-operations">Library considerations with atomic operations</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#system-updates">System Updates</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id17">2025-02-18</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id18">2025-01-14</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id19">2024-11-12</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id20">2024-09-03</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id21">2024-08-20</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id22">2024-07-16</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id23">2024-04-17</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id24">2024-03-19</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id25">2024-01-23</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id26">2023-12-05</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id27">2023-10-03</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id28">2023-09-19</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id29">2023-07-18</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#id30">2023-05-09</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="frontier_user_guide.html#known-issues">Known Issues</a><ul> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#open-issues">Open Issues</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#open-issues-w-workaround">Open Issues w/Workaround</a></li> <li class="toctree-l4"><a class="reference internal" href="frontier_user_guide.html#resolved-issues">Resolved Issues</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="citadel_user_guide.html">Citadel User Guide</a><ul> <li class="toctree-l3"><a class="reference internal" href="citadel_user_guide.html#what-is-citadel">What is Citadel</a><ul> <li class="toctree-l4"><a class="reference internal" href="citadel_user_guide.html#citadel-spi-documentation">Citadel (SPI) Documentation</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="andes_user_guide.html">Andes User Guide</a><ul> <li class="toctree-l3"><a class="reference internal" href="andes_user_guide.html#system-overview">System Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#compute-nodes">Compute Nodes</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#login-nodes">Login Nodes</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#file-systems">File Systems</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#lfs-setstripe-wrapper">LFS setstripe Wrapper</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="andes_user_guide.html#shell-and-programming-environments">Shell and Programming Environments</a><ul> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#default-shell">Default shell</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#environment-management-with-lmod">Environment Management with lmod</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#installed-software">Installed Software</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="andes_user_guide.html#compiling">Compiling</a><ul> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#available-compilers">Available Compilers</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#changing-compilers">Changing Compilers</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#compiler-wrappers">Compiler Wrappers</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#compiling-threaded-codes">Compiling Threaded Codes</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="andes_user_guide.html#running-jobs">Running Jobs</a><ul> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#login-vs-compute-nodes-on-commodity-clusters">Login vs Compute Nodes on Commodity Clusters</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#slurm">Slurm</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#interactive-batch-jobs-on-commodity-clusters">Interactive Batch Jobs on Commodity Clusters</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#common-batch-options-to-slurm">Common Batch Options to Slurm</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#batch-environment-variables">Batch Environment Variables</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#modifying-batch-jobs">Modifying Batch Jobs</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#monitoring-batch-jobs">Monitoring Batch Jobs</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#job-execution">Job Execution</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#batch-queues-on-andes">Batch Queues on Andes</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#job-accounting-on-andes">Job Accounting on Andes</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="andes_user_guide.html#andes-debugging">Debugging</a><ul> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#linaro-ddt">Linaro DDT</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#gdb">GDB</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#valgrind">Valgrind</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="andes_user_guide.html#visualization-tools">Visualization tools</a><ul> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#paraview">ParaView</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#visit">VisIt</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#remote-visualization-using-vnc-non-gpu">Remote Visualization using VNC (non-GPU)</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#remote-visualization-using-vnc-gpu-nodes">Remote Visualization using VNC (GPU nodes)</a></li> <li class="toctree-l4"><a class="reference internal" href="andes_user_guide.html#remote-visualization-using-nice-dcv-gpu-nodes-only">Remote Visualization using Nice DCV (GPU nodes only)</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="home_user_guide.html">Home</a><ul> <li class="toctree-l3"><a class="reference internal" href="home_user_guide.html#system-overview">System Overview</a></li> <li class="toctree-l3"><a class="reference internal" href="home_user_guide.html#access-connecting">Access & Connecting</a></li> <li class="toctree-l3"><a class="reference internal" href="home_user_guide.html#usage">Usage</a><ul> <li class="toctree-l4"><a class="reference internal" href="home_user_guide.html#acceptable-tasks">Acceptable Tasks</a></li> <li class="toctree-l4"><a class="reference internal" href="home_user_guide.html#unacceptable-tasks">Unacceptable Tasks</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="dtn_user_guide.html">Data Transfer Nodes (DTNs)</a><ul> <li class="toctree-l3"><a class="reference internal" href="dtn_user_guide.html#system-overview">System Overview</a></li> <li class="toctree-l3"><a class="reference internal" href="dtn_user_guide.html#interactive-access">Interactive Access</a></li> <li class="toctree-l3"><a class="reference internal" href="dtn_user_guide.html#access-from-globus-online">Access From Globus Online</a></li> <li class="toctree-l3"><a class="reference internal" href="dtn_user_guide.html#batch-queue-slurm">Batch Queue (Slurm)</a><ul> <li class="toctree-l4"><a class="reference internal" href="dtn_user_guide.html#queue-policy">Queue Policy</a></li> <li class="toctree-l4"><a class="reference internal" href="dtn_user_guide.html#submitting-jobs-to-frontier">Submitting jobs to Frontier</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="hpss_user_guide.html">High Performance Storage System</a><ul> <li class="toctree-l3"><a class="reference internal" href="hpss_user_guide.html#system-overview">System Overview</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="odo_user_guide.html">Odo</a><ul> <li class="toctree-l3"><a class="reference internal" href="odo_user_guide.html#system-overview">System Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="odo_user_guide.html#file-systems">File Systems</a></li> <li class="toctree-l4"><a class="reference internal" href="odo_user_guide.html#obtaining-access-to-odo">Obtaining Access to Odo</a></li> <li class="toctree-l4"><a class="reference internal" href="odo_user_guide.html#logging-in-to-odo">Logging In to Odo</a></li> </ul> </li> </ul> </li> <li class="toctree-l2 current"><a class="current reference internal" href="#">Spock Quick-Start Guide</a><ul> <li class="toctree-l3"><a class="reference internal" href="#system-overview">System Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="#spock-compute-nodes">Spock Compute Nodes</a></li> <li class="toctree-l4"><a class="reference internal" href="#system-interconnect">System Interconnect</a></li> <li class="toctree-l4"><a class="reference internal" href="#file-systems">File Systems</a></li> <li class="toctree-l4"><a class="reference internal" href="#gpus">GPUs</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="#connecting">Connecting</a></li> <li class="toctree-l3"><a class="reference internal" href="#data-and-storage">Data and Storage</a><ul> <li class="toctree-l4"><a class="reference internal" href="#nfs">NFS</a></li> <li class="toctree-l4"><a class="reference internal" href="#gpfs">GPFS</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="#programming-environment">Programming Environment</a><ul> <li class="toctree-l4"><a class="reference internal" href="#environment-modules-lmod">Environment Modules (Lmod)</a></li> <li class="toctree-l4"><a class="reference internal" href="#compilers">Compilers</a></li> <li class="toctree-l4"><a class="reference internal" href="#mpi">MPI</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="#compiling">Compiling</a><ul> <li class="toctree-l4"><a class="reference internal" href="#id3">MPI</a></li> <li class="toctree-l4"><a class="reference internal" href="#openmp">OpenMP</a></li> <li class="toctree-l4"><a class="reference internal" href="#openmp-gpu-offload">OpenMP GPU Offload</a></li> <li class="toctree-l4"><a class="reference internal" href="#hip">HIP</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="#running-jobs">Running Jobs</a><ul> <li class="toctree-l4"><a class="reference internal" href="#slurm-workload-manager">Slurm Workload Manager</a></li> <li class="toctree-l4"><a class="reference internal" href="#slurm-compute-node-partitions">Slurm Compute Node Partitions</a></li> <li class="toctree-l4"><a class="reference internal" href="#process-and-thread-mapping">Process and Thread Mapping</a></li> <li class="toctree-l4"><a class="reference internal" href="#nvme-usage">NVMe Usage</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="#getting-help">Getting Help</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="crusher_quick_start_guide.html">Crusher Quick-Start Guide</a><ul> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#system-overview">System Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#crusher-compute-nodes">Crusher Compute Nodes</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#system-interconnect">System Interconnect</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#file-systems">File Systems</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#gpus">GPUs</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#connecting">Connecting</a></li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#data-and-storage">Data and Storage</a><ul> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#nfs-filesystem">NFS Filesystem</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#lustre-filesystem">Lustre Filesystem</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#programming-environment">Programming Environment</a><ul> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#environment-modules-lmod">Environment Modules (Lmod)</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#compilers">Compilers</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#mpi">MPI</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#compiling">Compiling</a><ul> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id3">MPI</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#openmp">OpenMP</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#openmp-gpu-offload">OpenMP GPU Offload</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#hip">HIP</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#hip-openmp-cpu-threading">HIP + OpenMP CPU Threading</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#sycl">SYCL</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#running-jobs">Running Jobs</a><ul> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#slurm-workload-manager">Slurm Workload Manager</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#slurm-compute-node-partitions">Slurm Compute Node Partitions</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#process-and-thread-mapping">Process and Thread Mapping</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#nvme-usage">NVMe Usage</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#tips-for-launching-at-scale">Tips for Launching at Scale</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#profiling-applications">Profiling Applications</a><ul> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#getting-started-with-the-hpe-performance-analysis-tools-pat">Getting Started with the HPE Performance Analysis Tools (PAT)</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#getting-started-with-hpctoolkit">Getting Started with HPCToolkit</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#getting-started-with-the-rocm-profiler">Getting Started with the ROCm Profiler</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#roofline-profiling-with-the-rocm-profiler">Roofline Profiling with the ROCm Profiler</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#omnitrace">Omnitrace</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#omniperf">Omniperf</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#notable-differences-between-summit-and-crusher">Notable Differences between Summit and Crusher</a><ul> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#using-reduced-precision-fp16-and-bf16-datatypes">Using reduced precision (FP16 and BF16 datatypes)</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#enabling-gpu-page-migration">Enabling GPU Page Migration</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#floating-point-fp-atomic-operations-and-coarse-fine-grained-memory-allocations">Floating-Point (FP) Atomic Operations and Coarse/Fine Grained Memory Allocations</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#performance-considerations-for-lds-fp-atomicadd">Performance considerations for LDS FP atomicAdd()</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#system-updates">System Updates</a><ul> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id6">2024-03-19</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id7">2024-01-23</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id8">2023-12-05</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id9">2023-10-03</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id10">2023-09-19</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id11">2023-07-18</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id12">2023-04-05</a></li> <li class="toctree-l4"><a class="reference internal" href="crusher_quick_start_guide.html#id13">2022-12-29</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="crusher_quick_start_guide.html#getting-help">Getting Help</a></li> </ul> </li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../services_and_applications/index.html">Services and Applications</a><ul> <li class="toctree-l2"><a class="reference internal" href="../services_and_applications/slate/index.html">Slate</a><ul> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/overview.html">Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/overview.html#what-is-slate">What is Slate?</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/overview.html#what-is-kubernetes">What is Kubernetes?</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/overview.html#what-is-openshift">What is OpenShift?</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/getting_started.html">Getting Started</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/getting_started.html#requesting-a-slate-project-allocation">Requesting A Slate Project Allocation</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/getting_started.html#logging-in">Logging in</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/getting_started.html#slate-namespaces">Slate Namespaces</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/getting_started.html#install-the-oc-tool">Install the OC tool</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/getting_started.html#test-login-with-oc-tool">Test login with OC Tool</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/guided_tutorial.html">Guided Tutorial</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/guided_tutorial.html#creating-your-project">Creating your project</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/guided_tutorial.html#guided-web-gui-tutorial">Guided Web GUI Tutorial</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/guided_tutorial_cli.html">Guided Tutorial: CLI</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/guided_tutorial_cli.html#adding-a-pod-to-your-project">Adding a Pod to your Project</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/image_building.html">Image Building</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/image_building.html#build-types">Build Types</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/image_building.html#examples">Examples</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/image_building.html#logging-into-the-registry-externally">Logging into the Registry Externally</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/workloads/index.html">Workloads</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/workloads/pod.html">Pods</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/workloads/deployment.html">Deployments</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/networking/index.html">Networking</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/networking/services.html">Services</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/networking/nodeport.html">NodePorts</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/networking/route.html">Routes</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/networking/networkpolicy.html">Network Policies</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/networking/port_forwarding.html">Quick Access from Outside Slate</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/storage.html">Persistent Storage</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/storage.html#creating-a-persistent-volume-claim">Creating A Persistent Volume Claim</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/storage.html#adding-pvc-to-pod">Adding PVC To Pod</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/storage.html#backups">Backups</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/workflows/index.html">Workflows</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/workflows/overview.html">Workflows Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/workflows/openshift_gitops.html">OpenShift GitOps</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/workflows/openshift_pipelines.html">OpenShift Pipelines</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/use_cases/index.html">Application Deployment Examples</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/use_cases/simple_website.html">Build and Deploy Simple Website</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/use_cases/mongodb_service.html">Deploy MongoDB</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/use_cases/nginx_hello_world.html">Deploy NGINX with Hello World</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/use_cases/gitlab_runner.html">GitLab Runners</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/use_cases/helm_example.html">Deploy Packages with Helm</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/use_cases/helm_prerequisite.html">Helm Prerequisites</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/use_cases/minio.html">MinIO Object Store (On an NCCS Filesystem)</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/access_olcf_resources/index.html">Access OLCF Resources From Containers</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/access_olcf_resources/job_submit.html">Batch Job Submission</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/access_olcf_resources/mount_fs.html">Mount OLCF Filesystems</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/other_resources.html">Schedule Other Slate Resources</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/other_resources.html#gpus">GPUs</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/olcf_provided_applications/index.html">OLCF-Provided Applications on Slate</a><ul class="simple"> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/troubleshooting/index.html">Troubleshooting</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/troubleshooting/fix-writable-directories.html">Fix Container Image Permissions</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/troubleshooting/debugging.html">Debugging</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/examples.html">YAML Object Quick Reference</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/examples.html#cronjobs">CronJobs</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/examples.html#deployments-and-stateful-sets">Deployments and Stateful Sets</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/examples.html#pods">Pods</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/examples.html#roles-and-rolebindings">Roles and Rolebindings</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/examples.html#routes-services-and-nodeports">Routes, Services and Nodeports</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/slate/examples.html#persistent-volume-claims">Persistent Volume Claims</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/slate/glossary.html">Glossary</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../services_and_applications/myolcf/index.html">myOLCF</a><ul> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/myolcf/overview.html">Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/overview.html#what-is-myolcf">What is myOLCF?</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/overview.html#what-can-it-do">What can it do?</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/overview.html#can-i-suggest-a-feature">Can I suggest a feature?</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/myolcf/authenticating.html">Authenticating</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/authenticating.html#olcf-moderate-accounts">OLCF Moderate Accounts</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/authenticating.html#olcf-open-accounts">OLCF Open Accounts</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/myolcf/project_pages/project_pages.html">Project Pages</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/project_pages/project_pages.html#project-context">Project Context</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/project_pages/project_pages.html#switching-project-contexts">Switching Project Contexts</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/project_pages/project_pages.html#available-pages">Available Pages</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/myolcf/account_pages/account_pages.html">Account Pages</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/account_pages/account_pages.html#account-context">Account Context</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/myolcf/account_pages/account_pages.html#available-pages">Available Pages</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/myolcf/account_pages/processing_membership_requests.html">Processing Project Membership Requests</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../services_and_applications/jupyter/index.html">Jupyter</a><ul> <li class="toctree-l3"><a class="reference internal" href="../services_and_applications/jupyter/overview.html">Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/jupyter/overview.html#jupyter-at-olcf">Jupyter at OLCF</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/jupyter/overview.html#access">Access</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/jupyter/overview.html#cpu-vs-gpu-jupyterlab-available-resources">CPU vs. GPU JupyterLab (Available Resources)</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/jupyter/overview.html#working-within-lustre-and-nfs-launching-a-notebook">Working within Lustre and NFS (Launching a Notebook)</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/jupyter/overview.html#conda-environments-and-custom-notebooks">Conda Environments and Custom Notebooks</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/jupyter/overview.html#manually-stopping-your-jupyterlab-session">Manually Stopping Your JupyterLab Session</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/jupyter/overview.html#things-to-be-aware-of">Things to Be Aware Of</a></li> <li class="toctree-l4"><a class="reference internal" href="../services_and_applications/jupyter/overview.html#example-jupyter-notebooks">Example Jupyter Notebooks</a></li> </ul> </li> </ul> </li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../data/index.html">Data Storage and Transfers</a><ul> <li class="toctree-l2"><a class="reference internal" href="../data/index.html#summary-of-storage-areas">Summary of Storage Areas</a><ul> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#notes-on-user-centric-data-storage">Notes on User-Centric Data Storage</a><ul> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#user-home-directories-nfs">User Home Directories (NFS)</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#notes-on-project-centric-data-storage">Notes on Project-Centric Data Storage</a><ul> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#project-home-directories-nfs">Project Home Directories (NFS)</a></li> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#project-work-areas">Project Work Areas</a></li> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#project-archive-directories">Project Archive Directories</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../data/index.html#data-policies">Data Policies</a><ul> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#information">Information</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#special-requests">Special Requests</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#data-retention">Data Retention</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../data/index.html#orion-lustre-hpe-clusterstor-filesystem">Orion Lustre HPE ClusterStor Filesystem</a><ul> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#orion-performance-tiers-and-file-striping-policy">Orion Performance Tiers and File Striping Policy</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#i-o-patterns-that-benefit-from-file-striping">I/O Patterns that Benefit from File Striping</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#lfs-setstripe-wrapper">LFS setstripe wrapper</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#lustre-file-locking-tips">Lustre File Locking Tips</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#darshan-runtime-and-i-o-profiling">Darshan-runtime and I/O Profiling</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#purge">Purge</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#major-difference-between-lustre-hpe-clusterstor-and-ibm-spectrum-scale">Major difference between Lustre HPE ClusterStor and IBM Spectrum Scale</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../data/index.html#hpss-data-archival-system">HPSS Data Archival System</a></li> <li class="toctree-l2"><a class="reference internal" href="../data/index.html#kronos-nearline-archival-storage-system">Kronos Nearline Archival Storage System</a><ul> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#access-data-transfer">Access / Data Transfer</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#directory-structure">Directory Structure</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#project-quotas">Project Quotas</a></li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#kronos-and-hpss-comparison">Kronos and HPSS Comparison</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../data/index.html#transferring-data">Transferring Data</a><ul> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#data-transferring-data-globus">Globus</a><ul> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#using-globus-to-move-data-between-collections">Using Globus to Move Data Between Collections</a></li> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#using-globus-from-your-local-workstation">Using Globus From Your Local Workstation</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#hsi">HSI</a><ul> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#additional-hsi-documentation">Additional HSI Documentation</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#htar">HTAR</a><ul> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#htar-limitations">HTAR Limitations</a></li> <li class="toctree-l4"><a class="reference internal" href="../data/index.html#additional-htar-documentation">Additional HTAR Documentation</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../data/index.html#command-line-terminal-tools">Command-Line/Terminal Tools</a></li> </ul> </li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../software/index.html">Software</a><ul> <li class="toctree-l2"><a class="reference internal" href="../software/software-news.html">Software News</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/software-news.html#frontier-system-software-update-february-18-2025">Frontier: System Software Update (February 18, 2025)</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/software-news.html#frontier-updated-modules-for-cpe-23-12-october-16-2024">Frontier: Updated Modules for cpe/23.12 (October 16 2024)</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/software-news.html#frontier-core-module-october-15-2024">Frontier: Core Module (October 15, 2024)</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/software-news.html#frontier-system-software-update-july-16-2024">Frontier: System Software Update (July 16, 2024)</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/software-news.html#frontier-user-environment-changes-july-9-2024">Frontier: User Environment Changes (July 9, 2024)</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/analytics/index.html">ML/DL & Data Analytics</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/analytics/apache-spark.html">Apache Spark</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/apache-spark.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/apache-spark.html#getting-started">Getting Started</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/analytics/pytorch_frontier.html">PyTorch on Frontier</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/pytorch_frontier.html#table-of-contents">Table of Contents:</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/pytorch_frontier.html#installing-pytorch">Installing PyTorch</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/pytorch_frontier.html#best-practices">Best Practices</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/pytorch_frontier.html#pytorch-geometric">PyTorch Geometric</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/pytorch_frontier.html#troubleshooting">Troubleshooting</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/pytorch_frontier.html#additional-resources">Additional Resources</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/analytics/jax.html">JAX</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/jax.html#setting-up-the-environment">Setting up the environment</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/jax.html#installing-jax">Installing JAX</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/jax.html#testing-jax">Testing JAX</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/jax.html#additional-resources">Additional Resources</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/python/index.html">Python on OLCF Systems</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/python/index.html#overview">Overview</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/python/index.html#olcf-python-guides">OLCF Python Guides</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/python/conda_basics.html">Conda Basics</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/python/parallel_h5py.html">Installing mpi4py and h5py</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/python/cupy.html">Installing CuPy</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/python/sbcast_conda.html">Sbcast Conda Environments</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/python/jupyter_envs.html">Jupyter Visibility</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/pytorch_frontier.html">PyTorch on Frontier</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/analytics/jax.html">JAX</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/python/index.html#module-usage">Module Usage</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/python/index.html#base-environment">Base Environment</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/python/index.html#custom-environments">Custom Environments</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/python/index.html#how-to-run">How to Run</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/python/index.html#frontier-andes">Frontier / Andes</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/python/index.html#best-practices">Best Practices</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/python/index.html#additional-resources">Additional Resources</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/profiling/index.html">Profiling Tools</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/profiling/Scorep.html">Score-P</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/Scorep.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/Scorep.html#usage">Usage</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/Scorep.html#instrumentation">Instrumentation</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/Scorep.html#measurement">Measurement</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/Scorep.html#profiling">Profiling</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/Scorep.html#tracing">Tracing</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/Scorep.html#manual-instrumentation">Manual Instrumentation</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/Scorep.html#score-p-demo-video">Score-P Demo Video</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/profiling/TAU.html">Tuning and Analysis Utilities (TAU)</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/TAU.html#enabling-tau-at-olcf">Enabling TAU at OLCF</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/TAU.html#profile-and-trace-using-tau-exec-exe">Profile and trace using “tau_exec exe”</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/TAU.html#automatic-source-instrumentation-using-compiler-wrappers">Automatic source instrumentation using compiler wrappers</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/TAU.html#selective-instrumentation">Selective Instrumentation</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/TAU.html#manual-source-instrumentation">Manual source instrumentation</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/TAU.html#run-time-environment-variables">Run-Time Environment Variables</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/TAU.html#compile-time-environment-variables">Compile-Time Environment Variables</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/profiling/TAU.html#references">References</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/UMS/index.html">User-Managed Software</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/UMS/index.html#introduction">Introduction</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/UMS/index.html#currently-available-user-managed-software">Currently Available User-Managed Software</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/UMS/index.html#usage">Usage</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/UMS/index.html#policies">Policies</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/UMS/index.html#writing-ums-modulefiles">Writing UMS Modulefiles</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/workflows/index.html">Workflows</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/workflows/index.html#running-workflows-on-olcf-resources">Running Workflows on OLCF Resources</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/workflows/index.html#workflow-systems">Workflow Systems</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/workflows/entk.html">Ensemble Toolkit (EnTK)</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/workflows/parsl.html">Parsl</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/workflows/swift_t.html">Swift/T</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/spack_environments.html">Spack Environments</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/spack_environments.html#purpose">Purpose</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/spack_environments.html#getting-started">Getting Started</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/spack_environments.html#add-dependencies-to-the-environment">Add Dependencies to the environment</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/spack_environments.html#adding-olcf-modulefiles-as-external-packages">Adding OLCF Modulefiles as External Packages</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/spack_environments.html#adding-user-defined-dependencies-to-the-environment">Adding User-Defined Dependencies to the environment</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/spack_environments.html#installing-the-environment">Installing the Environment</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/spack_environments.html#more-details">More Details</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/spack_environments.html#references">References</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/viz_tools/index.html">Visualization Tools</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/viz_tools/visit.html">VisIt</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/visit.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/visit.html#installing-and-setting-up-visit">Installing and Setting Up Visit</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/visit.html#remote-gui-usage">Remote GUI Usage</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/visit.html#command-line-example">Command Line Example</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/visit.html#troubleshooting">Troubleshooting</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/visit.html#additional-resources">Additional Resources</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/viz_tools/paraview.html">ParaView</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/paraview.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/paraview.html#installing-and-setting-up-paraview">Installing and Setting Up ParaView</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/paraview.html#remote-gui-usage">Remote GUI Usage</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/paraview.html#command-line-example">Command Line Example</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/paraview.html#troubleshooting">Troubleshooting</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/viz_tools/paraview.html#additional-resources">Additional Resources</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/containers_on_frontier.html">Containers on Frontier</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/containers_on_frontier.html#examples-for-building-and-running-containers">Examples for Building and Running Containers</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/containers_on_frontier.html#building-and-running-a-container-image-from-a-base-linux-distribution-for-mpi">Building and running a container image from a base Linux distribution for MPI</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/containers_on_frontier.html#pushing-your-apptainer-image-to-an-oci-registry-supporting-oras-e-g-dockerhub">Pushing your Apptainer image to an OCI Registry supporting ORAS (e.g. DockerHub)</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/containers_on_frontier.html#building-an-image-on-top-of-an-existing-image-local-docker-image-oci-artifact">Building an image on top of an existing image (local, docker image, OCI artifact)</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/containers_on_frontier.html#olcf-base-images-apptainer-modules">OLCF Base Images & Apptainer Modules</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/containers_on_frontier.html#base-images">Base Images</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/containers_on_frontier.html#apptainer-modules">Apptainer Modules</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/containers_on_frontier.html#example-workflow">Example Workflow</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/containers_on_frontier.html#some-restrictions-and-tips">Some Restrictions and Tips</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../software/debugging/index.html">Debugging</a><ul> <li class="toctree-l3"><a class="reference internal" href="../software/debugging/index.html#linaro-forge-ddt">Linaro Forge DDT</a><ul> <li class="toctree-l4"><a class="reference internal" href="../software/debugging/index.html#client-setup-and-usage">Client Setup and Usage</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/debugging/index.html#download">Download</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/debugging/index.html#installation">Installation</a></li> <li class="toctree-l4"><a class="reference internal" href="../software/debugging/index.html#configuration">Configuration</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../software/debugging/index.html#gnu-gdb">GNU GDB</a></li> <li class="toctree-l3"><a class="reference internal" href="../software/debugging/index.html#valgrind">Valgrind</a></li> </ul> </li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../training/index.html">Training</a><ul> <li class="toctree-l2"><a class="reference external" href="https://www.olcf.ornl.gov/for-users/training/training-calendar" target="_blank">OLCF Training Calendar</a></li> <li class="toctree-l2"><a class="reference external" href="https://github.com/olcf-tutorials" target="_blank">OLCF Tutorials</a></li> <li class="toctree-l2"><a class="reference internal" href="../training/training_archive.html">OLCF Training Archive</a></li> <li class="toctree-l2"><a class="reference internal" href="../training/olcf_gpu_hackathons.html">OLCF GPU Hackathons</a></li> <li class="toctree-l2"><a class="reference internal" href="../training/olcf_gpu_hackathons.html#facultyhack">FacultyHack</a></li> <li class="toctree-l2"><a class="reference external" href="https://vimeo.com/channels/olcftraining" target="_blank">OLCF Vimeo Channel</a></li> <li class="toctree-l2"><a class="reference internal" href="../training/index.html#new-user-quick-start">New User Quick Start</a></li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../quantum/index.html">Quantum</a><ul> <li class="toctree-l2"><a class="reference internal" href="../quantum/quantum_access.html">Quantum Computing User Program (QCUP) Access</a><ul> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_access.html#qcup-priorities">QCUP Priorities</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_access.html#project-allocations">Project Allocations</a><ul> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_access.html#what-happens-after-a-project-request-is-approved">What happens after a project request is approved?</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_access.html#project-renewals">Project Renewals</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_access.html#closeout-and-quarterly-reports">Closeout and Quarterly Reports</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_access.html#user-accounts">User Accounts</a><ul> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_access.html#checking-the-status-of-your-application">Checking the status of your application</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_access.html#accessing-quantum-resources">Accessing Quantum Resources</a><ul> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_access.html#ibm-quantum-computing">IBM Quantum Computing</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_access.html#quantinuum">Quantinuum</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_access.html#ionq">IonQ</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_access.html#iqm">IQM</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_access.html#publication-citations">Publication Citations</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../quantum/quantum_systems/index.html">Quantum Systems</a><ul> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_systems/ibm_quantum.html">IBM Quantum</a><ul> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ibm_quantum.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ibm_quantum.html#connecting">Connecting</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ibm_quantum.html#running-jobs-queue-policies">Running Jobs & Queue Policies</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ibm_quantum.html#checking-system-availability-capability">Checking System Availability & Capability</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ibm_quantum.html#software">Software</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ibm_quantum.html#additional-resources">Additional Resources</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_systems/quantinuum.html">Quantinuum</a><ul> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/quantinuum.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/quantinuum.html#connecting">Connecting</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/quantinuum.html#running-jobs-queue-policies">Running Jobs & Queue Policies</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/quantinuum.html#default-quotas">Default Quotas</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/quantinuum.html#allocations-credit-usage">Allocations & Credit Usage</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/quantinuum.html#software">Software</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_systems/ionq.html">IonQ</a><ul> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ionq.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ionq.html#ionq-systems">IonQ systems</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ionq.html#connecting">Connecting</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ionq.html#running-jobs-queue-policies">Running Jobs & Queue Policies</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ionq.html#allocations-credit-usage">Allocations & Credit Usage</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ionq.html#checking-system-availability-capability">Checking System Availability & Capability</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/ionq.html#additional-resources">Additional Resources</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_systems/iqm.html">IQM</a><ul> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/iqm.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/iqm.html#connecting">Connecting</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/iqm.html#running-jobs-queue-policies">Running Jobs & Queue Policies</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/iqm.html#checking-system-availability">Checking System Availability</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/iqm.html#software">Software</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_systems/iqm.html#additional-resources">Additional Resources</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../quantum/quantum_software/index.html">Quantum Software</a><ul> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_software/hybrid_hpc.html">Quantum Software on HPC Systems</a><ul> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_software/hybrid_hpc.html#overview">Overview</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_software/hybrid_hpc.html#qiskit">Qiskit</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_software/hybrid_hpc.html#pyquil-forest-sdk-rigetti">PyQuil/Forest SDK (Rigetti)</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_software/hybrid_hpc.html#pennylane">PennyLane</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_software/hybrid_hpc.html#pytket">Pytket</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_software/hybrid_hpc.html#cuda-q">CUDA-Q</a></li> <li class="toctree-l4"><a class="reference internal" href="../quantum/quantum_software/hybrid_hpc.html#batch-jobs">Batch Jobs</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../quantum/hello_qcup.html">Hello QCUP Scripts</a><ul> <li class="toctree-l3"><a class="reference internal" href="../quantum/hello_qcup.html#overview">Overview</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/hello_qcup.html#ibm-quantum">IBM Quantum</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/hello_qcup.html#quantinuum">Quantinuum</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/hello_qcup.html#ionq">IonQ</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/hello_qcup.html#iqm">IQM</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../quantum/quantum_faq.html">Frequently Asked Questions</a><ul> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_faq.html#how-do-quantum-computers-differ-from-classical-computers">How do quantum computers differ from classical computers?</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_faq.html#what-is-a-qubit">What is a qubit?</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_faq.html#how-do-i-access-the-olcf-quantum-computing-resources">How do I access the OLCF quantum computing resources?</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_faq.html#what-happens-after-i-apply-for-access-to-qcup">What happens after I apply for access to QCUP?</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_faq.html#i-formerly-had-access-to-quantum-resources-but-my-backends-lattices-etc-have-disappeared-what-do-i-do">I formerly had access to quantum resources, but my backends/lattices/etc. have disappeared, what do I do?</a></li> <li class="toctree-l3"><a class="reference internal" href="../quantum/quantum_faq.html#i-applied-to-a-quantum-computing-resource-via-the-vendor-website-but-dont-have-access-what-do-i-do">I applied to a quantum computing resource via the vendor website, but don’t have access; what do I do?</a></li> </ul> </li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../spi/index.html">Scalable Protected Infrastructure (SPI)</a><ul> <li class="toctree-l2"><a class="reference internal" href="../spi/index.html#what-is-spi">What is SPI</a><ul> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#what-is-citadel">What is Citadel</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../spi/index.html#new-user-quickstart">New User QuickStart</a><ul> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#notable-differences">Notable Differences</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../spi/index.html#allocations-and-user-accounts">Allocations and User Accounts</a><ul> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#allocations-projects">Allocations (Projects)</a><ul> <li class="toctree-l4"><a class="reference internal" href="../spi/index.html#requesting-a-new-allocation-project">Requesting a New Allocation (Project)</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#user-accounts">User Accounts</a><ul> <li class="toctree-l4"><a class="reference internal" href="../spi/index.html#requesting-a-new-user-account">Requesting a New User Account</a></li> </ul> </li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../spi/index.html#available-resources">Available Resources</a><ul> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#compute"><span class="xref std std-ref">Compute</span></a></li> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#file-systems"><span class="xref std std-ref">File Systems</span></a></li> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#data-transfer"><span class="xref std std-ref">Data Transfer</span></a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../spi/index.html#ip-whitelisting">IP Whitelisting</a><ul> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#whitelisting-an-ip-or-range">Whitelisting an IP or range</a></li> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#finding-your-ip">Finding your IP</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../spi/index.html#citadel">Citadel</a><ul> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#login-nodes">Login Nodes</a></li> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#connecting">Connecting</a></li> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#building-software">Building Software</a><ul> <li class="toctree-l4"><a class="reference internal" href="../spi/index.html#external-repositories">External Repositories</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../spi/index.html#running-batch-jobs">Running Batch Jobs</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../spi/index.html#spi-file-systems">File Systems</a></li> <li class="toctree-l2"><a class="reference internal" href="../spi/index.html#spi-data-transfer">Data Transfer</a></li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../ace_testbed/index.html">Advanced Computing Ecosystem Testbed (ACE)</a><ul> <li class="toctree-l2"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html">Defiant Quick-Start Guide</a><ul> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#system-overview">System Overview</a><ul> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#defiant-compute-nodes">Defiant Compute Nodes</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#system-interconnect">System Interconnect</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#file-systems">File Systems</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#gpus">GPUs</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#connecting">Connecting</a></li> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#data-and-storage">Data and Storage</a><ul> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#nfs-filesystem">NFS Filesystem</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#lustre-filesystem-polis">Lustre Filesystem (Polis)</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#programming-environment">Programming Environment</a><ul> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#environment-modules-lmod">Environment Modules (Lmod)</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#compilers">Compilers</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#mpi">MPI</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#compiling">Compiling</a><ul> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#id3">MPI</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#openmp">OpenMP</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#openmp-gpu-offload">OpenMP GPU Offload</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#hip">HIP</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#running-jobs">Running Jobs</a><ul> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#slurm-workload-manager">Slurm Workload Manager</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#slurm-compute-node-partitions">Slurm Compute Node Partitions</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#process-and-thread-mapping">Process and Thread Mapping</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#nvme-usage">NVMe Usage</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#container-usage">Container Usage</a><ul> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#setup-before-building">Setup before Building</a></li> <li class="toctree-l4"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#build-and-run-workflow">Build and Run Workflow</a></li> </ul> </li> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#getting-help">Getting Help</a></li> <li class="toctree-l3"><a class="reference internal" href="../ace_testbed/defiant_quick_start_guide.html#known-issues">Known Issues</a></li> </ul> </li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing to these docs</a><ul> <li class="toctree-l2"><a class="reference internal" href="../contributing/index.html#submitting-suggestions">Submitting suggestions</a></li> <li class="toctree-l2"><a class="reference internal" href="../contributing/index.html#authoring-content">Authoring content</a><ul> <li class="toctree-l3"><a class="reference internal" href="../contributing/index.html#setup-authoring-environment">Setup authoring environment</a></li> <li class="toctree-l3"><a class="reference internal" href="../contributing/index.html#edit-the-docs">Edit the docs</a></li> <li class="toctree-l3"><a class="reference internal" href="../contributing/index.html#resources">Resources</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../contributing/index.html#github-guidelines">GitHub Guidelines</a></li> </ul> </li> </ul> </div> </div> </nav> <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" style="background: #efefef" > <i data-toggle="wy-nav-top" class="fa fa-bars"></i> <a href="../index.html">OLCF User Documentation</a> </nav> <div class="wy-nav-content"> <div class="rst-content style-external-links"> <div role="navigation" aria-label="Page navigation"> <ul class="wy-breadcrumbs"> <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li> <li class="breadcrumb-item"><a href="index.html">Systems</a></li> <li class="breadcrumb-item active">Spock Quick-Start Guide</li> <li class="wy-breadcrumbs-aside"> <a href="https://github.com/olcf/olcf-user-docs/blob/master/systems/spock_quick_start_guide.rst" class="fa fa-github"> Edit on GitHub</a> </li> </ul> <hr/> </div> <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> <div itemprop="articleBody"> <section id="spock-quick-start-guide"> <span id="id1"></span><h1>Spock Quick-Start Guide<a class="headerlink" href="#spock-quick-start-guide" title="Link to this heading"></a></h1> <div class="admonition warning"> <p class="admonition-title">Warning</p> <p><strong>The Spock Early Access System was decommissioned on March 15, 2023.</strong> The file systems that were available on Spock are still accessible from the Home server and the Data Transfer Nodes (DTN), so all your data will remain accessible. If you do not have access to other OLCF systems, your project will move to data-only for 30-days. If you have any questions, please contact <a class="reference external" href="mailto:help%40olcf.ornl.gov" target="_blank">help<span>@</span>olcf<span>.</span>ornl<span>.</span>gov</a>.</p> </div> <section id="system-overview"> <span id="spock-system-overview"></span><h2>System Overview<a class="headerlink" href="#system-overview" title="Link to this heading"></a></h2> <p>Spock is an NCCS moderate-security system that contains similar hardware and software as the upcoming Frontier system. It is used as an early-access testbed for Center for Accelerated Application Readiness (CAAR) and Exascale Computing Project (ECP) teams as well as NCCS staff and our vendor partners. The system has 3 cabinets, each containing 12 compute nodes, for a total of 36 compute nodes.</p> <section id="spock-compute-nodes"> <span id="id2"></span><h3>Spock Compute Nodes<a class="headerlink" href="#spock-compute-nodes" title="Link to this heading"></a></h3> <p>Each Spock compute node consists of [1x] 64-core AMD EPYC 7662 “Rome” CPU (with 2 hardware threads per physical core) with access to 256 GB of DDR4 memory and connected to [4x] AMD MI100 GPUs. The CPU is connected to all GPUs via PCIe Gen4, allowing peak host-to-device (H2D) and device-to-host (D2H) data transfers of 32+32 GB/s. The GPUs are connected in an all-to-all arrangement via Infinity Fabric (xGMI), allowing for a peak device-to-device bandwidth of 46+46 GB/s. Each compute node also has [2x] 3.2 TB NVMe devices (SSDs) with sequential read and write speeds of 6900 MB/s and 4200 MB/s, respectively.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>The X+X GB/s values for bandwidths above represent bi-directional bandwidths. So, for example, the Infinity Fabric connecting any two GPUs allows peak data transfers of 46 GB/s <em>in both directions simultaneously</em>.</p> </div> <a class="reference internal image-reference" href="../_images/Spock_Node.jpg"><img alt="Spock node architecture diagram" class="align-center" src="../_images/Spock_Node.jpg" style="width: 100%;" /> </a> <div class="admonition note"> <p class="admonition-title">Note</p> <p>There are 4 NUMA domains per node, that are defined as follows:</p> <ul class="simple"> <li><p>NUMA 0: hardware threads 000-015, 064-079 | GPU 0</p></li> <li><p>NUMA 1: hardware threads 016-031, 080-095 | GPU 1</p></li> <li><p>NUMA 2: hardware threads 032-047, 096-111 | GPU 2</p></li> <li><p>NUMA 3: hardware threads 048-063, 112-127 | GPU 3</p></li> </ul> </div> </section> <section id="system-interconnect"> <h3>System Interconnect<a class="headerlink" href="#system-interconnect" title="Link to this heading"></a></h3> <p>The Spock nodes are connected with Slingshot-10 providing a node injection bandwidth of 12.5 GB/s.</p> </section> <section id="file-systems"> <h3>File Systems<a class="headerlink" href="#file-systems" title="Link to this heading"></a></h3> <p>Spock is connected to an IBM Spectrum Scale™ filesystem providing 250 PB of storage capacity with a peak write speed of 2.5 TB/s. Spock also has access to the center-wide NFS-based filesystem (which provides user and project home areas). While Spock does not have <em>direct</em> access to the center’s Nearline archival storage system (Kronos) - for user and project archival storage - users can log in to the moderate DTNs to move data to/from Kronos or use the “OLCF Kronos” Globus collection. For more information on using Kronos, see the <a class="reference internal" href="../data/index.html#kronos"><span class="std std-ref">Kronos Nearline Archival Storage System</span></a> section.</p> </section> <section id="gpus"> <h3>GPUs<a class="headerlink" href="#gpus" title="Link to this heading"></a></h3> <p>Spock contains a total of 144 AMD MI100 GPUs. The AMD MI100 GPU has a peak performance of up to 11.5 TFLOPS in double-precision for modeling & simulation and up to 184.6 TFLOPS in half-precision for machine learning and data analytics. Each GPU contains 120 compute units (7680 stream processors) and 32 GB of high-bandwidth memory (HBM2) which can be accessed at speeds of up to 1.2 TB/s.</p> </section> </section> <hr class="docutils" /> <section id="connecting"> <h2>Connecting<a class="headerlink" href="#connecting" title="Link to this heading"></a></h2> <p>To connect to Spock, <code class="docutils literal notranslate"><span class="pre">ssh</span></code> to <code class="docutils literal notranslate"><span class="pre">spock.olcf.ornl.gov</span></code>. For example:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ssh<span class="w"> </span>username@spock.olcf.ornl.gov </pre></div> </div> <p>For more information on connecting to OLCF resources, see <a class="reference internal" href="../connecting/index.html#connecting-to-olcf"><span class="std std-ref">Connecting for the first time</span></a>.</p> </section> <hr class="docutils" /> <section id="data-and-storage"> <h2>Data and Storage<a class="headerlink" href="#data-and-storage" title="Link to this heading"></a></h2> <p>For more detailed information about center-wide file systems and data archiving available on Spock, please refer to the pages on <a class="reference internal" href="../data/index.html#data-storage-and-transfers"><span class="std std-ref">Data Storage and Transfers</span></a>, but the two subsections below give a quick overview of NFS and GPFS storage spaces.</p> <section id="nfs"> <h3>NFS<a class="headerlink" href="#nfs" title="Link to this heading"></a></h3> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Area</p></th> <th class="head"><p>Path</p></th> <th class="head"><p>Type</p></th> <th class="head"><p>Permissions</p></th> <th class="head"><p>Quota</p></th> <th class="head"><p>Backups</p></th> <th class="head"><p>Purged</p></th> <th class="head"><p>Retention</p></th> <th class="head"><p>On Compute Nodes</p></th> </tr> </thead> <tbody> <tr class="row-even"><td><p>User Home</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">/ccs/home/[userid]</span></code></p></td> <td><p>NFS</p></td> <td><p>User set</p></td> <td><p>50 GB</p></td> <td><p>Yes</p></td> <td><p>No</p></td> <td><p>90 days</p></td> <td><p>Read-only</p></td> </tr> <tr class="row-odd"><td><p>Project Home</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">/ccs/proj/[projid]</span></code></p></td> <td><p>NFS</p></td> <td><p>770</p></td> <td><p>50 GB</p></td> <td><p>Yes</p></td> <td><p>No</p></td> <td><p>90 days</p></td> <td><p>Read-only</p></td> </tr> </tbody> </table> </section> <section id="gpfs"> <h3>GPFS<a class="headerlink" href="#gpfs" title="Link to this heading"></a></h3> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Area</p></th> <th class="head"><p>Path</p></th> <th class="head"><p>Type</p></th> <th class="head"><p>Permissions</p></th> <th class="head"><p>Quota</p></th> <th class="head"><p>Backups</p></th> <th class="head"><p>Purged</p></th> <th class="head"><p>Retention</p></th> <th class="head"><p>On Compute Nodes</p></th> </tr> </thead> <tbody> <tr class="row-even"><td><p>Member Work</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">/gpfs/alpine/[projid]/scratch/[userid]</span></code></p></td> <td><p>Spectrum Scale</p></td> <td><p>700</p></td> <td><p>50 TB</p></td> <td><p>No</p></td> <td><p>90 days</p></td> <td><p>N/A</p></td> <td><p>Yes</p></td> </tr> <tr class="row-odd"><td><p>Project Work</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">/gpfs/alpine/[projid]/proj-shared</span></code></p></td> <td><p>Spectrum Scale</p></td> <td><p>770</p></td> <td><p>50 TB</p></td> <td><p>No</p></td> <td><p>90 days</p></td> <td><p>N/A</p></td> <td><p>Yes</p></td> </tr> <tr class="row-even"><td><p>World Work</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">/gpfs/alpine/[projid]/world-shared</span></code></p></td> <td><p>Spectrum Scale</p></td> <td><p>775</p></td> <td><p>50 TB</p></td> <td><p>No</p></td> <td><p>90 days</p></td> <td><p>N/A</p></td> <td><p>Yes</p></td> </tr> </tbody> </table> </section> </section> <hr class="docutils" /> <section id="programming-environment"> <h2>Programming Environment<a class="headerlink" href="#programming-environment" title="Link to this heading"></a></h2> <p>OLCF provides Spock users many pre-installed software packages and scientific libraries. To facilitate this, environment management tools are used to handle necessary changes to the shell.</p> <section id="environment-modules-lmod"> <h3>Environment Modules (Lmod)<a class="headerlink" href="#environment-modules-lmod" title="Link to this heading"></a></h3> <p>Environment modules are provided through <a class="reference external" href="https://lmod.readthedocs.io/en/latest/" target="_blank">Lmod</a>, a Lua-based module system for dynamically altering shell environments. By managing changes to the shell’s environment variables (such as <code class="docutils literal notranslate"><span class="pre">PATH</span></code>, <code class="docutils literal notranslate"><span class="pre">LD_LIBRARY_PATH</span></code>, and <code class="docutils literal notranslate"><span class="pre">PKG_CONFIG_PATH</span></code>), Lmod allows you to alter the software available in your shell environment without the risk of creating package and version combinations that cannot coexist in a single environment.</p> <section id="general-usage"> <h4>General Usage<a class="headerlink" href="#general-usage" title="Link to this heading"></a></h4> <p>The interface to Lmod is provided by the <code class="docutils literal notranslate"><span class="pre">module</span></code> command:</p> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Command</p></th> <th class="head"><p>Description</p></th> </tr> </thead> <tbody> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">-t</span> <span class="pre">list</span></code></p></td> <td><p>Shows a terse list of the currently loaded modules</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">avail</span></code></p></td> <td><p>Shows a table of the currently available modules</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">help</span> <span class="pre"><modulename></span></code></p></td> <td><p>Shows help information about <code class="docutils literal notranslate"><span class="pre"><modulename></span></code></p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">show</span> <span class="pre"><modulename></span></code></p></td> <td><p>Shows the environment changes made by the <code class="docutils literal notranslate"><span class="pre"><modulename></span></code> modulefile</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">spider</span> <span class="pre"><string></span></code></p></td> <td><p>Searches all possible modules according to <code class="docutils literal notranslate"><span class="pre"><string></span></code></p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">load</span> <span class="pre"><modulename></span> <span class="pre">[...]</span></code></p></td> <td><p>Loads the given <code class="docutils literal notranslate"><span class="pre"><modulename></span></code>(s) into the current environment</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">use</span> <span class="pre"><path></span></code></p></td> <td><p>Adds <code class="docutils literal notranslate"><span class="pre"><path></span></code> to the modulefile search cache and <code class="docutils literal notranslate"><span class="pre">MODULESPATH</span></code></p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">unuse</span> <span class="pre"><path></span></code></p></td> <td><p>Removes <code class="docutils literal notranslate"><span class="pre"><path></span></code> from the modulefile search cache and <code class="docutils literal notranslate"><span class="pre">MODULESPATH</span></code></p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">purge</span></code></p></td> <td><p>Unloads all modules</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">reset</span></code></p></td> <td><p>Resets loaded modules to system defaults</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">update</span></code></p></td> <td><p>Reloads all currently loaded modules</p></td> </tr> </tbody> </table> </section> <section id="searching-for-modules"> <h4>Searching for Modules<a class="headerlink" href="#searching-for-modules" title="Link to this heading"></a></h4> <p>Modules with dependencies are only available when the underlying dependencies, such as compiler families, are loaded. Thus, module avail will only display modules that are compatible with the current state of the environment. To search the entire hierarchy across all possible dependencies, the <code class="docutils literal notranslate"><span class="pre">spider</span></code> sub-command can be used as summarized in the following table.</p> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Command</p></th> <th class="head"><p>Description</p></th> </tr> </thead> <tbody> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">spider</span></code></p></td> <td><p>Shows the entire possible graph of modules</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">spider</span> <span class="pre"><modulename></span></code></p></td> <td><p>Searches for modules named <code class="docutils literal notranslate"><span class="pre"><modulename></span></code> in the graph of possible modules</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">spider</span> <span class="pre"><modulename>/<version></span></code></p></td> <td><p>Searches for a specific version of <code class="docutils literal notranslate"><span class="pre"><modulename></span></code> in the graph of possible modules</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">module</span> <span class="pre">spider</span> <span class="pre"><string></span></code></p></td> <td><p>Searches for modulefiles containing <code class="docutils literal notranslate"><span class="pre"><string></span></code></p></td> </tr> </tbody> </table> </section> </section> <section id="compilers"> <h3>Compilers<a class="headerlink" href="#compilers" title="Link to this heading"></a></h3> <p>Cray, AMD, and GCC compilers are provided through modules on Spock. The Cray and AMD compilers are both based on LLVM/Clang. There are also system/OS versions of both Clang and GCC available in <code class="docutils literal notranslate"><span class="pre">/usr/bin</span></code>. The table below lists details about each of the module-provided compilers.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>It is highly recommended to use the Cray compiler wrappers (<code class="docutils literal notranslate"><span class="pre">cc</span></code>, <code class="docutils literal notranslate"><span class="pre">CC</span></code>, and <code class="docutils literal notranslate"><span class="pre">ftn</span></code>) whenever possible. See the next section for more details.</p> </div> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Vendor</p></th> <th class="head"><p>Programming Environment</p></th> <th class="head"><p>Compiler Module</p></th> <th class="head"><p>Language</p></th> <th class="head"><p>Compiler Wrapper</p></th> <th class="head"><p>Compiler</p></th> </tr> </thead> <tbody> <tr class="row-even"><td rowspan="3"><p>Cray</p></td> <td rowspan="3"><p><code class="docutils literal notranslate"><span class="pre">PrgEnv-cray</span></code></p></td> <td rowspan="3"><p><code class="docutils literal notranslate"><span class="pre">cce</span></code></p></td> <td><p>C</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">cc</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">craycc</span></code></p></td> </tr> <tr class="row-odd"><td><p>C++</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">CC</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">craycxx</span></code> or <code class="docutils literal notranslate"><span class="pre">crayCC</span></code></p></td> </tr> <tr class="row-even"><td><p>Fortran</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">ftn</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">crayftn</span></code></p></td> </tr> <tr class="row-odd"><td rowspan="3"><p>AMD</p></td> <td rowspan="3"><p><code class="docutils literal notranslate"><span class="pre">PrgEnv-amd</span></code></p></td> <td rowspan="3"><p><code class="docutils literal notranslate"><span class="pre">rocm</span></code></p></td> <td><p>C</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">cc</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">$ROCM_PATH/llvm/bin/clang</span></code></p></td> </tr> <tr class="row-even"><td><p>C++</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">CC</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">$ROCM_PATH/llvm/bin/clang++</span></code></p></td> </tr> <tr class="row-odd"><td><p>Fortran</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">ftn</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">$ROCM_PATH/llvm/bin/flang</span></code></p></td> </tr> <tr class="row-even"><td rowspan="3"><p>GCC</p></td> <td rowspan="3"><p><code class="docutils literal notranslate"><span class="pre">PrgEnv-gnu</span></code></p></td> <td rowspan="3"><p><code class="docutils literal notranslate"><span class="pre">gcc</span></code></p></td> <td><p>C</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">cc</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">$GCC_PATH/bin/gcc</span></code></p></td> </tr> <tr class="row-odd"><td><p>C++</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">CC</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">$GCC_PATH/bin/g++</span></code></p></td> </tr> <tr class="row-even"><td><p>Fortran</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">ftn</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">$GCC_PATH/bin/gfortran</span></code></p></td> </tr> </tbody> </table> <section id="cray-programming-environment-and-compiler-wrappers"> <h4>Cray Programming Environment and Compiler Wrappers<a class="headerlink" href="#cray-programming-environment-and-compiler-wrappers" title="Link to this heading"></a></h4> <p>Cray provides <code class="docutils literal notranslate"><span class="pre">PrgEnv-<compiler></span></code> modules (e.g., <code class="docutils literal notranslate"><span class="pre">PrgEnv-cray</span></code>) that load compatible components of a specific compiler toolchain. The components include the specified compiler as well as MPI, LibSci, and other libraries. Loading the <code class="docutils literal notranslate"><span class="pre">PrgEnv-<compiler></span></code> modules also defines a set of compiler wrappers for that compiler toolchain that automatically add include paths and link in libraries for Cray software. Compiler wrappers are provided for C (<code class="docutils literal notranslate"><span class="pre">cc</span></code>), C++ (<code class="docutils literal notranslate"><span class="pre">CC</span></code>), and Fortran (<code class="docutils literal notranslate"><span class="pre">ftn</span></code>).</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Use the <code class="docutils literal notranslate"><span class="pre">-craype-verbose</span></code> flag to display the full include and link information used by the Cray compiler wrappers. This must be called on a file to see the full output (e.g., <code class="docutils literal notranslate"><span class="pre">CC</span> <span class="pre">-craype-verbose</span> <span class="pre">test.cpp</span></code>).</p> </div> </section> </section> <section id="mpi"> <h3>MPI<a class="headerlink" href="#mpi" title="Link to this heading"></a></h3> <p>The MPI implementation available on Spock is Cray’s MPICH, which is “GPU-aware” so GPU buffers can be passed directly to MPI calls.</p> </section> </section> <hr class="docutils" /> <section id="compiling"> <h2>Compiling<a class="headerlink" href="#compiling" title="Link to this heading"></a></h2> <p>This section covers how to compile for different programming models using the different compilers covered in the previous section.</p> <section id="id3"> <h3>MPI<a class="headerlink" href="#id3" title="Link to this heading"></a></h3> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Implementation</p></th> <th class="head"><p>Module</p></th> <th class="head"><p>Compiler</p></th> <th class="head"><p>Header Files & Linking</p></th> </tr> </thead> <tbody> <tr class="row-even"><td rowspan="2"><p>Cray MPICH</p></td> <td rowspan="2"><p><code class="docutils literal notranslate"><span class="pre">cray-mpich</span></code></p></td> <td><p><code class="docutils literal notranslate"><span class="pre">cc</span></code>, <code class="docutils literal notranslate"><span class="pre">CC</span></code>, <code class="docutils literal notranslate"><span class="pre">ftn</span></code> (Cray compiler wrappers)</p></td> <td><p>MPI header files and linking is built into the Cray compiler wrappers</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">hipcc</span></code></p></td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">-L$(MPICH_DIR)/lib</span> <span class="pre">-lmpi</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">-I$(MPICH_DIR)/include</span></code></div> </div> </td> </tr> </tbody> </table> <section id="gpu-aware-mpi"> <h4>GPU-Aware MPI<a class="headerlink" href="#gpu-aware-mpi" title="Link to this heading"></a></h4> <p>To use GPU-aware Cray MPICH, there are currently some extra steps needed in addition to the table above, which depend on the compiler that is used.</p> <section id="compiling-with-the-cray-compiler-wrappers-cc-or-cc"> <h5>1. Compiling with the Cray compiler wrappers, <code class="docutils literal notranslate"><span class="pre">cc</span></code> or <code class="docutils literal notranslate"><span class="pre">CC</span></code><a class="headerlink" href="#compiling-with-the-cray-compiler-wrappers-cc-or-cc" title="Link to this heading"></a></h5> <p>To use GPU-aware Cray MPICH with the Cray compiler wrappers, users must load specific modules, set some environment variables, and include appropriate headers and libraries. The following modules and environment variables must be set:</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Setting <code class="docutils literal notranslate"><span class="pre">MPICH_SMP_SINGLE_COPY_MODE=CMA</span></code> is required as a temporary workaround due to a <a class="reference external" href="https://docs.olcf.ornl.gov/systems/spock_quick_start_guide.html#olcfdev-138-gpu-aware-cray-mpich-can-cause-hang-in-some-codes" target="_blank">known issue</a>. Users should make a note of where they set this environment variable (if e.g., set in a script) since it should NOT be set once the known issue has been resolved.</p> </div> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>module<span class="w"> </span>load<span class="w"> </span>craype-accel-amd-gfx908 module<span class="w"> </span>load<span class="w"> </span>PrgEnv-cray module<span class="w"> </span>load<span class="w"> </span>rocm <span class="c1">## These must be set before running</span> <span class="nb">export</span><span class="w"> </span><span class="nv">MPIR_CVAR_GPU_EAGER_DEVICE_MEM</span><span class="o">=</span><span class="m">0</span> <span class="nb">export</span><span class="w"> </span><span class="nv">MPICH_GPU_SUPPORT_ENABLED</span><span class="o">=</span><span class="m">1</span> <span class="nb">export</span><span class="w"> </span><span class="nv">MPICH_SMP_SINGLE_COPY_MODE</span><span class="o">=</span>CMA </pre></div> </div> <p>In addition, the following header files and libraries must be included:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>-I<span class="si">${</span><span class="nv">ROCM_PATH</span><span class="si">}</span>/include -L<span class="si">${</span><span class="nv">ROCM_PATH</span><span class="si">}</span>/lib<span class="w"> </span>-lamdhip64<span class="w"> </span>-lhsa-runtime64 </pre></div> </div> <p>where the include path implies that <code class="docutils literal notranslate"><span class="pre">#include</span> <span class="pre"><hip/hip_runtime.h></span></code> is included in the source file.</p> </section> <section id="compiling-with-hipcc"> <h5>2. Compiling with <code class="docutils literal notranslate"><span class="pre">hipcc</span></code><a class="headerlink" href="#compiling-with-hipcc" title="Link to this heading"></a></h5> <p>To use GPU-aware Cray MPICH with <code class="docutils literal notranslate"><span class="pre">hipcc</span></code>, users must load specific modules, set some environment variables, and include appropriate headers and libraries. The following modules and environment variables must be set:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>module<span class="w"> </span>load<span class="w"> </span>craype-accel-amd-gfx908 module<span class="w"> </span>load<span class="w"> </span>PrgEnv-cray module<span class="w"> </span>load<span class="w"> </span>rocm <span class="c1">## These must be set before running</span> <span class="nb">export</span><span class="w"> </span><span class="nv">MPIR_CVAR_GPU_EAGER_DEVICE_MEM</span><span class="o">=</span><span class="m">0</span> <span class="nb">export</span><span class="w"> </span><span class="nv">MPICH_GPU_SUPPORT_ENABLED</span><span class="o">=</span><span class="m">1</span> <span class="nb">export</span><span class="w"> </span><span class="nv">MPICH_SMP_SINGLE_COPY_MODE</span><span class="o">=</span>CMA </pre></div> </div> <p>In addition, the following header files and libraries must be included:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>-I<span class="si">${</span><span class="nv">MPICH_DIR</span><span class="si">}</span>/include -L<span class="si">${</span><span class="nv">MPICH_DIR</span><span class="si">}</span>/lib<span class="w"> </span>-lmpi<span class="w"> </span>-L<span class="si">${</span><span class="nv">CRAY_MPICH_ROOTDIR</span><span class="si">}</span>/gtl/lib<span class="w"> </span>-lmpi_gtl_hsa </pre></div> </div> </section> </section> </section> <section id="openmp"> <h3>OpenMP<a class="headerlink" href="#openmp" title="Link to this heading"></a></h3> <p>This section shows how to compile with OpenMP using the different compilers covered above.</p> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Vendor</p></th> <th class="head"><p>Module</p></th> <th class="head"><p>Language</p></th> <th class="head"><p>Compiler</p></th> <th class="head"><p>OpenMP flag (CPU thread)</p></th> </tr> </thead> <tbody> <tr class="row-even"><td rowspan="2"><p>Cray</p></td> <td rowspan="2"><p><code class="docutils literal notranslate"><span class="pre">cce</span></code></p></td> <td><p>C, C++</p></td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">cc</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">CC</span></code></div> </div> </td> <td><p><code class="docutils literal notranslate"><span class="pre">-fopenmp</span></code></p></td> </tr> <tr class="row-odd"><td><p>Fortran</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">ftn</span></code></p></td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">-homp</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">-fopenmp</span></code> (alias)</div> </div> </td> </tr> <tr class="row-even"><td><p>AMD</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">rocm</span></code></p></td> <td><div class="line-block"> <div class="line">C</div> <div class="line">C++</div> <div class="line">Fortran</div> </div> </td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">$ROCM_PATH/llvm/bin/clang</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">$ROCM_PATH/llvm/bin/clang++</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">ROCM_PATH/llvm/bin/flang</span></code></div> </div> </td> <td><p><code class="docutils literal notranslate"><span class="pre">-fopenmp</span></code></p></td> </tr> <tr class="row-odd"><td><p>GCC</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">gcc</span></code></p></td> <td><div class="line-block"> <div class="line">C</div> <div class="line">C++</div> <div class="line">Fortran</div> </div> </td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">$GCC_PATH/bin/gcc</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">$GCC_PATH/bin/g++</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">$GCC_PATH/bin/gfortran</span></code></div> </div> </td> <td><p><code class="docutils literal notranslate"><span class="pre">-fopenmp</span></code></p></td> </tr> </tbody> </table> </section> <section id="openmp-gpu-offload"> <h3>OpenMP GPU Offload<a class="headerlink" href="#openmp-gpu-offload" title="Link to this heading"></a></h3> <p>This section shows how to compile with OpenMP Offload using the different compilers covered above.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Make sure the <code class="docutils literal notranslate"><span class="pre">craype-accel-amd-gfx908</span></code> module is loaded when using OpenMP offload.</p> </div> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Vendor</p></th> <th class="head"><p>Module</p></th> <th class="head"><p>Language</p></th> <th class="head"><p>Compiler</p></th> <th class="head"><p>OpenMP flag (GPU)</p></th> </tr> </thead> <tbody> <tr class="row-even"><td rowspan="2"><p>Cray</p></td> <td rowspan="2"><p><code class="docutils literal notranslate"><span class="pre">cce</span></code></p></td> <td><p>C C++</p></td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">cc</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">CC</span></code></div> </div> </td> <td><p><code class="docutils literal notranslate"><span class="pre">-fopenmp</span></code></p></td> </tr> <tr class="row-odd"><td><p>Fortran</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">ftn</span></code></p></td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">-homp</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">-fopenmp</span></code> (alias)</div> </div> </td> </tr> <tr class="row-even"><td><p>AMD</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">rocm</span></code></p></td> <td><div class="line-block"> <div class="line">C</div> <div class="line">C++</div> <div class="line">Fortran</div> </div> </td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">$ROCM_PATH/llvm/bin/clang</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">$ROCM_PATH/llvm/bin/clang++</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">ROCM_PATH/llvm/bin/flang</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">hipcc</span></code></div> </div> </td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">-fopenmp</span> <span class="pre">-target</span> <span class="pre">x86_64-pc-linux-gnu</span> <span class="pre">\</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">-fopenmp-targets=amdgcn-amd-amdhsa</span>   <span class="pre">\</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">-Xopenmp-target=amdgcn-amd-amdhsa</span>    <span class="pre">\</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">-march=gfx908</span></code></div> </div> </td> </tr> </tbody> </table> </section> <section id="hip"> <h3>HIP<a class="headerlink" href="#hip" title="Link to this heading"></a></h3> <p>This section shows how to compile HIP codes using the Cray compiler wrappers and <code class="docutils literal notranslate"><span class="pre">hipcc</span></code> compiler driver.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Make sure the <code class="docutils literal notranslate"><span class="pre">craype-accel-amd-gfx908</span></code> module is loaded when using HIP.</p> </div> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Compiler</p></th> <th class="head"><p>Compile/Link Flags, Header Files, and Libraries</p></th> </tr> </thead> <tbody> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">CC</span></code></p></td> <td><div class="line-block"> <div class="line"><code class="docutils literal notranslate"><span class="pre">CFLAGS</span> <span class="pre">=</span> <span class="pre">-std=c++11</span> <span class="pre">-D__HIP_ROCclr__</span> <span class="pre">-D__HIP_ARCH_GFX908__=1</span> <span class="pre">--rocm-path=${ROCM_PATH}</span> <span class="pre">--offload-arch=gfx908</span> <span class="pre">-x</span> <span class="pre">hip</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">LFLAGS</span> <span class="pre">=</span> <span class="pre">-std=c++11</span> <span class="pre">-D__HIP_ROCclr__</span> <span class="pre">--rocm-path=${ROCM_PATH}</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">-I${HIP_PATH}/include</span></code></div> <div class="line"><code class="docutils literal notranslate"><span class="pre">-L${HIP_PATH}/lib</span> <span class="pre">-lamdhip64</span></code></div> </div> </td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">hipcc</span></code></p></td> <td><div class="line-block"> <div class="line">Can be used directly to compile HIP source files.</div> <div class="line">To see what is being invoked within this compiler driver, issue the command, <code class="docutils literal notranslate"><span class="pre">hipcc</span> <span class="pre">--verbose</span></code></div> </div> </td> </tr> </tbody> </table> </section> </section> <hr class="docutils" /> <section id="running-jobs"> <h2>Running Jobs<a class="headerlink" href="#running-jobs" title="Link to this heading"></a></h2> <p>This section describes how to run programs on the Spock compute nodes, including a brief overview of Slurm and also how to map processes and threads to CPU cores and GPUs.</p> <section id="slurm-workload-manager"> <h3>Slurm Workload Manager<a class="headerlink" href="#slurm-workload-manager" title="Link to this heading"></a></h3> <p><a class="reference external" href="https://slurm.schedmd.com/" target="_blank">Slurm</a> is the workload manager used to interact with the compute nodes on Spock. In the following subsections, the most commonly used Slurm commands for submitting, running, and monitoring jobs will be covered, but users are encouraged to visit the official documentation and man pages for more information.</p> <section id="batch-scheduler-and-job-launcher"> <h4>Batch Scheduler and Job Launcher<a class="headerlink" href="#batch-scheduler-and-job-launcher" title="Link to this heading"></a></h4> <p>Slurm provides 3 ways of submitting and launching jobs on Spock’s compute nodes: batch scripts, interactive, and single-command. The Slurm commands associated with these methods are shown in the table below and examples of their use can be found in the related subsections.</p> <table class="docutils align-default"> <tbody> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">sbatch</span></code></p></td> <td><div class="line-block"> <div class="line">Used to submit a batch script to allocate a Slurm job allocation. The script contains options preceded with <code class="docutils literal notranslate"><span class="pre">#SBATCH</span></code>.</div> <div class="line">(see Batch Scripts section below)</div> </div> </td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">salloc</span></code></p></td> <td><div class="line-block"> <div class="line">Used to allocate an interactive Slurm job allocation, where one or more job steps (i.e., <code class="docutils literal notranslate"><span class="pre">srun</span></code> commands) can then be launched on the allocated resources (i.e., nodes).</div> <div class="line">(see Interactive Jobs section below)</div> </div> </td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">srun</span></code></p></td> <td><div class="line-block"> <div class="line">Used to run a parallel job (job step) on the resources allocated with sbatch or <code class="docutils literal notranslate"><span class="pre">salloc</span></code>.</div> <div class="line">If necessary, srun will first create a resource allocation in which to run the parallel job(s).</div> <div class="line">(see Single Command section below)</div> </div> </td> </tr> </tbody> </table> <section id="batch-scripts"> <h5>Batch Scripts<a class="headerlink" href="#batch-scripts" title="Link to this heading"></a></h5> <p>A batch script can be used to submit a job to run on the compute nodes at a later time. In this case, stdout and stderr will be written to a file(s) that can be opened after the job completes. Here is an example of a simple batch script:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="linenos">1</span><span class="ch">#!/bin/bash</span> <span class="linenos">2</span><span class="c1">#SBATCH -A <project_id></span> <span class="linenos">3</span><span class="c1">#SBATCH -J <job_name></span> <span class="linenos">4</span><span class="c1">#SBATCH -o %x-%j.out</span> <span class="linenos">5</span><span class="c1">#SBATCH -t 00:05:00</span> <span class="linenos">6</span><span class="c1">#SBATCH -p <partition></span> <span class="linenos">7</span><span class="c1">#SBATCH -N 2</span> <span class="linenos">8</span> <span class="linenos">9</span>srun<span class="w"> </span>-n4<span class="w"> </span>--ntasks-per-node<span class="o">=</span><span class="m">2</span><span class="w"> </span>./a.out </pre></div> </div> <p>The Slurm submission options are preceded by <code class="docutils literal notranslate"><span class="pre">#SBATCH</span></code>, making them appear as comments to a shell (since comments begin with <code class="docutils literal notranslate"><span class="pre">#</span></code>). Slurm will look for submission options from the first line through the first non-comment line. Options encountered after the first non-comment line will not be read by Slurm. In the example script, the lines are:</p> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Line</p></th> <th class="head"><p>Description</p></th> </tr> </thead> <tbody> <tr class="row-even"><td><p>1</p></td> <td><p>[Optional] shell interpreter line</p></td> </tr> <tr class="row-odd"><td><p>2</p></td> <td><p>OLCF project to charge</p></td> </tr> <tr class="row-even"><td><p>3</p></td> <td><p>Job name</p></td> </tr> <tr class="row-odd"><td><p>4</p></td> <td><p>stdout file name ( <code class="docutils literal notranslate"><span class="pre">%x</span></code> represents job name, <code class="docutils literal notranslate"><span class="pre">%j</span></code> represents job id)</p></td> </tr> <tr class="row-even"><td><p>5</p></td> <td><p>Walltime requested (<code class="docutils literal notranslate"><span class="pre">HH:MM:SS</span></code>)</p></td> </tr> <tr class="row-odd"><td><p>6</p></td> <td><p>Batch queue</p></td> </tr> <tr class="row-even"><td><p>7</p></td> <td><p>Number of compute nodes requested</p></td> </tr> <tr class="row-odd"><td><p>8</p></td> <td><p>Blank line</p></td> </tr> <tr class="row-even"><td><p>9</p></td> <td><p><code class="docutils literal notranslate"><span class="pre">srun</span></code> command to launch parallel job (requesting 4 processes - 2 per node)</p></td> </tr> </tbody> </table> </section> <section id="interactive-jobs"> <span id="interactive-spock"></span><h5>Interactive Jobs<a class="headerlink" href="#interactive-jobs" title="Link to this heading"></a></h5> <p>To request an interactive job where multiple job steps (i.e., multiple srun commands) can be launched on the allocated compute node(s), the <code class="docutils literal notranslate"><span class="pre">salloc</span></code> command can be used:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>salloc<span class="w"> </span>-A<span class="w"> </span><project_id><span class="w"> </span>-J<span class="w"> </span><job_name><span class="w"> </span>-t<span class="w"> </span><span class="m">00</span>:05:00<span class="w"> </span>-p<span class="w"> </span><partition><span class="w"> </span>-N<span class="w"> </span><span class="m">2</span> salloc:<span class="w"> </span>Granted<span class="w"> </span>job<span class="w"> </span>allocation<span class="w"> </span><span class="m">4258</span> salloc:<span class="w"> </span>Waiting<span class="w"> </span><span class="k">for</span><span class="w"> </span>resource<span class="w"> </span>configuration salloc:<span class="w"> </span>Nodes<span class="w"> </span>spock<span class="o">[</span><span class="m">10</span>-11<span class="o">]</span><span class="w"> </span>are<span class="w"> </span>ready<span class="w"> </span><span class="k">for</span><span class="w"> </span>job $<span class="w"> </span>srun<span class="w"> </span>-n<span class="w"> </span><span class="m">4</span><span class="w"> </span>--ntasks-per-node<span class="o">=</span><span class="m">2</span><span class="w"> </span>./a.out <output<span class="w"> </span>printed<span class="w"> </span>to<span class="w"> </span>terminal> $<span class="w"> </span>srun<span class="w"> </span>-n<span class="w"> </span><span class="m">2</span><span class="w"> </span>--ntasks-per-node<span class="o">=</span><span class="m">1</span><span class="w"> </span>./a.out <output<span class="w"> </span>printed<span class="w"> </span>to<span class="w"> </span>terminal> </pre></div> </div> <p>Here, <code class="docutils literal notranslate"><span class="pre">salloc</span></code> is used to request an allocation of 2 MI100 compute nodes for 5 minutes. Once the resources become available, the user is granted access to the compute nodes (<code class="docutils literal notranslate"><span class="pre">spock10</span></code> and <code class="docutils literal notranslate"><span class="pre">spock11</span></code> in this case) and can launch job steps on them using srun.</p> </section> <section id="single-command-non-interactive"> <span id="single-command-spock"></span><h5>Single Command (non-interactive)<a class="headerlink" href="#single-command-non-interactive" title="Link to this heading"></a></h5> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>srun<span class="w"> </span>-A<span class="w"> </span><project_id><span class="w"> </span>-t<span class="w"> </span><span class="m">00</span>:05:00<span class="w"> </span>-p<span class="w"> </span><partition><span class="w"> </span>-N<span class="w"> </span><span class="m">2</span><span class="w"> </span>-n<span class="w"> </span><span class="m">4</span><span class="w"> </span>--ntasks-per-node<span class="o">=</span><span class="m">2</span><span class="w"> </span>./a.out <output<span class="w"> </span>printed<span class="w"> </span>to<span class="w"> </span>terminal> </pre></div> </div> <p>The job name and output options have been removed since stdout/stderr are typically desired in the terminal window in this usage mode.</p> </section> </section> <section id="common-slurm-submission-options"> <h4>Common Slurm Submission Options<a class="headerlink" href="#common-slurm-submission-options" title="Link to this heading"></a></h4> <p>The table below summarizes commonly-used Slurm job submission options:</p> <table class="docutils align-default"> <tbody> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">A</span> <span class="pre"><project_id></span></code></p></td> <td><p>Project ID to charge</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">-J</span> <span class="pre"><job_name></span></code></p></td> <td><p>Name of job</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-p</span> <span class="pre"><partition></span></code></p></td> <td><p>Partition / batch queue</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">-t</span> <span class="pre"><time></span></code></p></td> <td><p>Wall clock time <<code class="docutils literal notranslate"><span class="pre">HH:MM:SS</span></code>></p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-N</span> <span class="pre"><number_of_nodes></span></code></p></td> <td><p>Number of compute nodes</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">-o</span> <span class="pre"><file_name></span></code></p></td> <td><p>Standard output file name</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-e</span> <span class="pre"><file_name></span></code></p></td> <td><p>Standard error file name</p></td> </tr> </tbody> </table> <p>For more information about these and/or other options, please see the <code class="docutils literal notranslate"><span class="pre">sbatch</span></code> man page.</p> </section> <section id="other-common-slurm-commands"> <h4>Other Common Slurm Commands<a class="headerlink" href="#other-common-slurm-commands" title="Link to this heading"></a></h4> <p>The table below summarizes commonly-used Slurm commands:</p> <table class="docutils align-default"> <tbody> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">sinfo</span></code></p></td> <td><div class="line-block"> <div class="line">Used to view partition and node information.</div> <div class="line">E.g., to view user-defined details about the caar queue:</div> <div class="line"><code class="docutils literal notranslate"><span class="pre">sinfo</span> <span class="pre">-p</span> <span class="pre">caar</span> <span class="pre">-o</span> <span class="pre">"%15N</span> <span class="pre">%10D</span> <span class="pre">%10P</span> <span class="pre">%10a</span> <span class="pre">%10c</span> <span class="pre">%10z"</span></code></div> </div> </td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">squeue</span></code></p></td> <td><div class="line-block"> <div class="line">Used to view job and job step information for jobs in the scheduling queue.</div> <div class="line">E.g., to see all jobs from a specific user:</div> <div class="line"><code class="docutils literal notranslate"><span class="pre">squeue</span> <span class="pre">-l</span> <span class="pre">-u</span> <span class="pre"><user_id></span></code></div> </div> </td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">sacct</span></code></p></td> <td><div class="line-block"> <div class="line">Used to view accounting data for jobs and job steps in the job accounting log (currently in the queue or recently completed).</div> <div class="line">E.g., to see a list of specified information about all jobs submitted/run by a users since 1 PM on January 4, 2021:</div> <div class="line"><code class="docutils literal notranslate"><span class="pre">sacct</span> <span class="pre">-u</span> <span class="pre"><username></span> <span class="pre">-S</span> <span class="pre">2021-01-04T13:00:00</span> <span class="pre">-o</span> <span class="pre">"jobid%5,jobname%25,user%15,nodelist%20"</span> <span class="pre">-X</span></code></div> </div> </td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">scancel</span></code></p></td> <td><div class="line-block"> <div class="line">Used to signal or cancel jobs or job steps.</div> <div class="line">E.g., to cancel a job:</div> <div class="line"><code class="docutils literal notranslate"><span class="pre">scancel</span> <span class="pre"><jobid></span></code></div> </div> </td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">scontrol</span></code></p></td> <td><div class="line-block"> <div class="line">Used to view or modify job configuration.</div> <div class="line">E.g., to place a job on hold:</div> <div class="line"><code class="docutils literal notranslate"><span class="pre">scontrol</span> <span class="pre">hold</span> <span class="pre"><jobid></span></code></div> </div> </td> </tr> </tbody> </table> </section> </section> <hr class="docutils" /> <section id="slurm-compute-node-partitions"> <h3>Slurm Compute Node Partitions<a class="headerlink" href="#slurm-compute-node-partitions" title="Link to this heading"></a></h3> <p>Spock’s compute nodes are separated into 2 Slurm partitions (queues): 1 for CAAR projects and 1 for ECP projects. Please see the tables below for details.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>If CAAR or ECP teams require a temporary exception to this policy, please email <a class="reference external" href="mailto:help%40olcf.ornl.gov" target="_blank">help<span>@</span>olcf<span>.</span>ornl<span>.</span>gov</a> with your request and it will be given to the OLCF Resource Utilization Council (RUC) for review.</p> </div> <section id="caar-partition"> <h4>CAAR Partition<a class="headerlink" href="#caar-partition" title="Link to this heading"></a></h4> <p>The CAAR partition consists of 24 total compute nodes. On a per-project basis, each user can have 1 running and 1 eligible job at a time, with no limit on the number of jobs submitted.</p> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Number of Nodes</p></th> <th class="head"><p>Max Walltime</p></th> </tr> </thead> <tbody> <tr class="row-even"><td><p>1 - 4</p></td> <td><p>3 hours</p></td> </tr> <tr class="row-odd"><td><p>5 - 16</p></td> <td><p>1 hour</p></td> </tr> </tbody> </table> </section> <section id="ecp-partition"> <h4>ECP Partition<a class="headerlink" href="#ecp-partition" title="Link to this heading"></a></h4> <p>The ECP partition consists of 12 total compute nodes. On a per-project basis, each user can have 1 running and 1 eligible job at a time, with up to 5 jobs submitted.</p> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head"><p>Number of Nodes</p></th> <th class="head"><p>Max Walltime</p></th> </tr> </thead> <tbody> <tr class="row-even"><td><p>1 - 4</p></td> <td><p>3 hours</p></td> </tr> </tbody> </table> </section> </section> <section id="process-and-thread-mapping"> <h3>Process and Thread Mapping<a class="headerlink" href="#process-and-thread-mapping" title="Link to this heading"></a></h3> <p>This section describes how to map processes (e.g., MPI ranks) and process threads (e.g., OpenMP threads) to the CPUs and GPUs on Spock. The <a class="reference internal" href="#spock-compute-nodes"><span class="std std-ref">Spock Compute Nodes</span></a> diagram will be helpful when reading this section to understand which hardware threads your processes and threads run on.</p> <section id="cpu-mapping"> <h4>CPU Mapping<a class="headerlink" href="#cpu-mapping" title="Link to this heading"></a></h4> <p>In this sub-section, a simple MPI+OpenMP “Hello, World” program (<a class="reference external" href="https://code.ornl.gov/olcf/hello_mpi_omp" target="_blank">hello_mpi_omp</a>) will be used to clarify the mappings. Slurm’s <a class="reference internal" href="#interactive-spock"><span class="std std-ref">Interactive Jobs</span></a> method was used to request an allocation of 1 compute node for these examples: <code class="docutils literal notranslate"><span class="pre">salloc</span> <span class="pre">-A</span> <span class="pre"><project_id></span> <span class="pre">-t</span> <span class="pre">30</span> <span class="pre">-p</span> <span class="pre"><parition></span> <span class="pre">-N</span> <span class="pre">1</span></code></p> <p>The <code class="docutils literal notranslate"><span class="pre">srun</span></code> options used in this section are (see <code class="docutils literal notranslate"><span class="pre">man</span> <span class="pre">srun</span></code> for more information):</p> <table class="docutils align-default"> <tbody> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-c,</span> <span class="pre">--cpus-per-task=<ncpus></span></code></p></td> <td><div class="line-block"> <div class="line">Request that <code class="docutils literal notranslate"><span class="pre">ncpus</span></code> be allocated per process (default is 1).</div> <div class="line">(<code class="docutils literal notranslate"><span class="pre">ncpus</span></code> refers to hardware threads)</div> </div> </td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">--threads-per-core=<threads></span></code></p></td> <td><div class="line-block"> <div class="line">In task layout, use the specified maximum number of threads per core</div> <div class="line">(default is 1; there are 2 hardware threads per physical CPU core).</div> </div> </td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">--cpu-bind=threads</span></code></p></td> <td><div class="line-block"> <div class="line">Bind tasks to CPUs.</div> <div class="line"><code class="docutils literal notranslate"><span class="pre">threads</span></code> - Automatically generate masks binding tasks to threads.</div> <div class="line">(Although this option is not explicitly used in these examples, it is the default CPU binding.)</div> </div> </td> </tr> </tbody> </table> <div class="admonition note"> <p class="admonition-title">Note</p> <p>In the <code class="docutils literal notranslate"><span class="pre">srun</span></code> man page (and so the table above), threads refers to hardware threads.</p> </div> <section id="mpi-ranks-each-with-2-openmp-threads"> <h5>2 MPI ranks - each with 2 OpenMP threads<a class="headerlink" href="#mpi-ranks-each-with-2-openmp-threads" title="Link to this heading"></a></h5> <p>In this example, the intent is to launch 2 MPI ranks, each of which spawn 2 OpenMP threads, and have all of the 4 OpenMP threads run on different physical CPU cores.</p> <p><strong>First (INCORRECT) attempt</strong></p> <p>To set the number of OpenMP threads spawned per MPI rank, the <code class="docutils literal notranslate"><span class="pre">OMP_NUM_THREADS</span></code> environment variable can be used. To set the number of MPI ranks launched, the <code class="docutils literal notranslate"><span class="pre">srun</span></code> flag <code class="docutils literal notranslate"><span class="pre">-n</span></code> can be used.</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">2</span> $<span class="w"> </span>srun<span class="w"> </span>-n2<span class="w"> </span>./hello_mpi_omp<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort WARNING:<span class="w"> </span>Requested<span class="w"> </span>total<span class="w"> </span>thread<span class="w"> </span>count<span class="w"> </span>and/or<span class="w"> </span>thread<span class="w"> </span>affinity<span class="w"> </span>may<span class="w"> </span>result<span class="w"> </span><span class="k">in</span> oversubscription<span class="w"> </span>of<span class="w"> </span>available<span class="w"> </span>CPU<span class="w"> </span>resources!<span class="w"> </span>Performance<span class="w"> </span>may<span class="w"> </span>be<span class="w"> </span>degraded. Explicitly<span class="w"> </span><span class="nb">set</span><span class="w"> </span><span class="nv">OMP_WAIT_POLICY</span><span class="o">=</span>PASSIVE<span class="w"> </span>or<span class="w"> </span>ACTIVE<span class="w"> </span>to<span class="w"> </span>suppress<span class="w"> </span>this<span class="w"> </span>message. Set<span class="w"> </span><span class="nv">CRAY_OMP_CHECK_AFFINITY</span><span class="o">=</span>TRUE<span class="w"> </span>to<span class="w"> </span>print<span class="w"> </span>detailed<span class="w"> </span>thread-affinity<span class="w"> </span>messages. WARNING:<span class="w"> </span>Requested<span class="w"> </span>total<span class="w"> </span>thread<span class="w"> </span>count<span class="w"> </span>and/or<span class="w"> </span>thread<span class="w"> </span>affinity<span class="w"> </span>may<span class="w"> </span>result<span class="w"> </span><span class="k">in</span> oversubscription<span class="w"> </span>of<span class="w"> </span>available<span class="w"> </span>CPU<span class="w"> </span>resources!<span class="w"> </span>Performance<span class="w"> </span>may<span class="w"> </span>be<span class="w"> </span>degraded. Explicitly<span class="w"> </span><span class="nb">set</span><span class="w"> </span><span class="nv">OMP_WAIT_POLICY</span><span class="o">=</span>PASSIVE<span class="w"> </span>or<span class="w"> </span>ACTIVE<span class="w"> </span>to<span class="w"> </span>suppress<span class="w"> </span>this<span class="w"> </span>message. Set<span class="w"> </span><span class="nv">CRAY_OMP_CHECK_AFFINITY</span><span class="o">=</span>TRUE<span class="w"> </span>to<span class="w"> </span>print<span class="w"> </span>detailed<span class="w"> </span>thread-affinity<span class="w"> </span>messages. MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock01 MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock01 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock01 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock01 </pre></div> </div> <p>The first thing to notice here is the <code class="docutils literal notranslate"><span class="pre">WARNING</span></code> about oversubscribing the available CPU cores. Also, the output shows each MPI rank did spawn 2 OpenMP threads, but both OpenMP threads ran on the same hardware thread (for a given MPI rank). This was not the intended behavior; each OpenMP thread was meant to run on its own physical CPU core.</p> <p><strong>Second (CORRECT) attempt</strong></p> <p>By default, each MPI rank is allocated only 1 hardware thread, so both OpenMP threads only have that 1 hardware thread to run on - hence the WARNING and undesired behavior. In order for each OpenMP thread to run on its own physical CPU core, each MPI rank should be given 2 hardware thread (<code class="docutils literal notranslate"><span class="pre">-c</span> <span class="pre">2</span></code>) - since, by default, only 1 hardware thread per physical CPU core is enabled (this would need to be <code class="docutils literal notranslate"><span class="pre">-c</span> <span class="pre">4</span></code> if <code class="docutils literal notranslate"><span class="pre">--threads-per-core=2</span></code> instead of the default of <code class="docutils literal notranslate"><span class="pre">1</span></code>. The OpenMP threads will be mapped to unique physical CPU cores unless there are not enough physical CPU cores available, in which case the remaining OpenMP threads will share hardware threads and a WARNING will be issued as shown in the previous example.</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">2</span> $<span class="w"> </span>srun<span class="w"> </span>-n2<span class="w"> </span>-c2<span class="w"> </span>./hello_mpi_omp<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13 MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13 </pre></div> </div> <p>Now the output shows that each OpenMP thread ran on (one of the hardware threads of) its own physical CPU cores. More specifically (see the Spock Compute Node diagram), OpenMP thread 000 of MPI rank 000 ran on hardware thread 000 (i.e., physical CPU core 00), OpenMP thread 001 of MPI rank 000 ran on hardware thread 001 (i.e., physical CPU core 01), OpenMP thread 000 of MPI rank 001 ran on hardware thread 016 (i.e., physical CPU core 16), and OpenMP thread 001 of MPI rank 001 ran on hardware thread 017 (i.e., physical CPU core 17) - as expected.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>There are many different ways users might choose to perform these mappings, so users are encouraged to clone the <code class="docutils literal notranslate"><span class="pre">hello_mpi_omp</span></code> program and test whether or not processes and threads are running where intended.</p> </div> </section> </section> <section id="gpu-mapping"> <h4>GPU Mapping<a class="headerlink" href="#gpu-mapping" title="Link to this heading"></a></h4> <p>In this sub-section, an MPI+OpenMP+HIP “Hello, World” program (<a class="reference external" href="https://code.ornl.gov/olcf/hello_jobstep" target="_blank">hello_jobstep</a>) will be used to clarify the GPU mappings. Again, Slurm’s <a class="reference internal" href="#interactive-spock"><span class="std std-ref">Interactive Jobs</span></a> method was used to request an allocation of 2 compute node for these examples: <code class="docutils literal notranslate"><span class="pre">salloc</span> <span class="pre">-A</span> <span class="pre"><project_id></span> <span class="pre">-t</span> <span class="pre">30</span> <span class="pre">-p</span> <span class="pre"><parition></span> <span class="pre">-N</span> <span class="pre">2</span></code>. The CPU mapping part of this example is very similar to the example used above in the CPU Mapping sub-section, so the focus here will be on the GPU mapping part.</p> <p>The following <code class="docutils literal notranslate"><span class="pre">srun</span></code> options will be used in the examples below. See <code class="docutils literal notranslate"><span class="pre">man</span> <span class="pre">srun</span></code> for a complete list of options and more information.</p> <table class="docutils align-default"> <tbody> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">--gpus-per-task</span></code></p></td> <td><p>Specify the number of GPUs required for the job on each task to be spawned in the job’s resource allocation.</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">--gpu-bind=closest</span></code></p></td> <td><p>Binds each task to the GPU which is on the same NUMA domain as the CPU core the MPI rank is running on.</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">--gpu-bind=map_gpu:<list></span></code></p></td> <td><p>Bind tasks to specific GPUs by setting GPU masks on tasks (or ranks) as specified where <code class="docutils literal notranslate"><span class="pre"><list></span></code> is <code class="docutils literal notranslate"><span class="pre"><gpu_id_for_task_0>,<gpu_id_for_task_1>,...</span></code>. If the number of tasks (or ranks) exceeds the number of elements in this list, elements in the list will be reused as needed starting from the beginning of the list. To simplify support for large task counts, the lists may follow a map with an asterisk and repetition count. (For example <code class="docutils literal notranslate"><span class="pre">map_gpu:0*4,1*4</span></code>)</p></td> </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">--ntasks-per-gpu=<ntasks></span></code></p></td> <td><p>Request that there are ntasks tasks invoked for every GPU.</p></td> </tr> <tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">--distribution=<value>[:<value>][:<value>]</span></code></p></td> <td><p>Specifies the distribution of MPI ranks across compute nodes, sockets (NUMA domains on Spock), and cores, respectively. The default values are <code class="docutils literal notranslate"><span class="pre">block:cyclic:cyclic</span></code></p></td> </tr> </tbody> </table> <div class="admonition note"> <p class="admonition-title">Note</p> <p>In general, GPU mapping can be accomplished in different ways. For example, an application might map MPI ranks to GPUs programmatically within the code using, say, <code class="docutils literal notranslate"><span class="pre">hipSetDevice</span></code>. In this case, since all GPUs on a node are available to all MPI ranks on that node by default, there might not be a need to map to GPUs using Slurm (just do it in the code). However, in another application, there might be a reason to make only a subset of GPUs available to the MPI ranks on a node. It is this latter case that the following examples refer to.</p> </div> <section id="mapping-1-task-per-gpu"> <h5>Mapping 1 task per GPU<a class="headerlink" href="#mapping-1-task-per-gpu" title="Link to this heading"></a></h5> <p>In the following examples, each MPI rank (and its OpenMP threads) will be mapped to a single GPU.</p> <p><strong>Example 1: 4 MPI ranks - each with 2 OpenMP threads and 1 GPU (single-node)</strong></p> <p>This example launches 4 MPI ranks (<code class="docutils literal notranslate"><span class="pre">-n4</span></code>), each with 2 physical CPU cores (<code class="docutils literal notranslate"><span class="pre">-c2</span></code>) to launch 2 OpenMP threads (<code class="docutils literal notranslate"><span class="pre">OMP_NUM_THREADS=2</span></code>) on. In addition, each MPI rank (and its 2 OpenMP threads) should have access to only 1 GPU. To accomplish the GPU mapping, two new <code class="docutils literal notranslate"><span class="pre">srun</span></code> options will be used:</p> <ul class="simple"> <li><p><code class="docutils literal notranslate"><span class="pre">--gpus-per-task</span></code> specifies the number of GPUs required for the job on each task</p></li> <li><p><code class="docutils literal notranslate"><span class="pre">--gpu-bind=closest</span></code> binds each task to the GPU which is closest.</p></li> </ul> <div class="admonition note"> <p class="admonition-title">Note</p> <p>To further clarify, <code class="docutils literal notranslate"><span class="pre">--gpus-per-task</span></code> does not actually bind GPUs to MPI ranks. It allocates GPUs to the job step. The <code class="docutils literal notranslate"><span class="pre">--gpu-bind=closest</span></code> is what actually maps a specific GPU to each rank; namely, the “closest” one, which is the GPU on the same NUMA domain as the CPU core the MPI rank is running on (see the <a class="reference internal" href="#spock-compute-nodes"><span class="std std-ref">Spock Compute Nodes</span></a> section).</p> </div> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Without these additional flags, all MPI ranks would have access to all GPUs (which is the default behavior).</p> </div> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">2</span> $<span class="w"> </span>srun<span class="w"> </span>-N1<span class="w"> </span>-n4<span class="w"> </span>-c2<span class="w"> </span>--gpus-per-task<span class="o">=</span><span class="m">1</span><span class="w"> </span>--gpu-bind<span class="o">=</span>closest<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> </pre></div> </div> <p>The output contains different IDs associated with the GPUs so it is important to first describe these IDs before moving on. <code class="docutils literal notranslate"><span class="pre">GPU_ID</span></code> is the node-level (or global) GPU ID, which is labeled as one might expect from looking at a node diagram: 0, 1, 2, 3. <code class="docutils literal notranslate"><span class="pre">RT_GPU_ID</span></code> is the HIP runtime GPU ID, which can be thought of as each MPI rank’s local GPU ID numbering (with zero-based indexing). So in the output above, each MPI rank has access to 1 unique GPU - where MPI 000 has access to GPU 0, MPI 001 has access to GPU 1, etc., but all MPI ranks show a HIP runtime GPU ID of 0. The reason is that each MPI rank only “sees” one GPU and so the HIP runtime labels it as “0”, even though it might be global GPU ID 0, 1, 2, or 3. The GPU’s bus ID is included to definitively show that different GPUs are being used.</p> <p>Here is a summary of the different GPU IDs reported by the example program:</p> <ul class="simple"> <li><p><code class="docutils literal notranslate"><span class="pre">GPU_ID</span></code> is the node-level (or global) GPU ID read from <code class="docutils literal notranslate"><span class="pre">ROCR_VISIBLE_DEVICES</span></code>. If this environment variable is not set (either by the user or by Slurm), the value of <code class="docutils literal notranslate"><span class="pre">GPU_ID</span></code> will be set to <code class="docutils literal notranslate"><span class="pre">N/A</span></code>.</p></li> <li><p><code class="docutils literal notranslate"><span class="pre">RT_GPU_ID</span></code> is the HIP runtime GPU ID (as reported from, say <code class="docutils literal notranslate"><span class="pre">hipGetDevice</span></code>).</p></li> <li><p><code class="docutils literal notranslate"><span class="pre">Bus_ID</span></code> is the physical bus ID associated with the GPUs. Comparing the bus IDs is meant to definitively show that different GPUs are being used.</p></li> </ul> <p>So the job step (i.e., <code class="docutils literal notranslate"><span class="pre">srun</span></code> command) used above gave the desired output. Each MPI rank spawned 2 OpenMP threads and had access to a unique GPU. The <code class="docutils literal notranslate"><span class="pre">--gpus-per-task=1</span></code> allocated 1 GPU for each MPI rank and the <code class="docutils literal notranslate"><span class="pre">--gpu-bind=closest</span></code> ensured that the closest GPU to each rank was the one used.</p> <p><strong>Example 2: 8 MPI ranks - each with 2 OpenMP threads and 1 GPU (multi-node)</strong></p> <p>This example will extend Example 1 to run on 2 nodes. As the output shows, it is a very straightforward exercise of changing the number of nodes to 2 (<code class="docutils literal notranslate"><span class="pre">-N2</span></code>) and the number of MPI ranks to 8 (<code class="docutils literal notranslate"><span class="pre">-n8</span></code>).</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">2</span> $<span class="w"> </span>srun<span class="w"> </span>-N2<span class="w"> </span>-n8<span class="w"> </span>-c2<span class="w"> </span>--gpus-per-task<span class="o">=</span><span class="m">1</span><span class="w"> </span>--gpu-bind<span class="o">=</span>closest<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">004</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">004</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">006</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">006</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">007</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">007</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> </pre></div> </div> <p><strong>Example 3: 4 MPI ranks - each with 2 OpenMP threads and 1 *specific* GPU (single-node)</strong></p> <p>This example will be very similar to Example 1, but instead of using <code class="docutils literal notranslate"><span class="pre">--gpu-bind=closest</span></code> to map each MPI rank to the closest GPU, <code class="docutils literal notranslate"><span class="pre">--gpu-bind=map_gpu</span></code> will be used to map each MPI rank to a <em>specific</em> GPU. The <code class="docutils literal notranslate"><span class="pre">map_gpu</span></code> option takes a comma-separated list of GPU IDs to specify how the MPI ranks are mapped to GPUs, where the form of the comma-separated list is <code class="docutils literal notranslate"><span class="pre"><gpu_id_for_task_0>,</span> <span class="pre"><gpu_id_for_task_1>,...</span></code>.</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">2</span> $<span class="w"> </span>srun<span class="w"> </span>-N1<span class="w"> </span>-n4<span class="w"> </span>-c2<span class="w"> </span>--gpus-per-task<span class="o">=</span><span class="m">1</span><span class="w"> </span>--gpu-bind<span class="o">=</span>map_gpu:0,1,2,3<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> </pre></div> </div> <p>Here, the output is the same as the results from Example 1. This is because the 4 GPU IDs in the comma-separated list happen to specify the GPUs within the same NUMA domains that the MPI ranks are in. So MPI 000 is mapped to GPU 0, MPI 001 is mapped to GPU 1, etc.</p> <p>While this level of control over mapping MPI ranks to GPUs might be useful for some applications, it is always important to consider the implication of the mapping. For example, if the order of the GPU IDs in the <code class="docutils literal notranslate"><span class="pre">map_gpu</span></code> option is reversed, the MPI ranks and the GPUs they are mapped to would be in different NUMA domains, which could potentially lead to poorer performance.</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">2</span> $<span class="w"> </span>srun<span class="w"> </span>-N1<span class="w"> </span>-n4<span class="w"> </span>-c2<span class="w"> </span>--gpus-per-task<span class="o">=</span><span class="m">1</span><span class="w"> </span>--gpu-bind<span class="o">=</span>map_gpu:3,2,1,0<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 </pre></div> </div> <p>Here, notice that MPI 000 now maps to GPU 3, MPI 001 maps to GPU 2, etc., so the MPI ranks are not in the same NUMA domains as the GPUs they are mapped to.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Again, this particular example would NOT be a very good mapping of GPUs to MPI ranks though. E.g., notice that MPI rank 000 is running on NUMA node 0, whereas GPU 3 is on NUMA node 3. Again, see the <a class="reference internal" href="#spock-compute-nodes"><span class="std std-ref">Spock Compute Nodes</span></a> section for NUMA descriptions.</p> </div> <p><strong>Example 4: 8 MPI ranks - each with 2 OpenMP threads and 1 *specific* GPU (multi-node)</strong></p> <p>Extending Examples 2 and 3 to run on 2 nodes is also a straightforward exercise by changing the number of nodes to 2 (<code class="docutils literal notranslate"><span class="pre">-N2</span></code>) and the number of MPI ranks to 8 (<code class="docutils literal notranslate"><span class="pre">-n8</span></code>).</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">2</span> $<span class="w"> </span>srun<span class="w"> </span>-N2<span class="w"> </span>-n8<span class="w"> </span>-c2<span class="w"> </span>--gpus-per-task<span class="o">=</span><span class="m">1</span><span class="w"> </span>--gpu-bind<span class="o">=</span>map_gpu:0,1,2,3<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">004</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">004</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">006</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">006</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">007</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">007</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> </pre></div> </div> </section> <section id="mapping-multiple-mpi-ranks-to-a-single-gpu"> <h5>Mapping multiple MPI ranks to a single GPU<a class="headerlink" href="#mapping-multiple-mpi-ranks-to-a-single-gpu" title="Link to this heading"></a></h5> <p>In the following examples, 2 MPI ranks will be mapped to 1 GPU. For the sake of brevity, <code class="docutils literal notranslate"><span class="pre">OMP_NUM_THREADS</span></code> will be set to <code class="docutils literal notranslate"><span class="pre">1</span></code>, so <code class="docutils literal notranslate"><span class="pre">-c1</span></code> will be used unless otherwise specified.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>On AMD’s MI100 GPUs, multi-process service (MPS) is not needed since multiple MPI ranks per GPU is supported natively.</p> </div> <p><strong>Example 5: 8 MPI ranks - where 2 ranks share a GPU (round-robin, single-node)</strong></p> <p>This example launches 8 MPI ranks (<code class="docutils literal notranslate"><span class="pre">-n8</span></code>), each with 1 physical CPU core (<code class="docutils literal notranslate"><span class="pre">-c1</span></code>) to launch 1 OpenMP thread (<code class="docutils literal notranslate"><span class="pre">OMP_NUM_THREADS=1</span></code>) on. The MPI ranks will be assigned to GPUs in a round-robin fashion so that each of the 4 GPUs on the node are shared by 2 MPI ranks. To accomplish this GPU mapping, a new <code class="docutils literal notranslate"><span class="pre">srun</span></code> option will be used:</p> <ul class="simple"> <li><p><code class="docutils literal notranslate"><span class="pre">--ntasks-per-gpu</span></code> specifies the number of MPI ranks that will share access to a GPU.</p></li> </ul> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">1</span> $<span class="w"> </span>srun<span class="w"> </span>-N1<span class="w"> </span>-n8<span class="w"> </span>-c1<span class="w"> </span>--ntasks-per-gpu<span class="o">=</span><span class="m">2</span><span class="w"> </span>--gpu-bind<span class="o">=</span>closest<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">004</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">006</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">007</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> </pre></div> </div> <p>The output shows the round-robin (<code class="docutils literal notranslate"><span class="pre">cyclic</span></code>) distribution of MPI ranks to GPUs. In fact, it is a round-robin distribution of MPI ranks <em>to NUMA domains</em> (the default distribution). The GPU mapping is a consequence of where the MPI ranks are distributed; <code class="docutils literal notranslate"><span class="pre">--gpu-bind=closest</span></code> simply maps the GPU in a NUMA domain to the MPI ranks in the same NUMA domain.</p> <p><strong>Example 6: 16 MPI ranks - where 2 ranks share a GPU (round-robin, multi-node)</strong></p> <p>This example is an extension of Example 5 to run on 2 nodes.</p> <div class="admonition warning"> <p class="admonition-title">Warning</p> <p>This example requires a workaround to run as expected. <code class="docutils literal notranslate"><span class="pre">--ntasks-per-gpu=2</span></code> does not force MPI ranks 008-015 to run on the second node, so the number of physical CPU cores per MPI rank is increased to 8 (<code class="docutils literal notranslate"><span class="pre">-c8</span></code>) to force the desired behavior due to the constraint of the number of physical CPU cores (64) on a node.</p> </div> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">1</span> $<span class="w"> </span>srun<span class="w"> </span>-N2<span class="w"> </span>-n16<span class="w"> </span>-c8<span class="w"> </span>--ntasks-per-gpu<span class="o">=</span><span class="m">2</span><span class="w"> </span>--gpu-bind<span class="o">=</span>closest<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">018</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">050</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">004</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">010</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">026</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">006</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">040</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">007</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">059</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">008</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">009</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">010</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">032</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">011</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">012</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">008</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">013</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">024</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">014</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">042</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">015</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">056</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> </pre></div> </div> <p><strong>Example 7: 8 MPI ranks - where 2 ranks share a GPU (packed, single-node)</strong></p> <p>This example launches 8 MPI ranks (<code class="docutils literal notranslate"><span class="pre">-n8</span></code>), each with 8 physical CPU cores (<code class="docutils literal notranslate"><span class="pre">-c8</span></code>) to launch 1 OpenMP thread (<code class="docutils literal notranslate"><span class="pre">OMP_NUM_THREADS=1</span></code>) on. The MPI ranks will be assigned to GPUs in a packed fashion so that each of the 4 GPUs on the node are shared by 2 MPI ranks. Similar to Example 5, <code class="docutils literal notranslate"><span class="pre">-ntasks-per-gpu=2</span></code> will be used, but a new <code class="docutils literal notranslate"><span class="pre">srun</span></code> flag will be used to change the default round-robin (<code class="docutils literal notranslate"><span class="pre">cyclic</span></code>) distribution of MPI ranks across NUMA domains:</p> <ul class="simple"> <li><p><code class="docutils literal notranslate"><span class="pre">--distribution=<value>:[<value>]:[<value>]</span></code> specifies the distribution of MPI ranks across compute nodes, sockets (NUMA domains on Spock), and cores, respectively. The default values are <code class="docutils literal notranslate"><span class="pre">block:cyclic:cyclic</span></code>, which is where the <code class="docutils literal notranslate"><span class="pre">cyclic</span></code> assignment comes from in the previous examples.</p></li> </ul> <div class="admonition note"> <p class="admonition-title">Note</p> <p>In the job step for this example, <code class="docutils literal notranslate"><span class="pre">--distribution=*:block</span></code> is used, where <code class="docutils literal notranslate"><span class="pre">*</span></code> represents the default value of <code class="docutils literal notranslate"><span class="pre">block</span></code> for the distribution of MPI ranks across compute nodes and the distribution of MPI ranks across NUMA domains has been changed to <code class="docutils literal notranslate"><span class="pre">block</span></code> from its default value of <code class="docutils literal notranslate"><span class="pre">cyclic</span></code>.</p> </div> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Because the distribution across NUMA domains has been changed to a “packed” (<code class="docutils literal notranslate"><span class="pre">block</span></code>) configuration, caution must be taken to ensure MPI ranks end up in the NUMA domains where the GPUs they intend to be mapped to are located. To accomplish this, the number of physical CPU cores assigned to an MPI rank was increased - in this case to 8. Doing so ensures that only 2 MPI ranks can fit into a single NUMA domain. If the value of <code class="docutils literal notranslate"><span class="pre">-c</span></code> was left at <code class="docutils literal notranslate"><span class="pre">1</span></code>, all 8 MPI ranks would be “packed” into the first NUMA domain, where the “closest” GPU would be GPU 0 - the only GPU in that NUMA domain.</p> <p>Notice that this is not a workaround like in Example 6, but a requirement due to the <code class="docutils literal notranslate"><span class="pre">block</span></code> distribution of MPI ranks across NUMA domains.</p> </div> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">1</span> $<span class="w"> </span>srun<span class="w"> </span>-N1<span class="w"> </span>-n8<span class="w"> </span>-c8<span class="w"> </span>--ntasks-per-gpu<span class="o">=</span><span class="m">2</span><span class="w"> </span>--gpu-bind<span class="o">=</span>closest<span class="w"> </span>--distribution<span class="o">=</span>*:block<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">008</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">024</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">004</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">035</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">043</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">006</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">049</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">007</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">057</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> </pre></div> </div> <p>The overall effect of using <code class="docutils literal notranslate"><span class="pre">--distribution=*:block</span></code> and increasing the number of physical CPU cores available to each MPI rank is to place the first two MPI ranks in NUMA 0 with GPU 0, the next two MPI ranks in NUMA 1 with GPU 1, and so on.</p> <p><strong>Example 8: 16 MPI ranks - where 2 ranks share a GPU (packed, multi-node)</strong></p> <p>This example is an extension of Example 7 to use 2 compute nodes. With the appropriate changes put in place in Example 7, it is a straightforward exercise to change to using 2 nodes (<code class="docutils literal notranslate"><span class="pre">-N2</span></code>) and 16 MPI ranks (<code class="docutils literal notranslate"><span class="pre">-n16</span></code>).</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">OMP_NUM_THREADS</span><span class="o">=</span><span class="m">1</span> $<span class="w"> </span>srun<span class="w"> </span>-N2<span class="w"> </span>-n16<span class="w"> </span>-c8<span class="w"> </span>--ntasks-per-gpu<span class="o">=</span><span class="m">2</span><span class="w"> </span>--gpu-bind<span class="o">=</span>closest<span class="w"> </span>--distribution<span class="o">=</span>*:block<span class="w"> </span>./hello_jobstep<span class="w"> </span><span class="p">|</span><span class="w"> </span>sort MPI<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">001</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">008</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">017</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">003</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">026</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">004</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">005</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">041</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">006</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">048</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">007</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">057</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock13<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">008</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">002</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">009</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">011</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span>c9 MPI<span class="w"> </span><span class="m">010</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">016</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">011</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">026</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">1</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">87</span> MPI<span class="w"> </span><span class="m">012</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">033</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">013</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">041</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">2</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">48</span> MPI<span class="w"> </span><span class="m">014</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">054</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> MPI<span class="w"> </span><span class="m">015</span><span class="w"> </span>-<span class="w"> </span>OMP<span class="w"> </span><span class="m">000</span><span class="w"> </span>-<span class="w"> </span>HWT<span class="w"> </span><span class="m">063</span><span class="w"> </span>-<span class="w"> </span>Node<span class="w"> </span>spock14<span class="w"> </span>-<span class="w"> </span>RT_GPU_ID<span class="w"> </span><span class="m">0</span><span class="w"> </span>-<span class="w"> </span>GPU_ID<span class="w"> </span><span class="m">3</span><span class="w"> </span>-<span class="w"> </span>Bus_ID<span class="w"> </span><span class="m">09</span> </pre></div> </div> </section> <section id="multiple-gpus-per-mpi-rank"> <h5>Multiple GPUs per MPI rank<a class="headerlink" href="#multiple-gpus-per-mpi-rank" title="Link to this heading"></a></h5> <p>As mentioned previously, all GPUs are accessible by all MPI ranks by default, so it is possible to <em>programatically</em> map any combination of MPI ranks to GPUs. However, there is currently no way to use Slurm to map multiple GPUs to a single MPI rank. If this functionality is needed for an application, please submit a ticket by emailing <a class="reference external" href="mailto:help%40olcf.ornl.gov" target="_blank">help<span>@</span>olcf<span>.</span>ornl<span>.</span>gov</a>.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>There are many different ways users might choose to perform these mappings, so users are encouraged to clone the <code class="docutils literal notranslate"><span class="pre">hello_jobstep</span></code> program and test whether or not processes and threads are running where intended.</p> </div> </section> </section> </section> <section id="nvme-usage"> <h3>NVMe Usage<a class="headerlink" href="#nvme-usage" title="Link to this heading"></a></h3> <p>Each Spock compute node has [2x] 3.2 TB NVMe devices (SSDs) with a peak sequential performance of 6900 MB/s (read) and 4200 MB/s (write). To use the NVMe, users must request access during job allocation using the <code class="docutils literal notranslate"><span class="pre">-C</span> <span class="pre">nvme</span></code> option to <code class="docutils literal notranslate"><span class="pre">sbatch</span></code>, <code class="docutils literal notranslate"><span class="pre">salloc</span></code>, or <code class="docutils literal notranslate"><span class="pre">srun</span></code>. Once the devices have been granted to a job, users can access them at <code class="docutils literal notranslate"><span class="pre">/mnt/bb/<userid></span></code>. Users are responsible for moving data to/from the NVMe before/after their jobs. Here is a simple example script:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="ch">#!/bin/bash</span> <span class="c1">#SBATCH -A <projid></span> <span class="c1">#SBATCH -J nvme_test</span> <span class="c1">#SBATCH -o %x-%j.out</span> <span class="c1">#SBATCH -t 00:05:00</span> <span class="c1">#SBATCH -p batch</span> <span class="c1">#SBATCH -N 1</span> <span class="c1">#SBATCH -C nvme</span> date <span class="c1"># Change directory to user scratch space (GPFS)</span> <span class="nb">cd</span><span class="w"> </span>/gpfs/alpine/<projid>/scratch/<userid> <span class="nb">echo</span><span class="w"> </span><span class="s2">" "</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">"*****ORIGINAL FILE*****"</span> cat<span class="w"> </span>test.txt <span class="nb">echo</span><span class="w"> </span><span class="s2">"***********************"</span> <span class="c1"># Move file from GPFS to SSD</span> mv<span class="w"> </span>test.txt<span class="w"> </span>/mnt/bb/<userid> <span class="c1"># Edit file from compute node</span> srun<span class="w"> </span>-n1<span class="w"> </span>hostname<span class="w"> </span>>><span class="w"> </span>/mnt/bb/<userid>/test.txt <span class="c1"># Move file from SSD back to GPFS</span> mv<span class="w"> </span>/mnt/bb/<userid>/test.txt<span class="w"> </span>. <span class="nb">echo</span><span class="w"> </span><span class="s2">" "</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">"*****UPDATED FILE******"</span> cat<span class="w"> </span>test.txt <span class="nb">echo</span><span class="w"> </span><span class="s2">"***********************"</span> </pre></div> </div> <p>And here is the output from the script:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>nvme_test-<jobid>.out Mon<span class="w"> </span>May<span class="w"> </span><span class="m">17</span><span class="w"> </span><span class="m">12</span>:28:18<span class="w"> </span>EDT<span class="w"> </span><span class="m">2021</span> *****ORIGINAL<span class="w"> </span>FILE***** This<span class="w"> </span>is<span class="w"> </span>my<span class="w"> </span>file.<span class="w"> </span>There<span class="w"> </span>are<span class="w"> </span>many<span class="w"> </span>like<span class="w"> </span>it<span class="w"> </span>but<span class="w"> </span>this<span class="w"> </span>one<span class="w"> </span>is<span class="w"> </span>mine. *********************** *****UPDATED<span class="w"> </span>FILE****** This<span class="w"> </span>is<span class="w"> </span>my<span class="w"> </span>file.<span class="w"> </span>There<span class="w"> </span>are<span class="w"> </span>many<span class="w"> </span>like<span class="w"> </span>it<span class="w"> </span>but<span class="w"> </span>this<span class="w"> </span>one<span class="w"> </span>is<span class="w"> </span>mine. spock25 *********************** </pre></div> </div> </section> </section> <hr class="docutils" /> <section id="getting-help"> <h2>Getting Help<a class="headerlink" href="#getting-help" title="Link to this heading"></a></h2> <p>If you have problems or need helping running on Spock, please submit a ticket by emailing <a class="reference external" href="mailto:help%40olcf.ornl.gov" target="_blank">help<span>@</span>olcf<span>.</span>ornl<span>.</span>gov</a>.</p> </section> </section> </div> </div> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <a href="odo_user_guide.html" class="btn btn-neutral float-left" title="Odo" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="crusher_quick_start_guide.html" class="btn btn-neutral float-right" title="Crusher Quick-Start Guide" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> </div> <hr/> <div role="contentinfo"> <p>© Copyright 2025, OLCF.</p> </div> Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. </footer> </div> </div> </section> </div> <script> jQuery(function () { SphinxRtdTheme.Navigation.enable(true); }); </script> </body> </html>