CINXE.COM
Spark tasks experiencing shuffle spills and high disk I/O. | Shoreline Runbooks | Shoreline.io
<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><link rel="canonical" href="https://www.shoreline.io/runbooks/spark/spark-tasks-experiencing-shuffle-spills-and-high-disk-i-o"/><title>Spark tasks experiencing shuffle spills and high disk I/O. | Shoreline Runbooks | Shoreline.io</title><meta name="robots" content="index,follow"/><meta name="description" content="This incident type typically occurs in distributed computing systems, where Spark tasks are experiencing high disk I/O and shuffle spills. Spark is a popular distributed computing engine that uses shuffle operations to move data between nodes in a cluster, which can sometimes result in performance issues due to spills. The spills occur when the data being shuffled exceeds the memory capacity allocated for the shuffle operations. This incident requires optimization of the shuffle operations to reduce spills and improve overall performance. "/><meta property="og:title" content="Spark tasks experiencing shuffle spills and high disk I/O. | Shoreline Runbooks"/><meta property="og:description" content="This incident type typically occurs in distributed computing systems, where Spark tasks are experiencing high disk I/O and shuffle spills. Spark is a popular distributed computing engine that uses shuffle operations to move data between nodes in a cluster, which can sometimes result in performance issues due to spills. The spills occur when the data being shuffled exceeds the memory capacity allocated for the shuffle operations. This incident requires optimization of the shuffle operations to reduce spills and improve overall performance. "/><meta property="og:url" content="https://www.shoreline.io/runbooks/spark/spark-tasks-experiencing-shuffle-spills-and-high-disk-i-o"/><meta property="og:type" content="website"/><meta property="og:image" content="https://www.shoreline.io/assets/images/shoreline-open-graph.png"/><meta property="og:image:alt" content="Spark tasks experiencing shuffle spills and high disk I/O. | Shoreline Runbooks"/><meta property="og:image:width" content="1200"/><meta property="og:image:height" content="627"/><meta property="og:locale" content="en-US"/><link rel="preload" as="image" imageSrcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fwave-bg-2.808cc056.svg&w=1080&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fwave-bg-2.808cc056.svg&w=3840&q=75 2x" fetchpriority="high"/><meta name="next-head-count" content="16"/><link rel="icon" href="/favicon.ico" type="image/x-icon"/><script type="text/javascript" id="termly-script-loader" async="" src="//app.termly.io/resource-blocker/cdf7744e-0560-4585-8fb6-8d7b87357122?autoBlock=on"></script><meta name="facebook-domain-verification" content="d7p4qv4qsjb7eqk1snoylxxacgbia3"/><link rel="preload" href="/_next/static/media/83001f47a8fdbd0d-s.p.woff2" as="font" type="font/woff2" crossorigin="anonymous" data-next-font="size-adjust"/><link rel="preload" href="/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2" as="font" type="font/woff2" crossorigin="anonymous" data-next-font="size-adjust"/><link rel="preload" href="/_next/static/media/21ed5661b47f7f6d-s.p.woff2" as="font" type="font/woff2" crossorigin="anonymous" data-next-font="size-adjust"/><link rel="preload" href="/_next/static/media/12d86e8d7e1c2769-s.p.woff2" as="font" type="font/woff2" crossorigin="anonymous" data-next-font="size-adjust"/><link rel="preload" href="/_next/static/media/3d9ea938b6afa941-s.p.woff2" as="font" type="font/woff2" crossorigin="anonymous" data-next-font="size-adjust"/><link rel="preload" href="/_next/static/css/049a1d310f147fc2.css" as="style" crossorigin=""/><link rel="stylesheet" href="/_next/static/css/049a1d310f147fc2.css" crossorigin="" data-n-g=""/><link rel="preload" href="/_next/static/css/52746fcecb9d7efc.css" as="style" crossorigin=""/><link rel="stylesheet" href="/_next/static/css/52746fcecb9d7efc.css" crossorigin="" data-n-p=""/><noscript data-n-css=""></noscript><script defer="" crossorigin="" nomodule="" src="/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js"></script><script src="/_next/static/chunks/webpack-63e88e8d1caad2b3.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/framework-d02714ac045baec6.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/main-1786e072a617f17b.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/pages/_app-366368a814f2f1f8.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/57fa3e58-45366f8a6d2496b0.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/6554-79c6520169065b52.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/4655-f45ddcf3a05b1718.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/1880-e663948b33992957.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/1360-0364a12471c552c8.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/5349-f99472b39a37c3d6.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/3869-08d0da7f2fa3bf04.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/pages/runbooks/%5B...slug%5D-953dfa063a59a607.js" defer="" crossorigin=""></script><script src="/_next/static/Op4Q3dRyk20CRi25yi4wh/_buildManifest.js" defer="" crossorigin=""></script><script src="/_next/static/Op4Q3dRyk20CRi25yi4wh/_ssgManifest.js" defer="" crossorigin=""></script></head><body><noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5Q86KVJ" height="0" width="0" class="invisible hidden"></iframe></noscript><div id="__next"><main class="__variable_e53e8d __variable_af7b73 __variable_14f33d __variable_aaf875 __variable_b72822 absolute min-h-screen min-w-full max-w-full bg-gray-50 font-sans"><header style="top:0" class="fixed z-30 w-full py-4 transition duration-300 ease-in-out md:bg-opacity-90"><div class="mx-auto max-w-7xl px-4 sm:px-6 lg:px-8"><nav class="relative z-50 flex justify-between"><div class="flex items-center md:gap-x-6"><a aria-label="Home" href="/"><img alt="Shoreline Logo" loading="lazy" width="150" height="30" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/image?url=%2Fassets%2Fimages%2Flogos%2Fshoreline-logo.svg&w=256&q=100 1x, /_next/image?url=%2Fassets%2Fimages%2Flogos%2Fshoreline-logo.svg&w=384&q=100 2x" src="/_next/image?url=%2Fassets%2Fimages%2Flogos%2Fshoreline-logo.svg&w=384&q=100"/></a><div class="hidden gap-x-1 md:gap-x-2 lg:flex xl:gap-x-4"></div></div><div class="flex items-center gap-x-4"><div class="hidden items-center gap-x-2 sm:flex"></div><div class="-mr-1 lg:hidden"><div data-headlessui-state=""><button class="relative z-10 flex h-8 w-8 items-center justify-center [&:not(:focus-visible)]:focus:outline-none" aria-label="Toggle Navigation" type="button" aria-expanded="false" data-headlessui-state=""><svg aria-hidden="true" class="h-3.5 w-3.5 overflow-visible stroke-slate-700" fill="none" stroke-width="2" stroke-linecap="round"><path d="M0 1H14M0 7H14M0 13H14" class="origin-center transition"></path><path d="M2 2L12 12M12 2L2 12" class="origin-center transition scale-90 opacity-0"></path></svg></button></div><div style="position:fixed;top:1px;left:1px;width:1px;height:0;padding:0;margin:-1px;overflow:hidden;clip:rect(0, 0, 0, 0);white-space:nowrap;border-width:0;display:none"></div></div></div></nav></div></header><div class="z-1 pointer-events-none absolute right-0 top-0 flex w-full justify-end" aria-hidden="true"><img alt="bg-wave-2" fetchpriority="high" width="1029" height="266" decoding="async" data-nimg="1" class="w-[60%]" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fwave-bg-2.808cc056.svg&w=1080&q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fwave-bg-2.808cc056.svg&w=3840&q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fwave-bg-2.808cc056.svg&w=3840&q=75"/></div><section class="mx-auto max-w-7xl px-2 sm:px-6 lg:px-8 relative z-10 overflow-hidden"><div class="mx-auto max-w-7xl px-6 pt-24 text-center sm:pt-32 lg:px-8"><div class="mx-auto max-w-4xl"><span class="text-base font-semibold leading-7 text-primary-500">Runbook</span><h1 class="mt-2 font-display text-4xl font-medium tracking-tight sm:text-5xl">Spark tasks experiencing shuffle spills and high disk I/O.</h1></div></div></section><div class="my-12"><section class="relative mx-auto max-w-7xl px-2 sm:px-6 lg:px-8"><div class="flex"><div class="flex-none hidden md:flex md:flex-none"><div class="sticky top-24 mx-4 flex h-fit max-w-[200px] select-none flex-col gap-y-1 rounded bg-gray-100 p-4 shadow"><a class="flex items-center text-sm font-medium text-primary-500 hover:text-primary-600" href="/runbooks"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true" class="mr-2 inline-block h-5 w-5"><path fill-rule="evenodd" d="M17 10a.75.75 0 01-.75.75H5.612l4.158 3.96a.75.75 0 11-1.04 1.08l-5.5-5.25a.75.75 0 010-1.08l5.5-5.25a.75.75 0 111.04 1.08L5.612 9.25H16.25A.75.75 0 0117 10z" clip-rule="evenodd"></path></svg><span>Back to Runbooks</span></a></div></div><div class="flex-1"><div class="mx-auto max-w-[calc(100vw_-_3rem)] space-y-4 sm:max-w-2xl md:max-w-2xl lg:max-w-3xl"><a class="flex items-center text-sm font-medium text-primary-500 hover:text-primary-600 md:hidden" href="/runbooks"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true" class="mr-2 inline-block h-5 w-5"><path fill-rule="evenodd" d="M17 10a.75.75 0 01-.75.75H5.612l4.158 3.96a.75.75 0 11-1.04 1.08l-5.5-5.25a.75.75 0 010-1.08l5.5-5.25a.75.75 0 111.04 1.08L5.612 9.25H16.25A.75.75 0 0117 10z" clip-rule="evenodd"></path></svg><span>Back to Runbooks</span></a><div class="flex items-center justify-between"><h3 id="overview" class="prose text-2xl font-bold tracking-tight text-gray-900 sm:text-3xl">Overview</h3></div><div class="mt-3 flex items-center justify-between gap-1"><div><a href="/runbooks/tag/category-spark"><span class="inline-flex min-w-fit items-center rounded-full px-2 py-1 text-xs font-medium inline-flex cursor-pointer items-center justify-center rounded-md font-medium leading-5 duration-150 ease-in-out border bg-white text-black hover:border-gray-300 hover:bg-gray-200 hover:text-gray-700 ">Spark</span></a></div><div class="flex shrink flex-row gap-2"><div class="flex w-full justify-end"><a target="_blank" href="https://registry.terraform.io/modules/terraform-shoreline-modules/spark-spark-tasks-experiencing-shuffle-spills-and-high-disk-i-o"><button type="button" class="inline-flex min-w-[11rem] items-center gap-x-1.5 rounded-md bg-white px-2.5 py-1.5 text-sm font-semibold text-gray-900 shadow-sm ring-1 ring-inset ring-gray-300 hover:bg-gray-50"><svg class="-ml-0.5 h-5 w-5" fill="#844FBA" stroke="none" focusable="false" viewBox="61 21 122 144" aria-hidden="true"><polygon points="102.58 46.91 141.41 69.33 141.41 114.16 102.58 91.75 102.58 46.91"></polygon><polygon points="145.67 69.33 145.67 114.16 184.5 91.75 184.5 46.91 145.67 69.33"></polygon><polygon points="59.5 21.88 59.5 66.72 98.33 89.14 98.33 44.3 59.5 21.88"></polygon><polygon points="102.58 141.49 141.41 163.91 141.41 119.38 141.41 119.08 102.58 96.66 102.58 141.49"></polygon></svg>Get the Terraform</button></a></div></div></div><div class="md:prose-md prose prose-slate !max-w-none font-light dark:prose-invert dark:text-slate-400 prose-headings:scroll-mt-28 prose-headings:font-display prose-headings:font-normal lg:prose-headings:scroll-mt-[8.5rem] prose-lead:text-slate-500 dark:prose-lead:text-slate-400 prose-a:font-semibold dark:prose-a:text-sun-400 prose-a:no-underline prose-a:shadow-[inset_0_-2px_0_0_var(--tw-prose-background,#fff),inset_0_calc(-1*(var(--tw-prose-underline-size,4px)+1px))_0_0_var(--tw-prose-underline,theme(colors.sun.300))] hover:prose-a:[--tw-prose-underline-size:6px] dark:[--tw-prose-background:theme(colors.slate.900)] dark:prose-a:shadow-[inset_0_calc(-1*var(--tw-prose-underline-size,2px))_0_0_var(--tw-prose-underline,theme(colors.sun.800))] dark:hover:prose-a:[--tw-prose-underline-size:6px] prose-pre:rounded-xl prose-pre:bg-slate-900 prose-pre:shadow-lg dark:prose-pre:bg-slate-800/60 dark:prose-pre:shadow-none dark:prose-pre:ring-1 dark:prose-pre:ring-slate-300/10 dark:prose-hr:border-slate-800">This incident type typically occurs in distributed computing systems, where Spark tasks are experiencing high disk I/O and shuffle spills. Spark is a popular distributed computing engine that uses shuffle operations to move data between nodes in a cluster, which can sometimes result in performance issues due to spills. The spills occur when the data being shuffled exceeds the memory capacity allocated for the shuffle operations. This incident requires optimization of the shuffle operations to reduce spills and improve overall performance.</div><h3 id="parameter" class="prose text-2xl font-bold tracking-tight text-gray-900 sm:text-3xl">Parameters</h3><div class="md:prose-md prose prose-slate !max-w-none font-light dark:prose-invert dark:text-slate-400 prose-headings:scroll-mt-28 prose-headings:font-display prose-headings:font-normal lg:prose-headings:scroll-mt-[8.5rem] prose-lead:text-slate-500 dark:prose-lead:text-slate-400 prose-a:font-semibold dark:prose-a:text-sun-400 prose-a:no-underline prose-a:shadow-[inset_0_-2px_0_0_var(--tw-prose-background,#fff),inset_0_calc(-1*(var(--tw-prose-underline-size,4px)+1px))_0_0_var(--tw-prose-underline,theme(colors.sun.300))] hover:prose-a:[--tw-prose-underline-size:6px] dark:[--tw-prose-background:theme(colors.slate.900)] dark:prose-a:shadow-[inset_0_calc(-1*var(--tw-prose-underline-size,2px))_0_0_var(--tw-prose-underline,theme(colors.sun.800))] dark:hover:prose-a:[--tw-prose-underline-size:6px] prose-pre:rounded-xl prose-pre:bg-slate-900 prose-pre:shadow-lg dark:prose-pre:bg-slate-800/60 dark:prose-pre:shadow-none dark:prose-pre:ring-1 dark:prose-pre:ring-slate-300/10 dark:prose-hr:border-slate-800"><div class="code relative mx-auto" aria-live="polite"><pre style="color:#9CDCFE;background-color:#1E1E1E"><div class="token-line" style="color:#9CDCFE"><span class="" style="display:inline-block"> </span></div></pre><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-8 top-1.5 ml-2 mt-1" role="alert" style="outline:none"><svg class="text-gray-100" height="24" width="24" viewBox="0 0 24 24" fill="currentcolor" xmlns="http://www.w3.org/2000/svg"><path d="M4 19h6v-2H4v2zM20 5H4v2h16V5zm-3 6H4v2h13.25c1.1 0 2 .9 2 2s-.9 2-2 2H15v-2l-3 3 3 3v-2h2c2.21 0 4-1.79 4-4s-1.79-4-4-4z"></path></svg></button><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-2 top-2 ml-2 mt-1" role="alert" style="outline:none"><svg class="" height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M13 7H7V5H13V7Z" fill="currentColor"></path><path d="M13 11H7V9H13V11Z" fill="currentColor"></path><path d="M7 15H13V13H7V15Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M3 19V1H17V5H21V23H7V19H3ZM15 17V3H5V17H15ZM17 7V19H9V21H19V7H17Z" fill="currentColor"></path></svg><div class="absolute bottom-0 left-0 right-0 top-0 flex items-center justify-center text-green-300 opacity-0"><svg height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M10.2426 16.3137L6 12.071L7.41421 10.6568L10.2426 13.4853L15.8995 7.8284L17.3137 9.24262L10.2426 16.3137Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1 5C1 2.79086 2.79086 1 5 1H19C21.2091 1 23 2.79086 23 5V19C23 21.2091 21.2091 23 19 23H5C2.79086 23 1 21.2091 1 19V5ZM5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5C3 3.89543 3.89543 3 5 3Z" fill="currentColor"></path></svg></div></button></div></div><h3 id="debug" class="prose text-2xl font-bold tracking-tight text-gray-900 sm:text-3xl">Debug</h3><div class="md:prose-md prose prose-slate !max-w-none font-light dark:prose-invert dark:text-slate-400 prose-headings:scroll-mt-28 prose-headings:font-display prose-headings:font-normal lg:prose-headings:scroll-mt-[8.5rem] prose-lead:text-slate-500 dark:prose-lead:text-slate-400 prose-a:font-semibold dark:prose-a:text-sun-400 prose-a:no-underline prose-a:shadow-[inset_0_-2px_0_0_var(--tw-prose-background,#fff),inset_0_calc(-1*(var(--tw-prose-underline-size,4px)+1px))_0_0_var(--tw-prose-underline,theme(colors.sun.300))] hover:prose-a:[--tw-prose-underline-size:6px] dark:[--tw-prose-background:theme(colors.slate.900)] dark:prose-a:shadow-[inset_0_calc(-1*var(--tw-prose-underline-size,2px))_0_0_var(--tw-prose-underline,theme(colors.sun.800))] dark:hover:prose-a:[--tw-prose-underline-size:6px] prose-pre:rounded-xl prose-pre:bg-slate-900 prose-pre:shadow-lg dark:prose-pre:bg-slate-800/60 dark:prose-pre:shadow-none dark:prose-pre:ring-1 dark:prose-pre:ring-slate-300/10 dark:prose-hr:border-slate-800"><h3 id="check-the-disk-io-usage" class="group flex whitespace-pre-wrap" data-toc-depth="3" data-toc-id="check-the-disk-io-usage" data-toc-text="Check the disk I/O usage" data-toc-hidden="false"><span>Check the disk I/O usage</span><a href="#check-the-disk-io-usage" class="ml-1 text-gray-400 opacity-0 group-hover:opacity-100 relative after:content-['#']" aria-label="Anchor"></a></h3><div class="code relative mx-auto" aria-live="polite"><pre style="color:#9CDCFE;background-color:#1E1E1E"><div class="token-line" style="color:#9CDCFE"><span class="" style="display:inline-block"> </span></div></pre><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-8 top-1.5 ml-2 mt-1" role="alert" style="outline:none"><svg class="text-gray-100" height="24" width="24" viewBox="0 0 24 24" fill="currentcolor" xmlns="http://www.w3.org/2000/svg"><path d="M4 19h6v-2H4v2zM20 5H4v2h16V5zm-3 6H4v2h13.25c1.1 0 2 .9 2 2s-.9 2-2 2H15v-2l-3 3 3 3v-2h2c2.21 0 4-1.79 4-4s-1.79-4-4-4z"></path></svg></button><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-2 top-2 ml-2 mt-1" role="alert" style="outline:none"><svg class="" height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M13 7H7V5H13V7Z" fill="currentColor"></path><path d="M13 11H7V9H13V11Z" fill="currentColor"></path><path d="M7 15H13V13H7V15Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M3 19V1H17V5H21V23H7V19H3ZM15 17V3H5V17H15ZM17 7V19H9V21H19V7H17Z" fill="currentColor"></path></svg><div class="absolute bottom-0 left-0 right-0 top-0 flex items-center justify-center text-green-300 opacity-0"><svg height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M10.2426 16.3137L6 12.071L7.41421 10.6568L10.2426 13.4853L15.8995 7.8284L17.3137 9.24262L10.2426 16.3137Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1 5C1 2.79086 2.79086 1 5 1H19C21.2091 1 23 2.79086 23 5V19C23 21.2091 21.2091 23 19 23H5C2.79086 23 1 21.2091 1 19V5ZM5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5C3 3.89543 3.89543 3 5 3Z" fill="currentColor"></path></svg></div></button></div><h3 id="check-the-network-bandwidth-usage" class="group flex whitespace-pre-wrap" data-toc-depth="3" data-toc-id="check-the-network-bandwidth-usage" data-toc-text="Check the network bandwidth usage" data-toc-hidden="false"><span>Check the network bandwidth usage</span><a href="#check-the-network-bandwidth-usage" class="ml-1 text-gray-400 opacity-0 group-hover:opacity-100 relative after:content-['#']" aria-label="Anchor"></a></h3><div class="code relative mx-auto" aria-live="polite"><pre style="color:#9CDCFE;background-color:#1E1E1E"><div class="token-line" style="color:#9CDCFE"><span class="" style="display:inline-block"> </span></div></pre><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-8 top-1.5 ml-2 mt-1" role="alert" style="outline:none"><svg class="text-gray-100" height="24" width="24" viewBox="0 0 24 24" fill="currentcolor" xmlns="http://www.w3.org/2000/svg"><path d="M4 19h6v-2H4v2zM20 5H4v2h16V5zm-3 6H4v2h13.25c1.1 0 2 .9 2 2s-.9 2-2 2H15v-2l-3 3 3 3v-2h2c2.21 0 4-1.79 4-4s-1.79-4-4-4z"></path></svg></button><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-2 top-2 ml-2 mt-1" role="alert" style="outline:none"><svg class="" height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M13 7H7V5H13V7Z" fill="currentColor"></path><path d="M13 11H7V9H13V11Z" fill="currentColor"></path><path d="M7 15H13V13H7V15Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M3 19V1H17V5H21V23H7V19H3ZM15 17V3H5V17H15ZM17 7V19H9V21H19V7H17Z" fill="currentColor"></path></svg><div class="absolute bottom-0 left-0 right-0 top-0 flex items-center justify-center text-green-300 opacity-0"><svg height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M10.2426 16.3137L6 12.071L7.41421 10.6568L10.2426 13.4853L15.8995 7.8284L17.3137 9.24262L10.2426 16.3137Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1 5C1 2.79086 2.79086 1 5 1H19C21.2091 1 23 2.79086 23 5V19C23 21.2091 21.2091 23 19 23H5C2.79086 23 1 21.2091 1 19V5ZM5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5C3 3.89543 3.89543 3 5 3Z" fill="currentColor"></path></svg></div></button></div><h3 id="check-the-spark-task-metrics" class="group flex whitespace-pre-wrap" data-toc-depth="3" data-toc-id="check-the-spark-task-metrics" data-toc-text="Check the Spark task metrics" data-toc-hidden="false"><span>Check the Spark task metrics</span><a href="#check-the-spark-task-metrics" class="ml-1 text-gray-400 opacity-0 group-hover:opacity-100 relative after:content-['#']" aria-label="Anchor"></a></h3><div class="code relative mx-auto" aria-live="polite"><pre style="color:#9CDCFE;background-color:#1E1E1E"><div class="token-line" style="color:#9CDCFE"><span class="" style="display:inline-block"> </span></div></pre><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-8 top-1.5 ml-2 mt-1" role="alert" style="outline:none"><svg class="text-gray-100" height="24" width="24" viewBox="0 0 24 24" fill="currentcolor" xmlns="http://www.w3.org/2000/svg"><path d="M4 19h6v-2H4v2zM20 5H4v2h16V5zm-3 6H4v2h13.25c1.1 0 2 .9 2 2s-.9 2-2 2H15v-2l-3 3 3 3v-2h2c2.21 0 4-1.79 4-4s-1.79-4-4-4z"></path></svg></button><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-2 top-2 ml-2 mt-1" role="alert" style="outline:none"><svg class="" height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M13 7H7V5H13V7Z" fill="currentColor"></path><path d="M13 11H7V9H13V11Z" fill="currentColor"></path><path d="M7 15H13V13H7V15Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M3 19V1H17V5H21V23H7V19H3ZM15 17V3H5V17H15ZM17 7V19H9V21H19V7H17Z" fill="currentColor"></path></svg><div class="absolute bottom-0 left-0 right-0 top-0 flex items-center justify-center text-green-300 opacity-0"><svg height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M10.2426 16.3137L6 12.071L7.41421 10.6568L10.2426 13.4853L15.8995 7.8284L17.3137 9.24262L10.2426 16.3137Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1 5C1 2.79086 2.79086 1 5 1H19C21.2091 1 23 2.79086 23 5V19C23 21.2091 21.2091 23 19 23H5C2.79086 23 1 21.2091 1 19V5ZM5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5C3 3.89543 3.89543 3 5 3Z" fill="currentColor"></path></svg></div></button></div><h3 id="check-the-shuffle-size-and-spill-metrics" class="group flex whitespace-pre-wrap" data-toc-depth="3" data-toc-id="check-the-shuffle-size-and-spill-metrics" data-toc-text="Check the shuffle size and spill metrics" data-toc-hidden="false"><span>Check the shuffle size and spill metrics</span><a href="#check-the-shuffle-size-and-spill-metrics" class="ml-1 text-gray-400 opacity-0 group-hover:opacity-100 relative after:content-['#']" aria-label="Anchor"></a></h3><div class="code relative mx-auto" aria-live="polite"><pre style="color:#9CDCFE;background-color:#1E1E1E"><div class="token-line" style="color:#9CDCFE"><span class="" style="display:inline-block"> </span></div></pre><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-8 top-1.5 ml-2 mt-1" role="alert" style="outline:none"><svg class="text-gray-100" height="24" width="24" viewBox="0 0 24 24" fill="currentcolor" xmlns="http://www.w3.org/2000/svg"><path d="M4 19h6v-2H4v2zM20 5H4v2h16V5zm-3 6H4v2h13.25c1.1 0 2 .9 2 2s-.9 2-2 2H15v-2l-3 3 3 3v-2h2c2.21 0 4-1.79 4-4s-1.79-4-4-4z"></path></svg></button><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-2 top-2 ml-2 mt-1" role="alert" style="outline:none"><svg class="" height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M13 7H7V5H13V7Z" fill="currentColor"></path><path d="M13 11H7V9H13V11Z" fill="currentColor"></path><path d="M7 15H13V13H7V15Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M3 19V1H17V5H21V23H7V19H3ZM15 17V3H5V17H15ZM17 7V19H9V21H19V7H17Z" fill="currentColor"></path></svg><div class="absolute bottom-0 left-0 right-0 top-0 flex items-center justify-center text-green-300 opacity-0"><svg height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M10.2426 16.3137L6 12.071L7.41421 10.6568L10.2426 13.4853L15.8995 7.8284L17.3137 9.24262L10.2426 16.3137Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1 5C1 2.79086 2.79086 1 5 1H19C21.2091 1 23 2.79086 23 5V19C23 21.2091 21.2091 23 19 23H5C2.79086 23 1 21.2091 1 19V5ZM5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5C3 3.89543 3.89543 3 5 3Z" fill="currentColor"></path></svg></div></button></div><h3 id="check-the-system-resource-usage" class="group flex whitespace-pre-wrap" data-toc-depth="3" data-toc-id="check-the-system-resource-usage" data-toc-text="Check the system resource usage" data-toc-hidden="false"><span>Check the system resource usage</span><a href="#check-the-system-resource-usage" class="ml-1 text-gray-400 opacity-0 group-hover:opacity-100 relative after:content-['#']" aria-label="Anchor"></a></h3><div class="code relative mx-auto" aria-live="polite"><pre style="color:#9CDCFE;background-color:#1E1E1E"><div class="token-line" style="color:#9CDCFE"><span class="" style="display:inline-block"> </span></div></pre><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-8 top-1.5 ml-2 mt-1" role="alert" style="outline:none"><svg class="text-gray-100" height="24" width="24" viewBox="0 0 24 24" fill="currentcolor" xmlns="http://www.w3.org/2000/svg"><path d="M4 19h6v-2H4v2zM20 5H4v2h16V5zm-3 6H4v2h13.25c1.1 0 2 .9 2 2s-.9 2-2 2H15v-2l-3 3 3 3v-2h2c2.21 0 4-1.79 4-4s-1.79-4-4-4z"></path></svg></button><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-2 top-2 ml-2 mt-1" role="alert" style="outline:none"><svg class="" height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M13 7H7V5H13V7Z" fill="currentColor"></path><path d="M13 11H7V9H13V11Z" fill="currentColor"></path><path d="M7 15H13V13H7V15Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M3 19V1H17V5H21V23H7V19H3ZM15 17V3H5V17H15ZM17 7V19H9V21H19V7H17Z" fill="currentColor"></path></svg><div class="absolute bottom-0 left-0 right-0 top-0 flex items-center justify-center text-green-300 opacity-0"><svg height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M10.2426 16.3137L6 12.071L7.41421 10.6568L10.2426 13.4853L15.8995 7.8284L17.3137 9.24262L10.2426 16.3137Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1 5C1 2.79086 2.79086 1 5 1H19C21.2091 1 23 2.79086 23 5V19C23 21.2091 21.2091 23 19 23H5C2.79086 23 1 21.2091 1 19V5ZM5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5C3 3.89543 3.89543 3 5 3Z" fill="currentColor"></path></svg></div></button></div><h3 id="insufficient-memory-allocated-for-shuffle-operations" class="group flex whitespace-pre-wrap" data-toc-depth="3" data-toc-id="insufficient-memory-allocated-for-shuffle-operations" data-toc-text="Insufficient memory allocated for shuffle operations." data-toc-hidden="false"><span>Insufficient memory allocated for shuffle operations.</span><a href="#insufficient-memory-allocated-for-shuffle-operations" class="ml-1 text-gray-400 opacity-0 group-hover:opacity-100 relative after:content-['#']" aria-label="Anchor"></a></h3><div class="code relative mx-auto" aria-live="polite"><pre style="color:#9CDCFE;background-color:#1E1E1E"><div class="token-line" style="color:#9CDCFE"><span class="" style="display:inline-block"> </span></div></pre><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-8 top-1.5 ml-2 mt-1" role="alert" style="outline:none"><svg class="text-gray-100" height="24" width="24" viewBox="0 0 24 24" fill="currentcolor" xmlns="http://www.w3.org/2000/svg"><path d="M4 19h6v-2H4v2zM20 5H4v2h16V5zm-3 6H4v2h13.25c1.1 0 2 .9 2 2s-.9 2-2 2H15v-2l-3 3 3 3v-2h2c2.21 0 4-1.79 4-4s-1.79-4-4-4z"></path></svg></button><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-2 top-2 ml-2 mt-1" role="alert" style="outline:none"><svg class="" height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M13 7H7V5H13V7Z" fill="currentColor"></path><path d="M13 11H7V9H13V11Z" fill="currentColor"></path><path d="M7 15H13V13H7V15Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M3 19V1H17V5H21V23H7V19H3ZM15 17V3H5V17H15ZM17 7V19H9V21H19V7H17Z" fill="currentColor"></path></svg><div class="absolute bottom-0 left-0 right-0 top-0 flex items-center justify-center text-green-300 opacity-0"><svg height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M10.2426 16.3137L6 12.071L7.41421 10.6568L10.2426 13.4853L15.8995 7.8284L17.3137 9.24262L10.2426 16.3137Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1 5C1 2.79086 2.79086 1 5 1H19C21.2091 1 23 2.79086 23 5V19C23 21.2091 21.2091 23 19 23H5C2.79086 23 1 21.2091 1 19V5ZM5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5C3 3.89543 3.89543 3 5 3Z" fill="currentColor"></path></svg></div></button></div></div><h3 id="repair" class="prose text-2xl font-bold tracking-tight text-gray-900 sm:text-3xl">Repair</h3><div class="md:prose-md prose prose-slate !max-w-none font-light dark:prose-invert dark:text-slate-400 prose-headings:scroll-mt-28 prose-headings:font-display prose-headings:font-normal lg:prose-headings:scroll-mt-[8.5rem] prose-lead:text-slate-500 dark:prose-lead:text-slate-400 prose-a:font-semibold dark:prose-a:text-sun-400 prose-a:no-underline prose-a:shadow-[inset_0_-2px_0_0_var(--tw-prose-background,#fff),inset_0_calc(-1*(var(--tw-prose-underline-size,4px)+1px))_0_0_var(--tw-prose-underline,theme(colors.sun.300))] hover:prose-a:[--tw-prose-underline-size:6px] dark:[--tw-prose-background:theme(colors.slate.900)] dark:prose-a:shadow-[inset_0_calc(-1*var(--tw-prose-underline-size,2px))_0_0_var(--tw-prose-underline,theme(colors.sun.800))] dark:hover:prose-a:[--tw-prose-underline-size:6px] prose-pre:rounded-xl prose-pre:bg-slate-900 prose-pre:shadow-lg dark:prose-pre:bg-slate-800/60 dark:prose-pre:shadow-none dark:prose-pre:ring-1 dark:prose-pre:ring-slate-300/10 dark:prose-hr:border-slate-800"><h3 id="increase-the-memory-allocation-for-shuffle-operations-to-avoid-spills-this-can-be-done-by-increasing-the-sparkshufflememoryfraction-or-sparkmemoryfraction-configuration-parameters" class="group flex whitespace-pre-wrap" data-toc-depth="3" data-toc-id="increase-the-memory-allocation-for-shuffle-operations-to-avoid-spills-this-can-be-done-by-increasing-the-sparkshufflememoryfraction-or-sparkmemoryfraction-configuration-parameters" data-toc-text="Increase the memory allocation for shuffle operations to avoid spills. This can be done by increasing the spark.shuffle.memoryFraction or spark.memory.fraction configuration parameters." data-toc-hidden="false"><span>Increase the memory allocation for shuffle operations to avoid spills. This can be done by increasing the <code>spark.shuffle.memoryFraction</code> or <code>spark.memory.fraction</code> configuration parameters.</span><a href="#increase-the-memory-allocation-for-shuffle-operations-to-avoid-spills-this-can-be-done-by-increasing-the-sparkshufflememoryfraction-or-sparkmemoryfraction-configuration-parameters" class="ml-1 text-gray-400 opacity-0 group-hover:opacity-100 relative after:content-['#']" aria-label="Anchor"></a></h3><div class="code relative mx-auto" aria-live="polite"><pre style="color:#9CDCFE;background-color:#1E1E1E"><div class="token-line" style="color:#9CDCFE"><span class="" style="display:inline-block"> </span></div></pre><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-8 top-1.5 ml-2 mt-1" role="alert" style="outline:none"><svg class="text-gray-100" height="24" width="24" viewBox="0 0 24 24" fill="currentcolor" xmlns="http://www.w3.org/2000/svg"><path d="M4 19h6v-2H4v2zM20 5H4v2h16V5zm-3 6H4v2h13.25c1.1 0 2 .9 2 2s-.9 2-2 2H15v-2l-3 3 3 3v-2h2c2.21 0 4-1.79 4-4s-1.79-4-4-4z"></path></svg></button><button class="sticky right-0 top-0 inline-block items-center font-mono leading-none text-gray-100 !absolute right-2 top-2 ml-2 mt-1" role="alert" style="outline:none"><svg class="" height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M13 7H7V5H13V7Z" fill="currentColor"></path><path d="M13 11H7V9H13V11Z" fill="currentColor"></path><path d="M7 15H13V13H7V15Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M3 19V1H17V5H21V23H7V19H3ZM15 17V3H5V17H15ZM17 7V19H9V21H19V7H17Z" fill="currentColor"></path></svg><div class="absolute bottom-0 left-0 right-0 top-0 flex items-center justify-center text-green-300 opacity-0"><svg height="20" width="20" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M10.2426 16.3137L6 12.071L7.41421 10.6568L10.2426 13.4853L15.8995 7.8284L17.3137 9.24262L10.2426 16.3137Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1 5C1 2.79086 2.79086 1 5 1H19C21.2091 1 23 2.79086 23 5V19C23 21.2091 21.2091 23 19 23H5C2.79086 23 1 21.2091 1 19V5ZM5 3H19C20.1046 3 21 3.89543 21 5V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V5C3 3.89543 3.89543 3 5 3Z" fill="currentColor"></path></svg></div></button></div></div></div></div><div class="flex-none hidden lg:flex lg:flex-none"><aside class="sticky top-24 ml-4 self-start"><nav class="py-4 lg:py-8 lg:mt-0 lg:pt-0"><p class="mb-3 text-sm font-bold uppercase tracking-wider text-gray-500 lg:mb-2 lg:text-xs">ON THIS PAGE</p><ul><li class="border-l-2 py-1 pl-3 text-gray-700"><a href="#overview" class="block cursor-pointer text-sm transition-colors duration-200 hover:text-primary-600 font-normal">Overview</a></li><li class="border-l-2 py-1 pl-3 text-gray-700"><a href="#parameter" class="block cursor-pointer text-sm transition-colors duration-200 hover:text-primary-600 font-normal">Parameters</a></li><li class="border-l-2 py-1 pl-3 text-gray-700"><a href="#debug" class="block cursor-pointer text-sm transition-colors duration-200 hover:text-primary-600 font-normal">Debug</a></li><li class="border-l-2 py-1 pl-3 text-gray-700"><a href="#repair" class="block cursor-pointer text-sm transition-colors duration-200 hover:text-primary-600 font-normal">Repair</a></li></ul></nav></aside></div></div></section><div class="relative scroll-mt-14 pt-8 sm:scroll-mt-32 sm:pt-10 lg:pt-16"><div class="absolute inset-x-0 bottom-0 top-1/2 text-slate-900/10 [mask-image:linear-gradient(transparent,white)]"><svg aria-hidden="true" class="absolute inset-0 h-full w-full"><defs><pattern id=":R1db6:" width="128" height="128" patternUnits="userSpaceOnUse" x="50%" y="100%"><path d="M0 128V.5H128" fill="none" stroke="currentColor"></path></pattern></defs><rect width="100%" height="100%" fill="url(#:R1db6:)"></rect></svg><div class="ocean -z-50 overflow-x-clip opacity-40 z-0"><div class="wave"></div><div class="wave"></div></div></div><section class="mx-auto max-w-7xl px-2 sm:px-6 lg:px-8 relative my-4 flex flex-col justify-between gap-x-4 md:flex-row"><div class="flex w-full flex-col items-center justify-center"><div class="mx-auto max-w-2xl px-6 lg:px-0 lg:pr-4 lg:pt-4 lg:text-center text-balance"><div class="mx-auto max-w-2xl lg:text-center"><h3 class="text-base font-semibold leading-7 text-primary-600">Learn more</h3><h2 class="mt-2 text-3xl font-bold tracking-tight text-gray-900 sm:text-4xl">Related Runbooks</h2><p class="text-balance mt-6 text-lg leading-8 text-gray-600" style="text-wrap:balance">Check out these related runbooks to help you debug and resolve similar issues.</p></div></div><div class="mx-auto grid h-full w-full max-w-2xl auto-rows-fr grid-cols-1 gap-4 pt-4 lg:mx-0 lg:max-w-none lg:grid-cols-4"><article class="group relative isolate flex flex-col justify-end overflow-hidden rounded-2xl transition-shadow duration-200 ease-in-out hover:shadow-xl p-4"><img alt="Spark tasks failing due to out of memory errors." loading="lazy" width="4195" height="2802" decoding="async" data-nimg="1" class="absolute inset-0 -z-10 h-full w-full object-cover" style="color:transparent" sizes="(max-width: 768px) 100vw, (max-width: 1200px) 50vw, 33vw" srcSet="/_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=256&q=75 256w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=384&q=75 384w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=640&q=75 640w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=750&q=75 750w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=828&q=75 828w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=1080&q=75 1080w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=1200&q=75 1200w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=1920&q=75 1920w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=2048&q=75 2048w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=3840&q=75 3840w" src="/_next/image?url=%2Fassets%2Fimages%2Frunbooks%2Fd94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg&w=3840&q=75"/><div class="absolute inset-0 -z-10 bg-gradient-to-t from-gray-900 from-10% via-gray-900/60 via-50%"></div><div class="absolute inset-0 -z-10 rounded-2xl ring-1 ring-inset ring-gray-900/10"></div><h3 class="mb-3 text-lg font-semibold leading-6 text-white group-hover:underline"><a href="/runbooks/spark/spark-tasks-failing-due-to-out-of-memory-errors"><span class="absolute inset-0"></span>Spark tasks failing due to out of memory errors.</a></h3><div class="flex flex-wrap items-center gap-y-1 overflow-hidden text-sm leading-6 text-gray-300"><div class="flex items-center gap-x-4"><div class="flex gap-x-2.5"><span class="inline-flex min-w-fit items-center rounded-full px-2 py-1 text-xs font-medium border border-indigo-200 bg-indigo-100 text-indigo-800 inline-flex cursor-pointer items-center justify-center rounded-md font-medium leading-5 duration-150 ease-in-out border bg-white text-black hover:border-gray-300 hover:bg-gray-200 hover:text-gray-700 self-end !text-xs">Spark</span></div></div></div></article><article class="group relative isolate flex flex-col justify-end overflow-hidden rounded-2xl transition-shadow duration-200 ease-in-out hover:shadow-xl p-4"><img alt="Spark job failures due to cluster resource contentions." loading="lazy" width="8750" height="12667" decoding="async" data-nimg="1" class="absolute inset-0 -z-10 h-full w-full object-cover" style="color:transparent" sizes="(max-width: 768px) 100vw, (max-width: 1200px) 50vw, 33vw" srcSet="/_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=256&q=75 256w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=384&q=75 384w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=640&q=75 640w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=750&q=75 750w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=828&q=75 828w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=1080&q=75 1080w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=1200&q=75 1200w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=1920&q=75 1920w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=2048&q=75 2048w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=3840&q=75 3840w" src="/_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg&w=3840&q=75"/><div class="absolute inset-0 -z-10 bg-gradient-to-t from-gray-900 from-10% via-gray-900/60 via-50%"></div><div class="absolute inset-0 -z-10 rounded-2xl ring-1 ring-inset ring-gray-900/10"></div><h3 class="mb-3 text-lg font-semibold leading-6 text-white group-hover:underline"><a href="/runbooks/spark/spark-job-failures-due-to-cluster-resource-contentions"><span class="absolute inset-0"></span>Spark job failures due to cluster resource contentions.</a></h3><div class="flex flex-wrap items-center gap-y-1 overflow-hidden text-sm leading-6 text-gray-300"><div class="flex items-center gap-x-4"><div class="flex gap-x-2.5"><span class="inline-flex min-w-fit items-center rounded-full px-2 py-1 text-xs font-medium border border-indigo-200 bg-indigo-100 text-indigo-800 inline-flex cursor-pointer items-center justify-center rounded-md font-medium leading-5 duration-150 ease-in-out border bg-white text-black hover:border-gray-300 hover:bg-gray-200 hover:text-gray-700 self-end !text-xs">Spark</span></div></div></div></article><article class="group relative isolate flex flex-col justify-end overflow-hidden rounded-2xl transition-shadow duration-200 ease-in-out hover:shadow-xl p-4"><img alt="Spark executor failure during job execution." loading="lazy" width="6000" height="4000" decoding="async" data-nimg="1" class="absolute inset-0 -z-10 h-full w-full object-cover" style="color:transparent" sizes="(max-width: 768px) 100vw, (max-width: 1200px) 50vw, 33vw" srcSet="/_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=256&q=75 256w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=384&q=75 384w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=640&q=75 640w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=750&q=75 750w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=828&q=75 828w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=1080&q=75 1080w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=1200&q=75 1200w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=1920&q=75 1920w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=2048&q=75 2048w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=3840&q=75 3840w" src="/_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg&w=3840&q=75"/><div class="absolute inset-0 -z-10 bg-gradient-to-t from-gray-900 from-10% via-gray-900/60 via-50%"></div><div class="absolute inset-0 -z-10 rounded-2xl ring-1 ring-inset ring-gray-900/10"></div><h3 class="mb-3 text-lg font-semibold leading-6 text-white group-hover:underline"><a href="/runbooks/spark/spark-executor-failure-during-job-execution"><span class="absolute inset-0"></span>Spark executor failure during job execution.</a></h3><div class="flex flex-wrap items-center gap-y-1 overflow-hidden text-sm leading-6 text-gray-300"><div class="flex items-center gap-x-4"><div class="flex gap-x-2.5"><span class="inline-flex min-w-fit items-center rounded-full px-2 py-1 text-xs font-medium border border-indigo-200 bg-indigo-100 text-indigo-800 inline-flex cursor-pointer items-center justify-center rounded-md font-medium leading-5 duration-150 ease-in-out border bg-white text-black hover:border-gray-300 hover:bg-gray-200 hover:text-gray-700 self-end !text-xs">Spark</span></div></div></div></article><article class="group relative isolate flex flex-col justify-end overflow-hidden rounded-2xl transition-shadow duration-200 ease-in-out hover:shadow-xl p-4"><img alt="Spark driver program crash during job runtime." loading="lazy" width="7680" height="4320" decoding="async" data-nimg="1" class="absolute inset-0 -z-10 h-full w-full object-cover" style="color:transparent" sizes="(max-width: 768px) 100vw, (max-width: 1200px) 50vw, 33vw" srcSet="/_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=256&q=75 256w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=384&q=75 384w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=640&q=75 640w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=750&q=75 750w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=828&q=75 828w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=1080&q=75 1080w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=1200&q=75 1200w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=1920&q=75 1920w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=2048&q=75 2048w, /_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=3840&q=75 3840w" src="/_next/image?url=%2Fassets%2Fimages%2Frunbooks%2F4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg&w=3840&q=75"/><div class="absolute inset-0 -z-10 bg-gradient-to-t from-gray-900 from-10% via-gray-900/60 via-50%"></div><div class="absolute inset-0 -z-10 rounded-2xl ring-1 ring-inset ring-gray-900/10"></div><h3 class="mb-3 text-lg font-semibold leading-6 text-white group-hover:underline"><a href="/runbooks/spark/spark-driver-program-crash-during-job-runtime"><span class="absolute inset-0"></span>Spark driver program crash during job runtime.</a></h3><div class="flex flex-wrap items-center gap-y-1 overflow-hidden text-sm leading-6 text-gray-300"><div class="flex items-center gap-x-4"><div class="flex gap-x-2.5"><span class="inline-flex min-w-fit items-center rounded-full px-2 py-1 text-xs font-medium border border-indigo-200 bg-indigo-100 text-indigo-800 inline-flex cursor-pointer items-center justify-center rounded-md font-medium leading-5 duration-150 ease-in-out border bg-white text-black hover:border-gray-300 hover:bg-gray-200 hover:text-gray-700 self-end !text-xs">Spark</span></div></div></div></article></div></div></section></div></div><footer class="bg-secondary-100" aria-labelledby="footer-heading"><section class="relative mx-auto max-w-7xl px-2 sm:px-6 lg:px-8 py-8"><h2 id="footer-heading" class="sr-only">Footer</h2><div><div class="grid grid-cols-1 gap-4 sm:grid-cols-3 md:grid-cols-4 xl:gap-8"><div class="space-y-8"><img alt="Shoreline Logo" loading="lazy" width="300" height="76" decoding="async" data-nimg="1" class="inline-block w-auto max-w-[200px] border-0 align-middle" style="color:transparent" srcSet="/_next/image?url=%2Fassets%2Fimages%2Flogos%2Fshoreline-logo.svg&w=384&q=75 1x, /_next/image?url=%2Fassets%2Fimages%2Flogos%2Fshoreline-logo.svg&w=640&q=75 2x" src="/_next/image?url=%2Fassets%2Fimages%2Flogos%2Fshoreline-logo.svg&w=640&q=75"/><div class="flex space-x-6"><a href="https://www.linkedin.com/company/shoreline-software/" class="text-gray-400 hover:text-gray-500" target="_blank"><span class="sr-only">LinkedIn</span><svg width="24" height="25" viewBox="0 0 24 25" fill="none" xmlns="http://www.w3.org/2000/svg" class="h-6 w-6" aria-hidden="true"><g clip-path="url(#clip0_468_100884)"><path d="M22.2234 0.5H1.77187C0.792187 0.5 0 1.27344 0 2.22969V22.7656C0 23.7219 0.792187 24.5 1.77187 24.5H22.2234C23.2031 24.5 24 23.7219 24 22.7703V2.22969C24 1.27344 23.2031 0.5 22.2234 0.5ZM7.12031 20.9516H3.55781V9.49531H7.12031V20.9516ZM5.33906 7.93438C4.19531 7.93438 3.27188 7.01094 3.27188 5.87187C3.27188 4.73281 4.19531 3.80937 5.33906 3.80937C6.47813 3.80937 7.40156 4.73281 7.40156 5.87187C7.40156 7.00625 6.47813 7.93438 5.33906 7.93438ZM20.4516 20.9516H16.8937V15.3828C16.8937 14.0562 16.8703 12.3453 15.0422 12.3453C13.1906 12.3453 12.9094 13.7937 12.9094 15.2891V20.9516H9.35625V9.49531H12.7687V11.0609H12.8156C13.2891 10.1609 14.4516 9.20938 16.1813 9.20938C19.7859 9.20938 20.4516 11.5813 20.4516 14.6656V20.9516Z" fill="#98A2B3"></path></g><defs><clipPath id="clip0_468_100884"><rect width="24" height="24" fill="white" transform="translate(0 0.5)"></rect></clipPath></defs></svg></a><a href="https://twitter.com/shoreline_io" class="text-gray-400 hover:text-gray-500" target="_blank"><span class="sr-only">Twitter</span><svg fill="currentColor" viewBox="0 0 24 24" class="h-6 w-6" aria-hidden="true"><path d="M8.29 20.251c7.547 0 11.675-6.253 11.675-11.675 0-.178 0-.355-.012-.53A8.348 8.348 0 0022 5.92a8.19 8.19 0 01-2.357.646 4.118 4.118 0 001.804-2.27 8.224 8.224 0 01-2.605.996 4.107 4.107 0 00-6.993 3.743 11.65 11.65 0 01-8.457-4.287 4.106 4.106 0 001.27 5.477A4.072 4.072 0 012.8 9.713v.052a4.105 4.105 0 003.292 4.022 4.095 4.095 0 01-1.853.07 4.108 4.108 0 003.834 2.85A8.233 8.233 0 012 18.407a11.616 11.616 0 006.29 1.84"></path></svg></a><a href="https://github.com/shorelinesoftware" class="text-gray-400 hover:text-gray-500" target="_blank"><span class="sr-only">GitHub</span><svg fill="currentColor" viewBox="0 0 24 24" class="h-6 w-6" aria-hidden="true"><path fill-rule="evenodd" d="M12 2C6.477 2 2 6.484 2 12.017c0 4.425 2.865 8.18 6.839 9.504.5.092.682-.217.682-.483 0-.237-.008-.868-.013-1.703-2.782.605-3.369-1.343-3.369-1.343-.454-1.158-1.11-1.466-1.11-1.466-.908-.62.069-.608.069-.608 1.003.07 1.531 1.032 1.531 1.032.892 1.53 2.341 1.088 2.91.832.092-.647.35-1.088.636-1.338-2.22-.253-4.555-1.113-4.555-4.951 0-1.093.39-1.988 1.029-2.688-.103-.253-.446-1.272.098-2.65 0 0 .84-.27 2.75 1.026A9.564 9.564 0 0112 6.844c.85.004 1.705.115 2.504.337 1.909-1.296 2.747-1.027 2.747-1.027.546 1.379.202 2.398.1 2.651.64.7 1.028 1.595 1.028 2.688 0 3.848-2.339 4.695-4.566 4.943.359.309.678.92.678 1.855 0 1.338-.012 2.419-.012 2.747 0 .268.18.58.688.482A10.019 10.019 0 0022 12.017C22 6.484 17.522 2 12 2z" clip-rule="evenodd"></path></svg></a><a href="https://www.youtube.com/channel/UCV8jBJHfH7yXDZZl9KjfQAA" class="text-gray-400 hover:text-gray-500" target="_blank"><span class="sr-only">YouTube</span><svg fill="currentColor" viewBox="0 0 24 24" class="h-6 w-6" aria-hidden="true"><path fill-rule="evenodd" d="M19.812 5.418c.861.23 1.538.907 1.768 1.768C21.998 8.746 22 12 22 12s0 3.255-.418 4.814a2.504 2.504 0 0 1-1.768 1.768c-1.56.419-7.814.419-7.814.419s-6.255 0-7.814-.419a2.505 2.505 0 0 1-1.768-1.768C2 15.255 2 12 2 12s0-3.255.417-4.814a2.507 2.507 0 0 1 1.768-1.768C5.744 5 11.998 5 11.998 5s6.255 0 7.814.418ZM15.194 12 10 15V9l5.194 3Z" clip-rule="evenodd"></path></svg></a></div><p class="text-xs leading-5 text-gray-500">Questions?<!-- --> <a class="text-primary-600 dark:text-primary-500 hover:underline" href="mailto:hello@shoreline.io">hello@shoreline.io</a></p></div><div class="grid grid-cols-2 gap-8 sm:col-span-2 md:col-span-3"><div class="md:grid md:grid-cols-2 md:gap-8"><div><h3 class="text-sm font-semibold leading-6 text-gray-900">Support</h3><ul role="list" class="mt-2 space-y-2"><li><a href="https://docs.shoreline.io" target="_blank" class="text-sm leading-6 text-gray-600 hover:text-gray-900 hover:underline">Documentation<svg class="inline-block h-6 w-6 pb-1 text-primary-500" fill="currentColor" stroke="none" focusable="false" viewBox="0 0 20 20" aria-hidden="true"><path d="M11 3a1 1 0 100 2h2.586l-6.293 6.293a1 1 0 101.414 1.414L15 6.414V9a1 1 0 102 0V4a1 1 0 00-1-1h-5z"></path><path d="M5 5a2 2 0 00-2 2v8a2 2 0 002 2h8a2 2 0 002-2v-3a1 1 0 10-2 0v3H5V7h3a1 1 0 000-2H5z"></path></svg></a></li></ul></div></div></div></div><div class="mt-4 flex flex-col justify-between gap-y-2 border-t border-gray-900/10 pt-4 text-center lg:flex-row lg:text-left"><p class="text-xs leading-5 text-gray-500">© <!-- -->2024<!-- --> Shoreline Software™. All rights reserved.</p><ul class="flex list-none divide-x divide-none self-center text-xs leading-5 text-gray-500 md:divide-solid"><li class="px-2"><a href="https://www.nvidia.com/en-us/about-nvidia/cookie-policy/" target="_blank" class="text-xs leading-5 text-gray-500/75 hover:underline">Cookies</a></li><li class="px-2"><a href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/" target="_blank" class="text-xs leading-5 text-gray-500/75 hover:underline">Privacy</a></li><li class="px-2"><a href="https://www.nvidia.com/en-us/about-nvidia/terms-of-service/" target="_blank" class="text-xs leading-5 text-gray-500/75 hover:underline">Terms</a></li><li class="px-2"><a href="https://www.nvidia.com/en-us/preferences/start/" target="_blank" class="text-xs leading-5 text-gray-500/75 hover:underline">DSAR</a></li><li class="px-0 sm:px-1 md:px-2"><a href="/runbooks/spark/spark-tasks-experiencing-shuffle-spills-and-high-disk-i-o#" class="text-xs leading-5 text-gray-500/75 hover:text-gray-900 termly-display-preferences hover:underline">Preferences</a></li></ul><span class="text-xs leading-5 text-gray-500/75"><span>Shoreline.io HQ</span> - 255 Shoreline Dr., Ste 315, Redwood City, CA 94065</span></div></div></section></footer></main></div><script id="__NEXT_DATA__" type="application/json" crossorigin="">{"props":{"pageProps":{"runbook":{"id":"956f46e3-7f56-442d-938b-43f0c1281ad9","authorId":null,"content":"---\nid: 956f46e3-7f56-442d-938b-43f0c1281ad9\n---\n\n# Spark tasks experiencing shuffle spills and high disk I/O.\n---\n\nThis incident type typically occurs in distributed computing systems, where Spark tasks are experiencing high disk I/O and shuffle spills. Spark is a popular distributed computing engine that uses shuffle operations to move data between nodes in a cluster, which can sometimes result in performance issues due to spills. The spills occur when the data being shuffled exceeds the memory capacity allocated for the shuffle operations. This incident requires optimization of the shuffle operations to reduce spills and improve overall performance.\n\n### Parameters\n```shell\nexport COUNT=\"PLACEHOLDER\"\n\nexport INTERVAL=\"PLACEHOLDER\"\n\nexport APPLICATION_ID=\"PLACEHOLDER\"\n\nexport NEW_MEMORY_FRACTION_VALUE=\"PLACEHOLDER\"\n\nexport CONFIGURATION_PARAMETER_TO_UPDATE=\"PLACEHOLDER\"\n\nexport PATH_TO_SPARK_CONF=\"PLACEHOLDER\"\n```\n\n## Debug\n\n### Check the disk I/O usage\n```shell\niostat -x ${INTERVAL} ${COUNT}\n```\n\n### Check the network bandwidth usage\n```shell\nsar -n DEV ${INTERVAL} ${COUNT}\n```\n\n### Check the Spark task metrics\n```shell\nyarn logs -applicationId ${APPLICATION_ID} | grep \"TaskMetrics\" | grep \"ExecutorRunTime\"\n```\n\n### Check the shuffle size and spill metrics\n```shell\nyarn logs -applicationId ${APPLICATION_ID} | grep \"ShuffleMetrics\" | grep \"spilled\"\n```\n\n### Check the system resource usage\n```shell\ntop\n```\n\n### Insufficient memory allocated for shuffle operations.\n```shell\n\n\n#!/bin/bash\n\n\n\n# set the Spark configuration file path\n\nSPARK_CONF=${PATH_TO_SPARK_CONF}\n\n\n\n# get the current value of spark.memory.fraction\n\nMEMORY_FRACTION=$(grep \"spark.memory.fraction\" $SPARK_CONF | awk '{print $3}')\n\n\n\n# calculate the current value of spark.shuffle.memoryFraction\n\nSHUFFLE_MEMORY_FRACTION=$(echo \"scale=2; $MEMORY_FRACTION * 0.2\" | bc)\n\n\n\n# get the current value of spark.shuffle.memoryFraction\n\nCURRENT_SHUFFLE_MEMORY_FRACTION=$(grep \"spark.shuffle.memoryFraction\" $SPARK_CONF | awk '{print $3}')\n\n\n\n# compare the current value of spark.shuffle.memoryFraction with the calculated value\n\nif (( $(echo \"$CURRENT_SHUFFLE_MEMORY_FRACTION \u003c $SHUFFLE_MEMORY_FRACTION\" | bc -l) )); then\n\n # if the current value is less than the calculated value, increase the memory fraction for shuffle operations\n\n sed -i \"s/spark.shuffle.memoryFraction=.*/spark.shuffle.memoryFraction=$SHUFFLE_MEMORY_FRACTION/g\" $SPARK_CONF\n\n echo \"Increased memory fraction for shuffle operations to $SHUFFLE_MEMORY_FRACTION\"\n\nelse\n\n echo \"Memory fraction for shuffle operations is already sufficient\"\n\nfi\n\n\n```\n\n## Repair\n\n### Increase the memory allocation for shuffle operations to avoid spills. This can be done by increasing the `spark.shuffle.memoryFraction` or `spark.memory.fraction` configuration parameters.\n```shell\nbash\n\n#!/bin/bash\n\n\n\n# Set the path to the Spark configuration file\n\nSPARK_CONF=${PATH_TO_SPARK_CONF}\n\n\n\n# Set the new memory fraction value\n\nNEW_MEM_FRACTION=${NEW_MEMORY_FRACTION_VALUE}\n\n\n\n# Set the configuration parameter to be updated\n\nCONF_PARAM=${CONFIGURATION_PARAMETER_TO_UPDATE}\n\n\n\n# Update the configuration parameter in the Spark configuration file\n\nsed -i \"s/$CONF_PARAM=.*/$CONF_PARAM=$NEW_MEM_FRACTION/g\" $SPARK_CONF\n\n\n\n# Restart the Spark cluster to apply the configuration changes\n\nsudo systemctl restart spark\n\n\n```","description":"This incident type typically occurs in distributed computing systems, where Spark tasks are experiencing high disk I/O and shuffle spills. Spark is a popular distributed computing engine that uses shuffle operations to move data between nodes in a cluster, which can sometimes result in performance issues due to spills. The spills occur when the data being shuffled exceeds the memory capacity allocated for the shuffle operations. This incident requires optimization of the shuffle operations to reduce spills and improve overall performance.\n","environment":"production","repoUrl":"https://github.com/shorelinesoftware/generated-runbooks/blob/main/runbooks/Spark/Spark%20tasks%20experiencing%20shuffle%20spills%20and%20high%20disk%20I:O.md","terraformUrl":"https://registry.terraform.io/modules/terraform-shoreline-modules/spark-spark-tasks-experiencing-shuffle-spills-and-high-disk-i-o","s3Uri":null,"s3Url":null,"hash":"e854d2015e620cf4e2a9c93c290b701df0ed1c10cbffb1ef5ad23699bb1500a9","info":null,"incidentId":null,"metadata":{"seo":{"image":{"alt":"Spark tasks experiencing shuffle spills and high disk I/O.","src":"/assets/images/runbooks/3acc619184650dac675cfc81698cc86e4f6eab8dae440c4d97ab7531b24be147.jpg","width":2968,"height":2969}},"tags":[{"key":"category","slug":"category-spark","value":"Spark"}]},"published":true,"slug":"spark/spark-tasks-experiencing-shuffle-spills-and-high-disk-i-o","source":"shoreline","title":"Spark tasks experiencing shuffle spills and high disk I/O.","createdAt":"2024-01-24T17:06:28.866Z","updatedAt":"2024-01-24T17:06:28.866Z","relations":[{"id":31606,"parentId":"956f46e3-7f56-442d-938b-43f0c1281ad9","relationId":"02c7010b-81e5-46c9-b274-a25a96c788dd","createdAt":"2023-10-15T08:02:58.251Z","updatedAt":"2023-10-15T08:02:58.251Z","relation":{"createdAt":"2024-01-24T17:06:22.204Z","description":"This incident type refers to a situation where Spark tasks are failing due to out of memory errors. Spark is a distributed computing system used for big data processing. When the data volume exceeds the allocated memory, the Spark tasks fail, and the system generates an out of memory error. This type of incident can cause data processing delays or even system downtime, which can impact the overall performance of the application.\n","id":"02c7010b-81e5-46c9-b274-a25a96c788dd","metadata":{"seo":{"image":{"alt":"Spark tasks failing due to out of memory errors.","src":"/assets/images/runbooks/d94b2918e79e3f2370be219b62dd274077d28e45627e670826238ea09c032bb0.jpg","width":4195,"height":2802}},"tags":[{"key":"category","slug":"category-spark","value":"Spark"}]},"slug":"spark/spark-tasks-failing-due-to-out-of-memory-errors","title":"Spark tasks failing due to out of memory errors.","updatedAt":"2024-01-24T17:06:22.204Z"}},{"id":31607,"parentId":"956f46e3-7f56-442d-938b-43f0c1281ad9","relationId":"33315d35-d99e-4702-9c1c-962386a48dc4","createdAt":"2023-10-15T08:02:58.251Z","updatedAt":"2023-10-15T08:02:58.251Z","relation":{"createdAt":"2024-01-24T17:05:25.546Z","description":"This incident type refers to a situation where Spark jobs are failing due to resource contentions in the cluster. When multiple Spark jobs are trying to access the same resources or data at the same time, it can cause a bottleneck that leads to job failures. This can happen when the resources in the cluster are not properly allocated or when the number of jobs running simultaneously exceeds the cluster's capacity to handle them. The result is that Spark jobs fail, leading to disruptions in data processing and analysis.\n","id":"33315d35-d99e-4702-9c1c-962386a48dc4","metadata":{"seo":{"image":{"alt":"Spark job failures due to cluster resource contentions.","src":"/assets/images/runbooks/2769e1c28c7050e4639d3d3d83bdf889dc7bb84e4f4bc3283b7a5dd2bb181092.jpg","width":8750,"height":12667}},"tags":[{"key":"category","slug":"category-spark","value":"Spark"}]},"slug":"spark/spark-job-failures-due-to-cluster-resource-contentions","title":"Spark job failures due to cluster resource contentions.","updatedAt":"2024-01-24T17:05:25.546Z"}},{"id":31608,"parentId":"956f46e3-7f56-442d-938b-43f0c1281ad9","relationId":"68c83dc1-e9a0-42bd-b575-0ff12badb55f","createdAt":"2023-10-15T08:02:58.251Z","updatedAt":"2023-10-15T08:02:58.251Z","relation":{"createdAt":"2024-01-24T17:05:32.477Z","description":"This incident type refers to a failure in one or more Spark executors during the execution of a job. Spark executors are worker processes that run computations and store data in memory or on disk. When an executor fails, it can cause the entire job to fail or result in degraded performance. This type of incident can occur for a variety of reasons, such as hardware or network issues, memory errors, or software bugs.\n","id":"68c83dc1-e9a0-42bd-b575-0ff12badb55f","metadata":{"seo":{"image":{"alt":"Spark executor failure during job execution.","src":"/assets/images/runbooks/7025773fc2a3d6cfe775c843317d188fc5cfb44cc7a8d1897489d0d4a6b0666c.jpg","width":6000,"height":4000}},"tags":[{"key":"category","slug":"category-spark","value":"Spark"}]},"slug":"spark/spark-executor-failure-during-job-execution","title":"Spark executor failure during job execution.","updatedAt":"2024-01-24T17:05:32.477Z"}},{"id":31609,"parentId":"956f46e3-7f56-442d-938b-43f0c1281ad9","relationId":"117fdc27-6127-405b-8f43-4610b4a64624","createdAt":"2023-10-15T08:02:58.251Z","updatedAt":"2023-10-15T08:02:58.251Z","relation":{"createdAt":"2024-01-24T17:05:38.937Z","description":"This incident type refers to an unexpected termination of the Spark driver program during the runtime of a job. The driver program is responsible for coordinating the execution of a Spark job and if it crashes, the entire job is affected. This can result in data loss and downtime, and requires investigation and troubleshooting to identify the root cause of the issue.\n","id":"117fdc27-6127-405b-8f43-4610b4a64624","metadata":{"seo":{"image":{"alt":"Spark driver program crash during job runtime.","src":"/assets/images/runbooks/4ef581171cfd4c7cd1538537936871ff8a649de513bedfcfe4ba49ffe444c1bb.jpg","width":7680,"height":4320}},"tags":[{"key":"category","slug":"category-spark","value":"Spark"}]},"slug":"spark/spark-driver-program-crash-during-job-runtime","title":"Spark driver program crash during job runtime.","updatedAt":"2024-01-24T17:05:38.937Z"}},{"id":31610,"parentId":"956f46e3-7f56-442d-938b-43f0c1281ad9","relationId":"5c60cd57-96c4-46b3-a678-1c1ef208f532","createdAt":"2023-10-15T08:02:58.251Z","updatedAt":"2023-10-15T08:02:58.251Z","relation":{"createdAt":"2024-01-24T17:05:45.445Z","description":"Apache Spark driver failure refers to an incident where the driver program in an Apache Spark cluster fails to execute or crashes during runtime. This can happen due to a variety of reasons such as hardware failure, software bugs, resource constraints, or programming errors. As the driver program is responsible for coordinating the execution of tasks across the cluster, any failure in the driver can result in the entire Spark job failing. This can lead to data loss, processing delays, and impact the overall performance of the Spark cluster.\n","id":"5c60cd57-96c4-46b3-a678-1c1ef208f532","metadata":{"seo":{"image":{"alt":"Spark driver failure incident.","src":"/assets/images/runbooks/252703a3e88f0403b25876685752089d700fbd961d006645aa7c1c966b41a53c.jpg","width":3840,"height":2160}},"tags":[{"key":"category","slug":"category-spark","value":"Spark"}]},"slug":"spark/spark-driver-failure-incident","title":"Spark driver failure incident.","updatedAt":"2024-01-24T17:05:45.445Z"}}],"sections":[{"id":1210,"content":"```shell\nexport COUNT=\"PLACEHOLDER\"\n\nexport INTERVAL=\"PLACEHOLDER\"\n\nexport APPLICATION_ID=\"PLACEHOLDER\"\n\nexport NEW_MEMORY_FRACTION_VALUE=\"PLACEHOLDER\"\n\nexport CONFIGURATION_PARAMETER_TO_UPDATE=\"PLACEHOLDER\"\n\nexport PATH_TO_SPARK_CONF=\"PLACEHOLDER\"\n```\n","createdAt":"2024-01-24T17:06:30.247Z","runbookId":"956f46e3-7f56-442d-938b-43f0c1281ad9","type":"PARAMETER","updatedAt":"2024-01-24T17:06:30.247Z"},{"id":1211,"content":"### Check the disk I/O usage\n\n```shell\niostat -x ${INTERVAL} ${COUNT}\n```\n\n### Check the network bandwidth usage\n\n```shell\nsar -n DEV ${INTERVAL} ${COUNT}\n```\n\n### Check the Spark task metrics\n\n```shell\nyarn logs -applicationId ${APPLICATION_ID} | grep \"TaskMetrics\" | grep \"ExecutorRunTime\"\n```\n\n### Check the shuffle size and spill metrics\n\n```shell\nyarn logs -applicationId ${APPLICATION_ID} | grep \"ShuffleMetrics\" | grep \"spilled\"\n```\n\n### Check the system resource usage\n\n```shell\ntop\n```\n\n### Insufficient memory allocated for shuffle operations.\n\n```shell\n\n\n#!/bin/bash\n\n\n\n# set the Spark configuration file path\n\nSPARK_CONF=${PATH_TO_SPARK_CONF}\n\n\n\n# get the current value of spark.memory.fraction\n\nMEMORY_FRACTION=$(grep \"spark.memory.fraction\" $SPARK_CONF | awk '{print $3}')\n\n\n\n# calculate the current value of spark.shuffle.memoryFraction\n\nSHUFFLE_MEMORY_FRACTION=$(echo \"scale=2; $MEMORY_FRACTION * 0.2\" | bc)\n\n\n\n# get the current value of spark.shuffle.memoryFraction\n\nCURRENT_SHUFFLE_MEMORY_FRACTION=$(grep \"spark.shuffle.memoryFraction\" $SPARK_CONF | awk '{print $3}')\n\n\n\n# compare the current value of spark.shuffle.memoryFraction with the calculated value\n\nif (( $(echo \"$CURRENT_SHUFFLE_MEMORY_FRACTION \u003c $SHUFFLE_MEMORY_FRACTION\" | bc -l) )); then\n\n # if the current value is less than the calculated value, increase the memory fraction for shuffle operations\n\n sed -i \"s/spark.shuffle.memoryFraction=.*/spark.shuffle.memoryFraction=$SHUFFLE_MEMORY_FRACTION/g\" $SPARK_CONF\n\n echo \"Increased memory fraction for shuffle operations to $SHUFFLE_MEMORY_FRACTION\"\n\nelse\n\n echo \"Memory fraction for shuffle operations is already sufficient\"\n\nfi\n\n\n```\n","createdAt":"2024-01-24T17:06:30.984Z","runbookId":"956f46e3-7f56-442d-938b-43f0c1281ad9","type":"DEBUG","updatedAt":"2024-01-24T17:06:30.984Z"},{"id":1212,"content":"### Increase the memory allocation for shuffle operations to avoid spills. This can be done by increasing the `spark.shuffle.memoryFraction` or `spark.memory.fraction` configuration parameters.\n\n```shell\nbash\n\n#!/bin/bash\n\n\n\n# Set the path to the Spark configuration file\n\nSPARK_CONF=${PATH_TO_SPARK_CONF}\n\n\n\n# Set the new memory fraction value\n\nNEW_MEM_FRACTION=${NEW_MEMORY_FRACTION_VALUE}\n\n\n\n# Set the configuration parameter to be updated\n\nCONF_PARAM=${CONFIGURATION_PARAMETER_TO_UPDATE}\n\n\n\n# Update the configuration parameter in the Spark configuration file\n\nsed -i \"s/$CONF_PARAM=.*/$CONF_PARAM=$NEW_MEM_FRACTION/g\" $SPARK_CONF\n\n\n\n# Restart the Spark cluster to apply the configuration changes\n\nsudo systemctl restart spark\n\n\n```\n","createdAt":"2024-01-24T17:06:31.225Z","runbookId":"956f46e3-7f56-442d-938b-43f0c1281ad9","type":"REPAIR","updatedAt":"2024-01-24T17:06:31.225Z"}],"variables":[],"seo":{"description":"This incident type typically occurs in distributed computing systems, where Spark tasks are experiencing high disk I/O and shuffle spills. Spark is a popular distributed computing engine that uses shuffle operations to move data between nodes in a cluster, which can sometimes result in performance issues due to spills. The spills occur when the data being shuffled exceeds the memory capacity allocated for the shuffle operations. This incident requires optimization of the shuffle operations to reduce spills and improve overall performance.\n","title":"Spark tasks experiencing shuffle spills and high disk I/O. | Shoreline Runbooks"},"_id":"wT7XZSRCo945zrGbocTpMK","_updatedAt":"2024-01-08T06:55:35Z","environnment":null,"path":"/runbooks/spark/spark-tasks-experiencing-shuffle-spills-and-high-disk-i-o"},"draftMode":false,"token":""},"__N_SSG":true},"page":"/runbooks/[...slug]","query":{"slug":["spark","spark-tasks-experiencing-shuffle-spills-and-high-disk-i-o"]},"buildId":"Op4Q3dRyk20CRi25yi4wh","isFallback":false,"gsp":true,"locale":"en","locales":["en"],"defaultLocale":"en","scriptLoader":[{"id":"termly-mover","strategy":"afterInteractive","children":"!function(){var e=document.getElementById('termly-script-loader');if(e){e.parentNode.removeChild(e);var t=document.getElementsByTagName('head')[0];t.firstChild?t.insertBefore(e,t.firstChild):t.appendChild(e);}}();"},{"id":"google-analytics-gtag-script-loader","src":"https://www.googletagmanager.com/gtag/js?id=G-TJ5GZM0EYJ","strategy":"afterInteractive"},{"id":"google-tag-manager-script-loader","strategy":"afterInteractive","children":"\n (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'\u0026l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-5Q86KVJ');\n "},{"id":"google-analytics-script-loader","strategy":"afterInteractive","children":"\n window.dataLayer = window.dataLayer || [];\n function gtag(){window.dataLayer.push(arguments);}\n gtag('js', new Date());\n gtag('config', 'G-TJ5GZM0EYJ');\n gtag('config', 'GTM-5Q86KVJ');\n "},{"id":"posthog-script-loader","strategy":"afterInteractive","children":"!function(t,e){var o,n,p,r;e.__SV||(window.posthog=e,e._i=[],e.init=function(i,s,a){function g(t,e){var o=e.split(\".\");2==o.length\u0026\u0026(t=t[o[0]],e=o[1]),t[e]=function(){t.push([e].concat(Array.prototype.slice.call(arguments,0)))}}(p=t.createElement(\"script\")).type=\"text/javascript\",p.async=!0,p.src=s.api_host+\"/static/array.js\",(r=t.getElementsByTagName(\"script\")[0]).parentNode.insertBefore(p,r);var u=e;for(void 0!==a?u=e[a]=[]:a=\"posthog\",u.people=u.people||[],u.toString=function(t){var e=\"posthog\";return\"posthog\"!==a\u0026\u0026(e+=\".\"+a),t||(e+=\" (stub)\"),e},u.people.toString=function(){return u.toString(1)+\".people (stub)\"},o=\"capture identify alias people.set people.set_once set_config register register_once unregister opt_out_capturing has_opted_out_capturing opt_in_capturing reset isFeatureEnabled onFeatureFlags\".split(\" \"),n=0;n\u003co.length;n++)g(u,o[n]);e._i.push([i,s,a])},e.__SV=1)}(document,window.posthog||[]);\n posthog.init('phc_kcxvhx5JkclTKWXaKsrqCCKksG5GjYsL3NIksZStgK7',{api_host:'https://app.posthog.com'})"},{"type":"text/javascript","strategy":"afterInteractive","id":"hs-script-loader","src":"//js-na1.hs-scripts.com/8855569.js"}]}</script></body></html>