CINXE.COM

Apache Spark™ - Unified Engine for large-scale data analytics

<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title> Apache Spark&trade; - Unified Engine for large-scale data analytics </title> <meta name="description" content="Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters."> <meta name="twitter:card" content="summary_large_image"> <meta name="twitter:site" content="@ApacheSpark"> <meta name="twitter:title" content=" Apache Spark&trade; - Unified Engine for large-scale data analytics"> <meta name="twitter:description" content="Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters."> <meta name="twitter:image" content="https://spark.apache.org/images/spark-twitter-card-large.jpg"> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> <link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> <link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700&Courier+Prime:wght@400;700&display=swap" rel="stylesheet"> <link href="/css/custom.css" rel="stylesheet"> <link href="/css/pygments-default.css" rel="stylesheet"> <link rel="icon" href="favicon.ico" type="image/x-icon"> <!-- Matomo --> <script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="https://analytics.apache.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '40']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script> <!-- End Matomo Code --> </head> <body> <nav class="navbar navbar-expand-lg navbar-dark p-0 px-4" style="background: #1D6890;"> <a class="navbar-brand" href="/"> <img src="/images/spark-logo-rev.svg" alt="" width="141" height="72"> </a> <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarContent" aria-controls="navbarContent" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="collapse navbar-collapse col-md-12 col-lg-auto pt-4" id="navbarContent"> <ul class="navbar-nav me-auto"> <li class="nav-item"> <a class="nav-link active" aria-current="page" href="/downloads.html">Download</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="libraries" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Libraries </a> <ul class="dropdown-menu" aria-labelledby="libraries"> <li><a class="dropdown-item" href="/sql/">SQL and DataFrames</a></li> <li><a class="dropdown-item" href="/spark-connect/">Spark Connect</a></li> <li><a class="dropdown-item" href="/streaming/">Spark Streaming</a></li> <li><a class="dropdown-item" href="/pandas-on-spark/">pandas on Spark</a></li> <li><a class="dropdown-item" href="/mllib/">MLlib (machine learning)</a></li> <li><a class="dropdown-item" href="/graphx/">GraphX (graph)</a></li> <li> <hr class="dropdown-divider"> </li> <li><a class="dropdown-item" href="/third-party-projects.html">Third-Party Projects</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="documentation" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Documentation </a> <ul class="dropdown-menu" aria-labelledby="documentation"> <li><a class="dropdown-item" href="/docs/latest/">Latest Release</a></li> <li><a class="dropdown-item" href="/documentation.html">Older Versions and Other Resources</a></li> <li><a class="dropdown-item" href="/faq.html">Frequently Asked Questions</a></li> </ul> </li> <li class="nav-item"> <a class="nav-link active" aria-current="page" href="/examples.html">Examples</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="community" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Community </a> <ul class="dropdown-menu" aria-labelledby="community"> <li><a class="dropdown-item" href="/community.html">Mailing Lists &amp; Resources</a></li> <li><a class="dropdown-item" href="/contributing.html">Contributing to Spark</a></li> <li><a class="dropdown-item" href="/improvement-proposals.html">Improvement Proposals (SPIP)</a> </li> <li><a class="dropdown-item" href="https://issues.apache.org/jira/browse/SPARK">Issue Tracker</a> </li> <li><a class="dropdown-item" href="/powered-by.html">Powered By</a></li> <li><a class="dropdown-item" href="/committers.html">Project Committers</a></li> <li><a class="dropdown-item" href="/history.html">Project History</a></li> <li><a class="dropdown-item" href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy Policy</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="developers" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Developers </a> <ul class="dropdown-menu" aria-labelledby="developers"> <li><a class="dropdown-item" href="/developer-tools.html">Useful Developer Tools</a></li> <li><a class="dropdown-item" href="/versioning-policy.html">Versioning Policy</a></li> <li><a class="dropdown-item" href="/release-process.html">Release Process</a></li> <li><a class="dropdown-item" href="/security.html">Security</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="github" role="button" data-bs-toggle="dropdown" aria-expanded="false"> GitHub </a> <ul class="dropdown-menu" aria-labelledby="github"> <li><a class="dropdown-item" href="https://github.com/apache/spark">spark</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-connect-go">spark-connect-go</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-docker">spark-docker</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-kubernetes-operator">spark-kubernetes-operator</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-website">spark-website</a></li> </ul> </li> </ul> <ul class="navbar-nav ml-auto"> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="apacheFoundation" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Apache Software Foundation </a> <ul class="dropdown-menu" aria-labelledby="apacheFoundation"> <li><a class="dropdown-item" href="https://www.apache.org/">Apache Homepage</a></li> <li><a class="dropdown-item" href="https://www.apache.org/licenses/">License</a></li> <li><a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> <li><a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html">Thanks</a></li> <li><a class="dropdown-item" href="https://www.apache.org/security/">Security</a></li> <li><a class="dropdown-item" href="https://www.apache.org/events/current-event">Event</a></li> </ul> </li> </ul> </div> </nav> <section class="hero-banner position-relative"> <div class="bg"></div> <div class="container position-relative"> <div class="container pt-5 pb-5"> <div class="row"> <div class="col-12 col-md-12 col-lg-7"> <h1 style="max-width: 680px;">Unified engine for large-scale data analytics</h1> <a href="/docs/latest/quick-start.html" class="btn btn-cta">Get Started</a> </div> </div> <div class="row mt-5"> <div class="col-12 col-lg-6"> <h2>What is Apache Spark<span class="tm" style="bottom: 14px;">&trade;</span>?</h2> <div class="what-is-spark">Apache Spark<span class="tm" style="bottom: 7px;">&trade;</span> is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. </div> </div> </div> </div> </div> </section> <section style="min-height:810px; background-color: #FFF6ED;"> <div class="container pt-4 pb-4"> <div class="row"> <div class="col-12 col-md-7 col-lg-5" style="position: relative;"> <div class="spark-star-bg"></div> <div class="apache-spark-motto">Simple.<br />Fast.<br />Scalable.<br />Unified.</div> </div> <div class="col-12 col-md-5 col-lg-7 mt-4 p-md-0"> <div class="row"> <div class="col-12 mb-5" style="font-style: normal;font-weight: bold;font-size: 32px;line-height: 42px;"> Key features </div> <div class="row"> <div class="col-12 col-lg-6 features"> <img class="icon" src="/images/batch-sstreaming-data-icon.svg" width="75" height="75" alt="Batch/streaming data" /> <div class="title">Batch/streaming data</div> <div class="details">Unify the processing of your data in batches and real-time streaming, using your preferred language: Python, SQL, Scala, Java or R. </div> </div> <div class="col-12 col-lg-6 mt-5 mt-lg-0 ms-auto features"> <img class="icon" src="/images/sql-analytics-icon.svg" width="75" height="75" alt="SQL analytics" /> <div class="title">SQL analytics</div> <div class="details">Execute fast, distributed ANSI SQL queries for dashboarding and ad-hoc reporting. Runs faster than most data warehouses. </div> </div> </div> <div class="row mt-lg-5"> <div class="col-12 col-lg-6 mt-5 mt-lg-0 features"> <img class="icon" src="/images/data-science-scale-icon.svg" width="75" height="75" alt="Data science at scale" /> <div class="title">Data science at scale</div> <div class="details">Perform Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling </div> </div> <div class="col-12 col-lg-6 mt-5 mt-lg-0 ms-auto features"> <img class="icon" src="/images/machine-learning-icon.svg" width="75" height="75" alt="Machine Learning" /> <div class="title">Machine learning</div> <div class="details">Train machine learning algorithms on a laptop and use the same code to scale to fault-tolerant clusters of thousands of machines. </div> </div> </div> </div> </div> </div> </div> </section> <section class="spark-run-now"> <nav> <div class="container-md"> <div class="row nav nav-tabs" id="nav-tab" role="tablist"> <button class="col-12 col-md-3 col-lg-2 nav-link border-end active" id="nav-python-tab" data-bs-toggle="tab" data-bs-target="#nav-python" type="button" role="tab" aria-controls="nav-python" aria-selected="true">Python </button> <button class="col-12 col-md-2 nav-link border-end" id="nav-sql-tab" data-bs-toggle="tab" data-bs-target="#nav-sql" type="button" role="tab" aria-controls="nav-sql" aria-selected="false">SQL </button> <button class="col-12 col-md-2 nav-link border-end" id="nav-scala-tab" data-bs-toggle="tab" data-bs-target="#nav-scala" type="button" role="tab" aria-controls="nav-scala" aria-selected="false">Scala </button> <button class="col-12 col-md-2 nav-link border-end" id="nav-java-tab" data-bs-toggle="tab" data-bs-target="#nav-java" type="button" role="tab" aria-controls="nav-java" aria-selected="false">Java </button> <button class="col-12 col-md-2 nav-link" id="nav-r-tab" data-bs-toggle="tab" data-bs-target="#nav-r" type="button" role="tab" aria-controls="nav-r" aria-selected="false">R </button> </div> </div> </nav> <div class="container"> <div class="tab-content py-5 spark-install" id="nav-tabContent"> <div class="tab-pane fade show active" id="nav-python" role="tabpanel" aria-labelledby="nav-python-tab"> <div class="mb-2 title">Run now</div> <div style="font-size: 16px;">Install with 'pip' </div> <div class="code"> <p>$ pip install pyspark</p> <p>$ pyspark</p> </div> <div style="font-size: 16px;">Use the official Docker image </div> <div class="code"> <p>$ docker run -it --rm spark:python3 /opt/spark/bin/pyspark</p> </div> <div class="examples mt-5"> <div class="window"><span class="circle red"></span><span class="circle yellow"></span><span class="circle green"></span></div> <nav class="container"> <div class="row nav nav-tabs" id="nav-exampleTab" role="tablist"> <button class="col-12 col-md-4 nav-link active" id="nav-quick_start-tab" data-bs-toggle="tab" data-bs-target="#nav-quick_start" type="button" role="tab" aria-controls="nav-quick_start" aria-selected="true">QuickStart </button> <button class="col-12 col-md-4 nav-link" id="nav-machine_learning-tab" data-bs-toggle="tab" data-bs-target="#nav-machine_learning" type="button" role="tab" aria-controls="nav-machine_learning" aria-selected="false">Machine Learning </button> <button class="col-12 col-md-4 nav-link" id="nav-ad-tab" data-bs-toggle="tab" data-bs-target="#nav-analytics" type="button" role="tab" aria-controls="nav-analytics" aria-selected="false">Analytics &amp; Data Science </button> </div> </nav> <div class="tab-content spark-install" id="nav-exampleContent"> <div class="tab-pane show active" id="nav-quick_start" role="tabpanel" aria-labelledby="nav-quick_start-tab"> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">json</span><span class="p">(</span><span class="s">"logs.json"</span><span class="p">)</span> <span class="n">df</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="s">"age &gt; 21"</span><span class="p">).</span><span class="n">select</span><span class="p">(</span><span class="s">"name.first"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span></code></pre></figure> </div> <div class="tab-pane" id="nav-machine_learning" role="tabpanel" aria-labelledby="nav-machine_learning-tab"> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># Every record contains a label and feature vector </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="p">[</span><span class="s">"label"</span><span class="p">,</span> <span class="s">"features"</span><span class="p">])</span> <span class="c1"># Split the data into train/test datasets </span><span class="n">train_df</span><span class="p">,</span> <span class="n">test_df</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">randomSplit</span><span class="p">([.</span><span class="mi">80</span><span class="p">,</span> <span class="p">.</span><span class="mi">20</span><span class="p">],</span> <span class="n">seed</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span> <span class="c1"># Set hyperparameters for the algorithm </span><span class="n">rf</span> <span class="o">=</span> <span class="n">RandomForestRegressor</span><span class="p">(</span><span class="n">numTrees</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span> <span class="c1"># Fit the model to the training data </span><span class="n">model</span> <span class="o">=</span> <span class="n">rf</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_df</span><span class="p">)</span> <span class="c1"># Generate predictions on the test dataset. </span><span class="n">model</span><span class="p">.</span><span class="n">transform</span><span class="p">(</span><span class="n">test_df</span><span class="p">).</span><span class="n">show</span><span class="p">()</span></code></pre></figure> </div> <div class="tab-pane" id="nav-analytics" role="tabpanel" aria-labelledby="nav-ad-tab"> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">csv</span><span class="p">(</span><span class="s">"accounts.csv"</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="c1"># Select subset of features and filter for balance &gt; 0 </span><span class="n">filtered_df</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"AccountBalance"</span><span class="p">,</span> <span class="s">"CountOfDependents"</span><span class="p">).</span><span class="nb">filter</span><span class="p">(</span><span class="s">"AccountBalance &gt; 0"</span><span class="p">)</span> <span class="c1"># Generate summary statistics </span><span class="n">filtered_df</span><span class="p">.</span><span class="n">summary</span><span class="p">().</span><span class="n">show</span><span class="p">()</span></code></pre></figure> </div> </div> </div> </div> <div class="tab-pane fade" id="nav-sql" role="tabpanel" aria-labelledby="nav-sql-tab"> <div class="mb-2 title">Run now</div> <div class="code"> <p>$ docker run -it --rm spark /opt/spark/bin/spark-sql</p> <p>spark-sql&gt;</p> </div> <div class="examples mt-5"> <div class="window"><span class="circle red"></span><span class="circle yellow"></span><span class="circle green"></span></div> <div class="spark-code"> <figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">SELECT</span> <span class="n">name</span><span class="p">.</span><span class="k">first</span> <span class="k">AS</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">name</span><span class="p">.</span><span class="k">last</span> <span class="k">AS</span> <span class="n">last_name</span><span class="p">,</span> <span class="n">age</span> <span class="k">FROM</span> <span class="n">json</span><span class="p">.</span><span class="nv">`logs.json`</span> <span class="k">WHERE</span> <span class="n">age</span> <span class="o">&gt;</span> <span class="mi">21</span><span class="p">;</span></code></pre></figure> </div> </div> </div> <div class="tab-pane fade" id="nav-scala" role="tabpanel" aria-labelledby="nav-scala-tab"> <div class="mb-2 title">Run now</div> <div class="code"> <p>$ docker run -it --rm spark /opt/spark/bin/spark-shell</p> <p>scala&gt;</p> </div> <div class="examples mt-5"> <div class="window"><span class="circle red"></span><span class="circle yellow"></span><span class="circle green"></span></div> <div class="spark-code"> <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="nv">df</span> <span class="k">=</span> <span class="nv">spark</span><span class="o">.</span><span class="py">read</span><span class="o">.</span><span class="py">json</span><span class="o">(</span><span class="s">"logs.json"</span><span class="o">)</span> <span class="nv">df</span><span class="o">.</span><span class="py">where</span><span class="o">(</span><span class="s">"age &gt; 21"</span><span class="o">)</span> <span class="o">.</span><span class="py">select</span><span class="o">(</span><span class="s">"name.first"</span><span class="o">).</span><span class="py">show</span><span class="o">()</span></code></pre></figure> </div> </div> </div> <div class="tab-pane fade" id="nav-java" role="tabpanel" aria-labelledby="nav-java-tab"> <div class="mb-2 title">Run now</div> <div class="code"> <p>$ docker run -it --rm spark /opt/spark/bin/spark-shell</p> <p>scala&gt;</p> </div> <div class="examples mt-5"> <div class="window"><span class="circle red"></span><span class="circle yellow"></span><span class="circle green"></span></div> <div class="spark-code"> <figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">Dataset</span> <span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">json</span><span class="o">(</span><span class="s">"logs.json"</span><span class="o">);</span> <span class="n">df</span><span class="o">.</span><span class="na">where</span><span class="o">(</span><span class="s">"age &gt; 21"</span><span class="o">)</span> <span class="o">.</span><span class="na">select</span><span class="o">(</span><span class="s">"name.first"</span><span class="o">).</span><span class="na">show</span><span class="o">();</span></code></pre></figure> </div> </div> </div> <div class="tab-pane fade" id="nav-r" role="tabpanel" aria-labelledby="nav-r-tab"> <div class="mb-2 title">Run now</div> <div class="code"> <p>$ docker run -it --rm spark:r /opt/spark/bin/sparkR</p> <p>&gt;</p> </div> <div class="examples mt-5"> <div class="window"><span class="circle red"></span><span class="circle yellow"></span><span class="circle green"></span></div> <div class="spark-code"> <figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">df</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read.json</span><span class="p">(</span><span class="n">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"logs.json"</span><span class="p">)</span><span class="w"> </span><span class="n">df</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">df</span><span class="p">,</span><span class="w"> </span><span class="n">df</span><span class="o">$</span><span class="n">age</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">21</span><span class="p">)</span><span class="w"> </span><span class="n">head</span><span class="p">(</span><span class="n">select</span><span class="p">(</span><span class="n">df</span><span class="p">,</span><span class="w"> </span><span class="n">df</span><span class="o">$</span><span class="n">name.first</span><span class="p">))</span></code></pre></figure> </div> </div> </div> </div> </div> </section> <section class="py-5" style="background: rgba(186, 231, 253, 0.17);"> <div class="container text-center"> <div class="row" style="line-height: 42px;"> <div style="font-style: normal;font-weight: bold;font-size: 32px;line-height: 42px;text-align: center;">The most widely-used engine for scalable computing </div> <div style="font-style: normal;font-weight: normal;font-size: 24px;line-height: 30px;text-align: center;">Thousands of companies, including 80% of the Fortune 500, use Apache Spark<span class="tm">&trade;</span>.<br />Over 2,000 contributors to the open source project from industry and academia. </div> </div> </div> </section> <section class="py-5 text-center"> <div class="container"> <div class="row col-12 col-md-12 col-lg-7 mx-auto"> <div style="font-style: normal;font-weight: bold;font-size: 32px;line-height: 42px;text-align: center;">Ecosystem</div> <div style="font-style: normal;font-weight: normal;font-size: 24px;line-height: 30px;text-align: center;">Apache Spark<span class="tm">&trade;</span> integrates with your favorite frameworks, helping to scale them to thousands of machines. </div> </div> <div class="row" style="margin-top: 100px"> <div class="col-12 col-md-12 col-lg-5 mx-auto"> <div class="text-center ecosystem-title">Data science and Machine learning </div> <div class="row d-flex align-items-center"> <div class="col-12 col-md-4"><img src="/images/scikit-learn.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/pandas.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/tf_logo_social.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/pytorch.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/mlflow-logo.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/r_logo.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/numpy.png" height="90" width="166" /></div> </div> </div> <div class="col-12 col-md-12 col-lg-5 mt-5 mt-md-5 mt-lg-0 mx-auto"> <div class="text-center ecosystem-title">SQL analytics and BI</div> <div class="row d-flex align-items-center"> <div class="col-12 col-md-4"><img src="/images/superset.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/PowerBI-Logo-Square-Insight-Platforms.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/looker_logo.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/redash.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/tableau-logo-tableau-software.png" height="90" width="166" /></div> <div class="col-12 col-md-4"><img src="/images/dbt.png" height="90" width="166" /></div> </div> </div> <div class="col-12 col-md-12 col-lg-10 mt-5 pt-5 mx-auto"> <div class="text-center ecosystem-title">Storage and Infrastructure</div> <div class="row d-flex align-items-center"> <div class="col"><img src="/images/Elasticsearch.png" height="90" width="166" /></div> <div class="col"><img src="/images/mongo.png" height="90" width="166" /></div> <div class="col"><img src="/images/kafka.png" height="90" width="166" /></div> <div class="col"><img src="/images/delta-lake-logo.png" height="90" width="166" /></div> <div class="col"><img src="/images/kubernetes-horizontal-color.png" height="90" width="166" /></div> <div class="col"><img src="/images/AirflowLogo.png" height="90" width="166" /></div> <div class="col"><img src="/images/Apache_Parquet_logo.png" height="90" width="166" /></div> <div class="col"><img src="/images/sqlserver.png" height="90" width="166" /></div> <div class="col"><img src="/images/1280px-Cassandra_logo.png" height="90" width="166" /></div> <div class="col"><img src="/images/Apache_Orc_logo.png" height="90" width="166" /></div> </div> </div> </div> </div> </section> <section class="py-5 text-center"> <div class="container"> <div class="row"> <div style="font-style: normal;font-weight: bold;font-size: 32px;line-height: 54px;text-align: center;">Spark SQL engine: under the hood</div> <div style="font-style: normal;font-weight: normal;font-size: 24px;line-height: 42px;text-align: center;">Apache Spark<span class="tm">&trade;</span> is built on an advanced distributed SQL engine for large-scale data </div> </div> </div> </section> <section class="py-5"> <div class="container"> <div class="row"> <div class="col-12 col-lg-5 order-2 order-lg-1 mt-5 mt-lg-0" style="font-size: 19px;line-height: 33px;"> <div class="scalable-data-science"> <a href="/docs/latest/sql-performance-tuning.html#adaptive-query-execution" alt="Adaptive Query Execution">Adaptive Query Execution</a> <p>Spark SQL adapts the execution plan at runtime, such as automatically setting the number of reducers and join algorithms.</p> </div> <div class="mt-5 scalable-data-science"> <a href="/docs/latest/sql-ref-ansi-compliance.html" alt="Support for ANSI SQL">Support for ANSI SQL</a> <p>Use the same SQL you鈥檙e already comfortable with.</p> </div> <div class="mt-5 scalable-data-science"> <a href="/docs/latest/sql-data-sources-json.html" alt="Structured and unstructured data">Structured and unstructured data</a> <p>Spark SQL works on structured tables and unstructured data such as JSON or images.</p> </div> </div> <div class="col-12 col-lg-6 order-1 order-lg-21 text-center ms-auto"> <div class="fw-bold mb-2">TPC-DS 1TB No-Stats With vs. Without Adaptive Query Execution</div> <img src="/images/AQE-compersion.png" width="100%" /> <div class="fw-bold mt-2" style="font-size: 18px;">Accelerates TPC-DS queries up to <span style="color: #f55b15;">8x</span></div> </div> </div> </div> </section> <section class="py-5"> <div class="container"> <div class="row text-center"> <div style="font-style: normal;font-weight: bold;font-size: 32px;line-height: 42px;text-align: center;">Join the community</div> <div style="font-style: normal;font-weight: normal;font-size: 24px;line-height: 30px;text-align: center;" class="col-9 mx-auto">Spark has a thriving open source community, with contributors from around the globe building features, documentation and assisting other users. </div> </div> <div class="row mt-5"> <div class="col-12 col-sm-4 p-3"> <div class="card"> <a href="/community.html"> <div class="card-body text-center text-xl-start"> <img class="d-block d-xl-inline-block m-auto" src="/images/icon-orange-mailing-list.svg" width="96" height="96" alt="Mailing list" /> Mailing list </div> </a> </div> </div> <div class="col-12 col-sm-4 p-3"> <div class="card"> <a href="https://github.com/apache/spark"> <div class="card-body text-center text-xl-start"> <img class="d-block d-xl-inline-block mx-auto" src="/images/icon-orange-built-in-functions.svg" width="96" height="96" alt="Source code" /> Source code </div> </a> </div> </div> <div class="col-12 col-sm-4 p-3"> <div class="card"> <a href="/news/"> <div class="card-body text-center text-xl-start"> <img class="d-block d-xl-inline-block mx-auto" src="/images/icon-orange-Delta-Table.svg" width="96" height="96" alt="News and events" /> News and events </div> </a> </div> </div> <div class="col-12 col-sm-4 p-3"> <div class="card"> <a href="/contributing.html"> <div class="card-body text-center text-xl-start"> <img class="d-block d-xl-inline-block mx-auto" src="/images/icon-orange-Collaborative.svg" width="96" height="96" alt="How to contribute" /> How to contribute </div> </a> </div> </div> <div class="col-12 col-sm-4 p-3"> <div class="card"> <a href="https://issues.apache.org/jira/projects/SPARK/issues"> <div class="card-body text-center text-xl-start"> <img class="d-block d-xl-inline-block mx-auto" src="/images/icon-orange-Scheduled-Jobs.svg" width="96" height="96" alt="Issue tracking" /> Issue tracking </div> </a> </div> </div> <div class="col-12 col-sm-4 p-3"> <div class="card"> <a href="/committers.html"> <div class="card-body text-center text-xl-start"> <img class="d-block d-xl-inline-block mx-auto" src="/images/icon-orange-data-engineer-persona.svg" width="96" height="96" alt="Committers" /> Committers </div> </a> </div> </div> </div> </div> </section> <footer class="container row mx-auto"> <hr> Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. See guidance on use of Apache Spark <a href="/trademarks.html">trademarks</a>. All other marks mentioned may be trademarks or registered trademarks of their respective owners. Copyright 漏 2018 The Apache Software Foundation, Licensed under the <a href="https://www.apache.org/licenses/">Apache License, Version 2.0</a>. </footer> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js" integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM" crossorigin="anonymous"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10