CINXE.COM

Spark Structured Streaming | Apache Spark

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title> Spark Structured Streaming | Apache Spark </title> <meta name="description" content="Spark Structured Streaming makes it easy to build streaming applications and pipelines with the same and familiar Spark APIs."> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> <link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> <link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700&Courier+Prime:wght@400;700&display=swap" rel="stylesheet"> <link href="/css/custom.css" rel="stylesheet"> <!-- Code highlighter CSS --> <link href="/css/pygments-default.css" rel="stylesheet"> <link rel="icon" href="/favicon.ico" type="image/x-icon"> <!-- Matomo --> <script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="https://analytics.apache.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '40']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script> <!-- End Matomo Code --> </head> <body class="global"> <nav class="navbar navbar-expand-lg navbar-dark p-0 px-4" style="background: #1D6890;"> <a class="navbar-brand" href="/"> <img src="/images/spark-logo-rev.svg" alt="" width="141" height="72"> </a> <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarContent" aria-controls="navbarContent" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="collapse navbar-collapse col-md-12 col-lg-auto pt-4" id="navbarContent"> <ul class="navbar-nav me-auto"> <li class="nav-item"> <a class="nav-link active" aria-current="page" href="/downloads.html">Download</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="libraries" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Libraries </a> <ul class="dropdown-menu" aria-labelledby="libraries"> <li><a class="dropdown-item" href="/sql/">SQL and DataFrames</a></li> <li><a class="dropdown-item" href="/spark-connect/">Spark Connect</a></li> <li><a class="dropdown-item" href="/streaming/">Spark Streaming</a></li> <li><a class="dropdown-item" href="/pandas-on-spark/">pandas on Spark</a></li> <li><a class="dropdown-item" href="/mllib/">MLlib (machine learning)</a></li> <li><a class="dropdown-item" href="/graphx/">GraphX (graph)</a></li> <li> <hr class="dropdown-divider"> </li> <li><a class="dropdown-item" href="/third-party-projects.html">Third-Party Projects</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="documentation" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Documentation </a> <ul class="dropdown-menu" aria-labelledby="documentation"> <li><a class="dropdown-item" href="/docs/latest/">Latest Release</a></li> <li><a class="dropdown-item" href="/documentation.html">Older Versions and Other Resources</a></li> <li><a class="dropdown-item" href="/faq.html">Frequently Asked Questions</a></li> </ul> </li> <li class="nav-item"> <a class="nav-link active" aria-current="page" href="/examples.html">Examples</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="community" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Community </a> <ul class="dropdown-menu" aria-labelledby="community"> <li><a class="dropdown-item" href="/community.html">Mailing Lists &amp; Resources</a></li> <li><a class="dropdown-item" href="/contributing.html">Contributing to Spark</a></li> <li><a class="dropdown-item" href="/improvement-proposals.html">Improvement Proposals (SPIP)</a> </li> <li><a class="dropdown-item" href="https://issues.apache.org/jira/browse/SPARK">Issue Tracker</a> </li> <li><a class="dropdown-item" href="/powered-by.html">Powered By</a></li> <li><a class="dropdown-item" href="/committers.html">Project Committers</a></li> <li><a class="dropdown-item" href="/history.html">Project History</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="developers" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Developers </a> <ul class="dropdown-menu" aria-labelledby="developers"> <li><a class="dropdown-item" href="/developer-tools.html">Useful Developer Tools</a></li> <li><a class="dropdown-item" href="/versioning-policy.html">Versioning Policy</a></li> <li><a class="dropdown-item" href="/release-process.html">Release Process</a></li> <li><a class="dropdown-item" href="/security.html">Security</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="github" role="button" data-bs-toggle="dropdown" aria-expanded="false"> GitHub </a> <ul class="dropdown-menu" aria-labelledby="github"> <li><a class="dropdown-item" href="https://github.com/apache/spark">spark</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-connect-go">spark-connect-go</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-docker">spark-docker</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-kubernetes-operator">spark-kubernetes-operator</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-website">spark-website</a></li> </ul> </li> </ul> <ul class="navbar-nav ml-auto"> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="apacheFoundation" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Apache Software Foundation </a> <ul class="dropdown-menu" aria-labelledby="apacheFoundation"> <li><a class="dropdown-item" href="https://www.apache.org/">Apache Homepage</a></li> <li><a class="dropdown-item" href="https://www.apache.org/licenses/">License</a></li> <li><a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> <li><a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html">Thanks</a></li> <li><a class="dropdown-item" href="https://www.apache.org/security/">Security</a></li> <li><a class="dropdown-item" href="https://www.apache.org/events/current-event">Event</a></li> </ul> </li> </ul> </div> </nav> <div class="container"> <div class="row mt-4"> <div class="col-12 col-md-9"> <div class="jumbotron"> <a href="/docs/latest/structured-streaming-programming-guide.html">Spark Structured Streaming</a> makes it easy to build streaming applications and pipelines with the same and familiar Spark APIs. </div> <div class="row row-padded"> <div class="col-md-7 col-sm-7"> <h2>Easy to use</h2> <p> Spark Structured Streaming abstracts away complex streaming concepts such as incremental processing, checkpointing, and watermarks so that you can build streaming applications and pipelines without learning any new concepts or tools. </p> </div> <div class="col-md-5 col-sm-5 col-padded-top col-center"> <div style="margin-top: 15px; text-align: left; display: inline-block;"> <div class="code"> spark<br /> &nbsp;&nbsp;.<span class="sparkop">readStream</span><br /> &nbsp;&nbsp;.<span class="sparkop">select</span>(<span class="closure">$"value"</span>.cast(<span class="closure">"string"</span>).alias(<span class="closure">"jsonData"</span>))<br /> &nbsp;&nbsp;.<span class="sparkop">select</span>(from_json(<span class="closure">$"jsonData"</span>,jsonSchema).alias(<span class="closure">"payload"</span>))<br /> &nbsp;&nbsp;.<span class="sparkop">writeStream</span><br /> &nbsp;&nbsp;.<span class="sparkop">trigger</span>(<span class="closure">"1 seconds"</span>)<br /> &nbsp;&nbsp;.<span class="sparkop">start</span>() </div> </div> </div> </div> <div class="row row-padded"> <div class="col-md-7 col-sm-7"> <h2>Unified batch and streaming APIs</h2> <p> Spark Structured Streaming provides the same structured APIs (DataFrames and Datasets) as Spark so that you don鈥檛 need to develop on or maintain two different technology stacks for batch and streaming. In addition, unified APIs make it easy to migrate your existing batch Spark jobs to streaming jobs. </p> </div> <div class="col-md-5 col-sm-5 col-padded-top col-center"> <div style="width: 100%; max-width: 300px; display: inline-block;"> <img src="/images/spark-unified-batch-and-streaming.png" style="width: 100%; max-width: 300px;" /> </div> </div> </div> <div class="row row-padded"> <div class="col-md-7 col-sm-7"> <h2>Low latency and cost effective</h2> <p> Spark Structured Streaming uses the same underlying architecture as Spark so that you can take advantage of all the performance and cost optimizations built into the Spark engine. With Spark Structured Streaming, you can build low latency streaming applications and pipelines cost effectively. </p> </div> <div class="col-md-5 col-sm-5 col-padded-top col-center"> <div style="width: 100%; max-width: 300px; display: inline-block;"> <img src="/images/spark-structured-streaming-incremental-execution.png" style="width: 100%; max-width: 300px;" /> </div> </div> </div> <div class="row"> <div class="col-md-4 col-padded"> <h3>Getting started</h3> <p> To get started with Spark Structured Streaming: </p> <ul class="list-narrow"> <li><a href="/downloads.html">Download Spark</a>. It includes Structured Streaming as a module.</li> <li>Read the <a href="/docs/latest/structured-streaming-programming-guide.html">Spark Structured Streaming programming guide</a>, which includes programming models, tutorials, configurations, etc.</li> </ul> </div> <div class="col-md-4 col-padded"> <h3>Community</h3> <p> Spark Structured Streaming is developed as part of Apache Spark. It thus gets tested and updated with each Spark release. </p> <p> If you have questions about the system, ask on the <a href="/community.html#mailing-lists">Spark mailing lists</a>. </p> <p> The Spark Structured Streaming developers welcome contributions. If you'd like to help out, read <a href="/contributing.html">how to contribute to Spark</a>, and send us a patch! </p> </div> </div> <div class="row"> <div class="col-sm-12 col-center"> <a href="/downloads.html" class="btn btn-cta btn-lg btn-multiline"> Download Apache Spark<br /><span class="small">Includes Spark Structured Streaming</span> </a> </div> </div> </div> <div class="col-12 col-md-3"> <div class="news" style="margin-bottom: 20px;"> <h5>Latest News</h5> <ul class="list-unstyled"> <li><a href="/news/spark-3-4-4-released.html">Spark 3.4.4 released</a> <span class="small">(Oct 27, 2024)</span></li> <li><a href="/news/spark-4.0.0-preview2.html">Preview release of Spark 4.0</a> <span class="small">(Sep 26, 2024)</span></li> <li><a href="/news/spark-3-5-3-released.html">Spark 3.5.3 released</a> <span class="small">(Sep 24, 2024)</span></li> <li><a href="/news/spark-3-5-2-released.html">Spark 3.5.2 released</a> <span class="small">(Aug 10, 2024)</span></li> </ul> <p class="small" style="text-align: right;"><a href="/news/index.html">Archive</a></p> </div> <div style="text-align:center; margin-bottom: 20px;"> <a href="https://www.apache.org/events/current-event.html"> <img src="https://www.apache.org/events/current-event-234x60.png" style="max-width: 100%;"/> </a> </div> <div class="hidden-xs hidden-sm"> <a href="/downloads.html" class="btn btn-cta btn-lg d-grid" style="margin-bottom: 30px;"> Download Spark </a> <p style="font-size: 16px; font-weight: 500; color: #555;"> Built-in Libraries: </p> <ul class="list-none"> <li><a href="/sql/">SQL and DataFrames</a></li> <li><a href="/streaming/">Spark Streaming</a></li> <li><a href="/mllib/">MLlib (machine learning)</a></li> <li><a href="/graphx/">GraphX (graph)</a></li> </ul> <a href="/third-party-projects.html">Third-Party Projects</a> </div> </div> </div> <footer class="small"> <hr> Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. See guidance on use of Apache Spark <a href="/trademarks.html">trademarks</a>. All other marks mentioned may be trademarks or registered trademarks of their respective owners. Copyright &copy; 2018 The Apache Software Foundation, Licensed under the <a href="https://www.apache.org/licenses/">Apache License, Version 2.0</a>. </footer> </div> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js" integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM" crossorigin="anonymous"></script> <script src="https://code.jquery.com/jquery.js"></script> <script src="/js/lang-tabs.js"></script> <script src="/js/downloads.js"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10