CINXE.COM

Release Process | Apache Spark

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title> Release Process | Apache Spark </title> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> <link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> <link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700&Courier+Prime:wght@400;700&display=swap" rel="stylesheet"> <link href="/css/custom.css" rel="stylesheet"> <!-- Code highlighter CSS --> <link href="/css/pygments-default.css" rel="stylesheet"> <link rel="icon" href="/favicon.ico" type="image/x-icon"> <!-- Matomo --> <script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="https://analytics.apache.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '40']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script> <!-- End Matomo Code --> </head> <body class="global"> <nav class="navbar navbar-expand-lg navbar-dark p-0 px-4" style="background: #1D6890;"> <a class="navbar-brand" href="/"> <img src="/images/spark-logo-rev.svg" alt="" width="141" height="72"> </a> <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarContent" aria-controls="navbarContent" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="collapse navbar-collapse col-md-12 col-lg-auto pt-4" id="navbarContent"> <ul class="navbar-nav me-auto"> <li class="nav-item"> <a class="nav-link active" aria-current="page" href="/downloads.html">Download</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="libraries" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Libraries </a> <ul class="dropdown-menu" aria-labelledby="libraries"> <li><a class="dropdown-item" href="/sql/">SQL and DataFrames</a></li> <li><a class="dropdown-item" href="/spark-connect/">Spark Connect</a></li> <li><a class="dropdown-item" href="/streaming/">Spark Streaming</a></li> <li><a class="dropdown-item" href="/pandas-on-spark/">pandas on Spark</a></li> <li><a class="dropdown-item" href="/mllib/">MLlib (machine learning)</a></li> <li><a class="dropdown-item" href="/graphx/">GraphX (graph)</a></li> <li> <hr class="dropdown-divider"> </li> <li><a class="dropdown-item" href="/third-party-projects.html">Third-Party Projects</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="documentation" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Documentation </a> <ul class="dropdown-menu" aria-labelledby="documentation"> <li><a class="dropdown-item" href="/docs/latest/">Latest Release</a></li> <li><a class="dropdown-item" href="/documentation.html">Older Versions and Other Resources</a></li> <li><a class="dropdown-item" href="/faq.html">Frequently Asked Questions</a></li> </ul> </li> <li class="nav-item"> <a class="nav-link active" aria-current="page" href="/examples.html">Examples</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="community" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Community </a> <ul class="dropdown-menu" aria-labelledby="community"> <li><a class="dropdown-item" href="/community.html">Mailing Lists &amp; Resources</a></li> <li><a class="dropdown-item" href="/contributing.html">Contributing to Spark</a></li> <li><a class="dropdown-item" href="/improvement-proposals.html">Improvement Proposals (SPIP)</a> </li> <li><a class="dropdown-item" href="https://issues.apache.org/jira/browse/SPARK">Issue Tracker</a> </li> <li><a class="dropdown-item" href="/powered-by.html">Powered By</a></li> <li><a class="dropdown-item" href="/committers.html">Project Committers</a></li> <li><a class="dropdown-item" href="/history.html">Project History</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="developers" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Developers </a> <ul class="dropdown-menu" aria-labelledby="developers"> <li><a class="dropdown-item" href="/developer-tools.html">Useful Developer Tools</a></li> <li><a class="dropdown-item" href="/versioning-policy.html">Versioning Policy</a></li> <li><a class="dropdown-item" href="/release-process.html">Release Process</a></li> <li><a class="dropdown-item" href="/security.html">Security</a></li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="github" role="button" data-bs-toggle="dropdown" aria-expanded="false"> GitHub </a> <ul class="dropdown-menu" aria-labelledby="github"> <li><a class="dropdown-item" href="https://github.com/apache/spark">spark</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-connect-go">spark-connect-go</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-docker">spark-docker</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-kubernetes-operator">spark-kubernetes-operator</a></li> <li><a class="dropdown-item" href="https://github.com/apache/spark-website">spark-website</a></li> </ul> </li> </ul> <ul class="navbar-nav ml-auto"> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="apacheFoundation" role="button" data-bs-toggle="dropdown" aria-expanded="false"> Apache Software Foundation </a> <ul class="dropdown-menu" aria-labelledby="apacheFoundation"> <li><a class="dropdown-item" href="https://www.apache.org/">Apache Homepage</a></li> <li><a class="dropdown-item" href="https://www.apache.org/licenses/">License</a></li> <li><a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> <li><a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html">Thanks</a></li> <li><a class="dropdown-item" href="https://www.apache.org/security/">Security</a></li> <li><a class="dropdown-item" href="https://www.apache.org/events/current-event">Event</a></li> </ul> </li> </ul> </div> </nav> <div class="container"> <div class="row mt-4"> <div class="col-12 col-md-9"> <h1>Preparing Spark releases</h1> <p>The release manager role in Spark means you are responsible for a few different things:</p> <ul> <li><a href="#preparing-your-setup">Preparing your setup</a> <ul> <li><a href="#preparing-gpg-key">Preparing gpg key</a> <ul> <li><a href="#generate-key">Generate key</a></li> <li><a href="#upload-key">Upload key</a></li> <li><a href="#update-keys-file-with-your-code-signing-key">Update KEYS file with your code signing key</a></li> </ul> </li> <li><a href="#installing-docker">Installing Docker</a></li> </ul> </li> <li><a href="#preparing-for-release-candidates">Preparing for release candidates</a> <ul> <li><a href="#cutting-a-release-candidate">Cutting a release candidate</a></li> <li>Informing the community of timing</li> <li>Working with component leads to clean up JIRA</li> <li>Making code changes in that branch with necessary version updates</li> </ul> </li> <li>Running the voting process for a release: <ul> <li><a href="#creating-release-candidates-using-automated-tooling">Creating release candidates using automated tooling</a></li> <li><a href="https://s.apache.org/spark-jira-versions">Triaging issues</a></li> <li><a href="#call-a-vote-on-the-release-candidate">Call a vote on the release candidate</a></li> </ul> </li> <li><a href="#finalize-the-release">Finalizing and posting a release</a> <ul> <li><a href="#upload-to-apache-release-directory">Upload to Apache release directory</a></li> <li><a href="#upload-to-pypi">Upload to PyPI</a></li> <li><a href="#remove-rc-artifacts-from-repositories">Remove RC artifacts from repositories</a></li> <li><a href="#remove-old-releases-from-mirror-network">Remove old releases from Mirror Network</a></li> <li><a href="#update-the-apache-spark-repository">Update the Apache Spark<sup>TM</sup> repository</a></li> <li><a href="#update-the-configuration-of-algolia-crawler">Update the configuration of Algolia Crawler</a></li> <li><a href="#update-the-spark-website">Update the Spark website</a> <ul> <li><a href="#upload-generated-docs">Upload generated docs</a></li> <li><a href="#update-the-rest-of-the-spark-website">Update the rest of the Spark website</a></li> </ul> </li> <li><a href="#create-and-upload-spark-docker-images">Create and upload Spark Docker Images</a></li> <li><a href="#create-an-announcement">Create an announcement</a></li> </ul> </li> </ul> <h2 id="preparing-your-setup">Preparing your setup</h2> <p>If you are a new Release Manager, you can read up on the process from the followings:</p> <ul> <li>release signing <a href="https://www.apache.org/dev/release-signing.html">https://www.apache.org/dev/release-signing.html</a></li> <li>gpg for signing <a href="https://www.apache.org/dev/openpgp.html">https://www.apache.org/dev/openpgp.html</a></li> <li>svn <a href="https://infra.apache.org/version-control.html#svn">https://infra.apache.org/version-control.html#svn</a></li> </ul> <h3 id="preparing-gpg-key">Preparing gpg key</h3> <p>You can skip this section if you have already uploaded your key.</p> <h4 id="generate-key">Generate key</h4> <p>Here&#8217;s an example of gpg 2.0.12. If you use gpg version 1 series, please refer to <a href="https://www.apache.org/dev/openpgp.html#generate-key">generate-key</a> for details.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --full-gen-key gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Please select what kind of key you want: (1) RSA and RSA (default) (2) DSA and Elgamal (3) DSA (sign only) (4) RSA (sign only) Your selection? 1 RSA keys may be between 1024 and 4096 bits long. What keysize do you want? (2048) 4096 Requested keysize is 4096 bits Please specify how long the key should be valid. 0 = key does not expire &lt;n&gt; = key expires in n days &lt;n&gt;w = key expires in n weeks &lt;n&gt;m = key expires in n months &lt;n&gt;y = key expires in n years Key is valid for? (0) Key does not expire at all Is this correct? (y/N) y GnuPG needs to construct a user ID to identify your key. Real name: Robert Burrell Donkin Email address: rdonkin@apache.org Comment: CODE SIGNING KEY You selected this USER-ID: "Robert Burrell Donkin (CODE SIGNING KEY) &lt;rdonkin@apache.org&gt;" Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. gpg: key 04B3B5C426A27D33 marked as ultimately trusted gpg: revocation certificate stored as '/home/ubuntu/.gnupg/openpgp-revocs.d/08071B1E23C8A7E2CA1E891A04B3B5C426A27D33.rev' public and secret key created and signed. pub rsa4096 2021-08-19 [SC] 08071B1E23C8A7E2CA1E891A04B3B5C426A27D33 uid Jack (test) &lt;Jack@mail.com&gt; sub rsa4096 2021-08-19 [E] </code></pre></div></div> <p>Note that the last 8 digits (26A27D33) of the public key is the <a href="https://infra.apache.org/release-signing.html#key-id">key ID</a>.</p> <h4 id="upload-key">Upload key</h4> <p>After generating the public key, we should upload it to <a href="https://infra.apache.org/release-signing.html#keyserver">public key server</a>:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --keyserver hkps://keys.openpgp.org --send-key 26A27D33 </code></pre></div></div> <p>Please refer to <a href="https://infra.apache.org/release-signing.html#keyserver-upload">keyserver-upload</a> for details.</p> <h4 id="update-keys-file-with-your-code-signing-key">Update KEYS file with your code signing key</h4> <p>To get the code signing key (a.k.a ASCII-armored public key), run the command:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --export --armor 26A27D33 </code></pre></div></div> <p>And then append the generated key to the KEYS file by:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Move dev/ to release/ when the voting is completed. See Finalize the Release below svn co --depth=files "https://dist.apache.org/repos/dist/dev/spark" svn-spark # edit svn-spark/KEYS file svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Update KEYS" </code></pre></div></div> <p>If you want to do the release on another machine, you can transfer your secret key to that machine via the <code class="language-plaintext highlighter-rouge">gpg --export-secret-keys</code> and <code class="language-plaintext highlighter-rouge">gpg --import</code> commands.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="installing-docker">Installing Docker</h3> <p>The scripts to create release candidates are run through docker. You need to install docker before running these scripts. Please make sure that you can run docker as non-root users. See <a href="https://docs.docker.com/install/linux/linux-postinstall/">https://docs.docker.com/install/linux/linux-postinstall</a> for more details.</p> <h2 id="preparing-for-release-candidates">Preparing for release candidates</h2> <p>The main step towards preparing a release is to create a release branch. This is done via standard Git branching mechanism and should be announced to the community once the branch is created.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="cutting-a-release-candidate">Cutting a release candidate</h3> <p>If this is not the first RC, then make sure that the JIRA issues that have been solved since the last RC are marked as <code class="language-plaintext highlighter-rouge">Resolved</code> and has a <code class="language-plaintext highlighter-rouge">Target Versions</code> set to this release version.</p> <p>To track any issue with pending PR targeting this release, create a filter in JIRA with a query like this <code class="language-plaintext highlighter-rouge">project = SPARK AND "Target Version/s" = "12340470" AND status in (Open, Reopened, "In Progress")</code></p> <p>For target version string value to use, find the numeric value corresponds to the release by looking into an existing issue with that target version and click on the version (eg. find an issue targeting 2.2.1 and click on the version link of its Target Versions field)</p> <p>Verify from <code class="language-plaintext highlighter-rouge">git log</code> whether they are actually making it in the new RC or not. Check for JIRA issues with <code class="language-plaintext highlighter-rouge">release-notes</code> label, and make sure they are documented in relevant migration guide for breaking changes or in the release news on the website later.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="creating-release-candidates-using-automated-tooling">Creating release candidates using automated tooling</h3> <p>To cut a release candidate, there are 4 steps:</p> <ol> <li>Create a git tag for the release candidate.</li> <li>Package the release binaries &amp; sources, and upload them to the Apache staging SVN repo.</li> <li>Create the release docs, and upload them to the Apache staging SVN repo.</li> <li>Publish a snapshot to the Apache staging Maven repo.</li> </ol> <p>The process of cutting a release candidate has been automated via the <code class="language-plaintext highlighter-rouge">dev/create-release/do-release-docker.sh</code> script. Run this script, type information it requires, and wait until it finishes. You can also do a single step via the <code class="language-plaintext highlighter-rouge">-s</code> option. Please run <code class="language-plaintext highlighter-rouge">do-release-docker.sh -h</code> and see more details.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="call-a-vote-on-the-release-candidate">Call a vote on the release candidate</h3> <p>The release voting takes place on the Apache Spark developers list (the PMC is voting). Look at past voting threads to see how this proceeds. The email should follow <a href="https://mail-archives.apache.org/mod_mbox/spark-dev/201407.mbox/%3cCABPQxss7Cf+YaUuxCk0jnusH4207hCP4dkWn3BWFSvdnD86HHQ@mail.gmail.com%3e">this format</a>.</p> <ul> <li>Make a shortened link to the full list of JIRAs using <a href="https://s.apache.org/">https://s.apache.org/</a></li> <li>If possible, attach a draft of the release notes with the email</li> <li>Make sure the voting closing time is in UTC format. Use this script to generate it</li> <li>Make sure the email is in text format and the links are correct</li> </ul> <p>Once the vote is done, you should also send out a summary email with the totals, with a subject that looks something like <code class="language-plaintext highlighter-rouge">[VOTE][RESULT] ...</code>.</p> <h2 id="finalize-the-release">Finalize the release</h2> <p>Note that <code class="language-plaintext highlighter-rouge">dev/create-release/do-release-docker.sh</code> script (<code class="language-plaintext highlighter-rouge">finalize</code> step ) automates most of the following steps <strong>except</strong> for:</p> <ul> <li>Update the configuration of Algolia Crawler</li> <li>Remove old releases from Mirror Network</li> <li>Update the rest of the Spark website</li> <li>Create and upload Spark Docker Images</li> <li>Create an announcement</li> </ul> <p>Please manually verify the result after each step.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="upload-to-apache-release-directory">Upload to Apache release directory</h3> <p><strong>Be Careful!</strong></p> <p><strong>THIS STEP IS IRREVERSIBLE so make sure you selected the correct staging repository. Once you move the artifacts into the release folder, they cannot be removed.</strong></p> <p>After the vote passes, to upload the binaries to Apache mirrors, you move the binaries from dev directory (this should be where they are voted) to release directory. This &#8220;moving&#8221; is the only way you can add stuff to the actual release directory. (Note: only PMC can move to release directory)</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Move the sub-directory in "dev" to the # corresponding directory in "release" $ export SVN_EDITOR=vim $ svn mv https://dist.apache.org/repos/dist/dev/spark/v1.1.1-rc2-bin https://dist.apache.org/repos/dist/release/spark/spark-1.1.1 # If you've added your signing key to the KEYS file, also update the release copy. svn co --depth=files "https://dist.apache.org/repos/dist/release/spark" svn-spark curl "https://dist.apache.org/repos/dist/dev/spark/KEYS" &gt; svn-spark/KEYS (cd svn-spark &amp;&amp; svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Update KEYS") </code></pre></div></div> <p>Verify that the resources are present in <a href="https://www.apache.org/dist/spark/">https://www.apache.org/dist/spark/</a>. It may take a while for them to be visible. This will be mirrored throughout the Apache network. Check the release checker result of the release at <a href="https://checker.apache.org/projs/spark.html">https://checker.apache.org/projs/spark.html</a>.</p> <p>For Maven Central Repository, you can Release from the <a href="https://repository.apache.org/">Apache Nexus Repository Manager</a>. This is already populated by the <code class="language-plaintext highlighter-rouge">release-build.sh publish-release</code> step. Log in, open Staging Repositories, find the one voted on (eg. orgapachespark-1257 for <a href="https://repository.apache.org/content/repositories/orgapachespark-1257/">https://repository.apache.org/content/repositories/orgapachespark-1257/</a>), select and click Release and confirm. If successful, it should show up under <a href="https://repository.apache.org/content/repositories/releases/org/apache/spark/spark-core_2.11/2.2.1/">https://repository.apache.org/content/repositories/releases/org/apache/spark/spark-core_2.11/2.2.1/</a> and the same under <a href="https://repository.apache.org/content/groups/maven-staging-group/org/apache/spark/spark-core_2.11/2.2.1/">https://repository.apache.org/content/groups/maven-staging-group/org/apache/spark/spark-core_2.11/2.2.1/</a> (look for the correct release version). After some time this will be sync&#8217;d to <a href="https://search.maven.org/">Maven Central</a> automatically.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="upload-to-pypi">Upload to PyPI</h3> <p>You&#8217;ll need your own PyPI account. If you do not have a PyPI account that has access to the <code class="language-plaintext highlighter-rouge">pyspark</code> and <code class="language-plaintext highlighter-rouge">pyspark-connect</code> projects on PyPI, please ask the <a href="mailto:private@spark.apache.org">PMC</a> to grant permission for both.</p> <p>The artifacts can be uploaded using <a href="https://pypi.org/project/twine/">twine</a>. Just run:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>twine upload -u __token__ -p $PYPI_API_TOKEN \ --repository-url https://upload.pypi.org/legacy/ \ "pyspark-$PYSPARK_VERSION.tar.gz" \ "pyspark-$PYSPARK_VERSION.tar.gz.asc" </code></pre></div></div> <p>Adjusting the command for the files that match the new release. If for some reason the twine upload is incorrect (e.g. http failure or other issue), you can rename the artifact to <code class="language-plaintext highlighter-rouge">pyspark-version.post0.tar.gz</code>, delete the old artifact from PyPI and re-upload.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="remove-rc-artifacts-from-repositories">Remove RC artifacts from repositories</h3> <p><strong>NOTE! If you did not make a backup of docs for approved RC, this is the last time you can make a backup. This will be used to upload the docs to the website in next few step. Check out docs from svn before removing the directory.</strong></p> <p>After the vote passes and you moved the approved RC to the release repository, you should delete the RC directories from the staging repository. For example:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>RC=v3.5.2-rc3 &amp;&amp; \ svn rm https://dist.apache.org/repos/dist/dev/spark/"${RC}"-bin/ \ https://dist.apache.org/repos/dist/dev/spark/"${RC}"-docs/ \ -m"Removing RC artifacts." </code></pre></div></div> <p>Make sure to also remove the unpublished staging repositories from the <a href="https://repository.apache.org/">Apache Nexus Repository Manager</a>.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="remove-old-releases-from-mirror-network">Remove old releases from Mirror Network</h3> <p>Spark always keeps the latest maintenance released of each branch in the mirror network. To delete older versions simply use svn rm:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.1.0 </code></pre></div></div> <p>You will also need to update <code class="language-plaintext highlighter-rouge">js/download.js</code> to indicate the release is not mirrored anymore, so that the correct links are generated on the site.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="update-the-apache-spark-repository">Update the Spark Apache<span class="tm">&trade;</span> repository</h3> <p>Check out the tagged commit for the release candidate that passed and apply the correct version tag.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git tag v1.1.1 v1.1.1-rc2 # the RC that passed $ git push apache v1.1.1 </code></pre></div></div> <p align="right"><a href="#top">Return to top</a></p> <h3 id="update-the-configuration-of-algolia-crawler">Update the configuration of Algolia Crawler</h3> <p>The search box on the <a href="https://spark.apache.org/docs/latest/">Spark documentation website</a> leverages the <a href="https://www.algolia.com/products/search-and-discovery/crawler/">Algolia Crawler</a>. Before a release, please update the crawler configuration for Apache Spark with the new version on the <a href="https://crawler.algolia.com/">Algolia Crawler Admin Console</a>. If you don&#8217;t have access to the configuration, contact <a href="mailto:gengliang@apache.org">Gengliang Wang</a> or <a href="mailto:lixiao@apache.org">Xiao Li</a> for help.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="update-the-spark-website">Update the Spark website</h3> <h4 id="upload-generated-docs">Upload generated docs</h4> <p>The website repository is located at <a href="https://github.com/apache/spark-website">https://github.com/apache/spark-website</a>.</p> <p>It&#8217;s recommended to not remove the generated docs of the latest RC, so that we can copy it to spark-website directly, otherwise you need to re-build the docs.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Build the latest docs $ git checkout v1.1.1 $ cd docs $ PRODUCTION=1 bundle exec jekyll build # Copy the new documentation to Apache $ git clone https://github.com/apache/spark-website ... $ cp -R _site spark-website/site/docs/1.1.1 # Update the "latest" link $ cd spark-website/site/docs $ rm latest $ ln -s 1.1.1 latest </code></pre></div></div> <h4 id="update-the-rest-of-the-spark-website">Update the rest of the Spark website</h4> <p>Next, update the rest of the Spark website. See how the previous releases are documented (all the HTML file changes are generated by <code class="language-plaintext highlighter-rouge">jekyll</code>). In particular:</p> <ul> <li>update <code class="language-plaintext highlighter-rouge">_layouts/global.html</code> if the new release is the latest one</li> <li>update <code class="language-plaintext highlighter-rouge">documentation.md</code> to add link to the docs for the new release</li> <li>add the new release to <code class="language-plaintext highlighter-rouge">js/downloads.js</code> (attention to the order of releases)</li> <li>add the new release to <code class="language-plaintext highlighter-rouge">site/static/versions.json</code> (attention to the order of releases) [for <code class="language-plaintext highlighter-rouge">spark version drop down</code> of the <code class="language-plaintext highlighter-rouge">PySpark</code> docs]</li> <li>check <code class="language-plaintext highlighter-rouge">security.md</code> for anything to update</li> </ul> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git add 1.1.1 $ git commit -m "Add docs for Spark 1.1.1" </code></pre></div></div> <p>Then, create the release notes. Go to the <a href="https://issues.apache.org/jira/projects/SPARK?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page">release page in JIRA</a>, pick the release version from the list, then click on &#8220;Release Notes&#8221;. Copy this URL and then make a short URL on <a href="https://s.apache.org/">s.apache.org</a>, sign in to your Apache account, and pick the ID as something like <code class="language-plaintext highlighter-rouge">spark-2.1.2</code>. Create a new release post under <code class="language-plaintext highlighter-rouge">releases/_posts</code> to include this short URL. The date of the post should be the date you create it.</p> <p>Then run <code class="language-plaintext highlighter-rouge">bundle exec jekyll build</code> to update the <code class="language-plaintext highlighter-rouge">site</code> directory.</p> <p>Considering the Pull Request will be large, please separate the commits of code changes and generated <code class="language-plaintext highlighter-rouge">site</code> directory for an easier review.</p> <p>After merging the change into the <code class="language-plaintext highlighter-rouge">asf-site</code> branch, you may need to create a follow-up empty commit to force synchronization between ASF&#8217;s git and the web site, and also the GitHub mirror. For some reason synchronization seems to not be reliable for this repository.</p> <p>On a related note, make sure the version is marked as released on JIRA. Go find the release page as above, eg., <a href="https://issues.apache.org/jira/projects/SPARK/versions/12340295"><code class="language-plaintext highlighter-rouge">https://issues.apache.org/jira/projects/SPARK/versions/12340295</code></a>, and click the &#8220;Release&#8221; button on the right and enter the release date.</p> <p>(Generally, this is only for major and minor, but not patch releases) The contributors list can be automatically generated through <a href="https://github.com/apache/spark/blob/branch-1.1/dev/create-release/generate-contributors.py">this script</a>. It accepts the tag that corresponds to the current release and another tag that corresponds to the previous (not including maintenance release). For instance, if you are releasing Spark 1.2.0, set the current tag to v1.2.0-rc2 and the previous tag to v1.1.0. Once you have generated the initial contributors list, it is highly likely that there will be warnings about author names not being properly translated. To fix this, run <a href="https://github.com/apache/spark/blob/branch-1.1/dev/create-release/translate-contributors.py">this other script</a>, which fetches potential replacements from GitHub and JIRA. For instance:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd dev/create-release # Set RELEASE_TAG and PREVIOUS_RELEASE_TAG $ export RELEASE_TAG=v1.1.1 $ export PREVIOUS_RELEASE_TAG=v1.1.0 # Generate initial contributors list, likely with warnings $ ./generate-contributors.py # Set GITHUB_OAUTH_KEY. $ export GITHUB_OAUTH_KEY=blabla # Set either JIRA_ACCESS_TOKEN (for 4.0.0 and later) or JIRA_USERNAME / JIRA_PASSWORD. $ export JIRA_ACCESS_TOKEN=blabla $ export JIRA_USERNAME=blabla $ export JIRA_PASSWORD=blabla # Translate names generated in the previous step, reading from known_translations if necessary $ ./translate-contributors.py </code></pre></div></div> <p>Additionally, if you wish to give more specific credit for developers of larger patches, you may use the following commands to identify large patches. Extra care must be taken to make sure commits from previous releases are not counted since git cannot easily associate commits that were back ported into different branches.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Determine PR numbers closed only in the new release $ git log v1.1.1 | grep "Closes #" | cut -d " " -f 5,6 | grep Closes | sort &gt; closed_1.1.1 $ git log v1.1.0 | grep "Closes #" | cut -d " " -f 5,6 | grep Closes | sort &gt; closed_1.1.0 $ diff --new-line-format="" --unchanged-line-format="" closed_1.1.1 closed_1.1.0 &gt; diff.txt # Grep expression with all new patches $ EXPR=$(cat diff.txt | awk '{ print "\\("$1" "$2" \\)"; }' | tr "\n" "|" | sed -e "s/|/\\\|/g" | sed "s/\\\|$//") # Contributor list $ git shortlog v1.1.1 --grep "$EXPR" &gt; contrib.txt # Large patch list (300+ lines) $ git log v1.1.1 --grep "$expr" --shortstat --oneline | grep -B 1 -e "[3-9][0-9][0-9] insert" -e "[1-9][1-9][1-9][1-9] insert" | grep SPARK &gt; large-patches.txt </code></pre></div></div> <p align="right"><a href="#top">Return to top</a></p> <h3 id="create-and-upload-spark-docker-images">Create and upload Spark Docker Images</h3> <p>The apache/spark-docker provides Dockerfiles and GitHub Action for Spark Docker images published, please follow the <a href="https://github.com/apache/spark-docker?tab=readme-ov-file#create-a-new-version">instructions</a> to create and upload the docker images.</p> <p align="right"><a href="#top">Return to top</a></p> <h3 id="create-an-announcement">Create an announcement</h3> <p>Once everything is working (website docs, website changes) create an announcement on the website and then send an e-mail to the mailing list with a subject that looks something like <code class="language-plaintext highlighter-rouge">[ANNOUNCE] ...</code>. To create an announcement, create a post under <code class="language-plaintext highlighter-rouge">news/_posts</code> and then run <code class="language-plaintext highlighter-rouge">bundle exec jekyll build</code>.</p> <p>Enjoy an adult beverage of your choice, and congratulations on making a Spark release.</p> <p align="right"><a href="#top">Return to top</a></p> </div> <div class="col-12 col-md-3"> <div class="news" style="margin-bottom: 20px;"> <h5>Latest News</h5> <ul class="list-unstyled"> <li><a href="/news/spark-3-4-4-released.html">Spark 3.4.4 released</a> <span class="small">(Oct 27, 2024)</span></li> <li><a href="/news/spark-4.0.0-preview2.html">Preview release of Spark 4.0</a> <span class="small">(Sep 26, 2024)</span></li> <li><a href="/news/spark-3-5-3-released.html">Spark 3.5.3 released</a> <span class="small">(Sep 24, 2024)</span></li> <li><a href="/news/spark-3-5-2-released.html">Spark 3.5.2 released</a> <span class="small">(Aug 10, 2024)</span></li> </ul> <p class="small" style="text-align: right;"><a href="/news/index.html">Archive</a></p> </div> <div style="text-align:center; margin-bottom: 20px;"> <a href="https://www.apache.org/events/current-event.html"> <img src="https://www.apache.org/events/current-event-234x60.png" style="max-width: 100%;"/> </a> </div> <div class="hidden-xs hidden-sm"> <a href="/downloads.html" class="btn btn-cta btn-lg d-grid" style="margin-bottom: 30px;"> Download Spark </a> <p style="font-size: 16px; font-weight: 500; color: #555;"> Built-in Libraries: </p> <ul class="list-none"> <li><a href="/sql/">SQL and DataFrames</a></li> <li><a href="/streaming/">Spark Streaming</a></li> <li><a href="/mllib/">MLlib (machine learning)</a></li> <li><a href="/graphx/">GraphX (graph)</a></li> </ul> <a href="/third-party-projects.html">Third-Party Projects</a> </div> </div> </div> <footer class="small"> <hr> Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. See guidance on use of Apache Spark <a href="/trademarks.html">trademarks</a>. All other marks mentioned may be trademarks or registered trademarks of their respective owners. Copyright &copy; 2018 The Apache Software Foundation, Licensed under the <a href="https://www.apache.org/licenses/">Apache License, Version 2.0</a>. </footer> </div> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js" integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM" crossorigin="anonymous"></script> <script src="https://code.jquery.com/jquery.js"></script> <script src="/js/lang-tabs.js"></script> <script src="/js/downloads.js"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10