CINXE.COM

Apache Zeppelin 0.10.0 Documentation: Apache Hadoop Submarine Interpreter for Apache Zeppelin

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>Apache Zeppelin 0.10.0 Documentation: Apache Hadoop Submarine Interpreter for Apache Zeppelin</title> <meta name="description" content="Hadoop Submarine is the latest machine learning framework subproject in the Hadoop 3.1 release. It allows Hadoop to support Tensorflow, MXNet, Caffe, Spark, etc."> <meta name="author" content="The Apache Software Foundation"> <!-- Enable responsive viewport --> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <!-- Le HTML5 shim, for IE6-8 support of HTML elements --> <!--[if lt IE 9]> <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script> <![endif]--> <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet"> <!-- Le styles --> <link href="/docs/0.10.0/assets/themes/zeppelin/bootstrap/css/bootstrap.css" rel="stylesheet"> <link href="/docs/0.10.0/assets/themes/zeppelin/css/style.css?body=1" rel="stylesheet" type="text/css"> <link href="/docs/0.10.0/assets/themes/zeppelin/css/syntax.css" rel="stylesheet" type="text/css" media="screen" /> <!-- Le fav and touch icons --> <!-- Update these with your own images <link rel="shortcut icon" href="images/favicon.ico"> <link rel="apple-touch-icon" href="images/apple-touch-icon.png"> <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png"> <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png"> --> <!-- Js --> <script src="https://code.jquery.com/jquery-1.10.2.min.js"></script> <script src="/docs/0.10.0/assets/themes/zeppelin/bootstrap/js/bootstrap.min.js"></script> <script src="/docs/0.10.0/assets/themes/zeppelin/js/docs.js"></script> <script src="/docs/0.10.0/assets/themes/zeppelin/js/anchor.min.js"></script> <script src="/docs/0.10.0/assets/themes/zeppelin/js/toc.js"></script> <script src="/docs/0.10.0/assets/themes/zeppelin/js/lunr.min.js"></script> <script src="/docs/0.10.0/assets/themes/zeppelin/js/search.js"></script> <!-- atom & rss feed --> <link href="/docs/0.10.0/atom.xml" type="application/atom+xml" rel="alternate" title="Sitewide ATOM Feed"> <link href="/docs/0.10.0/rss.xml" type="application/rss+xml" rel="alternate" title="Sitewide RSS Feed"> </head> <body> <div id="menu" class="navbar navbar-inverse navbar-fixed-top" role="navigation"> <div class="container navbar-container"> <div class="navbar-header"> <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <div class="navbar-brand"> <a class="navbar-brand-main" href="http://zeppelin.apache.org"> <img src="/docs/0.10.0/assets/themes/zeppelin/img/zeppelin_logo.png" width="50" style="margin-top: -2px;" alt="I'm zeppelin"> <span style="margin-left: 5px; font-size: 27px;">Zeppelin</span> <a class="navbar-brand-version" href="/docs/0.10.0" style="font-size: 15px; color: white;"> 0.10.0 </a> </a> </div> </div> <nav class="navbar-collapse collapse" role="navigation"> <ul class="nav navbar-nav"> <li> <a href="#" data-toggle="dropdown" class="dropdown-toggle">Quick Start <b class="caret"></b></a> <ul class="dropdown-menu"> <li class="title"><span>Getting Started</span></li> <li><a href="/docs/0.10.0/quickstart/install.html">Install</a></li> <li><a href="/docs/0.10.0/quickstart/explore_ui.html">Explore UI</a></li> <li><a href="/docs/0.10.0/quickstart/tutorial.html">Tutorial</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Run Mode</span></li> <li><a href="/docs/0.10.0/quickstart/kubernetes.html">Kubernetes</a></li> <li><a href="/docs/0.10.0/quickstart/docker.html">Docker</a></li> <li><a href="/docs/0.10.0/quickstart/yarn.html">Yarn</a></li> <li role="separator" class="divider"></li> <li><a href="/docs/0.10.0/quickstart/spark_with_zeppelin.html">Spark with Zeppelin</a></li> <li><a href="/docs/0.10.0/quickstart/flink_with_zeppelin.html">Flink with Zeppelin</a></li> <li><a href="/docs/0.10.0/quickstart/sql_with_zeppelin.html">SQL with Zeppelin</a></li> <li><a href="/docs/0.10.0/quickstart/python_with_zeppelin.html">Python with Zeppelin</a></li> <li><a href="/docs/0.10.0/quickstart/r_with_zeppelin.html">R with Zeppelin</a></li> </ul> </li> <li> <a href="#" data-toggle="dropdown" class="dropdown-toggle">Usage<b class="caret"></b></a> <ul class="dropdown-menu scrollable-menu"> <li class="title"><span>Dynamic Form</span></li> <li><a href="/docs/0.10.0/usage/dynamic_form/intro.html">What is Dynamic Form?</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Display System</span></li> <li><a href="/docs/0.10.0/usage/display_system/basic.html#text">Text Display</a></li> <li><a href="/docs/0.10.0/usage/display_system/basic.html#html">HTML Display</a></li> <li><a href="/docs/0.10.0/usage/display_system/basic.html#table">Table Display</a></li> <li><a href="/docs/0.10.0/usage/display_system/basic.html#network">Network Display</a></li> <li><a href="/docs/0.10.0/usage/display_system/angular_backend.html">Angular Display using Backend API</a></li> <li><a href="/docs/0.10.0/usage/display_system/angular_frontend.html">Angular Display using Frontend API</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Interpreter</span></li> <li><a href="/docs/0.10.0/usage/interpreter/overview.html">Overview</a></li> <li><a href="/docs/0.10.0/usage/interpreter/interpreter_binding_mode.html">Interpreter Binding Mode</a></li> <li><a href="/docs/0.10.0/usage/interpreter/user_impersonation.html">User Impersonation</a></li> <li><a href="/docs/0.10.0/usage/interpreter/dependency_management.html">Dependency Management</a></li> <li><a href="/docs/0.10.0/usage/interpreter/installation.html">Installing Interpreters</a></li> <!--<li><a href="/docs/0.10.0/usage/interpreter/dynamic_loading.html">Dynamic Interpreter Loading (Experimental)</a></li>--> <li><a href="/docs/0.10.0/usage/interpreter/execution_hooks.html">Execution Hooks (Experimental)</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Other Features</span></li> <li><a href="/docs/0.10.0/usage/other_features/publishing_paragraphs.html">Publishing Paragraphs</a></li> <li><a href="/docs/0.10.0/usage/other_features/personalized_mode.html">Personalized Mode</a></li> <li><a href="/docs/0.10.0/usage/other_features/customizing_homepage.html">Customizing Zeppelin Homepage</a></li> <li><a href="/docs/0.10.0/usage/other_features/notebook_actions.html">Notebook Actions</a></li> <li><a href="/docs/0.10.0/usage/other_features/cron_scheduler.html">Cron Scheduler</a></li> <li><a href="/docs/0.10.0/usage/other_features/zeppelin_context.html">Zeppelin Context</a></li> <li role="separator" class="divider"></li> <li class="title"><span>REST API</span></li> <li><a href="/docs/0.10.0/usage/rest_api/interpreter.html">Interpreter API</a></li> <li><a href="/docs/0.10.0/usage/rest_api/zeppelin_server.html">Zeppelin Server API</a></li> <li><a href="/docs/0.10.0/usage/rest_api/notebook.html">Notebook API</a></li> <li><a href="/docs/0.10.0/usage/rest_api/notebook_repository.html">Notebook Repository API</a></li> <li><a href="/docs/0.10.0/usage/rest_api/configuration.html">Configuration API</a></li> <li><a href="/docs/0.10.0/usage/rest_api/credential.html">Credential API</a></li> <li><a href="/docs/0.10.0/usage/rest_api/helium.html">Helium API</a></li> <li class="title"><span>Zeppelin SDK</span></li> <li><a href="/docs/0.10.0/usage/zeppelin_sdk/client_api.html">Client API</a></li> <li><a href="/docs/0.10.0/usage/zeppelin_sdk/session_api.html">Session API</a></li> </ul> </li> <li> <a href="#" data-toggle="dropdown" class="dropdown-toggle">Setup<b class="caret"></b></a> <ul class="dropdown-menu scrollable-menu"> <li class="title"><span>Basics</span></li> <li><a href="/docs/0.10.0/setup/basics/how_to_build.html">How to Build Zeppelin</a></li> <li><a href="/docs/0.10.0/setup/basics/hadoop_integration.html">Hadoop Integration</a></li> <li><a href="/docs/0.10.0/setup/basics/multi_user_support.html">Multi-user Support</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Deployment</span></li> <!--<li><a href="/docs/0.10.0/setup/deployment/docker.html">Docker Image for Zeppelin</a></li>--> <li><a href="/docs/0.10.0/setup/deployment/spark_cluster_mode.html#spark-standalone-mode">Spark Cluster Mode: Standalone</a></li> <li><a href="/docs/0.10.0/setup/deployment/spark_cluster_mode.html#spark-on-yarn-mode">Spark Cluster Mode: YARN</a></li> <li><a href="/docs/0.10.0/setup/deployment/spark_cluster_mode.html#spark-on-mesos-mode">Spark Cluster Mode: Mesos</a></li> <li><a href="/docs/0.10.0/setup/deployment/flink_and_spark_cluster.html">Zeppelin with Flink, Spark Cluster</a></li> <li><a href="/docs/0.10.0/setup/deployment/cdh.html">Zeppelin on CDH</a></li> <li><a href="/docs/0.10.0/setup/deployment/virtual_machine.html">Zeppelin on VM: Vagrant</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Security</span></li> <li><a href="/docs/0.10.0/setup/security/authentication_nginx.html">HTTP Basic Auth using NGINX</a></li> <li><a href="/docs/0.10.0/setup/security/shiro_authentication.html">Shiro Authentication</a></li> <li><a href="/docs/0.10.0/setup/security/notebook_authorization.html">Notebook Authorization</a></li> <li><a href="/docs/0.10.0/setup/security/datasource_authorization.html">Data Source Authorization</a></li> <li><a href="/docs/0.10.0/setup/security/http_security_headers.html">HTTP Security Headers</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Notebook Storage</span></li> <li><a href="/docs/0.10.0/setup/storage/storage.html#notebook-storage-in-local-git-repository">Git Storage</a></li> <li><a href="/docs/0.10.0/setup/storage/storage.html#notebook-storage-in-s3">S3 Storage</a></li> <li><a href="/docs/0.10.0/setup/storage/storage.html#notebook-storage-in-azure">Azure Storage</a></li> <li><a href="/docs/0.10.0/setup/storage/storage.html#notebook-storage-in-oss">OSS Storage</a></li> <li><a href="/docs/0.10.0/setup/storage/storage.html#notebook-storage-in-zeppelinhub">ZeppelinHub Storage</a></li> <li><a href="/docs/0.10.0/setup/storage/storage.html#notebook-storage-in-mongodb">MongoDB Storage</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Operation</span></li> <li><a href="/docs/0.10.0/setup/operation/configuration.html">Configuration</a></li> <li><a href="/docs/0.10.0/setup/operation/proxy_setting.html">Proxy Setting</a></li> <li><a href="/docs/0.10.0/setup/operation/upgrading.html">Upgrading</a></li> <li><a href="/docs/0.10.0/setup/operation/trouble_shooting.html">Trouble Shooting</a></li> </ul> </li> <li> <a href="#" data-toggle="dropdown" class="dropdown-toggle">Interpreter <b class="caret"></b></a> <ul class="dropdown-menu scrollable-menu"> <li class="title"><span>Interpreters</span></li> <li><a href="/docs/0.10.0/usage/interpreter/overview.html">Overview</a></li> <li role="separator" class="divider"></li> <li><a href="/docs/0.10.0/interpreter/spark.html">Spark</a></li> <li><a href="/docs/0.10.0/interpreter/flink.html">Flink</a></li> <li><a href="/docs/0.10.0/interpreter/jdbc.html">JDBC</a></li> <li><a href="/docs/0.10.0/interpreter/python.html">Python</a></li> <li><a href="/docs/0.10.0/interpreter/r.html">R</a></li> <li role="separator" class="divider"></li> <li><a href="/docs/0.10.0/interpreter/alluxio.html">Alluxio</a></li> <li><a href="/docs/0.10.0/interpreter/beam.html">Beam</a></li> <li><a href="/docs/0.10.0/interpreter/bigquery.html">BigQuery</a></li> <li><a href="/docs/0.10.0/interpreter/cassandra.html">Cassandra</a></li> <li><a href="/docs/0.10.0/interpreter/elasticsearch.html">Elasticsearch</a></li> <li><a href="/docs/0.10.0/interpreter/geode.html">Geode</a></li> <li><a href="/docs/0.10.0/interpreter/groovy.html">Groovy</a></li> <li><a href="/docs/0.10.0/interpreter/hazelcastjet.html">Hazelcast Jet</a></li> <li><a href="/docs/0.10.0/interpreter/hbase.html">HBase</a></li> <li><a href="/docs/0.10.0/interpreter/hdfs.html">HDFS</a></li> <li><a href="/docs/0.10.0/interpreter/hive.html">Hive</a></li> <li><a href="/docs/0.10.0/interpreter/ignite.html">Ignite</a></li> <li><a href="/docs/0.10.0/interpreter/influxdb.html">influxDB</a></li> <li><a href="/docs/0.10.0/interpreter/java.html">Java</a></li> <li><a href="/docs/0.10.0/interpreter/jupyter.html">Jupyter</a></li> <li><a href="/docs/0.10.0/interpreter/kotlin.html">Kotlin</a></li> <li><a href="/docs/0.10.0/interpreter/ksql.html">KSQL</a></li> <li><a href="/docs/0.10.0/interpreter/kylin.html">Kylin</a></li> <li><a href="/docs/0.10.0/interpreter/lens.html">Lens</a></li> <li><a href="/docs/0.10.0/interpreter/livy.html">Livy</a></li> <li><a href="/docs/0.10.0/interpreter/mahout.html">Mahout</a></li> <li><a href="/docs/0.10.0/interpreter/markdown.html">Markdown</a></li> <li><a href="/docs/0.10.0/interpreter/mongodb.html">MongoDB</a></li> <li><a href="/docs/0.10.0/interpreter/neo4j.html">Neo4j</a></li> <li><a href="/docs/0.10.0/interpreter/pig.html">Pig</a></li> <li><a href="/docs/0.10.0/interpreter/postgresql.html">Postgresql, HAWQ</a></li> <li><a href="/docs/0.10.0/interpreter/sap.html">SAP</a></li> <li><a href="/docs/0.10.0/interpreter/scalding.html">Scalding</a></li> <li><a href="/docs/0.10.0/interpreter/scio.html">Scio</a></li> <li><a href="/docs/0.10.0/interpreter/shell.html">Shell</a></li> <li><a href="/docs/0.10.0/interpreter/sparql.html">Sparql</a></li> <li><a href="/docs/0.10.0/interpreter/submarine.html">Submarine</a></li> </ul> </li> <li> <a href="#" data-toggle="dropdown" class="dropdown-toggle">More<b class="caret"></b></a> <ul class="dropdown-menu scrollable-menu" style="right: 0; left: auto;"> <li class="title"><span>Extending Zeppelin</span></li> <li><a href="/docs/0.10.0/development/writing_zeppelin_interpreter.html">Writing Zeppelin Interpreter</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Helium (Experimental)</span></li> <li><a href="/docs/0.10.0/development/helium/overview.html">Overview</a></li> <li><a href="/docs/0.10.0/development/helium/writing_application.html">Writing Helium Application</a></li> <li><a href="/docs/0.10.0/development/helium/writing_spell.html">Writing Helium Spell</a></li> <li><a href="/docs/0.10.0/development/helium/writing_visualization_basic.html">Writing Helium Visualization: Basics</a></li> <li><a href="/docs/0.10.0/development/helium/writing_visualization_transformation.html">Writing Helium Visualization: Transformation</a></li> <li role="separator" class="divider"></li> <li class="title"><span>Contributing to Zeppelin</span></li> <li><a href="/docs/0.10.0/setup/basics/how_to_build.html">How to Build Zeppelin</a></li> <li><a href="/docs/0.10.0/development/contribution/useful_developer_tools.html">Useful Developer Tools</a></li> <li><a href="/docs/0.10.0/development/contribution/how_to_contribute_code.html">How to Contribute (code)</a></li> <li><a href="/docs/0.10.0/development/contribution/how_to_contribute_website.html">How to Contribute (website)</a></li> <li role="separator" class="divider"></li> <li class="title"><span>External Resources</span></li> <li><a target="_blank" href="https://zeppelin.apache.org/community.html">Mailing List</a></li> <li><a target="_blank" href="https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Home">Apache Zeppelin Wiki</a></li> <li><a target="_blank" href="http://stackoverflow.com/questions/tagged/apache-zeppelin">Stackoverflow Questions about Zeppelin</a></li> </ul> </li> <li> <a href="/docs/0.10.0/search.html" class="nav-search-link"> <span class="fa fa-search nav-search-icon"></span> </a> </li> </ul> </nav><!--/.navbar-collapse --> </div> </div> <div class="content"> <!--<div class="hero-unit Apache Hadoop Submarine Interpreter for Apache Zeppelin"> <h1></h1> </div> --> <div class="row"> <div class="col-md-12"> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <h1>Submarine Interpreter for Apache Zeppelin</h1> <div id="toc"></div> <p><a href="https://hadoop.apache.org/submarine/">Hadoop Submarine </a> is the latest machine learning framework subproject in the Hadoop 3.1 release. It allows Hadoop to support Tensorflow, MXNet, Caffe, Spark, etc. A variety of deep learning frameworks provide a full-featured system framework for machine learning algorithm development, distributed model training, model management, and model publishing, combined with hadoop&#39;s intrinsic data storage and data processing capabilities to enable data scientists to Good mining and the value of the data.</p> <p>A deep learning algorithm project requires data acquisition, data processing, data cleaning, interactive visual programming adjustment parameters, algorithm testing, algorithm publishing, algorithm job scheduling, offline model training, model online services and many other processes and processes. Zeppelin is a web-based notebook that supports interactive data analysis. You can use SQL, Scala, Python, etc. to make data-driven, interactive, collaborative documents.</p> <p>You can use the more than 20 interpreters in zeppelin (for example: spark, hive, Cassandra, Elasticsearch, Kylin, HBase, etc.) to collect data, clean data, feature extraction, etc. in the data in Hadoop before completing the machine learning model training. The data preprocessing process.</p> <p>By integrating submarine in zeppelin, we use zeppelin&#39;s data discovery, data analysis and data visualization and collaboration capabilities to visualize the results of algorithm development and parameter adjustment during machine learning model training.</p> <h2>Architecture</h2> <p><img class="submarine-dashboard" src="/docs/0.10.0/assets/themes/zeppelin/img/docs-img/submarine-architecture.png" /></p> <p>As shown in the figure above, how the Submarine develops and models the machine learning algorithms through Zeppelin is explained from the system architecture.</p> <p>After installing and deploying Hadoop 3.1+ and Zeppelin, submarine will create a fully separate Zeppelin Submarine interpreter Docker container for each user in YARN. This container contains the development and runtime environment for Tensorflow. Zeppelin Server connects to the Zeppelin Submarine interpreter Docker container in YARN. allows algorithmic engineers to perform algorithm development and data visualization in Tensorflow&#39;s stand-alone environment in Zeppelin Notebook.</p> <p>After the algorithm is developed, the algorithm engineer can submit the algorithm directly to the YARN in offline transfer training in Zeppelin, real-time demonstration of model training with Submarine&#39;s TensorBoard for each algorithm engineer.</p> <p>You can not only complete the model training of the algorithm, but you can also use the more than twenty interpreters in Zeppelin. Complete the data preprocessing of the model, For example, you can perform data extraction, filtering, and feature extraction through the Spark interpreter in Zeppelin in the Algorithm Note.</p> <p>In the future, you can also use Zeppelin&#39;s upcoming Workflow workflow orchestration service. You can complete Spark, Hive data processing and Tensorflow model training in one Note. It is organized into a workflow through visualization, etc., and the scheduling of jobs is performed in the production environment.</p> <h2>Overview</h2> <p><img class="submarine-dashboard" src="/docs/0.10.0/assets/themes/zeppelin/img/docs-img/submarine-interpreter.png" /></p> <p>As shown in the figure above, from the internal implementation, how Submarine combines Zeppelin&#39;s machine learning algorithm development and model training.</p> <ol> <li><p>The algorithm engineer created a Tensorflow notebook (left image) in Zeppelin by using Submarine interpreter.</p> <p>It is important to note that you need to complete the development of the entire algorithm in a Note.</p></li> <li><p>You can use Spark for data preprocessing in some of the paragraphs in Note.</p></li> <li><p>Use Python for algorithm development and debugging of Tensorflow in other paragraphs of notebook, Submarine creates a Zeppelin Submarine Interpreter Docker Container for you in YARN, which contains the following features and services:</p></li> </ol> <ul> <li><strong>Shell Command line tool</strong>:Allows you to view the system environment in the Zeppelin Submarine Interpreter Docker Container, Install the extension tools you need or the Python dependencies.</li> <li><strong>Kerberos lib</strong>:Allows you to perform kerberos authentication and access to Hadoop clusters with Kerberos authentication enabled.</li> <li><strong>Tensorflow environment</strong>:Allows you to develop tensorflow algorithm code.</li> <li><strong>Python environment</strong>:Allows you to develop tensorflow code.</li> <li>Complete a complete algorithm development with a Note in Zeppelin. If this algorithm contains multiple modules, You can write different algorithm modules in multiple paragraphs in Note. The title of each paragraph is the name of the algorithm module. The content of the paragraph is the code content of this algorithm module.</li> <li><p><strong>HDFS Client</strong>:Zeppelin Submarine Interpreter will automatically submit the algorithm code you wrote in Note to HDFS.</p> <p><strong>Submarine interpreter Docker Image</strong> It is Submarine that provides you with an image file that supports Tensorflow (CPU and GPU versions). And installed the algorithm library commonly used by Python. You can also install other development dependencies you need on top of the base image provided by Submarine.</p></li> </ul> <ol> <li>When you complete the development of the algorithm module, You can do this by creating a new paragraph in Note and typing <code>%submarine dashboard</code>. Zeppelin will create a Submarine Dashboard. The machine learning algorithm written in this Note can be submitted to YARN as a JOB by selecting the <code>JOB RUN</code> command option in the Control Panel. Create a Tensorflow Model Training Docker Container, The container contains the following sections:</li> </ol> <ul> <li>Tensorflow environment</li> <li><p>HDFS Client Will automatically download the algorithm file Mount from HDFS into the container for distributed model training. Mount the algorithm file to the Work Dir path of the container.</p> <p><strong>Submarine Tensorflow Docker Image</strong> There is Submarine that provides you with an image file that supports Tensorflow (CPU and GPU versions). And installed the algorithm library commonly used by Python. You can also install other development dependencies you need on top of the base image provided by Submarine.</p></li> </ul> <table class="table-configuration"> <tr> <th>Name</th> <th>Class</th> <th>Description</th> </tr> <tr> <td>%submarine</td> <td>SubmarineInterpreter</td> <td>Provides interpreter for Apache Submarine dashboard</td> </tr> <tr> <td>%submarine.sh</td> <td>SubmarineShellInterpreter</td> <td>Provides interpreter for Apache Submarine shell</td> </tr> <tr> <td>%submarine.python</td> <td>PySubmarineInterpreter</td> <td>Provides interpreter for Apache Submarine python</td> </tr> </table> <h3>Submarine shell</h3> <p>After creating a Note with Submarine Interpreter in Zeppelin, You can add a paragraph to Note if you need it. Using the %submarine.sh identifier, you can use the Shell command to perform various operations on the Submarine Interpreter Docker Container, such as:</p> <ol> <li>View the Pythone version in the Container</li> <li>View the system environment of the Container</li> <li>Install the dependencies you need yourself</li> <li>Kerberos certification with kinit</li> <li>Use Hadoop in Container for HDFS operations, etc.</li> </ol> <h3>Submarine python</h3> <p>You can add one or more paragraphs to Note. Write the algorithm module for Tensorflow in Python using the <code>%submarine.python</code> identifier.</p> <h3>Submarine Dashboard</h3> <p>After writing the Tensorflow algorithm by using <code>%submarine.python</code>, You can add a paragraph to Note. Enter the %submarine dashboard and execute it. Zeppelin will create a Submarine Dashboard.</p> <p><img class="submarine-dashboard" src="/docs/0.10.0/assets/themes/zeppelin/img/docs-img/submarine-dashboard.gif" /></p> <p>With Submarine Dashboard you can do all the operational control of Submarine, for example:</p> <ol> <li><p><strong>Usage</strong>:Display Submarine&#39;s command description to help developers locate problems.</p></li> <li><p><strong>Refresh</strong>:Zeppelin will erase all your input in the Dashboard.</p></li> <li><p><strong>Tensorboard</strong>:You will be redirected to the Tensorboard WEB system created by Submarine for each user. With Tensorboard you can view the real-time status of the Tensorflow model training in real time.</p></li> <li><p><strong>Command</strong></p></li> </ol> <ul> <li><strong>JOB RUN</strong>:Selecting <code>JOB RUN</code> will display the parameter input interface for submitting JOB.</li> </ul> <table class="table-configuration"> <tr> <th>Name</th> <th>Description</th> </tr> <tr> <td>Checkpoint Path/td> <td>Submarine sets up a separate Checkpoint path for each user's Note for Tensorflow training. Saved the training data for this Note history, Used to train the output of model data, Tensorboard uses the data in this path for model presentation. Users cannot modify it. For example: `hdfs://cluster1/...` , The environment variable name for Checkpoint Path is `%checkpoint_path%`, You can use `%checkpoint_path%` instead of the input value in Data Path in `PS Launch Cmd` and `Worker Launch Cmd`.</td> </tr> <tr> <td>Input Path</td> <td>The user specifies the data data directory of the Tensorflow algorithm. Only HDFS-enabled directories are supported. The environment variable name for Data Path is `%input_path%`, You can use `%input_path%` instead of the input value in Data Path in `PS Launch Cmd` and `Worker Launch Cmd`.</td> </tr> <tr> <td>PS Launch Cmd</td> <td>Tensorflow Parameter services launch command,例如:`python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=0 ...`</td> </tr> <tr> <td>Worker Launch Cmd</td> <td>Tensorflow Worker services launch command,例如:`python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=1 ...`</td> </tr> </table> <ul> <li><p><strong>JOB STOP</strong></p> <p>You can choose to execute the <code>JOB STOP</code> command. Stop a Tensorflow model training task that has been submitted and is running</p></li> <li><p><strong>TENSORBOARD START</strong></p> <p>You can choose to execute the <code>TENSORBOARD START</code> command to create your TENSORBOARD Docker Container.</p></li> <li><p><strong>TENSORBOARD STOP</strong></p> <p>You can choose to execute the <code>TENSORBOARD STOP</code> command to stop and destroy your TENSORBOARD Docker Container.</p></li> </ul> <ol> <li><strong>Run Command</strong>:Execute the action command of your choice</li> <li><strong>Clean Chechkpoint</strong>:Checking this option will clear the data in this Note&#39;s Checkpoint Path before each <code>JOB RUN</code> execution.</li> </ol> <h3>Configuration</h3> <p>Zeppelin Submarine interpreter provides the following properties to customize the Submarine interpreter</p> <table class="table-configuration"> <tr> <th>Attribute name</th> <th>Attribute value</th> <th>Description</th> </tr> <tr> <td>DOCKER_CONTAINER_TIME_ZONE</td> <td>Etc/UTC</td> <td>Set the time zone in the container | </tr> <tr> <td>DOCKER_HADOOP_HDFS_HOME</td> <td>/hadoop-3.1-0</td> <td>Hadoop path in the following 3 images(SUBMARINE_INTERPRETER_DOCKER_IMAGE、tf.parameter.services.docker.image、tf.worker.services.docker.image) | </tr> <tr> <td>DOCKER_JAVA_HOME</td> <td>/opt/java</td> <td>JAVA path in the following 3 images(SUBMARINE_INTERPRETER_DOCKER_IMAGE、tf.parameter.services.docker.image、tf.worker.services.docker.image) | </tr> <tr> <td>HADOOP_YARN_SUBMARINE_JAR</td> <td></td> <td>Path to the Submarine JAR package in the Hadoop-3.1+ release installed on the Zeppelin server | </tr> <tr> <td>INTERPRETER_LAUNCH_MODE</td> <td>local/yarn</td> <td>Run the Submarine interpreter instance in local or YARN local mainly for submarine interpreter development and debugging YARN mode for production environment | </tr> <tr> <td>SUBMARINE_HADOOP_CONF_DIR</td> <td></td> <td>Set the HADOOP-CONF path to support multiple Hadoop cluster environments</td> </tr> <tr> <td>SUBMARINE_HADOOP_HOME</td> <td></td> <td>Hadoop-3.1+ above path installed on the Zeppelin server</td> </tr> <tr> <td>SUBMARINE_HADOOP_KEYTAB</td> <td></td> <td>Keytab file path for a hadoop cluster with kerberos authentication turned on</td> </tr> <tr> <td>SUBMARINE_HADOOP_PRINCIPAL</td> <td></td> <td>PRINCIPAL information for the keytab file of the hadoop cluster with kerberos authentication turned on</td> </tr> <tr> <td>SUBMARINE_INTERPRETER_DOCKER_IMAGE</td> <td></td> <td>At INTERPRETER_LAUNCH_MODE=yarn, Submarine uses this image to create a Zeppelin Submarine interpreter container to create an algorithm development environment for the user. | </tr> <tr> <td>docker.container.network</td> <td></td> <td>YARN's Docker network name</td> </tr> <tr> <td>machinelearing.distributed.enable</td> <td></td> <td>Whether to use the model training of the distributed mode JOB RUN submission</td> </tr> <tr> <td>shell.command.timeout.millisecs</td> <td>60000</td> <td>Execute timeout settings for shell commands in the Submarine interpreter container</td> </tr> <tr> <td>submarine.algorithm.hdfs.path</td> <td></td> <td>Save machine-based algorithms developed using Submarine interpreter to HDFS as files</td> </tr> <tr> <td>submarine.yarn.queue</td> <td>root.default</td> <td>Submarine submits model training YARN queue name</td> </tr> <tr> <td>tf.checkpoint.path</td> <td></td> <td>Tensorflow checkpoint path, Each user will create a user's checkpoint secondary path using the username under this path. Each algorithm submitted by the user will create a checkpoint three-level path using the note id (the user's Tensorboard uses the checkpoint data in this path for visual display)</td> </tr> <tr> <td>tf.parameter.services.cpu</td> <td></td> <td>Number of CPU cores applied to Tensorflow parameter services when Submarine submits model distributed training</td> </tr> <tr> <td>tf.parameter.services.docker.image</td> <td></td> <td>Submarine creates a mirror for Tensorflow parameter services when submitting model distributed training</td> </tr> <tr> <td>tf.parameter.services.gpu</td> <td></td> <td>GPU cores applied to Tensorflow parameter services when Submarine submits model distributed training</td> </tr> <tr> <td>tf.parameter.services.memory</td> <td>2G</td> <td>Memory resources requested by Tensorflow parameter services when Submarine submits model distributed training</td> </tr> <tr> <td>tf.parameter.services.num</td> <td></td> <td>Number of Tensorflow parameter services used by Submarine to submit model distributed training</td> </tr> <tr> <td>tf.tensorboard.enable</td> <td>true</td> <td>Create a separate Tensorboard for each user</td> </tr> <tr> <td>tf.worker.services.cpu</td> <td></td> <td>Submarine submits model resources for Tensorflow worker services when submitting model training</td> </tr> <tr> <td>tf.worker.services.docker.image</td> <td></td> <td>Submarine creates a mirror for Tensorflow worker services when submitting model distributed training</td> </tr> <tr> <td>tf.worker.services.gpu</td> <td></td> <td>Submarine submits GPU resources for Tensorflow worker services when submitting model training</td> </tr> <tr> <td>tf.worker.services.memory</td> <td></td> <td>Submarine submits model resources for Tensorflow worker services when submitting model training</td> </tr> <tr> <td>tf.worker.services.num</td> <td></td> <td>Number of Tensorflow worker services used by Submarine to submit model distributed training</td> </tr> <tr> <td>yarn.webapp.http.address</td> <td>http://hadoop:8088</td> <td>YARN web ui address</td> </tr> <tr> <td>zeppelin.interpreter.rpc.portRange</td> <td>29914</td> <td>You need to export this port in the SUBMARINE_INTERPRETER_DOCKER_IMAGE configuration image. RPC communication for Zeppelin Server and Submarine interpreter containers</td> </tr> <tr> <td>zeppelin.ipython.grpc.message_size</td> <td>33554432</td> <td>Message size setting for IPython grpc in Submarine interpreter container</td> </tr> <tr> <td>zeppelin.ipython.launch.timeout</td> <td>30000</td> <td>IPython execution timeout setting in Submarine interpreter container</td> </tr> <tr> <td>zeppelin.python</td> <td>python</td> <td>Execution path of python in Submarine interpreter container</td> </tr> <tr> <td>zeppelin.python.maxResult</td> <td>10000</td> <td>The maximum number of python execution results returned from the Submarine interpreter container</td> </tr> <tr> <td>zeppelin.python.useIPython</td> <td>false</td> <td>IPython is currently not supported and must be false</td> </tr> <tr> <td>zeppelin.submarine.auth.type</td> <td>simple/kerberos</td> <td>Has Hadoop turned on kerberos authentication?</td> </tr> </table> <h3>Docker images</h3> <p>The docker images file is stored in the <code>zeppelin/scripts/docker/submarine</code> directory.</p> <ol> <li><p>submarine interpreter cpu version</p></li> <li><p>submarine interpreter gpu version</p></li> <li><p>tensorflow 1.10 &amp; hadoop 3.1.2 cpu version</p></li> <li><p>tensorflow 1.10 &amp; hadoop 3.1.2 gpu version</p></li> </ol> <h2>Change Log</h2> <p><strong>0.1.0</strong> <em>(Zeppelin 0.9.0)</em> :</p> <ul> <li>Support distributed or standolone tensorflow model training.</li> <li>Support submarine interpreter running local.</li> <li>Support submarine interpreter running YARN.</li> <li>Support Docker on YARN-3.3.0, Plan compatible with lower versions of yarn.</li> </ul> <h2>Bugs &amp; Contacts</h2> <ul> <li><strong>Submarine interpreter BUG</strong> If you encounter a bug for this interpreter, please create a sub <strong>JIRA</strong> ticket on <a href="https://issues.apache.org/jira/browse/ZEPPELIN-3856">ZEPPELIN-3856</a>.</li> <li><strong>Submarine Running problem</strong> If you encounter a problem for Submarine runtime, please create a <strong>ISSUE</strong> on <a href="https://github.com/hadoopsubmarine/hadoop-submarine-ecosystem">hadoop-submarine-ecosystem</a>.</li> <li><strong>YARN Submarine BUG</strong> If you encounter a bug for Yarn Submarine, please create a <strong>JIRA</strong> ticket on <a href="https://issues.apache.org/jira/browse/SUBMARINE">SUBMARINE</a>.</li> </ul> <h2>Dependency</h2> <ol> <li><strong>YARN</strong> Submarine currently need to run on Hadoop 3.3+</li> </ol> <ul> <li>The hadoop version of the hadoop submarine team git repository is periodically submitted to the code repository of the hadoop.</li> <li>The version of the git repository for the hadoop submarine team will be faster than the hadoop version release cycle.</li> <li>You can use the hadoop version of the hadoop submarine team git repository.</li> </ul> <ol> <li><strong>Submarine runtime environment</strong> you can use Submarine-installer https://github.com/hadoopsubmarine, Deploy Docker and network environments.</li> </ol> <h2>More</h2> <p><strong>Hadoop Submarine Project</strong>: https://hadoop.apache.org/submarine <strong>Youtube Submarine Channel</strong>: https://www.youtube.com/channel/UC4JBt8Y8VJ0BW0IM9YpdCyQ</p> </div> </div> <hr> <footer> <!-- <p>&copy; 2021 The Apache Software Foundation</p>--> </footer> </div> <script type="text/javascript"> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-45176241-5', 'zeppelin.apache.org'); ga('require', 'linkid', 'linkid.js'); ga('send', 'pageview'); </script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10