CINXE.COM
Setup a standard job (headless) | Socrata
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>Setup a standard job (headless) | Socrata</title> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content=""> <meta name="author" content=""> <!-- Le Bootstrap--> <link href="//netdna.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" rel="stylesheet"> <!-- Font Awesome by Dave Gandy - http://fontawesome.io/ --> <link href="//netdna.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.css" rel="stylesheet"/> <!-- Featherlight Lightbox --> <link href="//cdn.rawgit.com/noelboss/featherlight/1.3.4/release/featherlight.min.css" rel="stylesheet"/> <!-- Google Web Fonts --> <link href="//fonts.googleapis.com/css?family=Ubuntu:bold" rel="stylesheet" type="text/css"/> <link href="//fonts.googleapis.com/css?family=Nobile" rel="stylesheet" type="text/css"/> <link href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.1.0/styles/default.min.css" rel="stylesheet"/> <!-- hljs highlighting --> <!-- CSS customizations --> <link href="/datasync/common/css/common.css" rel="stylesheet"/> <link href="/datasync/common/css/murphy.css" rel="stylesheet"/> <link href="/datasync/css/local.css" rel="stylesheet"/> <!-- HTML5 shim, for IE6-8 support of HTML5 elements --> <!--[if lt IE 9]> <script src="//html5shim.googlecode.com/svn/trunk/html5.js"></script> <![endif]--> <!-- Require.js is either the best thing to ever happen to me or my worst enemy --> <script src="/datasync/common/js/require.js"></script> <script> // Load common code and custom includes requirejs(['/datasync/common/js/common.js'], function(common) { var rel_require = function(script) { if(script.match(/^\/[^\/]/)) { script = '/datasync' + script; } requirejs([script]); }; // Site scripts // Page scripts }); </script> <!-- Browser Icons --> <link rel="apple-touch-icon-precomposed" sizes="144x144" href="/datasync/common/ico/apple-touch-icon-144-precomposed.png"/> <link rel="apple-touch-icon-precomposed" sizes="114x114" href="/datasync/common/ico/apple-touch-icon-114-precomposed.png"/> <link rel="apple-touch-icon-precomposed" sizes="72x72" href="/datasync/common/ico/apple-touch-icon-72-precomposed.png"/> <link rel="shortcut icon" href="/datasync/common/ico/favicon.png"/> <!-- Blog RSS --> <link type="application/atom+xml" rel="alternate" href="https://dev.socrata.com/feed.xml" title="Socrata Developer Blog"> </head> <body class="dev homepage setup-standard-job-headless"> <!-- Path: "guides/setup-standard-job-headless.md" --> <!-- URL: "/guides/setup-standard-job-headless.html" --> <!-- Header Nav --> <div class="navbar navbar-inverse navbar-fixed-top dev" role="navigation"> <!-- Current Site --> <div class="navbar-header"> <!-- Collapse Button --> <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#nav-collapse"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand dev" href="/datasync/"><i class="fa fa-refresh"></i> Socrata DataSync</a> </div> <div class="collapse navbar-collapse" id="nav-collapse"> <!-- Right side nav --> <form class="navbar-form navbar-right visible-lg" action="/datasync/search.html" method="GET" role="search"> <div class="form-group"> <input name="q" class="form-control search" type="text" placeholder="Search"> </div> </form> <!-- Nav elements --> <ul class="nav navbar-nav"> <li id="socrata-status" style="display: none"> <button type="button" class="btn" data-toggle="popover" data-placement="bottom" data-html="true"></button> </li> <li class="dropdown "> <ul class="dropdown-menu"> <li><a href="/datasync/">Getting Started</a></li> <li class="nav-header dropdown-header">General Guides</li> <li><a href="/datasync/guides/quick-start.html">Quick Start (GUI)</a></li> <li><a href="/datasync/guides/setup-standard-job.html">Setup a Standard Job (GUI)</a></li> <li><a href="/datasync/guides/setup-standard-job-headless.html">Setup a Standard Job (Headlessly)</a></li> <li><a href="/datasync/guides/setup-port-job.html">Setup a Port Job (GUI)</a></li> <li><a href="/datasync/guides/setup-port-job-headless.html">Setup a Port Job (Headlessly)</a></li> <li><a href="/datasync/guides/setup-gis-job.html">Setup a GIS Job (GUI)</a></li> <li><a href="/datasync/guides/setup-gis-job-headless.html">Setup a GIS Job (Headlessly)</a></li> <li class="nav-header dropdown-header">Additional Resources</li> <li><a href="/datasync/resources/control-config.html">Control File Config</a></li> <li><a href="/datasync/resources/preferences-config.html">Preferences Config</a></li> <li><a href="/datasync/resources/schedule-job.html">Scheduling a Job</a></li> <li><a href="/datasync/resources/checking-log.html">Checking Logs</a></li> <li><a href="/datasync/resources/faq-common-problems.html">FAQ / Common Problems</a></li> <li><a href="/datasync/resources/network-considerations.html">Network Considerations</a></li> <li><a href="/datasync/resources/conditions-restrictions.html">Data Conditions & Restrictions</a></li> <li><a href="/datasync/resources/using-map-fields-dialog.html">Using the Map Fields Dialog</a></li> <li class="nav-header dropdown-header">Developer Guides</li> <li><a href="/datasync/guides/datasync-library-sdk.html">DataSync Library/SDK (Java)</a></li> <li><a href="/datasync/guides/compiling-on-windows-eclipse.html">Compiling on Windows (with Eclipse)</a></li> <li><a href="/datasync/guides/compiling-with-maven.html">Compiling with Maven</a></li> </ul> </li> </ul> </div><!--/.nav-collapse --> </div> <div class="container-fluid content"> <h1 id="title">Setup a standard job (headless)</h1> <div class="row with-sidebar"> <div class="col-md-3 hidden-phone"> <div class="well sidebar-nav"> <ul class="nav nav-list sidebar "> <li><a href="/datasync/">Getting Started</a></li> <li class="nav-header dropdown-header">General Guides</li> <li><a href="/datasync/guides/quick-start.html">Quick Start (GUI)</a></li> <li><a href="/datasync/guides/setup-standard-job.html">Setup a Standard Job (GUI)</a></li> <li><a href="/datasync/guides/setup-standard-job-headless.html">Setup a Standard Job (Headlessly)</a></li> <li><a href="/datasync/guides/setup-port-job.html">Setup a Port Job (GUI)</a></li> <li><a href="/datasync/guides/setup-port-job-headless.html">Setup a Port Job (Headlessly)</a></li> <li><a href="/datasync/guides/setup-gis-job.html">Setup a GIS Job (GUI)</a></li> <li><a href="/datasync/guides/setup-gis-job-headless.html">Setup a GIS Job (Headlessly)</a></li> <li class="nav-header dropdown-header">Additional Resources</li> <li><a href="/datasync/resources/control-config.html">Control File Config</a></li> <li><a href="/datasync/resources/preferences-config.html">Preferences Config</a></li> <li><a href="/datasync/resources/schedule-job.html">Scheduling a Job</a></li> <li><a href="/datasync/resources/checking-log.html">Checking Logs</a></li> <li><a href="/datasync/resources/faq-common-problems.html">FAQ / Common Problems</a></li> <li><a href="/datasync/resources/network-considerations.html">Network Considerations</a></li> <li><a href="/datasync/resources/conditions-restrictions.html">Data Conditions & Restrictions</a></li> <li><a href="/datasync/resources/using-map-fields-dialog.html">Using the Map Fields Dialog</a></li> <li class="nav-header dropdown-header">Developer Guides</li> <li><a href="/datasync/guides/datasync-library-sdk.html">DataSync Library/SDK (Java)</a></li> <li><a href="/datasync/guides/compiling-on-windows-eclipse.html">Compiling on Windows (with Eclipse)</a></li> <li><a href="/datasync/guides/compiling-with-maven.html">Compiling with Maven</a></li> </ul> </div><!--/.well --> </div><!--/span--> <div class="col-md-9"> <p><em>NOTICE</em>: The guide below only pertains to DataSync versions 1.0 and higher.</p> <p><em>NOTICE</em>: Before using DataSync in headless mode, we recommend familiarizing yourself with DataSync through the UI. For information on using DataSync’s UI please see <a href="/datasync/guides/setup-standard-job.html">guide to setup a standard job (GUI)</a></p> <p>DataSync’s command line interface, or “headless mode,” enables easy integration of DataSync into ETL code or other software systems. DataSync jobs can be run from the command line in one of two ways: (1) passing job parameters as command-line arguments/flags or (2) running an .sij file that was previously saved using the user interface. This guide focuses on (1).</p> <h3 id="step-1-establish-your-configuration-eg-authentication-details">Step 1: Establish your configuration (e.g. authentication details)</h3> <p>Information about your domain, username, password and app token is required for all DataSync jobs. Note that the user running the job must have publisher rights on the dataset. A number of other global settings, such as logging and emailing preferences can also be configured. Please refer to the <a href="/datasync/resources/preferences-config.html">configuration guide</a> to establish your credentials and preferences.</p> <h3 id="step-2-configure-job-details">Step 2: Configure job details</h3> <p>For general help using DataSync in headless/command-line mode run:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar <DATASYNC_JAR> --help </code></pre></div></div> <p>To run a job execute the following command, replacing <..> with the appropriate values (flags explained below):</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar <DATASYNC_JAR> -c <CONFIG.json FILE> -f <FILE TO PUBLISH> -h <HAS HEADER ROW> -i <DATASET ID> -m <PUBLISH METHOD> -pf <PUBLISH VIA FTP> -ph <PUBLISH VIA HTTP> -cf <FTP CONTROL.json FILE> </code></pre></div></div> <p>Explanation of flags: <code class="language-plaintext highlighter-rouge">*</code> = required flag</p> <table> <thead> <tr> <th>Flag - Short Name</th> <th>Flag - Long Name</th> <th>Example Values</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td style="text-align: left;">-c</td> <td style="text-align: left;">--config</td> <td style="text-align: left;">/Users/home/config.json</td> <td style="text-align: left;">Points to the config.json file you created in Step 1</td> </tr> <tr> <td style="text-align: left;">-f <code>*</code></td> <td style="text-align: left;">--fileToPublish</td> <td style="text-align: left;">/Users/home/data_file.csv</td> <td style="text-align: left;">CSV or TSV file to publish</td> </tr> <tr> <td style="text-align: left;">-h</td> <td style="text-align: left;">--fileToPublishHasHeaderRow</td> <td style="text-align: left;">true</td> <td style="text-align: left;">Set this to <code>true</code> if the file to publish has a header row, otherwise set it to <code>false</code></td> </tr> <tr> <td style="text-align: left;">-i <code>*</code></td> <td style="text-align: left;">--datasetID</td> <td style="text-align: left;">m985-ywaw</td> <td style="text-align: left;">The <a href="http://socrata.github.io/datasync/resources/fac-common-problems.html#what-is-the-id-of-my-dataset">dataset identifier</a> to publish to.</td> </tr> <tr> <td style="text-align: left;">-m</td> <td style="text-align: left;">--publishMethod</td> <td style="text-align: left;">replace</td> <td style="text-align: left;">Specifies the publish method to use (<code>replace</code>, <code>upsert</code>, <code>append</code>, and <code>delete</code> are the only acceptable values, for details on the publishing methods refer to Step 3 of the <a href="http://socrata.github.io/datasync/guides/setup-standard-job.html">Setup a Standard Job (GUI)</a></td> </tr> <tr> <td style="text-align: left;">-ph</td> <td style="text-align: left;">--publishViaHttp</td> <td style="text-align: left;">true</td> <td style="text-align: left;">Set this to <code>true</code> to use HTTP (rather than FTP or Soda2); This is the preferred method because is highly efficient and can reliably handle very large files (1 million+ rows). If <code>false</code> and --publishViaFTP is <code>false</code>, perform the dataset update using Soda2. (false is the default value)</td> </tr> <tr> <td style="text-align: left;">-pf</td> <td style="text-align: left;">--publishViaFTP</td> <td style="text-align: left;">true</td> <td style="text-align: left;">Set this to <code>true</code> to use FTP (currently only works for replace). If <code>false</code> and --publishViaHttp is <code>false</code>,perform the dataset update using Soda2. (false is the default value)</td> </tr> <tr> <td style="text-align: left;">-cf</td> <td style="text-align: left;">--pathToControlFile</td> <td style="text-align: left;">/Users/home/control.json</td> <td style="text-align: left;">Specifies a <a href="http://socrata.github.io/datasync/resources/control-config.html">control file></a> that configures HTTP and ‘replace via FTP’ jobs. Only required when --publishViaHttp or --publishViaFTP is set to <code>true</code>. When this flag is set the --fileToPublishHasHeaderRow and --publishMethod flags are overridden by the settings in the supplied control file.</td> </tr> <tr> <td style="text-align: left;">-t <code>*</code></td> <td style="text-align: left;">--jobType</td> <td style="text-align: left;">LoadPreferences</td> <td style="text-align: left;">Specifies the type of job to run (<code>IntegrationJob</code>, <code>LoadPreferences</code> and <code>PortJob</code> are the only acceptable values)</td> </tr> </tbody> </table> <h3 id="step-3-job-output">Step 3: Job Output</h3> <p>Information about the status of the job will be output to STDOUT. If the job runs successfully a ‘Success’ message will be output to STDOUT and the job will exit with a normal status code (0). If there was a problem running the job a detailed error message will be output to STDERR and the program will exit with an error status code (1). You can capture the exit code to configure error handling logic within your ETL process.</p> <h3 id="complete-example-job">Complete example job</h3> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar <DATASYNC_JAR> -c config.json -f business_licenses_2014-02-10.csv -h true -i 7tgi-grrk -m replace -pf true -sc control.json </code></pre></div></div> <p>config.json contents:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{ "domain": "https://opendata.socrata.com", "username": "publisher@opendata.socrata.com", "password": "secret_password", "appToken": "fPsJQRDYN9KqZOgEZWyjoa1SG", "adminEmail": "", "emailUponError": "false", "logDatasetID": "", "outgoingMailServer": "", "smtpPort": "", "sslPort": "", "smtpUsername": "", "smtpPassword": "" } </code></pre></div></div> <p>control.json contents:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{ "action" : "Replace", "csv" : { "useSocrataGeocoding" : true, "columns" : null, "skip" : 0, "fixedTimestampFormat" : ["ISO8601","MM/dd/yy","MM/dd/yyyy"], "floatingTimestampFormat" : ["ISO8601","MM/dd/yy","MM/dd/yyyy"], "timezone" : "UTC", "separator" : ",", "quote" : "\"", "encoding" : "utf-8", "emptyTextIsNull" : true, "trimWhitespace" : true, "trimServerWhitespace" : true, "overrides" : {} } } </code></pre></div></div> <p><strong>Running a previously saved job file (.sij file)</strong></p> <p>Simply run:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar <DATASYNC_JAR> <.sij FILE TO RUN> </code></pre></div></div> <p>For example:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar D<DATASYNC_JAR> /Users/john/Desktop/business_licenses.sij </code></pre></div></div> <p><strong>NOTE:</strong> you can also create an .sij file directly (rather than saving a job using the DataSync UI) which stores the job details in JSON format. Here is an example:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{ "datasetID" : "2bw7-dr67", "fileToPublish" : "/Users/john/Desktop/building_permits_2014-12-05.csv", "publishMethod" : "replace", "fileToPublishHasHeaderRow" : true, “publishViaFTP” : true, “pathToFTPControlFile” : “/Users/john/Desktop/building_permits_control.json” } </code></pre></div></div> <div class="related-pages"></div> </div><!--/span--> </div><!--/row--> <footer class="muted"> <hr /> <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US"> <img alt="Creative Commons License" src="https://licensebuttons.net/l/by-nc-sa/3.0/80x15.png" /> </a> Licensed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://www.socrata.com" property="cc:attributionName" rel="cc:attributionURL">Socrata</a> under <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US">CC BY-NC-SA 3.0</a>. Learn how <a href="/datasync/contributing.html">you can contribute!</a> </footer> </div> <!-- /container --> </body> </html>