CINXE.COM

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta property="og:title" content="Categories for Sequence Data" /> <meta property="og:url" content="https://www.ddbj.nig.ac.jp/documents/data-categories-e.html" /> <meta property="og:description" content="Categories for Sequence DataAcceptable data for DDBJFor the request of Primary e..." /> <meta property6="og:image" content="/images/thumbnail/logo_ddbj_fb.png" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>Categories for Sequence Data</title> <script async src="https://www.google-analytics.com/analytics.js"></script> <script src="https://code.jquery.com/jquery-3.5.0.js" integrity="sha256-r/AaFHrszJtwpe+tHyNi/XCfMxYpbsRg2Uqn0x3s2zc=" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.hoverintent/1.10.1/jquery.hoverIntent.min.js" integrity="sha512-gx3WTM6qxahpOC/hBNUvkdZARQ2ObXSp/m+jmsEN8ZNJPymj8/Jamf8+/3kJQY1RZA2DR+KQfT+b3JEB0r9YRg==" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/spin.js/4.1.0/spin.min.js" integrity="sha512-CbohqWjAgarTqRHcX1MbwkF2pujwbsCee1PABpnBWC+VqSldvlNEEI5+4OSsR/HbFQOFFpwY2YvZZNjBMxNnXg==" crossorigin="anonymous"></script> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery.colorbox/1.6.4/jquery.colorbox-min.js"></script> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery-deparam/0.5.3/jquery-deparam.min.js"></script> <script type="text/javascript" src="https://www.ddbj.nig.ac.jp/assets/js/jquery.trace.js"></script> <script type="text/javascript" src="https://www.ddbj.nig.ac.jp/assets/js/jquery.json_search.js"></script> <link rel="icon" href="https://www.ddbj.nig.ac.jp/assets/images/favicon_ddbj.ico"> <link rel="stylesheet" href="https://www.ddbj.nig.ac.jp/assets/css/colorbox.css" /> <link rel="stylesheet" href="https://www.ddbj.nig.ac.jp/assets/css/main.css" /> <link rel="alternate" type="application/rss+xml" title="My Site RSS" href="/feed.xml" /> <script src="https://www.ddbj.nig.ac.jp/assets/js/main.js"></script> </head> <body data-category="documents"> <script src="https://www.ddbj.nig.ac.jp/assets/js/ddbj_common_framework.js" id="DDBJ_common_framework" style="display: block; height: 40px;" data-bottom-menu="true" data-ddbj-home-page="true" data-search="true" ></script> <section class="top-news-view"> <div class="inner"> <ul> <li class="item"> <a href="https://www.ddbj.nig.ac.jp/news/en/2025-02-06-e">(14th February - mid-March) Suspension of the services due to the NIG Supercomputer replacement</a> </li> </ul> </div> </section> <div id="primary"> <header id="PageHeader"> <div class="inner"> <div class="page-title"> <h1 class="title">Categories for Sequence Data</h1> <nav class="breadcrumb-view"> <ul> <li> <a href="https://www.ddbj.nig.ac.jp/index-e.html">Home</a> </li> <li> <a href="https://www.ddbj.nig.ac.jp/documents/index-e.html">documents</a> </li> <li><a>Categories for Sequence Data</a></li> </ul> </nav> </div> </div> </header> <section id="NavigationAndMainView"> <div class="inner"> <div class="subview"> <nav id="TableOfContents" class="internal-link"></nav> </div> <section id="MainContentView" class="mainview"> <main class="md-content"> <h1 id="categories-for-sequence-data">Categories for Sequence Data</h1> <h2 id="accept">Acceptable data for DDBJ</h2> <p>For the request of <a href="/ddbj/submission.html#primary_entry">Primary entry</a> submission, in principle, DDBJ accepts any nucleotide sequences that are experimentally determined by submitters, but can not accept computational predicted and/or cited sequences.</p> <p>Even if your sequence is identical to previously reported sequence(s), on the condition that the sequence is independently determined, you can submit it as a “new” entry.</p> <p>DDBJ also acccepts an entry that is obtained by assembling primary entries publicized from DDBJ/ENA/GenBank of INSDC and/or is added annotation(s) by experimental or inferential method by submitter as <a href="/ddbj/tpa-e.html">TPA (third party data)</a>.</p> <p>However, some types of sequence data are <a href="/ddbj/sequence-e.html#not_acceptable">not acceptable for DDBJ</a>.</p> <p>When you are to publicize raw output data for your studies related to SNPs, <a href="/ddbj/wgs-e.html#acceptance">WGS</a>, <a href="/ddbj/tsa-e.html">transcriptome</a> and so on, we recommend you to contact with <a href="/dta/index-e.html">DDBJ Trace Archive</a>, or <a href="/dra/index-e.html">DDBJ Sequence Read Archive</a>, instead of DDBJ / ENA / GenBank.</p> <p>See <a href="/about/insdc-e.html#policy">Overview of International Nucleotide Sequence Databases Policies</a></p> <h3 id="submisson-of-the-data-including-identical-sequences-or-partially-duplicated-sequences">Submisson of the data including identical sequences or partially duplicated sequences</h3> <p>Basically, DDBJ accepts all sequence data that are independently determined, even though seqences are identical each other. For variation studies, DDBJ also accepts <a href="/ddbj/representative-sequence-e.html">submissions of representative data</a>.</p> <p>If you determine many sequences derived from the same indivisual, we strongly recommend to update sequence data submitted previously, rather than to submit new sequence data many times.<br /> However, since multicycle submissions for a single resource are required by any reasons; right for sequence data, phases of sequencing etc., DDBJ does not restrict them.</p> <h2 id="real">Sequencing Data</h2> <h3 id="ann">Annotated/assembled sequences</h3> <dl> <dt><a href="/ddbj/index.html">DDBJ</a></dt> <dd>Narrowly-defined DDBJ. DDBJ is a counterpart of GenBank and ENA (EMBL-bank) to accept sequences with feature annotation and to provide them in <a href="/ddbj/flat-file-e.html">flat file</a>.</dd> <dd>About the data in traditional DDBJ is classified, see <a href="#detail">Categories of Annotated/Assembled Data</a>.</dd> </dl> <div class="attention"> <p>If you are not sure to which database you should submit your data, see following sites;</p> <ul> <li><a href="/ddbj/genome-e.html">Steps of genome sequencing, categories of sequence data and their correspondences</a></li> <li><a href="/ddbj/transcriptome-e.html">Steps of transcriptome project, categories of sequence data and their correspondences</a></li> <li><a href="/ddbj/flat-file-e.html#Division">Division</a></li> <li><a href="#detail">Categories of Annotated/Assembled Data</a></li> </ul> <p>Using <a href="/ddbj/mss-e.html">Mass Submission System (MSS)</a>, the submitted nucleotide sequences are classified into one of the categories according to the descriptions of the <a href="/ddbj/file-format-e.html#annotation">DATATYPE, DIVISION, KEYWORD</a>.</p> </div> <h3 id="ngs">Sequencing and alignment data from next-generation sequencing platforms</h3> <dl> <dt><a href="/dra/index-e.html">DRA: DDBJ Sequence Read Archive</a></dt> <dd>Archival database for output data generated by next-generation sequencing machines including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System and others.</dd> <dt><a href="/dta/index-e.html">DTA: DDBJ Trace Archive</a></dt> <dd>Archival database of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects.</dd> </dl> <h3 id="fg">Functional genomics data</h3> <dl> <dt><a href="/gea/index-e.html">Genomic Expression Archive (GEA)</a></dt> <dd>A public database of functional genomics data such as gene expression, epigenetics and genotyping SNP array. Both microarray- and sequence-based data are accepted.</dd> </dl> <h2 id="project">Research project</h2> <dl> <dt><a href="/bioproject/index-e.html">BioProject</a></dt> <dd>Database to organize research projects and the corresponding data.<br /> It is required to submit to BioProject before sequence data submissions for <a href="/ddbj/tsa-e.html">TSA</a>, <a href="/ddbj/tls-e.html">TLS</a>, <a href="/ddbj/wgs-e.html">WGS</a> or <a href="/ddbj/genome-e.html">complete-genome scale</a> except viruses, plasmids and organelles.</dd> </dl> <h2 id="project">Biological sample</h2> <dl> <dt><a href="/biosample/index-e.html">BioSample</a></dt> <dd>Database to capture and store descriptive information about the biological source materials, or samples, used to generate experimental data.</dd> </dl> <h2 id="control">Human data requiring controlled-access</h2> <dl> <dt><a href="/jga/index-e.html">JGA: Japanese Genotype-phenotype Archive</a></dt> <dd>Database for permanent archiving and sharing of all types of individual-level genetic and de-identified phenotypic data resulting from biomedical research projects.</dd> </dl> <h2 id="detail">Annotated/Assembled Data Categories</h2> <h3 id="division"><a href="/ddbj/flat-file-e.html#Division">Division</a> conventional sequence data</h3> <h4 id="general"><strong>General data: classified by source species</strong></h4> <p>The data that are not classified into any categories described in the sections are called general data and belong here.<br />In principle, it is required for general data to have at least one source feature and at least one other <a href="/ddbj/file-format-e.html#biological_feature">Biological feature</a>.<br />Submitted sequences are automatically classified into one of the following divisions on the basis of the taxonomy of the source organisms.</p> <table> <thead> <tr> <th><em>Division</em></th> <th><em>Description</em></th> </tr> </thead> <tbody> <tr> <td>HUM</td> <td>Human</td> </tr> <tr> <td>PRI</td> <td>Primates (other than human)</td> </tr> <tr> <td>ROD</td> <td>Rodents</td> </tr> <tr> <td>MAM</td> <td>Mammals (other than primates or rodents)</td> </tr> <tr> <td>VRT</td> <td>Vertebrates (other than mammals)</td> </tr> <tr> <td>INV</td> <td>Invertebrates</td> </tr> <tr> <td>PLN</td> <td>Plants or fungi</td> </tr> <tr> <td>BCT</td> <td>Bacteria</td> </tr> <tr> <td>VRL</td> <td>Viruses</td> </tr> <tr> <td>PHG</td> <td>Phages</td> </tr> </tbody> </table> <h4 id="env"><strong>ENV/SYN: impossible to identify souce species, Environmental Samples and Synthetic Constructs</strong></h4> <p>Environmental samples and artificially constructed sequences are classified into <a href="/ddbj/env-e.html">ENV</a> and SYN division,respectively. <br />In principle, it is required for ENV and SYN data to have at least one source feature and at least one other <a href="/ddbj/file-format-e.html#biological_feature">Biological feature</a>.</p> <table> <thead> <tr> <th><em>Division</em></th> <th><em>Description</em></th> </tr> </thead> <tbody> <tr> <td><a href="/ddbj/env-e.html">ENV</a></td> <td>Sequences obtained via environmental sampling methods, direct PCR, DGGE, etc.<br />For ENV submissions, it is necessary to describe an <a href="/ddbj/qualifiers-e.html#environmental_sample">environmental_sample qualifier</a> on the source feature.</td> </tr> <tr> <td>SYN</td> <td>Synthetic constructs, sequences constructed by artificial manipulations<br />For SYN submissions, in general, the entry often has plural source features, so it should be cared.<br />See also <a href="/ddbj/example-e.html#E05">Description Examples of Sequence Data: E05) synthetic construct.</a>.</td> </tr> </tbody> </table> <h4 id="est"><strong>EST/GSS/HTC/HTG/STS: Divisions for Feasibility of Sequencing</strong></h4> <p>Sequences derived from high throughput projects, such as large scale analyses like EST dataset, ongoing whole genome scale sequencing, and so on, are classified into the following divisions, respectively.<br /> Basically only one source feature should be described for an entry in those divisions. <br /> In this regard, however, the entries including HTC or HTG division can have some <a href="/ddbj/file-format-e.html#biological_feature">Biological features</a> like as general data, if necessary.</p> <table> <thead> <tr> <th><em>Division</em></th> <th><em>Description</em></th> </tr> </thead> <tbody> <tr> <td><a href="/ddbj/est-e.html">EST</a></td> <td>Expressed sequence tags, cDNA sequences read short single pass.</td> </tr> <tr> <td><a href="/ddbj/gss-e.html">GSS</a></td> <td>Genome survey sequences, genome sequences read short single pass.</td> </tr> <tr> <td><a href="/ddbj/htc-e.html">HTC</a></td> <td>High throughput cDNA sequences from cDNA sequencing projects, not EST.<br />This division is to include unfinished high throughput cDNA sequences.</td> </tr> <tr> <td><a href="/ddbj/htg-e.html">HTG</a></td> <td>High throughput genomic sequences mainly from genome sequencing projects.<br />Unfinished HTG entries are classified into different levels, as follow;<br /><ul><li>phase0；Survey sequence generated for the purpose of library quality assessment and detection of overlaps with other clones before construction of piece contig(s)</li><li>phase1；Unfinished sequence having contigs that have NOT been ordered and oriented</li><li>phase2；Unfinished sequence having contigs that have been ordered and oriented</li></ul></td> </tr> </tbody> </table> <h3 id="data_type">Data type, bulk sequence data</h3> <h4 id="wgs"><strong>WGS: Fragment Sequences during WGS Assembling Process</strong></h4> <p>The large set of contigs from the proceeding genome project can be submitted as one of bulk sequence data, <a href="/ddbj/wgs-e.html">Whole Genome Shotgun (WGS)</a>.<br /> Please note that WGS data is different from others in its <a href="/ddbj/flat-file-e.html#Accession">format of accession number</a>. <br /> See also <a href="/ddbj/genome-e.html">Steps of genome sequencing, categories of sequence data and their correspondences</a>.</p> <h4 id="tsa"><strong>TSA: Transcriptome Shotgun Assembly</strong></h4> <p>Since 2008, we have accepted one of bulk sequence data, <a href="/ddbj/tsa-e.html">Transcriptome Shotgun Assembly (TSA)</a> categorized for assembled RNA transcript sequences.<br /> Basically only one source feature should be described for a TSA entry. <br /> TSA entries can have some <a href="/ddbj/file-format-e.html#biological_feature">Biological features</a> like as general data, if necessary. <br /> Please note that TSA data may be different from others in its <a href="/ddbj/flat-file-e.html#Accession">format of accession number</a>.<br /> See also <a href="/ddbj/transcriptome-e.html">steps of transcriptome project, categories of sequence data and their correspondences</a></p> <h4 id="tls"><strong>TLS: Targeted Locus Study</strong></h4> <p>Since 2016, we have accepted one of bulk sequence data, <a href="/ddbj/tls.html">Targeted Locus Study (TLS)</a>, including 16S rRNA or some other targeted loci mainly to be clustered into operational taxonomic unit.<br /> TLS entries can have some <a href="/ddbj/file-format-e.html#biological_feature">Biological features</a> like as general data. <br />Please note that TLS data is different from others in its <a href="/ddbj/flat-file-e.html#Accession">format of accession number</a>.</p> <h3 id="whom">Sequenced by whom</h3> <h4 id="tpa"><strong>TPA; Third Party Data and primary sequence data</strong></h4> <p><a href="/ddbj/tpa-e.html">TPA (Third Party Data)</a> is a nucleotide sequence data collection in which each entry is obtained by assembling primary entries publicized from DDBJ/ENA/GenBank, and/or <a href="/dra/index-e.html">Sequence Read Archive</a> with additional feature annotation(s) determined by experimental or inferential methods by TPA submitter.<br /> Those assemblies include two cases; one or more primary entries are used and newly determined sequence is contained.<br /> TPA sequence data should be submitted to DDBJ/ENA/GenBank as a part of the process to publish biological research for primary nucleotide sequences.<br /> See also <a href="/ddbj/tpa-table-e.html">TPA Submission Guidelines</a>.</p> </main> <aside class="related-pages"> <h2 class="caption">Related pages</h2> <div class="navigation"> <nav> <ul> <li> <a href="/documents/prefix-e.html">Prefix Letter List</a> </li> <li> <a href="/documents/accessions.html">Accession Number Assigned by INSD</a> </li> <li> <a href="/documents/data-release-policy-e.html">Principle of “Hold-Until-Published” data release</a> </li> </ul> </nav> </div> </aside> </section> </div> </section> </div> <footer></footer> <div id="back-top"></div> </body> </html>