CINXE.COM
DDBJ flat file format
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta property="og:title" content="DDBJ flat file format" /> <meta property="og:url" content="https://www.ddbj.nig.ac.jp/ddbj/flat-file-e.html" /> <meta property="og:description" content="DDBJ (DNA Data Bank of Japan) shares annotated/assembled nucleotide sequence dat..." /> <meta property6="og:image" content="/images/thumbnail/logo_ddbj_fb.png" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>DDBJ flat file format</title> <script async src="https://www.google-analytics.com/analytics.js"></script> <script src="https://code.jquery.com/jquery-3.5.0.js" integrity="sha256-r/AaFHrszJtwpe+tHyNi/XCfMxYpbsRg2Uqn0x3s2zc=" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.hoverintent/1.10.1/jquery.hoverIntent.min.js" integrity="sha512-gx3WTM6qxahpOC/hBNUvkdZARQ2ObXSp/m+jmsEN8ZNJPymj8/Jamf8+/3kJQY1RZA2DR+KQfT+b3JEB0r9YRg==" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/spin.js/4.1.0/spin.min.js" integrity="sha512-CbohqWjAgarTqRHcX1MbwkF2pujwbsCee1PABpnBWC+VqSldvlNEEI5+4OSsR/HbFQOFFpwY2YvZZNjBMxNnXg==" crossorigin="anonymous"></script> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery.colorbox/1.6.4/jquery.colorbox-min.js"></script> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery-deparam/0.5.3/jquery-deparam.min.js"></script> <script type="text/javascript" src="https://www.ddbj.nig.ac.jp/assets/js/jquery.trace.js"></script> <script type="text/javascript" src="https://www.ddbj.nig.ac.jp/assets/js/jquery.json_search.js"></script> <link rel="icon" href="https://www.ddbj.nig.ac.jp/assets/images/favicon_ddbj.ico"> <link rel="stylesheet" href="https://www.ddbj.nig.ac.jp/assets/css/colorbox.css" /> <link rel="stylesheet" href="https://www.ddbj.nig.ac.jp/assets/css/main.css" /> <link rel="alternate" type="application/rss+xml" title="My Site RSS" href="/feed.xml" /> <script src="https://www.ddbj.nig.ac.jp/assets/js/main.js"></script> </head> <body data-category="ddbj"> <script src="https://www.ddbj.nig.ac.jp/assets/js/ddbj_common_framework.js" id="DDBJ_common_framework" style="display: block; height: 40px;" data-bottom-menu="true" data-ddbj-home-page="true" data-search="true" ></script> <section class="top-news-view"> <div class="inner"> <ul> <li class="item"> <a href="https://www.ddbj.nig.ac.jp/news/en/2024-10-22-e">On Cyber Threats against DDBJ, a node of the International Nucleotide Sequence Database Collaboration</a> </li> <li class="item"> <a href="https://www.ddbj.nig.ac.jp/news/en/2024-11-22-e">(27st November 9:00-November 28th 12:00)Announcement of D-way/MSS suspension</a> </li> </ul> </div> </section> <div id="primary"> <header id="PageHeader"> <div class="inner"> <div class="page-title"> <p class="title -normal">DDBJ Annotated/Assembled Sequences</p> </div> <nav class="tab-menu-view"> <ul class="tabmenucontainer"> <li class=""> <a href="/ddbj/index-e.html">Home</a> </li> <li class=" -haschild"> <a href="/ddbj/submission-e.html">Submission</a> <ul> <li> <a href="/ddbj/submission-e.html">Before Submission</a> </li> <li> <a href="/ddbj/web-submission-e.html">Web submission</a> </li> <li> <a href="/ddbj/mss-e.html">Mass Submission</a> </li> <li> <a href="/ddbj/update-e.html">Data Update</a> </li> </ul> </li> <li class=" -haschild"> <a href="http://ddbj.nig.ac.jp/arsa/?lang=en">Search</a> <ul> <li> <a href="http://getentry.ddbj.nig.ac.jp/top-e.html">getentry</a> </li> <li> <a href="http://ddbj.nig.ac.jp/arsa/?lang=en">ARSA</a> </li> </ul> </li> <li class=" -haschild -current"> <a href="/ddbj/flat-file-e.html">Flat file</a> <ul> <li> <a href="/ddbj/feature-table-e.html">Feature Table</a> </li> <li> <a href="/ddbj/features-e.html">Feature key</a> </li> <li> <a href="/ddbj/qualifiers-e.html">Qualifier key</a> </li> <li> <a href="/ddbj/sequence-e.html">Nucleotide Sequences</a> </li> <li> <a href="/ddbj/organism-e.html">Organism qualifier</a> </li> <li> <a href="/ddbj/identifiers-e.html">Identifiers</a> </li> <li> <a href="/ddbj/location-e.html">Description of Location</a> </li> <li> <a href="/ddbj/cds-e.html">Protein Coding Sequence</a> </li> <li> <a href="/ddbj/geneticcode-e.html">The Genetic Codes</a> </li> <li> <a href="/ddbj/code-e.html">Codes Used in Sequence Description</a> </li> <li> <a href="/ddbj/example-e.html">Description Examples of Sequence Data</a> </li> </ul> </li> <li class=" -haschild"> <a href="/ddbj/data-categories-e.html">Data categories</a> <ul> <li> <a href="/ddbj/genome-e.html">Data Submission from Genome Project</a> </li> <li> <a href="/ddbj/pseudohaplotype-e.html">Pseudohaplotype</a> </li> <li> <a href="/ddbj/wgs-e.html">WGS</a> </li> <li> <a href="/ddbj/finished_level_genome-e.html">Finished level genomic sequences</a> </li> <li> <a href="/ddbj/metagenome-assembly-e.html">Metagenome Assembly</a> </li> <li> <a href="/ddbj/single-amplified-genome-e.html">Single amplified genome</a> </li> <li> <a href="/ddbj/htg-e.html">HTG</a> </li> <li> <a href="/ddbj/environmental-e.html">Environmental sample</a> </li> <li> <a href="/ddbj/env-e.html">ENV</a> </li> <li> <a href="/ddbj/tls-e.html">TLS</a> </li> <li> <a href="/ddbj/transcriptome-e.html">Data Submission from Transcriptome Project</a> </li> <li> <a href="/ddbj/tsa-e.html">TSA</a> </li> <li> <a href="/ddbj/est-e.html">EST</a> </li> <li> <a href="/ddbj/htc-e.html">HTC</a> </li> <li> <a href="/ddbj/tpa-e.html">Third Party Data (TPA)</a> </li> </ul> </li> <li class=""> <a href="/faq/en/index-e.html?tag=ddbj">FAQ</a> </li> <li class=" -haschild"> <a href="/ddbj/index-e.html">Other</a> <ul> <li> <a href="/ddbj/patent-data-e.html">Patent</a> </li> <li> <a href="/ddbj/mga-e.html">MGA</a> </li> </ul> </li> </ul> </nav> </div> </header> <section id="NavigationAndMainView"> <div class="inner"> <div class="subview"> <nav id="TableOfContents" class="internal-link"> </nav> </div> <section id="MainContentView" class="mainview"> <header class="header"> <nav class="breadcrumb-view"> <ul> <li> <a href="https://www.ddbj.nig.ac.jp/index-e.html">Home</a> </li> <li> <a href="https://www.ddbj.nig.ac.jp/ddbj/index-e.html">ddbj</a> </li> <li><a>DDBJ flat file format</a></li> </ul> </nav> <h1 class="title">DDBJ flat file format</h1> </header> <main class="md-content"> <p>DDBJ (DNA Data Bank of Japan) shares annotated/assembled nucleotide sequence data as a member of <a href="/about/insdc-e.html">INSDC</a> (International Nucleotide Sequence Database Collaboration). <br /> For the sharing purpose, DDBJ collects the nucleotide sequences experimentally determined, and constructs the database in accordance with the rule agreed with INSDC.</p> <p>The database also includes the data from <a href="https://www.jpo.go.jp/e/index.html">Japan Patent Office</a> (JPO), <a href="https://www.epo.org/">European Patent Office</a> (EPO), <a href="https://www.uspto.gov/">United States Patent and Trademark Office</a> (USPTO), and <a href="https://www.kipo.go.kr/">Korean Intellectual Property Office</a> (KIPO).</p> <p>The database is a collection of “entry” which is the unit of the data.<br /> The entry submitted to DDBJ is processed and publicized according to the DDBJ format for distribution (flat file).<br /> The flat file includes the sequence and the information of submitters, references, source organisms, and “feature” information, etc.<br /> The “feature” is defined by <a href="/ddbj/feature-table-e.html">The DDBJ/ENA/GenBank Feature Table Definition</a> to describe the biological nature such as gene function and other property of the nucleotide sequence.</p> <h2 id="The_virtual_sample_of_DDBJ_flat_file">The virtual sample of DDBJ flat file</h2> <pre><code><a id="LocusA" href="/ddbj/flat-file#LocusB">LOCUS</a> <a id="LocusNameA" href="/ddbj/flat-file#LocusNameB">AB000000</a> <a id="SequenceLengthA" href="/ddbj/flat-file#SequenceLengthB">450 bp</a> <a id="MoleculeTypeA" href="/ddbj/flat-file#MoleculeTypeB">mRNA</a> <a id="MoleculeFormA" href="/ddbj/flat-file#MoleculeFormB">linear</a> <a id="DivisionA" href="/ddbj/flat-file#DivisionB">HUM</a> <a id="ModificationDateA" href="/ddbj/flat-file#ModificationDateB">01-JUN-2009</a> <a id="DefinitionA" href="/ddbj/flat-file#DefinitionB">DEFINITION</a> Homo sapiens GAPD mRNA for glyceraldehyde-3-phosphate dehydrogenase, partial cds. <a id="AccessionA" href="/ddbj/flat-file#AccessionB">ACCESSION</a> AB000000 <a id="VersionA" href="/ddbj/flat-file#VersionB">VERSION</a> AB000000.1 <a id="KeywordsA" href="/ddbj/flat-file#KeywordsB">KEYWORDS</a> . <a id="SourceA" href="/ddbj/flat-file#SourceB">SOURCE</a> Homo sapiens (human) <a id="OrganismA" href="/ddbj/flat-file#OrganismB">ORGANISM</a> Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. <a id="Reference1A" href="/ddbj/flat-file#Reference1B">REFERENCE 1</a> (bases 1 to 450) <a id="AuthorsA" href="/ddbj/flat-file#AuthorsB">AUTHORS</a> Mishima,H. and Shizuoka,T. <a id="TitleA" href="/ddbj/flat-file#TitleB">TITLE</a> Direct Submission <a id="JournalA" href="/ddbj/flat-file#JournalB">JOURNAL</a> Submitted (30-NOV-2008) to the DDBJ/EMBL/GenBank databases. Contact:Hanako Mishima National Institute of Genetics, DNA Data Bank of Japan; Yata 1111, Mishima, Shizuoka 411-8540, Japan <a id="Reference2A" href="/ddbj/flat-file#Reference2B">REFERENCE 2</a> AUTHORS Mishima,H., Shizuoka,T. and Fuji,I. TITLE Glyceraldehyde-3-phosphate dehydrogenase expressed in human liver JOURNAL Unpublished (2009) <a id="CommentA" href="/ddbj/flat-file#CommentB">COMMENT</a> Human cDNA sequencing project. <a id="FeaturesA" href="/ddbj/flat-file#FeaturesB">FEATURES</a> Location/Qualifiers <a id="FeaturesSourceA" href="/ddbj/flat-file#FeaturesSourceB">source</a> <a href="/ddbj/location-e.html">1..450</a> /<a href="/ddbj/qualifiers-e.html#chromosome">chromosome</a>="12" /<a href="/ddbj/qualifiers-e.html#clone">clone</a>="GT200015" /<a href="/ddbj/qualifiers-e.html#collection_date">collection_date</a>="2007" /<a href="/ddbj/qualifiers-e.html#db_xref">db_xref</a>="taxon:9606" /<a href="/ddbj/qualifiers-e.html#geo_loc_name">geo_loc_name</a>="Japan" /<a href="/ddbj/qualifiers-e.html#map">map</a>="12p13" /<a href="/ddbj/qualifiers-e.html#mol_type">mol_type</a>="mRNA" /<a href="/ddbj/qualifiers-e.html#organism">organism</a>="Homo sapiens" /<a href="/ddbj/qualifiers-e.html#tissue_type">tissue_type</a>="liver" <a id="CDSA" href="#CDSB">CDS</a> <a href="/ddbj/location-e.html">86..>450</a> /<a href="/ddbj/qualifiers-e.html#codon_start">codon_start</a>=1 /<a href="/ddbj/qualifiers-e.html#gene">gene</a>="GAPD" /<a href="/ddbj/qualifiers-e.html#product">product</a>="glyceraldehyde-3-phosphate dehydrogenase" /<a href="/ddbj/qualifiers-e.html#protein_id">protein_id</a>="BAA12345.1" /<a href="/ddbj/qualifiers-e.html#transl_table">transl_table</a>=1 /<a href="/ddbj/qualifiers-e.html#translation">translation</a>="MAKIKIGINGFGRIGRLVARVALQSDDVELVAVNDPFITTDYMT YMFKYDTVHGQWKHHEVKVKDSKTLLFGEKEVTVFGCRNPKEIPWGETSAEFVVEYTG VFTDKDKAVAQLKGGAKKV" <a id="BaseCountA" href="#BaseCountB">BASE COUNT</a> 102 a 119 c 131 g 98 t <a id="OriginA" href="#OriginB">ORIGIN</a> 1 cccacgcgtc cggtcgcatc gcacttgtag ctctcgaccc ccgcatctca tccctcctct 61 cgcttagttc agatcgaaat cgcaaatggc gaagattaag atcgggatca atgggttcgg 121 gaggatcggg aggctcgtgg ccagggtggc cctgcagagc gacgacgtcg agctcgtcgc 181 cgtcaacgac cccttcatca ccaccgacta catgacatac atgttcaagt atgacactgt 241 gcacggccag tggaagcatc atgaggttaa ggtgaaggac tccaagaccc ttctcttcgg 301 tgagaaggag gtcaccgtgt tcggctgcag gaaccctaag gagatcccat ggggtgagac 361 tagcgctgag tttgttgtgg agtacactgg tgttttcact gacaaggaca aggccgttgc 421 tcaacttaag ggtggtgcta agaaggtctg <a id="EndA" href="/ddbj/flat-file#EndB">//</a></code></pre> <p>Flat file displays the information provided by submitters with DDBJ format.<br /> Even when the sequences are similar, the contents on the flat files may vary according to the submitter’s research aim etc.<br /> Please take that point into consideration when you refer search results.</p> <h2 id="FIELD_COMMENTS">FIELD COMMENTS</h2> <div id="LocusB"> <h3><a href="#LocusA">LOCUS</a></h3> </div> <p>locus name, sequence length, molecule type, molecular form, division, the date of last release</p> <div id="LocusNameB"> <h4><a href="#LocusNameA">Locus Name</a></h4> </div> <p>Locus name is a unique ID of the entry in the database. In DDBJ, since July 1996, the locus name has been assigned the same as<a href="#AccessionB">accession number</a>.</p> <div id="SequenceLengthB"> <h4><a href="#SequenceLengthA">Length of Sequence</a></h4> </div> <p>Notice: No information is available on the Master record of MGA data.</p> <div id="MoleculeTypeB"> <h4><a href="#MoleculeTypeA">Molecule Type</a></h4> </div> <p>According to the value of /<a href="/ddbj/qualifiers-e.html#mol_type">mol_type</a> qqualifier for source feature, it is described as DNA, RNA, mRNA, rRNA, tRNA, or cRNA.</p> <div id="MoleculeFormB"> <h4><a href="#MoleculeFormA">Molecular Form</a></h4> </div> <p>This column indicates whether molecular form of nucleotide sequence is “linear” or “circular”. If the entry is the full length of circular form, “circular” is appeared.</p> <div id="DivisionB"> <h4><a href="#DivisionA">Division</a></h4> </div> <p>DDBJ classifies entries into 21 divisions as below;</p> <p>a: taxonomic divisions</p> <table> <tbody> <tr> <td>HUM</td> <td>human</td> </tr> <tr> <td>PRI</td> <td>primates (other than human)</td> </tr> <tr> <td>ROD</td> <td>rodents</td> </tr> <tr> <td>MAM</td> <td>mammals (other than primates and rodents)</td> </tr> <tr> <td>VRT</td> <td>vertebrates (other than mammals)</td> </tr> <tr> <td>INV</td> <td>invertebrates (animals other than vertebrates)</td> </tr> <tr> <td>PLN</td> <td>plants, fungi, plastids (eukaryotes other than animals)</td> </tr> <tr> <td>BCT</td> <td>bacteria (including both Eubacteria and Archaea)</td> </tr> <tr> <td>VRL</td> <td>viruses</td> </tr> <tr> <td>PHG</td> <td>bacteriophages</td> </tr> </tbody> </table> <p>b: other divisions</p> <table> <tbody> <tr> <td><a href="/ddbj/patent-data-e.html">PAT</a></td> <td>sequence data related to patent application<br />The data those which Japan Patent Office (JPO), United States Patent and Trademark Office (USPTO), European Patent Office (EPO), and Korean Intellectual Property Office (KIPO) collected, processed and released.</td> </tr> <tr> <td><a href="/ddbj/env-e.html">ENV</a></td> <td>sequences obtained via environmental sampling methods</td> </tr> <tr> <td>SYN</td> <td>synthetic constructs; artificially constructed sequences</td> </tr> <tr> <td><a href="/ddbj/est-e.html">EST</a></td> <td>expressed sequence tags; short single pass cDNA sequences</td> </tr> <tr> <td><a href="/ddbj/tsa-e.html">TSA</a></td> <td>transcriptome shotgun assemblies; assembled mRNA sequences</td> </tr> <tr> <td><a href="/ddbj/gss-e.html">GSS</a></td> <td>genome survey sequences; short single pass genomic sequences</td> </tr> <tr> <td><a href="/ddbj/htc-e.html">HTC</a></td> <td>high throughput cDNA sequences;<br />The sequence submitted from cDNA sequencing projects except for EST. This division is to include unfinished high throughput cDNA sequences, each of which has 5’UTR and 3’UTR at both ends and part of a coding region. The sequence may also include introns. When the sequence becomes finished later, it moves to the corresponding taxonomic division.</td> </tr> <tr> <td><a href="/ddbj/htg-e.html">HTG</a></td> <td>high throughput genomic sequences;<br />The sequence submitted mainly from genome sequencing projects which regarded a clone as a sequencing unit.</td> </tr> <tr> <td>STS</td> <td><span class="red">DDBJ currently terminated accepting new submissions. </span><br />sequence tagged sites<br />The tag site for genome sequencing. The information of chromosome, map, PCR_condition is necessary for this division.</td> </tr> <tr> <td>UNA</td> <td><span class="red">DDBJ currently terminated accepting new submissions. </span><br />the data not annotated</td> </tr> <tr> <td><a href="con-e.html">CON</a></td> <td><span class="red">DDBJ currently terminated accepting new submissions. </span><br />Contig / Constructed<br />To conjugate a series of entries, such as those submitted from a genome project, each of the three data banks constructs an entry and assign an accession number to a large scale sequence dataset. Such entries are classified into the CON division. The entry in the CON division has the information of joined accession numbers instead of the sequence data. The corresponding entries of the CON entry have been submitted to other divisions.</td> </tr> </tbody> </table> <div id="ModificationDateB"> <h4><a href="#ModificationDateA">The date of last release</a></h4> </div> <p>The current publicized date is described. If the entry is updated and reopened to public site, this date will be changed.</p> <div id="DefinitionB"> <h3><a href="#DefinitionA">DEFINITION</a></h3> </div> <p>The definition briefly describes the information of gene(s). “DEFINITION” is constructed by each of the three data banks in accordance with standard rules in principle.However, in the case of EST or GSS submission using Mass Submission System, DDBJ will sometimes ask submitters to construct “DEFINITION”.</p> <p>[Sample]</p> <dl> <dt>Complete sequence of maize catalase coding gene</dt> <dd></dd> </dl> <pre class="code flat-file"><code> DEFINITION Zea mays Cat3 gene for catalase, complete cds. </code></pre> <p>Format: [organism name] [gene name] gene for [product name], complete cds.</p> <ul> <li>organism name: The scientific name is indicated as the organism name, in principle.</li> <li>gene name: the symbol of the gene</li> <li>product name: the general name of product</li> <li>complete cds: this coding sequence is complete</li> </ul> <dl> <dt>Partial sequence of human glyceraldehyde-3-phosphate dehydrogenase coding cDNA</dt> <dd></dd> </dl> <pre class="code flat-file"><code> DEFINITION Homo sapiens mRNA for glyceraldehyde-3-phosphate dehydrogenase, partial cds. </code></pre> <dl> <dt>Format: [organism name] mRNA for [product name], partial cds.</dt> <dd> <ul> <li>partial cds: this protein coding sequence is partial</li> <li>The gene name is omitted, because the submitter did not describe.</li> </ul> </dd> <dt>Partial sequence of Bacillus 16S rRNA</dt> <dd></dd> </dl> <pre class="code flat-file"><code> DEFINITION Bacillus sp. AZ25 gene for 16S rRNA, partial sequence. </code></pre> <p>Format: [organism name] [strain name] gene for [product name], partial sequence.</p> <ul> <li>In cases of unidentified species, comparison of intraspecies, and so on, describe name of strain, isolate or some, as identifier.</li> <li>partial sequence: this sequence is part of 16S rRNA.</li> </ul> <dl> <dt>Multiple CDS of rat mitochondrial DNA</dt> <dd></dd> </dl> <pre class="code flat-file"><code> DEFINITION Rattus norvegicus mitochondrial genes for cytochrome c oxidase subunit II, ATPase subunit 6, cytochrome c oxidase subunit III, partial and complete cds. </code></pre> <dl> <dt>Format: [organism name] [gene name 1], [gene name 2], …. genes for [product name 1], [product name 2], ….. , partial and complete cds.</dt> <dd> <ul> <li>The gene names and/or product names are subsequently described from 5’to 3’ end.</li> <li>“partial, complete and partial cds” is abbreviated to “partial and complete cds”.</li> <li>If some genes have only gene names or product names, only gene name or product name is described principally.</li> <li>If the “DEFINITION” is too long, some information, such as map position, is described instead of the gene or product names.</li> <li>Sometimes gene cluster or operon name is described, if it is considered reasonable.</li> </ul> </dd> <dt>EST data of human liver 3’ end</dt> <dd></dd> </dl> <pre class="code flat-file"><code> DEFINITION Homo sapiens cDNA, clone:ABC123, 3' end, expressed in liver. </code></pre> <dl> <dt>Format: [organism name] cDNA, clone:[clone name], [other information].</dt> <dd> <ul> <li>The clone name is mandatory.</li> </ul> </dd> <dt>GSS data of mouse chromosome 1q</dt> <dd></dd> </dl> <pre class="code flat-file"><code> DEFINITION Mus musculus DNA, clone:1H11A14, 1q region. </code></pre> <p>Format: [organism name] DNA, clone:[clone name], [other information].</p> <ul> <li>The clone name is mandatory.</li> </ul> <dl> <dt>TPA (Third Party Data) of human GAPD</dt> <dd></dd> </dl> <pre class="code flat-file"><code> DEFINITION TPA_exp: Homo sapiens GAPD mRNA forglyceraldehyde-3-phosphate dehydrogenase, complete cds. </code></pre> <dl> <dt>Format: [TPA header]: [organism name] [gene name] mRNA for [product name], complete cds.</dt> <dd> <ul> <li>In the case of <a href="/ddbj/tpa-e.html">TPA (Third Party data)</a>, either of “TPA_exp” (for TPA:experimental) or “TPA_inf” (for TPA:inferential) is described at the beginning of DEFINITION.</li> </ul> </dd> </dl> <div id="AccessionB"> <h3><a href="#AccessionA">ACCESSION</a></h3> </div> <p>This line shows accession number of the entry data.</p> <dl> <dt>Conventional sequence data</dt> <dd>A unique accession number is issued to the data submitter by each of the three data banks. The accession number is composed of 1 alphabet character and 5 digits (ex. A12345) or 2 alphabet characters and 6 digits (ex. AB123456). The former style was used in 1980s, but later the latter style was introduced because of data explosion. <br />The alphabet part is called “prefix”. Please refer <a href="/documents/prefix-e.html">the prefix list</a>.</dd> <dd> <p>If multiple entries are united to an entry, or if an entry is extensively modified after the submission, the responsible data banks may assign a new accession number to it. In these cases, the new accession number is called the primary accession number, and the old accession number(s) is/are called the secondary accession number(s). In the flat file, the primary accession number is indicated first, then the secondary accession number(s) follows. You can find the same updated entry with both the primary and the secondary accession numbers.</p> </dd> <dd></dd> </dl> <pre class="code flat-file"><code> ACCESSION AB999999 AB888888 AB777777 </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">AB999999</code></td> <td>primary accession number</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">AB888888 AB777777</code></td> <td>secondary accession number</td> </tr> </tbody> </table> <dl> <dt>Bulk sequence data; WGS, TSA, TLS</dt> <dd>The accession number assigned to each entry of <a href="/ddbj/wgs-e.html">WGS</a>, <a href="/ddbj/tsa-e.html">TSA</a> and TLS data consists of 4 alphabet characters and 8 (sometimes 9 or 10, if necessary) digits.<br />The alphabet part is called <a href="/documents/prefix-e.html">prefix</a>.<br />See also <a href="/documents/prefix-e.html#large">For Large Scale Data (four prefix)</a>.<br />Example:ZZZZ01000001 <table> <tbody> <tr> <td>ZZZZ (4 letters)</td> <td>Prefix to distinguish each project, project_id</td> </tr> <tr> <td>01 (2 digits)</td> <td>Version number of the data set, set_version</td> </tr> <tr> <td>000001 (6 digits)</td> <td>ID of each individual sequence (It might be 7 or 8 digits depended on the number of entries.)</td> </tr> </tbody> </table> <p>The set_version goes up for every update of the dataset. Example:ZZZZ02000001</p> </dd> <dd></dd> </dl> <pre class="code flat-file"><code> ACCESSION ZZZZ01000001 ZZZZ01000000 </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">ZZZZ01000001</code></td> <td>primary accession number</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">ZZZZ01000000</code></td> <td>set ID</td> </tr> </tbody> </table> <dl> <dt>For MGA data</dt> <dd>This (ACEESSION) line shows a number assigned by INSDC to a resource.</dd> <dd>The number is composed of 5 alphabet characters and 7 digits (ex. ABCDE0000001).An accession number assigned to an entry of a resource units is displayed in the MGA lines.</dd> <dd>Example:ABCDE0000001 <table> <tbody> <tr> <td>AB (first two characters) identi</td> <td>identifier to each project.</td> </tr> <tr> <td>CDE (third to fifth characters)</td> <td>identifier to each of resources on each project.</td> </tr> <tr> <td>0000001 (7 digit numeric numbers)</td> <td>number for each sequence entry in a resource.</td> </tr> </tbody> </table> <p>*1 The information about each project id is avilable at the <a href="https://ddbj.nig.ac.jp/public/ddbj_database/mga/project_index-e.html">project_index </a> page.<br />*2 “resource” here means a unit of identical origin, such as tissue, cells, from which sequence are obtained.</p> </dd> <dd></dd> </dl> <pre class="code flat-file"><code> ACCESSION ZZZZZ0000000 </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">ZZZZZ0000000</code></td> <td>number to a resource unit</td> </tr> </tbody> </table> <div id="VersionB"> <h3><a href="#VersionA">VERSION</a></h3> </div> <p>This line consists of an accession number and a version number, like “AB123456.1”, in which the digit(s) after the period is a version number.</p> <p>The data open to public for the first time is version number as “1”. The reason for adding VERSION is that since a released sequence sometimes revised by the submitter, the accession number alone cannot specify the sequence in question causing the user a trouble. The number is increased by one every time when a revised sequence is made public. And accession number will NOT be changed generally.</p> <pre class="code flat-file"><code> VERSION AB000000.1 </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">AB000000</code></td> <td>accession number</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">1</code></td> <td>version number</td> </tr> </tbody> </table> <dl> <dt>For MGA data</dt> <dd>This line consists of a number assigned to a resources unit in which the digit(s) after the period is a version number.<br /> Since the sequence of an MGA entry is not allowed to update, the version number has to be “1”.</dd> <dd></dd> </dl> <pre class="code flat-file"><code> VERSION ZZZZZ0000000.1 </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">ZZZZZ000000</code></td> <td>number to a resource unit</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">1</code></td> <td>version number</td> </tr> </tbody> </table> <div id="DblinkB"> <h3><a href="#DblinkA">DBLINK</a></h3> </div> <p>The DBLINK line is used to link other databases for BioProject, BioSample accession numbers, Sequence Read Archive Run accession numbers and so on.</p> <p>DDBJ has replaced the PROJECT line by DBLINK line format since 2009 to expand for other data resources than projects.</p> <pre class="code flat-file"><code>DBLINK BioProject:PRJDA12345 BioSample:SAMD01234567 Sequence Read Archive:DRR012345, DRR012346 </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">BioProject</code></td> <td>The name of linked database: <a href="/bioproject/index-e.html">BioProject Database</a></td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">PRJDA12345</code></td> <td>Linked ID in the database; BioProject accession number</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">BioSample</code></td> <td>The name of linked database: <a href="/biosample/index-e.html">BioSample Database</a></td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">SAMD01234567</code></td> <td>Linked ID in the database; BioSample accession number</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">Sequence Read Archive</code></td> <td>The name of linked database: <a href="/dra/index-e.html">Sequence Read Archive</a> (SRA)</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">DRR012345, DRR012346</code></td> <td>Linked ID in the database; SRA Run accession numbers</td> </tr> </tbody> </table> <div id="KeywordsB"> <h3><a href="#KeywordsA">KEYWORDS</a></h3> </div> <p>The KEYWORDS lines were used for indexing (<a href="/ddbj/qualifiers-e.html#gene">gene</a>) and (<a href="/ddbj/qualifiers-e.html#product">product</a>) names in the past.</p> <p>For now, KEYWORDS lines are used to indicate <a href="/documents/data-categories-e.html#detail">the detail category of the data</a> (<a href="/ddbj/est-e.html">EST</a>, <a href="/ddbj/tsa-e.html">TSA</a>, <a href="/ddbj/htc-e.html">HTC</a>, <a href="/ddbj/htg-e.html">HTG</a>, <a href="/ddbj/gss-e.html">GSS</a>, <a href="/ddbj/wgs-e.html">WGS</a>, <a href="/ddbj/tpa-e.html">TPA</a> etc) information about experimental method, “finishing level” of genome sequencing and else, if necessary. See also <a href="https://insdc.org/submitting-standards/methodological-keywords/">INSDC agreed methodological keywords</a>.</p> <div id="SourceB"> <h3><a href="#SourceA">SOURCE</a></h3> </div> <p>This line shows the scientific name (and common name, if defined) on organism from which the sequence is obtained and an organelle type if the sequence is derived from an organelle other than the nucleus.</p> <pre class="code flat-file"><code>SOURCE Homo sapiens (human) </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">Homo sapiens (human)</code></td> <td>The scientific name from which the sequence is obtained.</td> </tr> </tbody> </table> <div id="OrganismB"> <h4><a href="#OrganismA">ORGANISM</a></h4> </div> <p>The organism name and its phylogenic lineage from which the sequence is obtained are described.</p> <p>The scientific name is indicated as the organism name in 1st line. If the sequence is obtained from an unidentified organism or artificially synthesized, the name registered on the Unified Taxonomy Database is described instead of scientific name.</p> <p>The phylogenic lineage information based on the Unified Taxonomy Database is started from 2nd line.</p> <pre class="code flat-file"><code> ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">Homo sapiens</code></td> <td>The scientific name from which the sequence is obtained</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.</code></td> <td>The phylogenic lineage information of Homo sapiens</td> </tr> </tbody> </table> <div id="Reference1B"> <h3><a href="#Reference1A">REFERENCE 1</a></h3> </div> <p>The information of submitter(s) is described as REFERENCE 1 (except old entries and some <a href="/ddbj/con-e.html">CON</a> entries).</p> <p>In the case of <a href="/ddbj/web-submission-e.html">Nucleotide Sequence Submission System</a>, REFERENCE 1 is processed with the information entered on “Contact person” and “Submitter” pages. In the case of Mass Submission System, REFERENCE 1 is processed with the information entered in annotation file.</p> <pre class="code flat-file"><code>REFERENCE 1 (bases 1 to 450) </code></pre> <p>Notice: The portion, “(bases 1 to 450)”, is not available on the Master record of MGA data.</p> <div id="AuthorsB"> <h4><a href="#AuthorsA">AUTHORS</a></h4> </div> <p>Submitter(s) of the entry is/are indicated in principle. Submitter is responsible for the data and can update it.</p> <pre class="code flat-file"><code> AUTHORS Mishima,H. and Shizuoka,T. </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">Mishima,H. and Shizuoka,T</code></td> <td>The submitters of this entry</td> </tr> </tbody> </table> <div id="TitleB"> <h4><a href="#TitleA">TITLE</a></h4> </div> <p>“Direct Submission” is indicated to follow the standard form.</p> <pre class="code flat-file"><code> TITLE Direct Submission </code></pre> <div id="JournalB"> <h4><a href="#JournalA">JOURNAL</a></h4> </div> <p>At first, “Accept Date” of the entry is indicated. “Accept Date” is defined as the date when DDBJ have received the acceptable data to assign accession number in principle. Even if the entry is updated, “Accept Date” is NOT changed. Then, the information about the address and the affiliation of “Contact Person” is indicated.</p> <pre class="code flat-file"><code> JOURNAL Submitted (30-NOV-2008) to the DDBJ/EMBL/GenBank databases. Contact:Hanako Mishima National Institute of Genetics, DNA Data Bank of Japan; Yata 1111, Mishima, Shizuoka 411-8540, Japan </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">Submitted (30-NOV-2008) to the DDBJ/EMBL/GenBank databases.</code></td> <td>Accept date of this entry is 30-NOV-2008</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">Contact:Hanako Mishima</code><br /><code class="language-plaintext highlighter-rouge">National Institute of Genetics, DNA Data Bank of Japan; Yata 1111,</code><br /> <code class="language-plaintext highlighter-rouge">Mishima, Shizuoka 411-8540, Japan</code></td> <td>The information about the address and the affiliation of Hanako Mishima.</td> </tr> </tbody> </table> <p>E-mail address, phone & fax nos.</p> <ul> <li>To follow the Japanese law of protecting personal information, DDBJ will delete both phone and fax numbers, and E-mail address from the flat files of the entries submitted to DDBJ.However, if you wish to disclose any of the three items, please contact us with <a href="/contact-ddbj-e.html#to-ddbj">contact form</a>,?specifying the item(s) to be disclosed.</li> <li>When you wish to contact to the submitter(s) of an entry of your interest,please contact us with <a href="/contact-ddbj-e.html#to-submitters">the inquiry form</a> with reasons briefly;i.e. asking to transfer cloned sequences, etc, then we will forward your messeage to the submitter(s).</li> </ul> <p>Phone and fax numbers and E-mail address are deleted.</p> <pre class="code flat-file"><code> JOURNAL Submitted (30-NOV-2000) to the DDBJ/EMBL/GenBank databases. Contact:Hanako Mishima National Institute of Genetics, DNA Data Bank of Japan; Yata 1111, Mishima, Shizuoka 411-8540, Japan </code></pre> <p>When the submitters wish to keep their contact information disclosed, it will be described as,</p> <pre class="code flat-file"><code> JOURNAL Submitted (30-NOV-2000) to the DDBJ/EMBL/GenBank databases. Contact:Hanako Mishima National Institute of Genetics, DNA Data Bank of Japan; Yata 1111, Mishima, Shizuoka 411-8540, Japan E-mail :mishima@supernig.nig.ac.jp Phone :81-55-981-6853 Fax :81-55-981-6849 </code></pre> <div id="Reference2B"> <h3><a href="#Reference2A">REFERENCE 2</a></h3> </div> <p>The information of references related to the submitted sequence is indicated on REFERENCE line (other than (<a href="#Reference1B">REFERENCE 1</a>). Since REFERENCE 2 indicates <a href="/ddbj/submission.html#pcite">the publication status of the sequence</a>, the reference which does not describe about the submitting sequence is indicated as REFERENCE 3 or after, not as REFERENCE 2.</p> <p>When DDBJ notices a paper publication with an accession number, DDBJ will update the entry with the accession number, if necessary. During the process of the update, the prepublication paper(s) described in the line(s), REFERENCE 2 and/or later, will be revised without any notice to submitters, if applicable; i.e. When the submitted data, submitters’ affiliation, author names, title, and journal name of the prepublication paper, are enough reasonable to be revised.</p> <dl> <dt>In the cases of the manuscript in preparation, submitted for publication, in press, or published</dt> <dd></dd> </dl> <pre class="code flat-file"><code> REFERENCE 2 AUTHORS Mishima,H., Shizuoka,T. and Fuji,I. TITLE Glyceraldehyde-3-phosphate dehydrogenase expressed in human liver JOURNAL Unpublished (2009) </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">AUTHORS</code></td> <td>The (presumptive) author(s) of the reference is/are described.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">TITLE</code></td> <td>The (presumptive) title of the reference is described.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">JOURNAL</code></td> <td>In the cases of the paper published or In Press, the journal name is described. In the case of unpublished manuscript, “Unpublished” is described to follow the standard form.</td> </tr> </tbody> </table> <dl> <dt>In the case of no schedule for publication except the international nucleotide database.</dt> <dd></dd> </dl> <pre class="code flat-file"><code> REFERENCE 2 AUTHORS Mishima,H., Shizuoka,T. and Fuji,I. TITLE Glyceraldehyde-3-phosphate dehydrogenase expressed in human liver JOURNAL Published Only in Database(2009) </code></pre> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">AUTHORS</code></td> <td>The author(s) of the submission entered by submitter(s) is/are described.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">TITLE</code></td> <td>The title of the submission entered by submitter(s) is described.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">JOURNAL</code></td> <td>“Published Only in Database” is indicated.<br />The parenthetic number is the year when the entry has been firstly publicized.</td> </tr> </tbody> </table> <div id="CommentB"> <h3><a href="#CommentA">COMMENT</a></h3> </div> <p>The information about an entry that can not be described using FEATURES or the other fields. For instance, if submitter has the other affiliation to <a href="#Reference1B">REFERENCE 1</a>, it can be described on COMMENT line.</p> <pre class="code flat-file"><code> COMMENT Human cDNA sequencing project. </code></pre> <dl> <dt>Structured COMMENT</dt> <dd>Structured COMMENT is a format to describe and to share some datasets undefined in feature/qualifier.<br />SUsing structured COMMENTs, datasets can be shared via flatfiles of INSDC in the community of submitters and users.<br />To describe structured COMMENT, the dataset is required to be describe in structured sets of [names of items] and [values of items] on COMMENT line.<br />There are some predetermined formats of structured COMMENTs that are required to submit some kinds of sequence data derived from genome projects (including<a href="/ddbj/wgs-e.html">WGS</a>, transcriptome projects (including <a href="/ddbj/tsa-e.html">TSA</a>) and so on.</dd> <dd></dd> </dl> <pre class="code flat-file"><code> COMMENT ##Genome-Assembly-Data-START## Finishing Goal :: Finished Current Finishing Status :: High Quality Draft Assembly Method :: Newbler v. 2.3 Genome Coverage :: 30x Sequencing Technology :: 454 GS Junior; Illumina GA II ##Genome-Assembly-Data-END## </code></pre> <p>: <br /> The above example is an additional information, “Genome-Assembly-Data”, that is required for genome projects.<br />The contents between ##Genome-Assembly-Data-START## and ##Genome-Assembly-Data-END## are delimited item names and their values by “ :: “.</p> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">##Genome-Assembly-Data-START##</code></td> <td>The first line of the structured COMMENT defined as “Genome-Assembly-Data”.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">##Genome-Assembly-Data-END##</code></td> <td>The last line of the structured COMMENT defined as “Genome-Assembly-Data”.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">Finishing Goal :: Finished</code></td> <td>The final goal of the genome project is “Finished” level.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">Current Finishing Status :: High Quality Draft</code></td> <td>The current status of the genome project is “High Quality Draft” level.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">Assembly Method :: Newbler v. 2.3</code></td> <td>The software to assemble reads of sequences is Newbler and its version is 2.3.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">Genome Coverage :: 30x</code></td> <td>The sequencing depth of the genome sequences is approximately 30 fold.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">Sequencing Technology :: 454 GS Junior; Illumina GA II</code></td> <td>454 GS Junior; Illumina GA II – the platforms (sequencers) to determine the genome sequences are “454 GS Junior” and “Illumina GA II”.</td> </tr> </tbody> </table> <dl> <dt>For MGA data</dt> <dd>For <a href="/ddbj/mga-e.html">MGA Submission</a>, the process for obtaining the submitted sequence data e.g.; (methods for preparing sequences from tissues or cells and processing the sequences for submission) is described.</dd> <dd></dd> </dl> <pre class="code flat-file"><code> COMMENT The CAGE (cap analysis gene expression) is based on preparation and sequencing of concatamers of DNA tags deriving from the initial 20/21 nucleotides from 5' end mRNAs. Full-length cDNAs were at first selected with the Cap-Trapper method. Then, a specific linker (Linker1, some linker contain 5 bp sequences that have 15 variations for each rna sample) containing the ClassIIs restriction enzyme site MmeI was then ligated to the single-strand cDNA and then the second strand of cDNA synthesized. The resulting double-stranded cDNA was cleaved by the restriction enzyme MmeI and a second linker (Linker2) was ligated to the 2 bp overhang at the MmeI cleaved site, to produce a 5' 20/21 tag having two linkers at both sides. The ligation products were separated from unmodified DNA with magnetic beads. The 5' end cDNA tags were released from the beads, and the DNA fragments were amplified in a PCR step by using the two linker-specific primers (Primer1 (uni-PCR), Primer2 (MmeI-PCR)). The desired 32-37 bp tags were purified and ligated to form concatamers, and then the concatamer were fractionated and ligated to the plasmid ZErO-2. The ligations were finally electroporated into DH10b cells (Invitrogen) and obtained plasmids were sequenced with forward primers. CAGE libraries were sequenced with forward primers essentially as described with minor modifications to use zeocin for selection of recombinants. We used in-house developed algorithms for the extraction of tags and for masking the vectors. CAGE tags were extracted with the following parameters: vector masking, minimum 12 bp recognition allowed; linker (13 bp) masking: maximum mismatch, 2 bp allowed; XmaJI site maximum mismatch, 2 bp allowed; tag length, 17-24 bp. Linker1: "Upper oligonucleotide GN6": biotin-agagagagacctcgagtaactataacggtcctaaggtagcgacctagg (5 bp) tccgacGNNNNN and "Upper oligonucleotide N6": </code></pre> <div id="FeaturesB"> <h3><a href="#FeaturesA">FEATURES</a></h3> </div> <p>Biological features of a submitted sequence data are described with “Feature” key (the biological nature of the annotated feature), “Location” (the region of the sequence which corresponds to Feature), and “Qualifier” (supplementary information about Feature). In principle, EST or GSS entries are not described with any features except the “source” key.</p> <p>FEATURES are indicated on the basis of the information provided by submitter and modified by databanks to describe the appropriate annotation. The rules of feature description agreed with three databanks are explained at <a href="/ddbj/feature-table-e.html">The DDBJ/ENA/GenBank Feature Table Definition</a> in detail.</p> <p>Feature keys are briefly classified into 3 groups;</p> <ul> <li>group 1: biological source of the sequence (source)<br />The feature, “source” (group 1) is mandatory for all entries in the international nucleotide database.<br />The qualifiers “/organism” and “/mol_type” are mandatory for source feature.</li> <li>group 2: biological function features of the region<br />Feature keys in group 2 fall into families which are in some sense similar in function and which are annotated in a similar manner.A functional family may have a “generic” or miscellaneous key, which can be recognized by the ‘misc_’ prefix, that can used for instances not covered by the other defined keys of that group.<br />e.g. CDS, rRNA, etc.</li> <li>group 3: difference and/or change of the sequence data<br />e.g. variation, conflict, etc.</li> </ul> <p>One of the most frequently used feature key is “CDS” to describe coding sequence for protein. See also <a href="/ddbj/cds-e.html">CDS feature</a> page.</p> <pre class="code flat-file"><code>FEATURES Location/Qualifiers source 1..450 /chromosome="12" /clone="GT200015" /collection_date="2007" /db_xref="taxon:9606" /geo_loc_name="Japan" /map="12p13" /mol_type="mRNA" /organism="Homo sapiens" /tissue_type="liver" CDS 86..>450 /codon_start=1 /gene="GAPD" /product="glyceraldehyde-3-phosphate dehydrogenase" /protein_id="BAA12345.1" /transl_table=1 /translation="MAKIKIGINGFGRIGRLVARVALQSDDVELVAVNDPFITTDYMT YMFKYDTVHGQWKHHEVKVKDSKTLLFGEKEVTVFGCRNPKEIPWGETSAEFVVEYTG VFTDKDKAVAQLKGGAKKV" </code></pre> <div id="FeaturesSourceB"> <h3><a href="#FeaturesSourceA">source</a></h3> </div> <p>Identifies the biological source of the specified span of the sequence.</p> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">source 1..450</code></td> <td>The region from 1st to 450th base of the sequence is derived from the source described with following qualifiers.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/chromosome="12"</code></td> <td>The sequence is obtained from chromosome 12.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/clone="GT200015"</code></td> <td>The clone name which the sequence is obtained.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/collection_date="2007"</code></td> <td>The collection date of the sample.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/geo_loc_name="Japan"</code></td> <td>The collection site of the sample.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/map="12p13"</code></td> <td>The sequence is located on 12p13.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/db_xref="taxon:9606"</code></td> <td>The sequence is derived from a organism correspond to taxonomy database ID: 9606 (human).</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/mol_type="mRNA"</code></td> <td>The sequence is derived from mRNA.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/organism="Homo sapiens"</code></td> <td>The sequence is obtained from human.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/tissue_type="liver"</code></td> <td>The sequence is obtained from liver.</td> </tr> </tbody> </table> <div id="CDSB"> <h3><a href="#CDSA">CDS</a></h3> </div> <p>Coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon).</p> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">CDS 86..>450</code></td> <td>The region from 86th to 450th base of the sequence is coding a protein described with following qualifiers.”>” means that 3’end is not completed for the region of CDS. The rule to describe “Location” is explained at <a href="/ddbj/location-e.html">Description of Location</a> in detail.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/codon_start=1</code></td> <td>The <a href="/ddbj/cds-e.html#frame">frame</a> reading amino acid translation of the first codon is the 1st base of this region (86th base of the entry).</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/gene="GAPD"</code></td> <td>gene symbol, see <a href="/ddbj/qualifiers-e.html#gene">gene</a> qualifier</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/product="glyceraldehyde-3-phosphate dehydrogenase"</code></td> <td>product name, see <a href="/ddbj/qualifiers-e.html#product">product</a> qualifier </td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/protein_id="BAA12345.1"</code></td> <td>This is the ID assigned to amino acid sequence by the international nucleotide database.<br /> It is indicated as 3 alphabet characters and 5 digits.<br />The number next to “.” indicates he version number of protein ID. If the amino acid sequence is updated, the version number goes up (the protein_id is NOT changed).</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/transl_table=1</code></td> <td>The nucleotide sequence of CDS region is translated into amino acid sequence according to genetic code table 1.</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">/translation="MAKIKIGINGF(syncopation)AVAQLKGGAKKV"</code></td> <td>The nucleotide sequence of CDS region is conceptually translated into one-letter abbreviated amino acid sequence (<a href="/ddbj/code-e.html#amino-1">Amino Acid Codes</a>), except setting the qualifier<a href="/ddbj/qualifiers-e.html#exception">exception</a>.<br />In the case of setting the qualifier <a href="/ddbj/qualifiers-e.html#pseudogene">pseudogene</a> or <a href="/ddbj/qualifiers-e.html#pseudo">pseudo</a>, /translation is NOT indicated.</td> </tr> </tbody> </table> <div id="EndB"> <h3><a href="#EndA">//</a></h3> </div> <p>”//” is the terminal symbol of the entry.</p> </main> <aside class="related-pages"> <h2 class="caption">Related pages</h2> <div class="navigation"> <nav> <ul> <li> <a href="/ddbj/location-e.html">Description of Location</a> </li> <li> <a href="/ddbj/cds-e.html">Protein Coding Sequence; CDS feature</a> </li> <li> <a href="/ddbj/example-e.html">Description Examples of Sequence Data</a> </li> <li> <a href="/ddbj/code-e.html">Codes Used in Sequence Description</a> </li> <li> <a href="/ddbj/organism-e.html">Organism qualifier</a> </li> <li> <a href="/ddbj/features-e.html">Feature Key</a> </li> <li> <a href="/ddbj/qualifiers-e.html">Qualifier key</a> </li> <li> <a href="/ddbj/geneticcode-e.html">The Genetic Codes</a> </li> </ul> </nav> </div> </aside> </section> </div> </section> </div> <footer></footer> <div id="back-top"></div> </body> </html>