CINXE.COM

HGNC data archive help | HUGO Gene Nomenclature Committee

<!DOCTYPE html> <html lang="en" class="no-js"> <head> <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-5XFN8N2');</script> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <link rel="shortcut icon" href="/img/hgnc/favicon/favicon.ico" type="image/x-icon" /> <title>HGNC data archive help | HUGO Gene Nomenclature Committee</title> <meta name="Description" content="" /> <link type="text/css" rel="stylesheet" media="all" href="/css/default-bf.css?t=1728642555095" /> </head> <body> <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5XFN8N2" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript> <div class="page"> <header> <nav class="navbar navbar-genenames"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target=".navbar-collapse" aria-expanded="false" > <span class="sr-only">Toggle navigation</span> <i class="fa fa-bars"></i> <span>Menu</span> <i class="fa fa-search"></i> </button> <a class="navbar-brand" href="/"> <img alt="HGNC" src="/img/hgnc/logo/hgnc-logo-dark-bckgrnd-small.svg" /> </a> </div> <div class="collapse navbar-collapse"> <div class="navbar-form navbar-right search-bar" role="search"> <search-bar></search-bar> </div> </div> <div class="collapse navbar-collapse"> <hgnc-nav></hgnc-nav> </div> </div> </nav> <div class="message-banner col-xs-12"> <p class="container" style="font-size: 1.3rem;">All download files <strong>including</strong> the <a href="/download/archive/">archive</a> files are now in a publicly accessible Google Storage Bucket. <a href="/download/statistics-and-files/">Downloads</a> page links have been updated.</p> </div> </header> <div id="main" class="container"> <h1 class="title">HGNC data archive help</h1> <noscript> <p class="warning grid_24"> <strong>Warning:</strong> Search requires JavaScript to function. <a target="_blank" title="Learn how to enable JavaScript" href="/help/browser" >more...</a > </p> </noscript> <div id="toc-container" class="panel panel-default"> <div class="panel-heading"> <div id="toctitle"><a role="button" href="" class="title toctoggle"> <i class="far ng-scope fa-caret-square-up"></i> Contents </a> </div> </div> <div class="panel-body toctoggle"> <ol> <li class="toc_level-1 toc_section-1"> <a data-ng-anchor="#tocAnchor-1-1"><span class="tocnumber"></span> <span class="toctext">File fields/columns information</span></a> </li> <li class="toc_level-1 toc_section-2"> <a data-ng-anchor="#tocAnchor-1-2"><span class="tocnumber"></span> <span class="toctext">Archive files location</span></a> </li> <li class="toc_level-1 toc_section-3"> <a data-ng-anchor="#tocAnchor-1-3"><span class="tocnumber"></span> <span class="toctext">Quick links to HGNC current release complete set files.</span></a> </li> </ol> </div> </div><p>The HGNC now archive the complete HGNC dataset file (both tab separated and JSON formats) for each month and each quarter and store them in our Google Cloud Storage Bucket.</p> <h2 id="tocAnchor-1-1">File fields/columns information</h2> <p>The monthly files are produced on the 1st of every month while the quarterly files are produced on the 1st of Jan, Apr, Jul &amp; Oct. If a monthly file is over 365 days old, the file will be deleted for us to save disk space however, the quarterly files are not deleted currently so for snapshots older than 365 days please use the quarterly files.</p> <p>There are essentially two types of data file (excluding the file format type) of hgnc_complete_set and withdraw. The hgnc_complete_set is a set of all approved gene symbol reports found on the GRCh38 reference and the alternative reference loci (see fig. 1 for a list of columns/headings).</p> <figure> <pre class="center-block" style="background: rgba(238,238,238,0.92); color: #000;"> hgnc_id = HGNC ID. A unique ID created by the HGNC for every approved symbol. symbol = The HGNC approved gene symbol. Equates to the "APPROVED SYMBOL" field within the gene symbol report. name = HGNC approved name for the gene. Equates to the "APPROVED NAME" field within the gene symbol report. locus_group = A group name for a set of related locus types as defined by the HGNC (e.g. non-coding RNA). locus_type = The locus type as defined by the HGNC (e.g. RNA, transfer). status = Status of the symbol report, which can be either "Approved" or "Entry Withdrawn". location = Cytogenetic location of the gene (e.g. 2q34). location_sortable = Same as "location" but single digit chromosomes are prefixed with a 0 enabling them to be sorted in correct numerical order (e.g. 02q34). alias_symbol = Other symbols used to refer to this gene as seen in the "SYNONYMS" field in the symbol report. alias_name = Other names used to refer to this gene as seen in the "SYNONYMS" field in the gene symbol report. prev_symbol = Symbols previously approved by the HGNC for this gene. Equates to the "PREVIOUS SYMBOLS &amp; NAMES" field within the gene symbol report. prev_name = Gene names previously approved by the HGNC for this gene. Equates to the "PREVIOUS SYMBOLS &amp; NAMES" field within the gene symbol report. gene_family = Name given to a gene family or group the gene has been assigned to. Equates to the "GENE FAMILY" field within the gene symbol report. gene_family_id = ID used to designate a gene family or group the gene has been assigned to. date_approved_reserved = The date the entry was first approved. date_symbol_changed = The date the gene symbol was last changed. date_name_changed = The date the gene name was last changed. date_modified = Date the entry was last modified. entrez_id = Entrez gene ID. Found within the "GENE RESOURCES" section of the gene symbol report. ensembl_gene_id = Ensembl gene ID. Found within the "GENE RESOURCES" section of the gene symbol report. vega_id = Vega gene ID. Found within the "GENE RESOURCES" section of the gene symbol report. ucsc_id = UCSC gene ID. Found within the "GENE RESOURCES" section of the gene symbol report. ena = International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s). Found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report. refseq_accession = RefSeq nucleotide accession(s). Found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report. ccds_id = Consensus CDS ID. Found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report. uniprot_ids = UniProt protein accession. Found within the "PROTEIN RESOURCES" section of the gene symbol report. pubmed_id = Pubmed and Europe Pubmed Central PMID(s). mgd_id = Mouse genome informatics database ID. Found within the "HOMOLOGS" section of the gene symbol report. rgd_id = Rat genome database gene ID. Found within the "HOMOLOGS" section of the gene symbol report. lsdb = The name of the Locus Specific Mutation Database and URL for the gene separated by a | character cosmic = Symbol used within the Catalogue of somatic mutations in cancer for the gene. omim_id = Online Mendelian Inheritance in Man (OMIM) ID mirbase = miRBase ID homeodb = Homeobox Database ID snornabase = snoRNABase ID bioparadigms_slc = Symbol used to link to the SLC tables database at bioparadigms.org for the gene orphanet = Orphanet ID pseudogene.org = Pseudogene.org horde_id = Symbol used within HORDE for the gene merops = ID used to link to the MEROPS peptidase database imgt = Symbol used within international ImMunoGeneTics information system iuphar = The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database. To link to IUPHAR/BPS Guide to PHARMACOLOGY database only use the number (only use 1 from the result objectId:1) kznf_gene_catalog = ID used to link to the Human KZNF Gene Catalog mamit-trnadb = ID to link to the Mamit-tRNA database cd = Symbol used within the Human Cell Differentiation Molecule database for the gene lncrnadb = lncRNA Database ID enzyme_id = ENZYME EC accession number intermediate_filament_db = ID used to link to the Human Intermediate Filament Database agr = The HGNC ID that the Alliance of Genome Resources (AGR) have linked to their record of the gene. Use the HGNC ID to link to the AGR. mane_select = NCBI and Ensembl transcript IDs/acessions including the version number for one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene. The IDs are delimited by |. </pre> <figcaption class="text-center">Figure 1: Columns/headings within the hgnc_complete_set files.</figcaption> </figure> <p>The withdrawn file contains all gene symbol reports that are no longer approved. Either the symbol has been withdrawn or merged/split into another report (see fig. 2 for a list of columns/headings).</p> <figure> <pre class="center-block" style="background: rgba(238,238,238,0.92); color: #000;"> HGNC_ID = The HGNC ID of the withdrawn record. STATUS = Can either be "Entry Withdrawn" or "Merged/Split" WITHDRAWN_SYMBOL = The symbol of the withdrawn record. MERGED_INTO_REPORT(S) = Shows what record(s) replaced the withdrawn record if the status is "Merged/Split". Each replacement has the format HGNC_ID|SYMBOL|STATUS. If the withdrawn record is split, their will be more than one replacement and they will be comma separated. </pre> <figcaption class="text-center">Figure 2: Columns/headings within the withdrawn files.</figcaption> </figure> <h2 id="tocAnchor-1-2">Archive files location</h2> <ul> <li>Monthly files: <ul> <li><a href="/download/archive/monthly/tsv/">Tab separated files</a></li> <li><a href="/download/archive/monthly/json/">JSON files</a></li> </ul> </li> <li>Quarterly files: <ul> <li><a href="/download/archive/quarterly/tsv/">Tab separated files</a></li> <li><a href="/download/archive/quarterly/json/">JSON files</a></li> </ul> </li> </ul> <h2 id="tocAnchor-1-3">Quick links to HGNC current release complete set files.</h2> <ul> <li><a href="//storage.googleapis.com/public-download-files/hgnc/tsv/tsv/hgnc_complete_set.txt">Current tab separated hgnc_complete_set file</a></li> <li><a href="//storage.googleapis.com/public-download-files/hgnc/json/json/hgnc_complete_set.json">Current JSON format hgnc_complete_set file</a></li> <li><a href="//storage.googleapis.com/public-download-files/hgnc/tsv/tsv/withdrawn.txt">Current tab separated withdrawn file</a></li> <li><a href="//storage.googleapis.com/public-download-files/hgnc/json/json/withdrawn.json">Current JSON format withdrawn file</a></li> </ul> </div> <hgnc-footer></hgnc-footer> </div> <script type="text/javascript" src="/js/default-bf.js?t=1728642555095"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10