CINXE.COM

Ensembl Blog

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" > <channel> <title>Ensembl Blog</title> <atom:link href="https://www.ensembl.info/feed/" rel="self" type="application/rss+xml" /> <link>https://www.ensembl.info</link> <description>News about the Ensembl Project and its genome browser</description> <lastBuildDate>Mon, 18 Nov 2024 09:40:28 +0000</lastBuildDate> <language>en-GB</language> <sy:updatePeriod> hourly </sy:updatePeriod> <sy:updateFrequency> 1 </sy:updateFrequency> <generator>https://wordpress.org/?v=6.4.5</generator> <image> <url>https://www.ensembl.info/wp-content/uploads/2018/01/cropped-ebang-512-32x32.png</url> <title>Ensembl Blog</title> <link>https://www.ensembl.info</link> <width>32</width> <height>32</height> </image> <site xmlns="com-wordpress:feed-additions:1">52918201</site> <item> <title>We have frozen Ensembl Rapid Release</title> <link>https://www.ensembl.info/2024/11/14/we-will-be-freezing-ensembl-rapid-release/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=we-will-be-freezing-ensembl-rapid-release</link> <comments>https://www.ensembl.info/2024/11/14/we-will-be-freezing-ensembl-rapid-release/#respond</comments> <dc:creator><![CDATA[Jorge Batista da Rocha]]></dc:creator> <pubDate>Thu, 14 Nov 2024 16:40:07 +0000</pubDate> <category><![CDATA[Release announcements]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[Ensembl Genomes]]></category> <category><![CDATA[rapid]]></category> <category><![CDATA[rapid release]]></category> <category><![CDATA[release]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15780</guid> <description><![CDATA[All Rapid Release data, including Release 65, has been uploaded into the new Ensembl Beta site. New genome data will only be released onto the Ensembl Beta site. The Ensembl Rapid Release website will remain active for the foreseeable future, however, the data and species set will no longer be updated. If you feel that [&#8230;]]]></description> <content:encoded><![CDATA[ <p>All Rapid Release data, including Release 65, has been uploaded into the new Ensembl Beta site. New genome data will only be released onto the <a href="https://beta.ensembl.org" data-type="link" data-id="https://beta.ensembl.org">Ensembl Beta site</a>.</p> <span id="more-15780"></span> <p>The Ensembl Rapid Release website will remain active for the foreseeable future, however, the data and species set will no longer be updated. If you feel that this change will have a significant impact on your analyses, please let us know by emailing the <a href="mailto:helpdesk@ensembl.org" data-type="mailto" data-id="mailto:helpdesk@ensembl.org.">Ensembl Helpdesk</a>.</p> <p>The up-to-date versions of all of these data* will be accessible from the new Beta Ensembl site. However, we will continue to provide access to the data that was frozen on <a href="https://rapid.ensembl.org/">Ensembl Rapid Release genome browser</a> and <a href="https://ftp.ensembl.org/pub/rapid-release/">FTP site</a>.&nbsp;</p> <p>*Please note that data for the reference genomes from The Mouse Genomes Project (available at <a href="https://www.mousegenomes.org/reference-genomes/">The Mouse Genomes Project</a>) are currently excluded. We have identified some issues with the data and are actively working to resolve them. The corrected data will be accessible through ensembl.org starting with release e114 and will be integrated into the new website as soon as possible.</p> <p>Launched in June 2020, Rapid Release was introduced as a streamlined Ensembl platform designed to enable the swift release of gene annotations. In addition to gene annotations, the platform also provides protein feature annotation, BLAST functionality, and homology data. This innovation allowed Ensembl to maintain its leadership in delivering functional annotation data, keeping pace with the exponentially increasing number of assembled genomes released by key initiatives. These initiatives include global biodiversity efforts such as the <a href="https://www.earthbiogenome.org/">Earth Biogenome Project</a>, the <a href="https://www.erga-biodiversity.eu/erga-bge">European Reference Genome Atlas</a> , and the <a href="https://www.darwintreeoflife.org/">Darwin Tree of Life Project</a>; livestock and domestic animal projects like the <a href="https://www.animalgenome.org/community/FAANG/">FAANG</a> initiatives; and pangenome projects, including the <a href="https://humanpangenome.org/)">Human Pangenome Reference Consortium</a> among others.</p> ]]></content:encoded> <wfw:commentRss>https://www.ensembl.info/2024/11/14/we-will-be-freezing-ensembl-rapid-release/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">15780</post-id> </item> <item> <title>Temporary disruption in Ensembl BioMart</title> <link>https://www.ensembl.info/2024/11/13/temporary-disruption-in-ensembl-biomart/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=temporary-disruption-in-ensembl-biomart</link> <comments>https://www.ensembl.info/2024/11/13/temporary-disruption-in-ensembl-biomart/#respond</comments> <dc:creator><![CDATA[Aleena Mushtaq]]></dc:creator> <pubDate>Wed, 13 Nov 2024 11:32:09 +0000</pubDate> <category><![CDATA[Service status]]></category> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[biomart]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[release]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15776</guid> <description><![CDATA[We are experiencing issues with Ensembl BioMart due to unforeseen errors. Some species are currently unavailable in the list of datasets in Ensembl 113. To access these species in BioMart, you can use the previous version of Ensembl release 112.  We apologise for the inconvenience this may cause and we hope to restore them as [&#8230;]]]></description> <content:encoded><![CDATA[ <p>We are experiencing issues with Ensembl BioMart due to unforeseen errors. </p> <span id="more-15776"></span> <p>Some species are currently unavailable in the list of datasets in Ensembl 113. To access these species in BioMart, you can use the previous version of <a href="https://may2024.archive.ensembl.org/biomart/martview/">Ensembl release 112</a>. </p> <p>We apologise for the inconvenience this may cause and we hope to restore them as soon as we can.</p> ]]></content:encoded> <wfw:commentRss>https://www.ensembl.info/2024/11/13/temporary-disruption-in-ensembl-biomart/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">15776</post-id> </item> <item> <title>Ensembl 113 has been released!</title> <link>https://www.ensembl.info/2024/10/18/ensembl-113-has-been-released/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ensembl-113-has-been-released</link> <dc:creator><![CDATA[Louisse Paola Mirabueno]]></dc:creator> <pubDate>Fri, 18 Oct 2024 11:29:06 +0000</pubDate> <category><![CDATA[Release announcements]]></category> <category><![CDATA[archives]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[Ensembl Genomes]]></category> <category><![CDATA[gene annotation]]></category> <category><![CDATA[Genebuild]]></category> <category><![CDATA[Metazoa]]></category> <category><![CDATA[Plants]]></category> <category><![CDATA[regulation]]></category> <category><![CDATA[REST API]]></category> <category><![CDATA[variation]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15700</guid> <description><![CDATA[We are pleased to announce the release of Ensembl 113, and the corresponding release of Ensembl Genomes 60. This release brings major gene and regulatory feature annotation updates in Homo sapiens (Human) and Mus musculus (Mouse). We have updated existing genomes and added additional genomes across the different Ensembl sites, including livestock breeds in Ensembl, [&#8230;]]]></description> <content:encoded><![CDATA[ <p>We are pleased to announce the release of Ensembl 113, and the corresponding release of Ensembl Genomes 60. This release brings major gene and regulatory feature annotation updates in <em>Homo sapiens</em> (Human) and <em>Mus musculus</em> (Mouse). We have updated existing genomes and added additional genomes across the different Ensembl sites, including livestock breeds in Ensembl, three new species in Ensembl Plants and 26 new species in Ensembl Metazoa. Can’t find the species you are looking for? Don’t forget that new and exciting genome assemblies and annotations are continuously added to <a href="https://rapid.ensembl.org/index.html" target="_blank" rel="noreferrer noopener">Ensembl Rapid Release</a>!</p> <span id="more-15700"></span> <h2 class="wp-block-heading"><strong>Vertebrates</strong></h2> <h3 class="wp-block-heading"><strong>Gene annotation</strong></h3> <p>Ensembl 113 features the biggest ever expansion of the GENCODE human and mouse gene annotations from one release to the next. This is due to our integration of large numbers of new transcript models produced by the GENCODE Capture Long-read Sequencing (CLS) project, further details of which are provided on <a href="https://public-docs.crg.es/rguigo/CLS/" data-type="link" data-id="https://www.gencodegenes.org/pages/cls_lncrnas/">this GENCODE documentation page</a>. The present phase of the CLS project was specifically designed to expand the annotation of lncRNAs, and we have added over 130,000 new lncRNA transcripts to both species. We now have approximately twice as many lncRNA genes for these species as we did before. A manuscript to describe this work is currently being finalised.&nbsp;</p> <p>The default gene track has changed from GENCODE Comprehensive to GENCODE Basic to accommodate the increased number of transcripts. Consequently, a new GENCODE primary tag has been introduced. This tag represents a minimal set of transcripts including all well-conserved exons and exon skips that have high expression in human protein-coding genes. This will be expanded to other biotypes and in mouse at a later date.<em> </em>Further details can be found <a href="https://www.gencodegenes.org/pages/gencode_primary/" target="_blank" rel="noreferrer noopener">in the GENCODE FAQs</a>. They are labelled <code>gencode_primary</code> in REST API responses and GFF3/GTF files. The current GENCODE basic tag, labelled <code>basic</code>, has been changed to <code>gencode_basic</code>. Tags have not been changed in archives.</p> <p>Updated Havana manual gene annotations are also available for Human (GRCh38) and Mouse. The major histocompatibility complex (MHC) genes in <em>Rattus norvegicus</em> (Norway rat; mRatBN7.2) and <em>Sus scrofa</em> (Pig; Sscrofa11.1) have also been manually updated, along with additional immune genes.</p> <p>Moreover, genome assemblies and gene annotations for additional breeds of existing species are now available:</p> <ul> <li><em>Capra hircus</em> (Goat): 2 breeds (Saanen dairy and Xinong Saanen dairy)</li> <li><em>Ovis aries </em>(Sheep): 8 breeds (Qiaoke, Kermani, East Friesian, Polled Dorset, White dorper, Hu, Chinese merino and Romanov)</li> <li><em>S. scrofa</em> (Pig): 8 breeds (Bama miniature, Duroc, NIHS-2020, PB115, Euw1, Ossabaw miniature, Meishan and Ningxiang)</li> </ul> <h3 class="wp-block-heading"><strong>Variation</strong></h3> <p>To simplify variant displays during the large increase in Human GRCh38 transcripts driven by the on-going incorporation of models from long-read sequencing data, only pre-calculated transcript consequences for the GENCODE Primary set, which includes all exons in this assembly, are being displayed. The Human GRCh37 displays remain unchanged. A <a href="http://ftp.ensembl.org/pub/release-113/variation/vcf/homo_sapiens/all_transcripts" target="_blank" rel="noreferrer noopener">VCF file containing variant annotations for the full set of GRCh38 transcripts is available on the FTP site</a>.</p> <p>The Ensembl Variant Effect Predictor (VEP) now supports the GENCODE Primary transcript set to enable annotation of all potential variant consequences without duplication across multiple transcripts. The following features are supported:</p> <ul> <li>Optionally limiting Ensembl VEP annotation to this set</li> <li>Reporting the flag in Ensembl VEP output (as was done for MANE) as a boolean</li> </ul> <p>GnomAD v4.1 population allele frequency data is now available for Human on the website, command-line, and REST API instances of Ensembl VEP. Additionally, REVEL and ClinPred scores along with Ribosome profiling open reading frames (Ribo-seq ORFs) annotations&nbsp;are now available on the website and REST API version of Ensembl VEP. The <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#:~:text=Ensembl-,Paralogues,-A%20VEP%20plugin" target="_blank" rel="noreferrer noopener">Paralogues plugin</a> is now supported on the website and the LOEUF plugin is available on REST API. There are also some plugin data updates for CADD (v1.6 to v1.7) and dbNSFP (4.5c to v4.7c). From this release, and going forward, we only generate indexed Ensembl VEP cache files.&nbsp;</p> <p>We have updated our Nextflow VEP pipeline so that it can accept all the input formats Ensembl VEP supports, including VCF.</p> <p>The <em>O. aries</em> (Sheep; ARS-UI_Ramb_v2.0 (GCA_016772045.1) genome now includes variants from the European Variation Archive (EVA) release 5.</p> <h3 class="wp-block-heading"><strong>Regulation</strong></h3> <p>We have added new regulatory annotation for Cow. Regulatory annotation for Human and Mouse has undergone a major update, including regulatory features, a set of epigenomes (tissues and cell lines), and the associated regulatory activity. The <em>Transcription Factor Binding Sites (TFBS)</em> regulatory feature has been retired for both Human and Mouse. Instead, we recommend using the motif features track, which has been updated for Human and added to Mouse. This track can be found in the <em>Other regulatory features</em> under the <em>Regulation</em> section available in the configuration menu in the Location tab:</p> <figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="528" src="https://www.ensembl.info/wp-content/uploads/2024/10/image-1-1024x528.png" alt="A screenshot of the 'Configure this page' menu in the Ensembl 'Location' tab. The 'Motif features' track can be found under the 'Regulation' group in the left-hand panel." class="wp-image-15712" srcset="https://www.ensembl.info/wp-content/uploads/2024/10/image-1-1024x528.png 1024w, https://www.ensembl.info/wp-content/uploads/2024/10/image-1-300x155.png 300w, https://www.ensembl.info/wp-content/uploads/2024/10/image-1-768x396.png 768w, https://www.ensembl.info/wp-content/uploads/2024/10/image-1-1536x792.png 1536w, https://www.ensembl.info/wp-content/uploads/2024/10/image-1-2048x1056.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <p>The format of ENSR IDs has been updated for chromosome-level assemblies, and now includes additional characters (capital letters and the underscore symbol ‘<code>_</code>’). This change affects eight species with regulatory annotation. Regulatory annotation has been updated across all species to address minor inconsistencies, incorporate additional data, and introduce the new ENSR IDs.</p> <p>The following REST API regulation endpoints are now retired<br><code>GET regulatory/species/:species/id/:id<br>GET regulatory/species/:species/epigenome<br>GET regulatory/species/:species/microarray/:microarray/vendor/:vendor<br>GET regulatory/species/:species/microarray<br>GET regulatory/species/:species/microarray/:microarray/probe/:probe<br>GET regulatory/species/:species/microarray/:microarray/probe_set/:probe_set</code></p> <p>In addition, the <code>other_regulatory</code> and <code>array_probe</code> options in the <code>feature</code> parameter have been removed from the following endpoints:<br><code>GET overlap/region/:species/:region<br>GET overlap/id/:id</code></p> <h2 class="wp-block-heading"><strong>Ensembl Plants</strong></h2> <p>Additional Watkins Wheat variation data has been added for <em>Triticum aestivum</em> (Common wheat; GCA_900519105.1). The <a href="https://wwwg2b.com/watkins" target="_blank" rel="noreferrer noopener">Watkins Wheat variation collection</a> encompasses the genetic diversity found in the A.E. Watkins Landrace Collection of bread wheat, which includes 827 landraces from 32 countries.&nbsp;</p> <p>Gene annotation for the following existing species has been updated:</p> <ul> <li><em>Brassica rapa </em>(Field mustard; GCA_900412535.3)</li> <li><em>Marchantia polymorpha</em> (Common liverwort; GCA_039105155.1)</li> <li><em>Triticum aestivum</em> (Common wheat; Paragon v2; GCA_949126075.1)</li> </ul> <p>Genome assemblies and gene annotation for the following new species are now available:</p> <ul> <li><em>Arachis hypogaea</em> (Peanut; GCA_003086295.3)</li> <li><em>Lathyrus sativus </em>(Grass pea; GCA_963859935.3)</li> <li><em>Triticum timopheevii </em>(Sanduri wheat; GCA_963921465.1)</li> </ul> <h2 class="wp-block-heading"><strong>Ensembl Metazoa</strong></h2> <p>Multiple mosquito genomes have been added, including the latest reference genomes for <em>Anopheles coluzzii</em> (GCA_943734685.1) and <em>Anopheles funestus</em> (African malaria mosquito, GCA_943734845.1). Additionally, a new annotation source has also been added for the existing <em>Culex quinquefasciatus </em>(Southern house mosquito; GCA_015732765.1) assembly. Furthermore, genome assemblies and gene annotations are now available for the following species:</p> <ul> <li><em>Anastrepha ludens</em> (Mexican fruit fly; GCA_028408465.1)</li> <li><em>Anastrepha obliqua</em> (West Indian fruit fly; GCA_027943255.1)</li> <li><em>Anopheles coluzzii </em>(Mosquitos; GCA_943734685.1)</li> <li><em>Anopheles funestus </em>(African malaria mosquito; GCA_943734845.1)</li> <li><em>Bombus affinis</em> (Rusty patched bumble bee; GCA_024516045.2)</li> <li><em>Culex pipiens pallens </em>(Common house mosquito; GCA_016801865.2)</li> <li><em>Cylas formicarius</em> (Sweet potato weevil; GCA_029955315.1)</li> <li><em>Diorhabda carinulata</em> (Northern tamarisk beetle; GCA_026250575.1)</li> <li><em>Diorhabda sublineata</em> (Subtropical tamarisk beetle; GCA_026230105.1)</li> <li><em>Hylaeus anthracinus</em> (Anthricinan yellow-faced bee; GCA_026225885.1)</li> <li><em>Hylaeus volcanicus </em>(Volcano masked bee; GCA_026283585.1)</li> <li><em>Lutzomyia longipalpis</em> (Sandfly; GCA_024334085.1)</li> <li><em>Malaya genurostris </em>(Mosquitos; GCA_030247185.2)</li> <li><em>Microplitis demolitor </em>(Parasitoidwasp; GCA_026212275.2)</li> <li><em>Microplitis mediator </em>(Endoparasitoid wasp; GCA_029852145.1)</li> <li><em>Mytilus californianus </em>(California mussel; GCA_021869535.1)</li> <li><em>Phlebotomus argentipes</em> (Sandfly; GCA_947086385.1)</li> <li><em>Phlebotomus papatasi </em>(Sandfly; GCA_024763615.2)</li> <li><em>Plodia interpunctella </em>(Indianmeal moth; GCA_027563975.1)</li> <li><em>Spodoptera frugiperda</em> (Fall armyworm; GCA_023101765.3)</li> <li><em>Topomyia yanbarensis</em> (Mosquitos; GCA_030247195.1)</li> <li><em>Toxorhynchites rutilus septentrionalis</em> (Elephant mosquito; GCA_029784135.1)</li> <li><em>Uranotaenia lowii </em>(Sandfly; GCA_029784155.1)</li> <li><em>Wyeomyia smithii </em>(Pitcher plant mosquito; GCA_029784165.1)</li> <li><em>Zeugodacus cucurbitae </em>(Melon fly; GCA_028554725.2)</li> </ul> <p>The following genomes are no longer available:</p> <ul> <li><em>Athalia rosae</em> (Turnip sawfly; GCA_000344095.2)</li> <li><em>Drosophila yakuba</em> (Fruit fly; GCA_000005975.1)</li> <li><em>Melitaea cinxia</em> (Glanville fritillary; GCA_000716385.1)</li> </ul> <h2 class="wp-block-heading"><strong>Other updates and changes</strong></h2> <ul> <li>The latest InterProScan version (5.69-101.0) has been run on all species, including bacteria.</li> <li>Cross-references for all plant and vertebrate species are fully updated.</li> <li>UniProt references for all species are up-to-date.</li> <li>The <a href="https://github.com/Ensembl/ensembl" target="_blank" rel="noreferrer noopener">Ensembl core API</a> is now available on CPAN.</li> <li>Ensembl Genomes 37 (October 2017), Ensembl Genomes 40 (July 2018) and Ensembl 97 (July 2019) archives are now retired. Data are still available via the <a href="https://ftp.ensembl.org/pub/" target="_blank" rel="noreferrer noopener">Ensembl</a> and <a href="http://ftp.ensemblgenomes.org/pub/" target="_blank" rel="noreferrer noopener">Ensembl Genomes</a> FTP sites.</li> </ul> ]]></content:encoded> <post-id xmlns="com-wordpress:feed-additions:1">15700</post-id> </item> <item> <title>What’s coming in Ensembl release 113 / Ensembl Genomes 60?</title> <link>https://www.ensembl.info/2024/08/13/whats-coming-in-ensembl-release-113-ensembl-genomes-60/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=whats-coming-in-ensembl-release-113-ensembl-genomes-60</link> <dc:creator><![CDATA[Louisse Paola Mirabueno]]></dc:creator> <pubDate>Tue, 13 Aug 2024 09:49:07 +0000</pubDate> <category><![CDATA[Release announcements]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[Ensembl 97]]></category> <category><![CDATA[Ensembl Genomes]]></category> <category><![CDATA[Ensembl Genomes 37]]></category> <category><![CDATA[Ensembl Genomes 40]]></category> <category><![CDATA[Ensembl Metazoa]]></category> <category><![CDATA[Ensembl Plants]]></category> <category><![CDATA[Ensembl VEP]]></category> <category><![CDATA[Metazoa]]></category> <category><![CDATA[microarray]]></category> <category><![CDATA[Plants]]></category> <category><![CDATA[regulation]]></category> <category><![CDATA[REST API]]></category> <category><![CDATA[VEP]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15617</guid> <description><![CDATA[We expect to release Ensembl 113 and Ensembl Genomes 60 in September October 2024. Below is a list of updates that we are hoping to include in the upcoming release. However, please note that we cannot guarantee everything listed here will make it into the final release. Vertebrates GENCODE The Havana team is planning to [&#8230;]]]></description> <content:encoded><![CDATA[ <p>We expect to release Ensembl 113 and Ensembl Genomes 60 in <s>September</s> October 2024. Below is a list of updates that we are hoping to include in the upcoming release. However, please note that we cannot guarantee everything listed here will make it into the final release.</p> <span id="more-15617"></span> <h2 class="wp-block-heading">Vertebrates</h2> <h3 class="wp-block-heading">GENCODE</h3> <p>The <a href="https://www.ensembl.org/info/genome/genebuild/manual_havana.html" target="_blank" rel="noreferrer noopener">Havana</a> team is planning to integrate a large number of long non-coding RNA (lncRNA) transcripts originating from the Capture Long-read Sequencing (CLS) project within the <a href="https://www.gencodegenes.org/" target="_blank" rel="noreferrer noopener">GENCODE</a> consortium. The number of annotated lncRNA transcripts in <em>Homo sapiens</em> (Human) and <em>Mus musculus </em>(Mouse) will increase by roughly 130,000 in each species in release 113. We expect approximately 20,000 new lncRNA genes.</p> <p>The default <em>Gene</em> track is changing from <em>GENCODE Comprehensive</em> to <em>GENCODE Basic</em> to enable the display of the increased number of transcripts. Additionally, the <em>GENCODE Primary</em> tag will be introduced. This is a new transcript subset which covers all human exons in a minimal set of transcripts. The tag will be represented in REST API responses and GFF3/GTF files as <code>gencode_primary</code>. The <em>GENCODE Basic</em> tag (currently labelled as <code>basic</code>) will be changed to the tag <code>gencode_basic</code>.</p> <h3 class="wp-block-heading">Havana manual annotation</h3> <p>Updated Havana manual gene annotation will become available for <em>H. sapiens</em> (Human; GRCh38) and <em>M. musculus</em> (Mouse). Moreover, the major histocompatibility complex (MHC) genes in <em>Rattus norvegicus</em> (Norway rat; mRatBN7.2) and <em>Sus scrofa</em> (Pig; Sscrofa11.1) have been manually updated together with additional immune genes.&nbsp;</p> <h3 class="wp-block-heading">New gene annotation</h3> <p>Additional genome assemblies and gene annotation for breeds will be added for existing species:</p> <ul> <li><em>Capra hircus </em>(Goat): 2 breeds</li> <li><em>Ovis aries </em>(Sheep): 8 breeds</li> <li><em>Sus scrofa </em>(Pig): 8 breeds</li> </ul> <h3 class="wp-block-heading">Variation data</h3> <p>Our <em>O. aries</em> (Sheep; ARS-UI_Ramb_v2.0) displays have been updated to show variants from the <a href="https://www.ebi.ac.uk/eva/" target="_blank" rel="noreferrer noopener">European Variation Archive (EVA)</a> release 5.</p> <p>To simplify variant displays during the large increase in <em>H. sapiens </em>(Human; GRCh38) transcripts driven by the ongoing incorporation of models from long-read sequencing data, we will only display pre-calculated transcript consequences for the <em>GENCODE Primary</em> set, which includes all exons on this assembly. The <em>H. sapiens </em>(Human; GRCh37) displays will remain unchanged. We will provide a VCF file on the FTP site containing variant annotations for the full set of GRCh38 transcripts.</p> <p>The Ensembl Variant Effect Predictor (VEP) will support the <em>GENCODE Primary</em> transcript set to enable annotation of all potential variant consequences without duplication across multiple transcripts. The following will be supported:</p> <ul> <li>optionally limiting Ensembl VEP annotation to this set</li> <li>reporting the flag in Ensembl VEP output (as is currently done for <a href="https://www.ensembl.org/info/genome/genebuild/mane.html">MANE transcripts</a>) as a boolean</li> </ul> <p>GnomAD population allele frequency data for Human will be updated to v4.1. This will be available for the website, command-line and REST API instances of Ensembl VEP. We are also making REVEL and ClinPred scores available on the website and REST API version of Ensembl VEP for the Human GRCh38 assembly.</p> <h3 class="wp-block-heading">Regulation</h3> <p>Regulatory annotation for <em>H. sapiens </em>(Human) and <em>M</em>.<em> musculus </em>(Mouse) will receive a major update. This will include regulatory features, the set of epigenomes (tissues and cell lines) and the associated regulatory activity. We will be retiring the Transcription Factor Binding Sites (TFBS) regulatory feature from Human and Mouse. We will also update the motif features for Human and add motif feature annotation to Mouse.&nbsp;</p> <p>The format of ENSR IDs will be updated to use additional characters (capital letters and the underscore symbol <code>_</code>). This affects all species. Additionally, regulatory annotation will be updated for all species to address minor inconsistencies, add further data and introduce the new ENSR IDs.</p> <p>Microarray annotation will be retired for the following existing species:</p> <ul> <li><em>Anas platyrhynchos </em>(Mallard)</li> <li><em>Aotus nancymaae</em> (Nancy Ma&#8217;s night monkey)</li> <li><em>Callithrix jacchus</em> (Common marmoset)</li> <li><em>Carlito syrichta</em> (Philippine tarsier)</li> <li><em>Cavia porcellus</em> (Guinea pig)</li> <li><em>Cercocebus atys</em> (Sooty mangabey)</li> <li><em>Colobus angolensis palliatus</em> (Angolan colobus)</li> <li><em>Cricetulus griseus</em> (Chinese hamster) CHOK1GS</li> <li><em>Cricetulus griseus </em>(Chinese hamster) CriGri</li> <li><em>Cyprinodon variegatus</em> (Sheepshead minnow)</li> <li><em>Fundulus heteroclitus </em>(Mummichog)</li> <li><em>Ictalurus punctatus</em> (Channel catfish)</li> <li><em>Mandrillus leucophaeus</em> (Drill)</li> <li><em>Mesocricetus auratus</em> (Golden hamster)</li> <li><em>Microcebus murinus</em> (Gray mouse lemur)</li> <li><em>Mus spretus </em>(Algerian mouse)</li> <li><em>Nannospalax galili</em> (Middle East blind mole-rat)</li> <li><em>Nomascus leucogenys</em> (Northern white-cheeked gibbon)</li> <li><em>Ornithorhynchus anatinus</em> (Platypus)</li> <li><em>Papio anubis </em>(Olive baboon)</li> <li><em>Piliocolobus tephrosceles</em> (Ugandan red colobus)</li> <li><em>Prolemur simus</em> (Greater bamboo lemur)</li> <li><em>Propithecus coquereli</em> (Coquerel&#8217;s sifaka)</li> <li><em>Rhinopithecus bieti</em> (Black-and-white snub-nosed monkey)</li> <li><em>Rhinopithecus roxellana</em> (Golden snub-nosed monkey)</li> <li><em>Saimiri boliviensis boliviensis</em> (Black-capped squirrel monkey)</li> <li><em>Theropithecus gelada</em> (Gelada)</li> </ul> <p>The following REST API regulation endpoints will be retired:</p> <pre class="wp-block-code"><code>GET regulatory/species/:species/id/:id<br>GET regulatory/species/:species/epigenome<br>GET regulatory/species/:species/microarray/:microarray/vendor/:vendor<br>GET regulatory/species/:species/microarray<br>GET regulatory/species/:species/microarray/:microarray/probe/:probe<br>GET regulatory/species/:species/microarray/:microarray/probe_set/:probe_set</code></pre> <p>In addition, the <code>other_regulatory</code> and <code>array_probe</code> options in the feature parameter will be removed from the following endpoints:</p> <pre class="wp-block-code"><code>GET overlap/region/:species/:region<br>GET overlap/id/:id</code></pre> <h2 class="wp-block-heading">Ensembl Plants</h2> <p>Additional variation data from the <a href="https://wwwg2b.com/watkins" target="_blank" rel="noreferrer noopener">Watkins collection</a> will be added for <em>Triticum aestivum</em> (Common wheat; GCA_900519105.1).</p> <p>Genome assemblies and gene annotation will be updated for the following species:</p> <ul> <li><em>Marchantia polymorpha</em> (Common liverwort; GCA_039105155.1)</li> <li><em>Triticum aestivum</em> (Common wheat; Paragon v2; GCA_949126075.1)</li> </ul> <p>Additional genome assemblies and gene annotation will be added for the following existing species:</p> <ul> <li><em>Brassica rapa </em>(Field mustard; GCA_900412535.3)</li> </ul> <p>Genome assemblies and gene annotation will be added for the following new species:</p> <ul> <li><em>Arachis hypogaea</em> (Peanut; GCA_003086295.3)</li> <li><em>Lathyrus sativus </em>(Grass pea; GCA_963859935.3)</li> <li><em>Triticum timopheevii </em>(Sanduri wheat; GCA_963921465.1)</li> </ul> <h2 class="wp-block-heading">Ensembl Metazoa</h2> <p>A new annotation source will be added for the existing <em>Culex quinquefasciatus </em>(Southern house mosquito; GCA_015732765.1) assembly.</p> <p>Additional genome assemblies and gene annotation will be added for existing species:</p> <ul> <li><em>Anopheles coluzzii </em>(Mosquitos; GCA_943734685.1)</li> <li><em>Anopheles funestus </em>(African malaria mosquito; GCA_943734845.1)</li> <li><em>Lutzomyia longipalpis</em> (Sandfly; GCA_024334085.1)</li> <li><em>Phlebotomus papatasi </em>(Sandfly; GCA_024763615.2)</li> </ul> <p>Genome assemblies and gene annotation will be added for the following new species:</p> <ul> <li><em>Anastrepha ludens</em> (Mexican fruit fly; GCA_028408465.1)</li> <li><em>Anastrepha obliqua</em> (West Indian fruit fly; GCA_027943255.1)</li> <li><em>Bombus affinis</em> (Rusty patched bumble bee; GCA_024516045.2)</li> <li><em>Culex pipiens pallens </em>(Common house mosquito; GCA_016801865.2)</li> <li><em>Cylas formicarius</em> (Sweet potato weevil; GCA_029955315.1)</li> <li><em>Diorhabda carinulata</em> (Northern tamarisk beetle; GCA_026250575.1)</li> <li><em>Diorhabda sublineata</em> (Subtropical tamarisk beetle; GCA_026230105.1)</li> <li><em>Hylaeus anthracinus</em> (Anthricinan yellow-faced bee; GCA_026225885.1)</li> <li><em>Hylaeus volcanicus </em>(Volcano masked bee; GCA_026283585.1)</li> <li><em>Malaya genurostris </em>(Mosquitos; GCA_030247185.2)</li> <li><em>Microplitis demolitor </em>(Parasitoidwasp; GCA_026212275.2)</li> <li><em>Microplitis mediator </em>(Endoparasitoid wasp; GCA_029852145.1)</li> <li><em>Mytilus californianus </em>(California mussel; GCA_021869535.1)</li> <li><em>Phlebotomus argentipes</em> (Sandfly; GCA_947086385.1)</li> <li><em>Plodia interpunctella </em>(Indianmeal moth; GCA_027563975.1)</li> <li><em>Spodoptera frugiperda</em> (Fall armyworm; GCA_023101765.3)</li> <li><em>Topomyia yanbarensis</em> (Mosquitos; GCA_030247195.1)</li> <li><em>Toxorhynchites rutilus septentrionalis</em> (Elephant mosquito; GCA_029784135.1)</li> <li><em>Uranotaenia lowii </em>(Sandfly; GCA_029784155.1)</li> <li><em>Wyeomyia smithii </em>(Pitcher plant mosquito; GCA_029784165.1)</li> <li><em>Zeugodacus cucurbitae </em>(Melon fly; GCA_028554725.2)</li> </ul> <p>The following genomes will be removed:</p> <ul> <li><em>Athalia rosae</em> (Turnip sawfly; GCA_000344095.2)</li> <li><em>Drosophila yakuba</em> (Fruit fly; GCA_000005975.1)</li> <li><em>Melitaea cinxia</em> (Glanville fritillary; GCA_000716385.1)</li> </ul> <h2 class="wp-block-heading">Other updates and changes</h2> <ul> <li>The latest InterProScan version (5.69-101.0) will be run on all species, including those in Ensembl Bacteria</li> <li>Cross-references for all plant and vertebrate species will be fully updated</li> <li>UniProt references for all species will be updated</li> <li>Ensembl Genomes 37 (October 2017), Ensembl Genomes 40 (July 2018) and Ensembl 97 (July 2019) archives will be retired</li> </ul> ]]></content:encoded> <post-id xmlns="com-wordpress:feed-additions:1">15617</post-id> </item> <item> <title>Cool stuff Ensembl VEP can do: supporting alternative human assemblies</title> <link>https://www.ensembl.info/2024/08/09/cool-stuff-ensembl-vep-can-do-supporting-alternative-human-assemblies/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=cool-stuff-ensembl-vep-can-do-supporting-alternative-human-assemblies</link> <dc:creator><![CDATA[Aleena Mushtaq]]></dc:creator> <pubDate>Fri, 09 Aug 2024 07:22:00 +0000</pubDate> <category><![CDATA[Ensembl VEP]]></category> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[variation]]></category> <category><![CDATA[VEP]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15608</guid> <description><![CDATA[The Ensembl VEP command-line tool can annotate and filter variants called against the latest human assemblies, including the telomere-to-telomere assembly of the CHM13 cell line (T2T-CHM13). In this blog post, we provide examples of how to run Ensembl VEP with these new assemblies and list the additional annotations supported via plugins. The Human Pangenome Reference [&#8230;]]]></description> <content:encoded><![CDATA[ <p>The Ensembl VEP command-line tool can annotate and filter variants called against the latest human assemblies, including the telomere-to-telomere assembly of the CHM13 cell line (T2T-CHM13). In this blog post, we provide examples of how to run Ensembl VEP with these new assemblies and list the additional annotations supported via plugins.</p> <span id="more-15608"></span> <p>The <a href="https://humanpangenome.org/" target="_blank" rel="noreferrer noopener">Human Pangenome Reference Consortium (HPRC)</a> aims to sequence 350 individuals of diverse ancestries, producing a pangenome of 700 haplotypes by the end of 2024. The first publication (<a href="https://www.nature.com/articles/s41586-023-05896-x" target="_blank" rel="noreferrer noopener"><em>A draft human pangenome reference</em></a>) describes 47 phased, diploid assemblies from a cohort of genetically diverse individuals.</p> <p>We have annotated genes on these human assemblies, based on Ensembl/<a href="https://www.gencodegenes.org/human/release_38.html" target="_blank" rel="noreferrer noopener">GENCODE 38</a> genes and transcripts, via a new mapping pipeline as detailed in the Methods section of <a href="https://www.nature.com/articles/s41586-023-05896-x#Sec59" target="_blank" rel="noreferrer noopener"><em>A draft human pangenome reference</em></a>. The links to download and visualise the human annotations for HPRC assemblies are summarised in the <a href="https://projects.ensembl.org/hprc/" target="_blank" rel="noreferrer noopener">Ensembl HPRC data page</a>.</p> <h1 class="wp-block-heading">Running Ensembl VEP with HPRC assemblies</h1> <p>Currently, Ensembl VEP can only be run with HPRC assemblies in <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_offline">offline mode</a> and one assembly at a time. There are two ways to use Ensembl VEP with HPRC assemblies:</p> <ul> <li>Using <strong>Ensembl VEP cache</strong> with (recommended) <strong>FASTA sequence</strong> (the most efficient way)</li> <li>Using <strong>GTF annotation</strong> with (mandatory) <strong>FASTA sequence</strong></li> </ul> <p>In the examples below, we demonstrate annotating variants on the <strong>T2T-CHM13v2.0 </strong>(<a href="https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/" target="_blank" rel="noreferrer noopener">GCA_009914755.4</a>) assembly. To create a sample VCF to use in the examples below, you can take the first 100 lines from the ClinVar VCF file mapped to T2T-CHM13:</p> <pre class="wp-block-code"><code>clinvar=https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/variation/2022_10/vcf/2024_07/clinvar_20240624_GCA_009914755.4.vcf.gz tabix -h $clinvar 1 | head -n 100 &gt; test.vcf </code></pre> <h2 class="wp-block-heading">Ensembl VEP cache</h2> <p>The <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache" target="_blank" rel="noreferrer noopener">cache</a> is a downloadable archive containing all transcript models for an assembly; it may also contain regulatory features and variant data.<br>Let’s start by downloading and extracting the Ensembl VEP cache to the default Ensembl VEP directory (available for each annotation by clicking in <strong>VEP cache</strong> in the <a href="https://projects.ensembl.org/hprc/">Ensembl HPRC data page</a>). In the case of T2T-CHM13:</p> <pre class="wp-block-code"><code>cd $HOME/.vep curl -O https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/variation/2022_10/indexed_vep_cache/Homo_sapiens-GCA_009914755.4-2022_10.tar.gz tar xzf Homo_sapiens-GCA_009914755.4-2022_10.tar.gz</code></pre> <p>This will create the folder homo_sapiens_gca009914755v4/107_T2T-CHM13v2.0 with the gene data required to run Ensembl VEP. The name of this folder contains relevant information when running VEP:</p> <ul> <li>Species: homo_sapiens_gca009914755v4</li> <li>Cache version: 107</li> <li>Assembly: T2T-CHM13v2.0</li> </ul> <p>As well as molecular consequence predictions, many gene/transcript-based <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html" target="_blank" rel="noreferrer noopener">Ensembl VEP options</a> are supported for HPRC assemblies:</p> <pre class="wp-block-code"><code>vep -i test.vcf --offline --species homo_sapiens_gca009914755v4 --cache_version 107 --fasta Homo_sapiens-GCA_009914755.4-softmasked.fa.gz --domains --symbol --canonical --protein --biotype --uniprot --variant_class</code></pre> <p>We don’t have other annotations, such as RefSeq transcripts or variant information in the cache.&nbsp;</p> <p>To run Ensembl VEP with the downloaded cache in offline mode, please specify the species (which includes assembly name) and cache version:</p> <pre class="wp-block-code"><code>vep -i test.vcf --offline --species homo_sapiens_gca009914755v4 --cache_version 107</code></pre> <h2 class="wp-block-heading">FASTA sequence</h2> <p>When using Ensembl VEP cache, supplying the reference genomic sequence in a FASTA file is optional, but is required to enable the following options:</p> <ul> <li>Create HGVS notations (<a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_hgvs" target="_blank" rel="noreferrer noopener">&#8211;hgvs</a> and <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_hgvsg" target="_blank" rel="noreferrer noopener">&#8211;hgvsg</a>)</li> <li>Check the reference sequence given in input data (<a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_check_ref" target="_blank" rel="noreferrer noopener">&#8211;check_ref</a>)</li> </ul> <p>Genomic FASTA files can be found in <a href="https://projects.ensembl.org/hprc/" target="_blank" rel="noreferrer noopener">Ensembl HPRC data page</a> <strong>&gt; FTP dumps</strong> <strong>&gt; ensembl &gt; genome</strong>. FASTA files need to be either uncompressed or compressed with <strong>bgzip</strong> (recommended) to be compatible with VEP. For instance, to download a compressed FASTA file, uncompress it and then re-compress it with bgzip:</p> <pre class="wp-block-code"><code>curl -O https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/genome/Homo_sapiens-GCA_009914755.4-softmasked.fa.gz gzip -d Homo_sapiens-GCA_009914755.4-softmasked.fa.gz bgzip Homo_sapiens-GCA_009914755.4-softmasked.fa.gz</code></pre> <p>Afterwards, you can run VEP using cache and the <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_fasta" target="_blank" rel="noreferrer noopener">&#8211;fasta</a> flag:</p> <pre class="wp-block-code"><code>vep -i test.vcf --offline --species homo_sapiens_gca009914755v4 --cache_version 107 --fasta Homo_sapiens-GCA_009914755.4-softmasked.fa.gz</code></pre> <p>Visit the documentation for more information on <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#fasta" target="_blank" rel="noreferrer noopener">using FASTA files with VEP</a>.</p> <h2 class="wp-block-heading">GTF and GFF annotation</h2> <p>As an alternative to using cache files, Ensembl VEP can utilise gene information in appropriately indexed GTF or GFF files. GTF and GFF files can be downloaded from the annotation column in the <a href="https://projects.ensembl.org/hprc/" target="_blank" rel="noreferrer noopener">Ensembl HPRC data page</a>. The data needs to be re-sorted in chromosomal order, compressed in <strong>bgzip</strong> and indexed with <strong>tabix</strong>. We present here the example for a GTF file:</p> <pre class="wp-block-code"><code>curl -O https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/geneset/2022_07/Homo_sapiens-GCA_009914755.4-2022_07-genes.gtf.gz gzip -d Homo_sapiens-GCA_009914755.4-2022_07-genes.gtf.gz grep -v "#" Homo_sapiens-GCA_009914755.4-2022_07-genes.gtf | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c &gt; Homo_sapiens-GCA_009914755.4-2022_07-genes.gtf.gz tabix Homo_sapiens-GCA_009914755.4-2022_07-genes.gtf.gz</code></pre> <p>FASTA files are <strong>always</strong> required when running HPRC data with GTF annotation, as the transcript sequences are not available in the GFF files.</p> <p>Afterwards, you can run Ensembl VEP using the GTF and FASTA files:</p> <pre class="wp-block-code"><code><strong>vep</strong> -i test.vcf --gtf Homo_sapiens-GCA_009914755.4-2022_07-genes.gtf.gz --fasta Homo_sapiens-GCA_009914755.4-softmasked.fa.gz</code></pre> <p>Check our documentation for more information on using <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gff" target="_blank" rel="noreferrer noopener">GTF and GFF annotation</a>.</p> <h2 class="wp-block-heading">Missense deleteriousness predictions&nbsp;</h2> <p>Using our <a href="https://github.com/Ensembl/ensembl-variation/tree/main/nextflow/ProteinFunction" target="_blank" rel="noreferrer noopener">ProteinFunction pipeline</a>, we ran PolyPhen-2 2.2.3 and SIFT 6.2.1 on the proteome sequences for GRCh38 and all HPRC assemblies (the protein FASTA files indicated in <a href="https://projects.ensembl.org/hprc/" target="_blank" rel="noreferrer noopener">Ensembl HPRC data page</a>) and stored their results in a single SQLite file: <a href="https://ftp.ensembl.org/pub/current_variation/pangenomes/Human/homo_sapiens_pangenome_PolyPhen_SIFT_20240502.db" target="_blank" rel="noreferrer noopener">homo_sapiens_pangenome_PolyPhen_SIFT_20240502.db</a>.</p> <p>Pre-computed scores and predictions can be retrieved by downloading this file and running VEP with the <a href="https://github.com/Ensembl/VEP_plugins/blob/main/PolyPhen_SIFT.pm" target="_blank" rel="noreferrer noopener">PolyPhen_SIFT plugin</a>:</p> <pre class="wp-block-code"><code><strong>curl </strong>-O https://ftp.ensembl.org/pub/current_variation/pangenomes/Human/homo_sapiens_pangenome_PolyPhen_SIFT_20240502.db</code></pre> <pre class="wp-block-code"><code><strong>vep</strong> -i test.vcf --offline --species homo_sapiens_gca009914755v4 --cache_version 107 --fasta Homo_sapiens-GCA_009914755.4-softmasked.fa.gz --plugin PolyPhen_SIFT,db=human_pangenomes.PolyPhen_SIFT.db</code></pre> <h2 class="wp-block-heading">Matched variant annotations (ClinVar, gnomAD and dbSNP)</h2> <p>We don’t have variant data in the Ensembl VEP caches for the pangenome assemblies, but it can be integrated using the <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html" target="_blank" rel="noreferrer noopener">&#8211;custom</a> option with data files using the same assembly coordinates. We have lifted-over some key datasets, including ClinVar and gnomAD to the HPRC assemblies (downloadable from the VCF column in <a href="https://projects.ensembl.org/hprc/" target="_blank" rel="noreferrer noopener">Ensembl HPRC data page</a>).</p> <p><em># Download ClinVar data and respective index (TBI)</em></p> <pre class="wp-block-code"><code><strong>curl </strong>-O<strong> </strong>https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/variation/2022_10/vcf/2024_07/clinvar_20240624_GCA_009914755.4.vcf.gz</code></pre> <pre class="wp-block-code"><code><strong>curl </strong>-O<strong> </strong>https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/variation/2022_10/vcf/2024_07/clinvar_20240624_GCA_009914755.4.vcf.gz.tbi</code></pre> <p><em># Run VEP with ClinVar data</em></p> <pre class="wp-block-code"><code>vep -i test.vcf --offline \ --species homo_sapiens_gca009914755v4 --cache_version 107 \ --fasta Homo_sapiens-GCA_009914755.4-softmasked.fa.gz \ --custom file=clinvar_20240624_GCA_009914755.4.vcf.gz,short_name=ClinVar,format=vcf,type=exact,coords=0,fields=CLNSIG%CLNREVSTAT%CLNDN</code></pre> <h2 class="wp-block-heading">Additional annotations&nbsp;</h2> <p>Ensembl VEP plugins are a simple way to add new functionality to your analysis. Many require data that is only available for GRCh37 or GRCh38, but others, for example, those based on gene attributes or on the fly analysis are compatible with the HGRC assemblies.</p> <p>Here is a list of compatible Ensembl VEP plugins that can be easily used with HPRC assemblies:</p> <figure class="wp-block-table has-small-font-size"><table class="has-fixed-layout"><thead><tr><th><strong>Plugin</strong></th><th><strong>Description</strong></th><th><strong>Plugin data</strong></th><th><strong>Usage example</strong></th></tr></thead><tbody><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/Blosum62.pm" target="_blank" rel="noreferrer noopener">Blosum62</a></td><td>Looks up the BLOSUM 62 substitution matrix score for the reference and alternative amino acids predicted for a missense mutation.</td><td></td><td>&#8211;plugin Blosum62</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/DosageSensitivity.pm" target="_blank" rel="noreferrer noopener">DosageSensitivity</a></td><td>Retrieves haploinsufficiency and triplosensitivity probability scoresfor affected genes (<a href="https://doi.org/10.1016/j.cell.2022.06.036">Collins <em>et al.</em>, 2022</a>).</td><td><a href="https://zenodo.org/record/6347673/files/Collins_rCNV_2022.dosage_sensitivity_scores.tsv.gz" target="_blank" rel="noreferrer noopener">Collins_rCNV_2022.dosage_sensitivity_scores.tsv.gz</a></td><td>&#8211;plugin DosageSensitivity,file=Collins_rCNV_2022.dosage_sensitivity_scores.tsv.gz</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/Downstream.pm" target="_blank" rel="noreferrer noopener">Downstream</a></td><td>Predicts downstream effects of a frameshift variant on the protein sequence of a transcript.</td><td>Requires a FASTA file provided via the <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_fasta" target="_blank" rel="noreferrer noopener">&#8211;fasta</a> option</td><td>&#8211;plugin Downstream</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/Draw.pm" target="_blank" rel="noreferrer noopener">Draw</a></td><td>Draws pictures of the transcript model showing the variant location.</td><td></td><td>&#8211;plugin Draw</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/GeneSplicer.pm" target="_blank" rel="noreferrer noopener">GeneSplicer</a></td><td>Runs <a href="https://ccb.jhu.edu/software/genesplicer/">GeneSplicer</a> to get splice site predictions.</td><td>Binary and training data for GeneSplicer (<a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#GeneSplicer" target="_blank" rel="noreferrer noopener">plugin instructions</a>)</td><td>&#8211;plugin GeneSplicer,binary=genesplicer/bin/linux/genesplicer,training=genesplicer/human</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/GO.pm" target="_blank" rel="noreferrer noopener">GO</a></td><td>Retrieves Gene Ontology (GO) terms associated with genes (for HGRC assemblies, specifically) using custom GFF annotation containing GO terms.</td><td><a href="https://projects.ensembl.org/hprc/" target="_blank" rel="noreferrer noopener">Ensembl HPRC data page</a><strong> &gt; FTP dumps &gt; ensembl &gt; variation &gt; [date] &gt; gff:</strong>*_GO_plugin.gff.gz*_GO_plugin.gff.gz.tbi</td><td>&#8211;plugin GO,file=homo_sapiens_gca009914755v4_110_VEP_GO_plugin.gff.gz</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/HGVSIntronOffset.pm" target="_blank" rel="noreferrer noopener">HGVSIntronOffset</a></td><td>Returns HGVS intron start and end offsets. To be used with <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_hgvs">&#8211;hgvs</a> option.</td><td></td><td>&#8211;plugin HGVSIntronOffset</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/LoFtool.pm" target="_blank" rel="noreferrer noopener">LoFtool</a></td><td>Provides a rank of genic intolerance and consequent susceptibility to disease based on the ratio of Loss-of-function (LoF) to synonymous mutations for each gene.</td><td></td><td>&#8211;plugin LoFtool</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/MaxEntScan.pm" target="_blank" rel="noreferrer noopener">MaxEntScan</a></td><td>Runs <a href="http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html">MaxEntScan</a> to get splice site predictions.</td><td>Extracted directory from <a href="http://hollywood.mit.edu/burgelab/maxent/download/fordownload.tar.gz" target="_blank" rel="noreferrer noopener">fordownload.tar.gz</a></td><td>&#8211;plugin MaxEntScan,/path/to/fordownload</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/NearestExonJB.pm" target="_blank" rel="noreferrer noopener">NearestExonJB</a></td><td>Finds the nearest exon junction boundary to a coding sequence variant.</td><td></td><td>&#8211;plugin NearestExonJB</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/NMD.pm" target="_blank" rel="noreferrer noopener">NMD</a></td><td>Predicts if a variant allows the transcript to escape nonsense-mediated mRNA decay based on certain rules.</td><td></td><td>&#8211;plugin NMD</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/Phenotypes.pm" target="_blank" rel="noreferrer noopener">Phenotypes</a></td><td>Retrieves overlapping phenotype information.</td><td><a href="https://projects.ensembl.org/hprc/" target="_blank" rel="noreferrer noopener">Ensembl HPRC data page</a><strong> &gt; FTP dumps &gt; ensembl &gt; variation &gt; [date] &gt; gff:</strong>*_phenotypes_plugin.gvf.gz*_phenotypes_plugin.gvf.gz.tbi</td><td>&#8211;plugin Phenotypes,file=homo_sapiens_gca009914755v4_110_VEP_phenotypes_plugin.gvf.gz</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/pLI.pm" target="_blank" rel="noreferrer noopener">pLI</a></td><td>Adds the probability of a gene being loss-of-function intolerant (pLI).</td><td></td><td>&#8211;plugin pLI</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/PolyPhen_SIFT.pm" target="_blank" rel="noreferrer noopener">PolyPhen_SIFT</a></td><td>Retrieves PolyPhen and SIFT predictions from a SQLite database.</td><td><a href="https://ftp.ensembl.org/pub/current_variation/pangenomes/Human/homo_sapiens_pangenome_PolyPhen_SIFT_20240502.db" target="_blank" rel="noreferrer noopener">homo_sapiens_pangenome_PolyPhen_SIFT_20240502.db</a></td><td>&#8211;plugin PolyPhen_SIFT,db=homo_sapiens_pangenome_PolyPhen_SIFT_20240502.db</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/ProteinSeqs.pm" target="_blank" rel="noreferrer noopener">ProteinSeqs</a></td><td>Writes two files with the reference and mutated protein sequences of any proteins found with non-synonymous mutations in the input file.</td><td></td><td>&#8211;plugin ProteinSeqs</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/SingleLetterAA.pm" target="_blank" rel="noreferrer noopener">SingleLetterAA</a></td><td>Returns HGVSp string with single amino acid letter codes.</td><td></td><td>&#8211;plugin SingleLetterAA</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/SpliceRegion.pm" target="_blank" rel="noreferrer noopener">SpliceRegion</a></td><td>Provides more granular predictions of splicing effects.</td><td></td><td>&#8211;plugin SpliceRegion</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/SubsetVCF.pm" target="_blank" rel="noreferrer noopener">SubsetVCF</a></td><td>Retrieves overlapping records from a given VCF file.</td><td>A VCF file</td><td>&#8211;plugin SubsetVCF,file=file.vcf.gz,name=myvfc</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/TranscriptAnnotator.pm" target="_blank" rel="noreferrer noopener">TranscriptAnnotator</a></td><td>Annotates variant-transcript pairs based on a given file.</td><td>Tab-separated annotation file (<a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#TranscriptAnnotator" target="_blank" rel="noreferrer noopener">plugin instructions</a>)</td><td>&#8211;plugin TranscriptAnnotator,file=annotation.txt.gz</td></tr><tr><td><a href="https://github.com/Ensembl/VEP_plugins/blob/main/TSSDistance.pm" target="_blank" rel="noreferrer noopener">TSSDistance</a></td><td>Calculates the distance from the transcription start site for upstream variants.</td><td></td><td>&#8211;plugin TSSDistance</td></tr></tbody></table></figure> <p>Written by: Nuno Agostinho, Jamie Allen and Sarah Hunt. Edited by Aleena Mushtaq. </p> ]]></content:encoded> <post-id xmlns="com-wordpress:feed-additions:1">15608</post-id> </item> <item> <title>Getting to know us: Reham from Ensembl Havana</title> <link>https://www.ensembl.info/2024/07/31/getting-to-know-us-reham-from-ensembl-havana/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=getting-to-know-us-reham-from-ensembl-havana</link> <dc:creator><![CDATA[Louisse Paola Mirabueno]]></dc:creator> <pubDate>Wed, 31 Jul 2024 13:23:26 +0000</pubDate> <category><![CDATA[Community]]></category> <category><![CDATA[annotation]]></category> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[career]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[Havana]]></category> <category><![CDATA[manual annotation]]></category> <category><![CDATA[manual curation]]></category> <category><![CDATA[Teamsembl]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15581</guid> <description><![CDATA[In July, we talked to Reham Fatima, a member of the Havana team. The Havana team carry out manual annotation on human, mouse, zebrafish and rat genomes. Reham has contributed to the MANE and GENCODE projects. Read more about Reham&#8217;s work at EMBL-EBI and what kind of cuisine she likes the most. You can also [&#8230;]]]></description> <content:encoded><![CDATA[ <p>In July, we talked to <a href="https://www.ebi.ac.uk/people/person/reham-fatima/" target="_blank" rel="noreferrer noopener">Reham Fatima</a>, a member of the Havana team. The Havana team carry out manual annotation on human, mouse, zebrafish and rat genomes. Reham has contributed to the MANE and GENCODE projects. Read more about Reham&#8217;s work at <a href="https://www.ebi.ac.uk/" target="_blank" rel="noreferrer noopener">EMBL-EBI</a> and what kind of cuisine she likes the most. You can also see some amazing photographs she has taken in her free time.</p> <span id="more-15581"></span> <h2 class="wp-block-heading">What do you enjoy the most about your job?</h2> <p>I enjoy both interacting with smart and kind colleagues, as well as the interdisciplinarity of my role itself: applying programming and performing statistical analyses on genetic data.</p> <h2 class="wp-block-heading">What is your favourite thing about working at EMBL-EBI?</h2> <p>The multicultural environment and international exposure. Not only do people bring a variety of ideas from all over the world, I have learnt a lot about their cultural backgrounds and home countries.</p> <h2 class="wp-block-heading">Can you share a project or accomplishment that you consider to be the most significant in your career at Ensembl so far?</h2> <p>I would take pride in my work on both contributing to <a href="https://www.ensembl.org/info/genome/genebuild/mane.html" target="_blank" rel="noreferrer noopener">Matched Annotation between NCBI and EMBL-EBI (MANE)</a> and my current work on GENCODE Primary.</p> <h2 class="wp-block-heading">What inspired you to pursue a career in computational biology / bioinformatics?</h2> <p>I had wanted to be a scientist since I was young. It was a bit of a struggle back home as the most “acceptable” jobs at that time were either medical doctor or engineer. I did not feel I was cut out for either of them individually. I had always been intrigued with using different disciplines to solve the problems of another one. It was initially a struggle to convince my father, who was financing my education at that time. But for every job I got after my degrees, he had been immensely proud, and I would feel that was a cherry on top of my scientific job itself.</p> <figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex"> <figure class="wp-block-image size-large"><img decoding="async" width="768" height="1024" data-id="15595" src="https://www.ensembl.info/wp-content/uploads/2024/07/IMG_6376-1-768x1024.jpg" alt="" class="wp-image-15595" srcset="https://www.ensembl.info/wp-content/uploads/2024/07/IMG_6376-1-768x1024.jpg 768w, https://www.ensembl.info/wp-content/uploads/2024/07/IMG_6376-1-225x300.jpg 225w, https://www.ensembl.info/wp-content/uploads/2024/07/IMG_6376-1-1152x1536.jpg 1152w, https://www.ensembl.info/wp-content/uploads/2024/07/IMG_6376-1-1536x2048.jpg 1536w, https://www.ensembl.info/wp-content/uploads/2024/07/IMG_6376-1-scaled.jpg 1920w" sizes="(max-width: 768px) 100vw, 768px" /><figcaption class="wp-element-caption">A photograph of Reham in front of King&#8217;s College in Cambridge, UK.</figcaption></figure> <figure class="wp-block-image size-large"><img decoding="async" width="576" height="1024" data-id="15596" src="https://www.ensembl.info/wp-content/uploads/2024/07/VideoCapture_20240521-220345-1-576x1024.jpg" alt="" class="wp-image-15596" srcset="https://www.ensembl.info/wp-content/uploads/2024/07/VideoCapture_20240521-220345-1-576x1024.jpg 576w, https://www.ensembl.info/wp-content/uploads/2024/07/VideoCapture_20240521-220345-1-169x300.jpg 169w, https://www.ensembl.info/wp-content/uploads/2024/07/VideoCapture_20240521-220345-1-768x1365.jpg 768w, https://www.ensembl.info/wp-content/uploads/2024/07/VideoCapture_20240521-220345-1-864x1536.jpg 864w, https://www.ensembl.info/wp-content/uploads/2024/07/VideoCapture_20240521-220345-1.jpg 1080w" sizes="(max-width: 576px) 100vw, 576px" /><figcaption class="wp-element-caption">A photograph of Reham near Khor al Udaid in Qatar.</figcaption></figure> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="768" data-id="15592" src="https://www.ensembl.info/wp-content/uploads/2024/07/IMG_20230117_002242-1-1024x768.jpg" alt="" class="wp-image-15592" srcset="https://www.ensembl.info/wp-content/uploads/2024/07/IMG_20230117_002242-1-1024x768.jpg 1024w, https://www.ensembl.info/wp-content/uploads/2024/07/IMG_20230117_002242-1-300x225.jpg 300w, https://www.ensembl.info/wp-content/uploads/2024/07/IMG_20230117_002242-1-768x576.jpg 768w, https://www.ensembl.info/wp-content/uploads/2024/07/IMG_20230117_002242-1-1536x1152.jpg 1536w, https://www.ensembl.info/wp-content/uploads/2024/07/IMG_20230117_002242-1.jpg 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">A photograph of Reham in Cambridge, UK.</figcaption></figure> </figure> <h2 class="wp-block-heading">What is your favourite way to relax and unwind after a long day at work?</h2> <p>Have a lovely cup of chai and a savoury snack with either watching one of my favourite TV shows or talking to a good friend.</p> <h2 class="wp-block-heading">What is your favourite type of cuisine and what do you like about it?</h2> <p>Pakistani food, hands down. The taste, the combinations of spices and herbs (or sugar for sweets <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f600.png" alt="😀" class="wp-smiley" style="height: 1em; max-height: 1em;" />).</p> <h2 class="wp-block-heading">If you could have any superpower, what would it be and why?</h2> <p>Most likely flying. So I could fly to any country and back without queuing at airports and border controls <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f600.png" alt="😀" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> ]]></content:encoded> <post-id xmlns="com-wordpress:feed-additions:1">15581</post-id> </item> <item> <title>Updated gene annotation for Rattus norvegicus (Norway rat)</title> <link>https://www.ensembl.info/2024/07/19/updated-gene-annotation-for-rattus-norvegicus-norway-rat/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=updated-gene-annotation-for-rattus-norvegicus-norway-rat</link> <dc:creator><![CDATA[Louisse Paola Mirabueno]]></dc:creator> <pubDate>Fri, 19 Jul 2024 09:53:11 +0000</pubDate> <category><![CDATA[New data and web features]]></category> <category><![CDATA[annotation]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[Ensembl Rapid Release]]></category> <category><![CDATA[Genebuild]]></category> <category><![CDATA[Havana]]></category> <category><![CDATA[Norway rat]]></category> <category><![CDATA[rat]]></category> <category><![CDATA[Rattus norvegicus]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15569</guid> <description><![CDATA[We have made updates to the Rattus norvegicus (Norway rat) gene annotation. The GRCr8 assembly is now the reference and mRatBN7.2 has been enhanced with manual annotation. GRCr8 is now the reference We are excited to announce the release of the Ensembl automated annotation for the latest Rattus norvegicus (Norway rat) reference assembly from the [&#8230;]]]></description> <content:encoded><![CDATA[ <p>We have made updates to the <em>Rattus norvegicus</em> (Norway rat) gene annotation. The GRCr8 assembly is now the reference and mRatBN7.2 has been enhanced with manual annotation.</p> <span id="more-15569"></span> <h1 class="wp-block-heading">GRCr8 is now the reference</h1> <p>We are excited to announce the release of the Ensembl automated annotation for the latest <em>Rattus norvegicus </em>(Norway rat) reference assembly from the <a href="https://www.ncbi.nlm.nih.gov/grc" target="_blank" rel="noreferrer noopener">Genome Reference Consortium (GRC)</a>! The new assembly, GRCr8 (<a href="https://www.ebi.ac.uk/ena/browser/view/GCA_036323735" target="_blank" rel="noreferrer noopener">GCA_036323735.1</a>), which became available in January 2024 will replace the current mRatBN7.2 (<a href="https://www.ebi.ac.uk/ena/browser/view/GCA_015227675.2" target="_blank" rel="noreferrer noopener">GCA_015227675.2</a>) as the new reference assembly for this species.</p> <p>The latest genome, which is now available in <a href="https://rapid.ensembl.org/Rattus_norvegicus_GCA_036323735.1/Info/Index" target="_blank" rel="noreferrer noopener">Ensembl Rapid Release,</a> was annotated via the Ensembl automated annotation system. More details on the annotation can be found via our <a href="https://rapid.ensembl.org/info/genome/genebuild/full_genebuild.html" target="_blank" rel="noreferrer noopener">Rapid Release documentation</a>.</p> <h2 class="wp-block-heading">mRatBN7.2 has been enhanced with manual annotation</h2> <p>Additionally, the previous assembly, mRatBN7.2 (GCA_015227675.2) has been significantly enhanced with manual annotation by the HAVANA team. The updated annotation has become available&nbsp;in <a href="https://rapid.ensembl.org/Rattus_norvegicus_GCA_015227675.2/Info/Index" target="_blank" rel="noreferrer noopener">Ensembl Rapid Release 62</a>.</p> <p>The HAVANA team meticulously manually updated the annotation of over 200 community-selected genes, corrected automatic annotation errors, such as incorrect splicing and incomplete transcripts, and expanded isoforms using long-read RNA-seq data. The entire major histocompatibility complex (MHC) region was manually updated creating an additional 38 new genes in this region. This manual enrichment has ensured higher accuracy and completeness, which is essential for advanced genomic studies.</p> <h2 class="wp-block-heading">Future Goals</h2> <p>The GRCr8 assembly currently features automatic annotation, primarily aimed at generating a conservative set of protein-coding gene models, with non-coding genes and pseudogenes also annotated. Looking ahead, a manually updated annotation of the GRCr8 assembly will be available soon. Manual annotation from mRatBN7.2 will be projected onto GRCr8 with additional isoforms incorporated from long-read RNA-seq data on Ensembl. We hope that this will provide the scientific community with a more accurate tool for advancing research and understanding of the <em>R. norvegicus</em> genome.</p> <p>Stay tuned for further updates on our blog!</p> <div style="height:44px" aria-hidden="true" class="wp-block-spacer"></div> <p>Authors: Francesca Tricomi and Jose Gonzalez<br>Editors: Leanne Haggerty and Louisse Paola Mirabueno</p> ]]></content:encoded> <post-id xmlns="com-wordpress:feed-additions:1">15569</post-id> </item> <item> <title>Getting to know us: Ian Tsang (visiting PhD student)</title> <link>https://www.ensembl.info/2024/06/28/getting-to-know-us-ian-tsang-visiting-phd-student/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=getting-to-know-us-ian-tsang-visiting-phd-student</link> <dc:creator><![CDATA[Louisse Paola Mirabueno]]></dc:creator> <pubDate>Fri, 28 Jun 2024 10:34:40 +0000</pubDate> <category><![CDATA[Community]]></category> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[career]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[Ensembl Genomes]]></category> <category><![CDATA[Ensembl Plants]]></category> <category><![CDATA[Outreach]]></category> <category><![CDATA[Plants]]></category> <category><![CDATA[Teamsembl]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15545</guid> <description><![CDATA[Meet visiting PhD student Ian Tsang, who is doing a three-month work experience placement with the Ensembl Plants team. We talked all about his PhD, what projects he is working on during his time in Ensembl and what he does during his free time. What’s your PhD research about and why is it important? My [&#8230;]]]></description> <content:encoded><![CDATA[ <p>Meet visiting PhD student <a href="https://www.ebi.ac.uk/people/person/ian-tsang/" target="_blank" rel="noreferrer noopener">Ian Tsang</a>, who is doing a three-month work experience placement with the Ensembl Plants team. We talked all about his PhD, what projects he is working on during his time in Ensembl and what he does during his free time. </p> <span id="more-15545"></span> <div class="wp-block-columns is-layout-flex wp-container-core-columns-layout-1 wp-block-columns-is-layout-flex"> <div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow"> <div style="height:39px" aria-hidden="true" class="wp-block-spacer"></div> <figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" width="683" height="1024" src="https://www.ensembl.info/wp-content/uploads/2024/06/2024-06_placementIan-683x1024.png" alt="" class="wp-image-15552" style="width:383px;height:auto" srcset="https://www.ensembl.info/wp-content/uploads/2024/06/2024-06_placementIan-683x1024.png 683w, https://www.ensembl.info/wp-content/uploads/2024/06/2024-06_placementIan-200x300.png 200w, https://www.ensembl.info/wp-content/uploads/2024/06/2024-06_placementIan-768x1152.png 768w, https://www.ensembl.info/wp-content/uploads/2024/06/2024-06_placementIan.png 1000w" sizes="(max-width: 683px) 100vw, 683px" /><figcaption class="wp-element-caption">A portrait of Ian Tsang.</figcaption></figure> </div> <div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow"> <h2 class="wp-block-heading">What’s your PhD research about and why is it important?</h2> <p>My PhD project focusses on <em>Triticum aestivum</em> (wheat) trait genetics. Specifically, I work on investigating root hair development in wheat. Root hairs are very small, fine hair projections on root surfaces, and they are critical for nutrient and water uptake in plants. To date, most fundamental knowledge on root hair development has been in the model plant <em>Arabidopsis</em>, and translating this knowledge to a much more complex species, like wheat, is challenging. My work aims to investigate how root hair development occurs and is regulated in wheat, as well as investigating the natural variation in root hair morphology across available wheat cultivars, with the ultimate goal of targeting root hairs as an exploitable trait for future wheat improvement. </p> </div> </div> <h2 class="wp-block-heading">Why did you choose to do a placement at Ensembl?</h2> <p>During my PhD, I discovered how much I enjoy bioinformatics and programming. Having come from a biology background, I wanted to experience what bioinformatics was like on an industrial scale, before I knew whether to fully commit to a career in this field. For my PhD work, I routinely used <a href="https://plants.ensembl.org/" target="_blank" rel="noreferrer noopener">Ensembl Plants</a> on a daily basis, and I always loved how many features were available and the wealth of information presented to users. When I had to decide where to spend my three-month placement, I knew that an opportunity to work at EMBL-EBI with the Ensembl Plants team would be extremely beneficial for my research and my future career. It would provide me with a new set of skills and help enhance my existing skills in Python and SQL, and also allow me to meet and learn from many vastly more experienced bioinformaticians than myself!</p> <h2 class="wp-block-heading">What are you working on and how long is your placement at Ensembl?</h2> <p>During my three-month stay, I will be working on a few different projects. On the production side, I have been working on using internal pipelines to upload the new barley pangenome to Ensembl, which consists of 76 primary accessions. On the side, I have been working on scripts/recipes to mine useful data from Ensembl Plants. As a user of Ensembl, there were certain ideas and features that I wanted to retrieve during my research that were unavailable or difficult to access through the website. The scripts I am writing will help other users access data in a high-throughput manner with next to no required knowledge of Python, SQL or API. I will also be working on rice pangene IDs, and also helping the Ensembl Outreach team with some teaching!</p> <h2 class="wp-block-heading">How did you get into bioinformatics?</h2> <p>As you can probably tell from my answer on my PhD research, my PhD is not bioinformatics-focussed! Before I started my PhD, I knew nothing about programming or bioinformatics &#8211; I only knew how to use ggplot2 in R. At the very beginning of my PhD, I inherited a large amount of wheat exome and RNAseq data, and I had to spend the best part of my first year learning bioinformatics, from learning to use the HPC and Python to process and analyse this data. I very quickly realised how much I enjoyed bioinformatics compared to wet lab work, and have since been steering my PhD more and more towards bioinformatics. This early realisation in my PhD has helped me decide that I want to pursue a career in bioinformatics.</p> <h2 class="wp-block-heading">What do you enjoy doing outside of work?</h2> <p>I love cars, and I spend most of my free time thinking about and looking at cars on Autotrader. If I’m not looking at cars, I’m probably buying car parts on eBay or tinkering with my car here and there. Aside from cars, I routinely go to the gym, run and play video games with my friends.&nbsp;</p> <h2 class="wp-block-heading">Do you have a hidden talent or skill that most people don’t know about?</h2> <p>This is a very random hidden talent, but I have perfect pitch. It’s an innate ability that allows me to immediately recognize any musical note, or determine the key of a song, without any reference. In essence, it’s like how most people can identify colours immediately, but I can do it with musical notes. It allows me to tell if anything is out of tune, whether it’s a song or an instrument that’s not been tuned/calibrated. It was very helpful growing up when I was learning musical instruments!&nbsp;</p> ]]></content:encoded> <post-id xmlns="com-wordpress:feed-additions:1">15545</post-id> </item> <item> <title>Integration of AlphaMissense scores into Ensembl</title> <link>https://www.ensembl.info/2024/05/24/integration-of-alphamissense-scores-into-ensembl/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=integration-of-alphamissense-scores-into-ensembl</link> <dc:creator><![CDATA[Louisse Paola Mirabueno]]></dc:creator> <pubDate>Fri, 24 May 2024 10:30:27 +0000</pubDate> <category><![CDATA[New data and web features]]></category> <category><![CDATA[alphafold]]></category> <category><![CDATA[AlphaMissense]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[Ensembl VEP]]></category> <category><![CDATA[variation]]></category> <category><![CDATA[VEP]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15474</guid> <description><![CDATA[We’re excited to announce the integration of AlphaMissense pathogenicity scores into Ensembl! The display of AlphaMissense data in Ensembl and other EMBL-EBI resources aims to empower scientists to gain new insights, expand the exploration of genetic variation and the tolerance to change of different regions in proteins. AlphaMissense is an AI model developed by Google [&#8230;]]]></description> <content:encoded><![CDATA[ <p>We’re excited to announce the integration of AlphaMissense pathogenicity scores into Ensembl! The display of AlphaMissense data in Ensembl and other EMBL-EBI resources aims to empower scientists to gain new insights, expand the exploration of genetic variation and the tolerance to change of different regions in proteins.</p> <span id="more-15474"></span> <p>AlphaMissense is an AI model developed by <a href="https://deepmind.google/" target="_blank" rel="noreferrer noopener">Google DeepMind</a> which classifies genetic variants, specifically missense variants, as more likely to be pathogenic or benign. This information is helpful when researching variant-disease associations and provides an indication of the most functionally important parts of a protein.</p> <p>AlphaMissense scores are now integrated into the Ensembl Variant Effect Predictor (VEP) tool, enabling easy annotation of variants via the web interface, REST API, or command-line interface. Average AlphaMissense pathogenicity scores for each amino acid can also be visualised on the AlphaFold predicted 3D protein structure, available from the associated Ensembl transcript page.</p> <div style="height:17px" aria-hidden="true" class="wp-block-spacer"></div> <h3 class="wp-block-heading">Ensembl VEP</h3> <p>You can enable AlphaMissense scores on Ensembl VEP as follows:</p> <h4 class="wp-block-heading">Web interface</h4> <p>Open <a href="https://www.ensembl.org/Multi/Tools/VEP?db=core" target="_blank" rel="noreferrer noopener">Ensembl VEP</a> on the browser, scroll down to ‘Additional configurations’ and expand the ‘Predictions’ tab where you can enable AlphaMissense scores.</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="638" src="https://www.ensembl.info/wp-content/uploads/2024/05/image-1-1024x638.png" alt="A screenshot of the Ensembl VEP web interface input form highlighting how to retrieve AlphaMissense scores. Under 'Additional configurations', open the 'Predictions' tab and select 'AlphaMissense'." class="wp-image-15477" srcset="https://www.ensembl.info/wp-content/uploads/2024/05/image-1-1024x638.png 1024w, https://www.ensembl.info/wp-content/uploads/2024/05/image-1-300x187.png 300w, https://www.ensembl.info/wp-content/uploads/2024/05/image-1-768x479.png 768w, https://www.ensembl.info/wp-content/uploads/2024/05/image-1.png 1300w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">A screenshot of how to enable AlphaMissense scores in the Ensembl VEP web interface input form.</figcaption></figure> <p>The AlphaMissense pathogenicity score and classification will be reported in separate fields in the Ensembl VEP output table:</p> <figure class="wp-block-image"><img decoding="async" src="https://lh7-us.googleusercontent.com/VffWKKRjJgY-axxwtN9_UKWL-Pr5VgK9dvBQ2Boqr37MZ8h_mJpoEsn1RLLYpuH2Q8f61vK5V0gHB67-1TAn98z8XwZUa4IPmwrj-DThYxxmaofuQYm5JjvSYCZnhk7w6b0mIOK8M5Ti9BcFxl5pDTE" alt="A screenshot of an example output table of an Ensembl VEP web interface query. AlphaMissense classification and scores can be found in separate columns in the output table." /><figcaption class="wp-element-caption">A screenshot of an example output table including &#8216;AlphaMissense classification&#8217; and &#8216;AlphaMissense pathogenicity score&#8217; in separate columns.</figcaption></figure> <h4 class="wp-block-heading">REST API</h4> <p>AlphaMissense scores and pathogenicity classifications can be enabled by adding the optional parameter <code>AlphaMissense=1</code> when querying any <a href="https://rest.ensembl.org/#:~:text=phased%20genotype%20data-,VEP,-Resource" target="_blank" rel="noreferrer noopener"><code>/vep/</code> endpoint</a>.</p> <h4 class="wp-block-heading">Command-line</h4> <p>On the command-line Ensembl VEP, you can use the <a href="https://github.com/Ensembl/VEP_plugins/blob/release/112/AlphaMissense.pm" target="_blank" rel="noreferrer noopener">AlphaMissense plug-in</a> to analyse your data locally. Read more about how to use plug-ins on the <a href="https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#plugins_use" target="_blank" rel="noreferrer noopener">Ensembl VEP documentation page</a>.</p> <div style="height:17px" aria-hidden="true" class="wp-block-spacer"></div> <h3 class="wp-block-heading">AlphaMissense scores in the Ensembl browser</h3> <p>Average AlphaMissense pathogenicity scores for each amino acid can also be visualised on the AlphaFold predicted 3D protein structure, available under ‘Transcript-based displays: Protein Information’ and selecting the ‘AlphaFold predicted model’ display.</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="561" src="https://www.ensembl.info/wp-content/uploads/2024/05/image-1024x561.png" alt="A screenshot of the 'Transcript' tab for the human CLINT1 gene. AlphaMissense scores can be visualised by opening the 'Protein Information: AlphaFold predicted model' display." class="wp-image-15476" srcset="https://www.ensembl.info/wp-content/uploads/2024/05/image-1024x561.png 1024w, https://www.ensembl.info/wp-content/uploads/2024/05/image-300x164.png 300w, https://www.ensembl.info/wp-content/uploads/2024/05/image-768x421.png 768w, https://www.ensembl.info/wp-content/uploads/2024/05/image.png 1478w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">A screenshot of the &#8216;Transcript&#8217; tab of the human CLINT1-201 transcript highlighting how to visualise AlphaMissense scores in the AlphaFold predicted protein structure.</figcaption></figure> <p>This interactive view allows users to switch between variants, domains, exons, and AlphaMissense results, supporting the interpretation of different regions of the protein and their sensitivity to change.</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="545" src="https://www.ensembl.info/wp-content/uploads/2024/05/image-2-1024x545.png" alt="A screenshot of the 'AlphaFold predicted model' for the human CLINT1-201 transcript. AlphaMissense scores can be visualised in the protein model by selecting 'AlphaMissense Pathogenicity' under 'Toggle' in the right-hand panel. Likely pathogenic regions are highlighted in red in the protein structure and likely benign regions are highlighted in blue." class="wp-image-15478" srcset="https://www.ensembl.info/wp-content/uploads/2024/05/image-2-1024x545.png 1024w, https://www.ensembl.info/wp-content/uploads/2024/05/image-2-300x160.png 300w, https://www.ensembl.info/wp-content/uploads/2024/05/image-2-768x409.png 768w, https://www.ensembl.info/wp-content/uploads/2024/05/image-2.png 1522w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">A screenshot of the &#8216;AlphaFold predicted model&#8217; for the human CLINT1-201 transcript, highlighting where to enable AlphaMissense scores. Likely pathogenic regions are highlighted in red in the protein structure and likely benign regions are highlighted in blue.</figcaption></figure> <div style="height:17px" aria-hidden="true" class="wp-block-spacer"></div> <p>AlphaMissense data are also available in other EMBL-EBI resources <a href="https://www.uniprot.org/" target="_blank" rel="noreferrer noopener">UniProt</a> and the <a href="https://alphafold.ebi.ac.uk/" target="_blank" rel="noreferrer noopener">AlphaFold Protein Structure Database</a>. You can read more about AlphaMissense in the <a href="https://europepmc.org/article/MED/37733863" target="_blank" rel="noreferrer noopener">2023 publication by Cheng, et al.</a> and its availability in EMBL-EBI resources in the blog post <a href="https://www.ebi.ac.uk/about/news/technology-and-innovation/alphamissense-data-integration/" target="_blank" rel="noreferrer noopener">&#8216;AlphaMissense data integrated into Ensembl, UniProt and AlphaFold DB&#8217;</a>. </p> <div style="height:44px" aria-hidden="true" class="wp-block-spacer"></div> <p>Author: Louisse Paola Mirabueno<br>Editors: Sarah Hunt, Benjamin Moore, Roz Onions, Jamie Allen</p> ]]></content:encoded> <post-id xmlns="com-wordpress:feed-additions:1">15474</post-id> </item> <item> <title>Ensembl 112 has been released.</title> <link>https://www.ensembl.info/2024/05/13/ensembl-112-has-been-released/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ensembl-112-has-been-released</link> <dc:creator><![CDATA[Aleena Mushtaq]]></dc:creator> <pubDate>Mon, 13 May 2024 14:33:16 +0000</pubDate> <category><![CDATA[Release announcements]]></category> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Ensembl]]></category> <category><![CDATA[Ensembl Genomes]]></category> <category><![CDATA[Genomics]]></category> <category><![CDATA[Plants]]></category> <category><![CDATA[release]]></category> <category><![CDATA[REST API]]></category> <category><![CDATA[VEP]]></category> <guid isPermaLink="false">https://www.ensembl.info/?p=15429</guid> <description><![CDATA[We are pleased to announce the release of Ensembl 112, and the corresponding release of Ensembl Genomes 59. We have some exciting new fish species, many more drosophila species and some incredible VEP updates. Regulation We are transitioning our regulatory annotation over the next few releases to be based on open chromatin, rather than genomic [&#8230;]]]></description> <content:encoded><![CDATA[ <p>We are pleased to announce the release of Ensembl 112, and the corresponding release of Ensembl Genomes 59. We have some exciting new fish species, many more drosophila species and some incredible VEP updates.</p> <span id="more-15429"></span> <h2 class="wp-block-heading"><strong>Regulation</strong></h2> <p>We are transitioning our regulatory annotation over the next few releases to be based on open chromatin, rather than genomic segmentation of histone marks. As a necessary step, we have removed segmentation data and tracks from human and mouse regulatory annotation in this release.&nbsp;</p> <p>In addition, our promoters now align with the 5’ ends of known transcripts (specifically 10 bp downstream). Our feature annotation GFF file on the <a href="https://ftp.ensembl.org/pub/release-112/regulation/homo_sapiens/GRCh38/annotation/">FTP</a> site includes the gene(s) associated with each promoter.</p> <h2 class="wp-block-heading"><strong>New Assemblies and/or Annotation</strong></h2> <h3 class="wp-block-heading">Vertebrates</h3> <p><em>Amphiprion ocellaris</em> (Clown anemone fish) &#8211; GCA_022539595.1</p> <p><em>Anabas testudineus</em> (Climbing perch) &#8211; GCA_900324465.3</p> <p><em>Astatotilapia calliptera</em> (Eastern happy) &#8211; GCA_900246225.5</p> <p><em>Clupea harengus</em> (Atlantic herring) &#8211; GCA_900700415.2</p> <p><em>Denticeps clupeoides </em>(Denticle herring) &#8211; GCA_900700375.2</p> <p><em>Electrophorus electricus</em> (Electric eel) &#8211; GCA_013358815.1</p> <p><em>Esox lucius</em> (Northern pike) &#8211; GCA_011004845.1</p> <p><em>Gasterosteus aculeatus</em> (Three-spined stickleback) &#8211; GCA_016920845.1</p> <p><em>Ictalurus punctatus</em> (Channel catfish) &#8211; GCA_004006655.3</p> <p><em>Oncorhynchus tshawytscha</em> (Chinook salmon) &#8211; GCA_018296145.1</p> <p><em>Oreochromis aureus</em> (Guangdong) &#8211; GCA_013358895.1</p> <p><em>Parambassis ranga</em> (Indian glassy fish) &#8211; GCA_900634625.2</p> <p><em>Periophthalmus magnuspinnatus</em> (Bony fishes) &#8211; GCA_009829125.3</p> <p><em>Pygocentrus nattereri</em> (Red-bellied piranha) &#8211; GCA_015220715.1</p> <p><strong>Additional strains have been added for the following fish species</strong>:</p> <p><em>Gadus morhua</em> (Atlantic cod):</p> <ul> <li>Celtic sea &#8211; GCA_010882105.1</li> </ul> <p><em>Salmo salar</em> (Atlantic salmon):</p> <ul> <li>North American Atlantic salmon &#8211; GCA_021399835.1</li> <li>Brian &#8211; GCA_923944775.1</li> <li>European origin &#8211; GCA_931346935.2</li> </ul> <p><em>Gasterosteus aculeatus</em> (three-spined stickleback):</p> <ul> <li>Marine &#8211; GCA_006232285.1</li> <li>Marine &#8211; GCA_006232265.1</li> <li>Freshwater &#8211; GCA_006229185.1 </li> </ul> <h3 class="wp-block-heading">Non-Vertebrates</h3> <p><strong>Plants:</strong></p> <p><strong>New Genomes</strong></p> <p><em>Vicia faba</em> (Broad bean) &#8211; GCA_948472305.1</p> <p><em>Aegilops umbellulata</em> (Umbel goatgrass) &#8211; GCA_032464435.1</p> <p><strong>Updated species</strong></p> <p><em>Manihot esculenta</em> (Cassava)<em> &#8211; GCA_001659605.2</em></p> <p><em>Medicago truncatula</em> (Barrel Medic)<em> &#8211; GCA_003473485.2</em></p> <p><strong>Metazoa:</strong></p> <p><strong>New Drosophila Pangenome</strong></p> <p>We have introduced a new Drosophila genus wide pangenome which incorporates resources from the main<a href="https://metazoa.ensembl.org/index.html"> metazoa site</a>.</p> <p>This pangenome covers a whopping 36 species of Drosophila and 4 outgroup species. These species are currently hosted on both Ensembl metazoa and<a href="https://rapid.ensembl.org/index.html"> Rapid Release</a>.&nbsp;</p> <p><strong>New species:</strong></p> <p><em>Bactrocera neohumeralis</em> (GCA_024586455.2)&nbsp;</p> <p><em>Cherax quadricarinatus</em> (GCA_026875155.2)&nbsp;</p> <p><em>Coremacera marginata</em> (GCA_914767935.1)&nbsp;</p> <p><em>Ctenocephalides felis</em> (GCA_003426905.1)&nbsp;</p> <p><em>Daphnia carinata </em>(GCA_022539665.3)&nbsp;</p> <p><em>Diaphorina citri </em>(GCA_000475195.1)&nbsp;</p> <p><em>Drosophila albomicans</em> (GCA_009650485.2)&nbsp;</p> <p><em>Drosophila arizonae</em> (GCA_001654025.1)&nbsp;</p> <p><em>Drosophila biarmipes </em>(GCA_025231255.1)&nbsp;</p> <p><em>Drosophila bipectinata</em> (GCA_000236285.2)&nbsp;</p> <p><em>Drosophila busckii </em>(GCA_011750605.1)&nbsp;</p> <p><em>Drosophila elegans</em> (GCA_000224195.2)&nbsp;</p> <p><em>Drosophila eugracilis</em> (GCA_018153835.1)&nbsp;</p> <p><em>Drosophila ficusphila</em> (GCA_018152265.1)&nbsp;</p> <p><em>Drosophila guanche</em> (GCA_900245975.1)&nbsp;</p> <p><em>Drosophila gunungcola</em> (GCA_025200985.1)</p> <p><em>Drosophila hydei</em> (GCA_003285905.2)&nbsp;</p> <p><em>Drosophila innubila</em> (GCA_004354385.1)&nbsp;</p> <p><em>Drosophila kikkawai </em>(GCA_018152535.1)&nbsp;</p> <p><em>Drosophila mauritiana</em> (GCA_004382145.1)&nbsp;</p> <p><em>Drosophila miranda</em> (GCA_003369915.2)&nbsp;</p> <p><em>Drosophila navojoa</em> (GCA_001654015.2)&nbsp;</p> <p><em>Drosophila obscura</em> (GCA_018151105.1)&nbsp;</p> <p><em>Drosophila rhopaloa</em> (GCA_018152115.1)&nbsp;</p> <p><em>Drosophila santomea</em> (GCA_016746245.2)&nbsp;</p> <p><em>Drosophila subobscura</em> (GCA_008121235.1)&nbsp;</p> <p><em>Drosophila subpulchrella</em> (GCA_014743375.2)&nbsp;</p> <p><em>Drosophila suzukii</em> (GCA_013340165.1)</p> <p><em>Drosophila takahashii</em> (GCA_018152695.1)&nbsp;</p> <p><em>Drosophila teissieri </em>(GCA_016746235.2)&nbsp;</p> <p><em>Eriocheir sinensis</em> (GCA_024679095.1)&nbsp;</p> <p><em>Halyomorpha halys </em>(GCA_000696795.2)&nbsp;</p> <p><em>Homarus gammarus </em>(GCA_958450375.1)&nbsp;</p> <p><em>Hydractinia symbiolongicarpus</em> (GCA_029227915.2)&nbsp;</p> <p><em>Lytechinus pictus</em> (GCA_015342785.2) &#8211;&nbsp;</p> <p><em>Machimus atricapillus </em>(GCA_933228815.1)&nbsp;&nbsp;</p> <p><em>Melanaphis sacchari</em> (GCA_002803265.2)&nbsp;</p> <p><em>Microctonus aethiopoides</em> (GCA_030272655.1)&nbsp;&nbsp;</p> <p><em>Microctonus aethiopoides</em> (GCA_030272935.1)&nbsp;&nbsp;</p> <p><em>Microctonus aethiopoides</em> (GCA_030347275.1)&nbsp;&nbsp;</p> <p><em>Microctonus hyperodae</em> (GCA_030347285.1)&nbsp;</p> <p><em>Myopa tessellatipennis</em> (GCA_943737955.1)&nbsp;</p> <p><em>Octopus bimaculoides</em> (GCA_001194135.2)&nbsp;</p> <p><em>Paramacrobiotus metropolitanus</em> (GCA_019649055.1)&nbsp;</p> <p><em>Pecten maximus </em>(GCA_902652985.1)&nbsp;</p> <p><em>Tribolium madens </em>(GCA_015345945.1)&nbsp;</p> <p><em>Uloborus diversus</em> (GCA_026930045.1)&nbsp;</p> <p><strong>Updated genomes</strong>:</p> <p><em>Drosophila ananassae</em> (GCA_017639315.2)&nbsp;</p> <p><em>Drosophila erecta</em> (GCA_003286155.2)</p> <p><em>Drosophila grimshaw</em>i (GCA_018153295.1)</p> <p><em>Drosophila mojavensis</em> (GCA_018153725.1)&nbsp;</p> <p><em>Drosophila persimilis</em> (GCA_003286085.2)&nbsp;</p> <p><em>Drosophila pseudoobscura</em> (GCA_009870125.2)&nbsp;</p> <p><em>Drosophila sechellia</em> (GCA_004382195.2)</p> <p><em>Drosophila simulans</em> (GCA_016746395.2)&nbsp;</p> <p><em>Drosophila virilis</em> (GCA_003285735.2)&nbsp;</p> <p><em>Drosophila williston</em>i (GCA_018902025.2)&nbsp;</p> <p><em>Drosophila yakuba</em> (GCA_016746365.2)&nbsp;</p> <p><strong>The following outdated genomes have been removed</strong>:</p> <p><em>Daphnia pulex</em> (GCA_000187875.1)</p> <p><em>Hydra vulgaris</em> (GCA_000004095.1)</p> <p><em>Octopus bimaculoides</em>&nbsp; (GCA_001194135.1)</p> <p><em>Rhipicephalus sanguineus</em> (GCA_013339695.1) We will retain the V2 assembly version (GCA_013339695.2)&nbsp;</p> <h2 class="wp-block-heading"><strong>Other updates and changes</strong></h2> <ul> <li>Population frequency data is available for more species in the <a href="https://www.ensembl.org/info/docs/tools/vep/index.html" data-type="link" data-id="https://www.ensembl.org/info/docs/tools/vep/index.html" target="_blank" rel="noreferrer noopener">Ensembl VEP</a> web tool including <a href="https://www.ensembl.org/Gallus_gallus/Info/Index" data-type="link" data-id="https://www.ensembl.org/Gallus_gallus/Info/Index" target="_blank" rel="noreferrer noopener">chicken</a>, <a href="https://www.ensembl.org/Canis_lupus_familiaris/Info/Index" target="_blank" rel="noreferrer noopener">dog</a>, <a href="https://www.ensembl.org/Capra_hircus/Info/Index" target="_blank" rel="noreferrer noopener">goat</a> and <a href="https://www.ensembl.org/Ovis_aries_rambouillet/Info/Index" data-type="link" data-id="https://www.ensembl.org/Ovis_aries_rambouillet/Info/Index" target="_blank" rel="noreferrer noopener">sheep</a>.&nbsp;</li> </ul> <ul> <li>A new Ensembl VEP option has been added to predict the molecular consequence variants on <a href="https://www.ensembl.org/Homo_sapiens/Info/Index" data-type="link" data-id="https://www.ensembl.org/Homo_sapiens/Info/Index" target="_blank" rel="noreferrer noopener">human</a> GRCh38 open reading frames found in long non-coding RNAs (lncRNAs) and untranslated regions (UTRs) of protein-coding genes, as described in <a href="https://www.nature.com/articles/s41587-022-01369-0">Mudge et al.</a></li> </ul> <ul> <li>The Ensembl VEP web and<a href="https://rest.ensembl.org/"> REST </a>interfaces have been updated to use the <a href="https://sites.google.com/site/jpopgen/dbNSFP" target="_blank" rel="noreferrer noopener">dbNSFP</a> commercial data release.</li> </ul> <ul> <li>We have now retired Ensembl Archive 95 and 96 with this release.</li> </ul> ]]></content:encoded> <post-id xmlns="com-wordpress:feed-additions:1">15429</post-id> </item> </channel> </rss>

Pages: 1 2 3 4 5 6 7 8 9 10