poster

MVAR: a mouse variation registry

Authors:

Bahá El Kassaby,

Govindarajan Kunde-Ramamoorthy,

Francisco Castellanos,

Carol BultAuthors Info & Claims

BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Article No.: 74, Page 1

https://doi.org/10.1145/3459930.3469503

Published: 01 August 2021 Publication History

Get Access

Abstract

Model organisms are essential to understanding the biological and disease consequences of human genome variation. Bioinformatics resources that support meaningful comparisons of mouse and human genotype-to-phenotype data and knowledge are needed to support the translation from bench to bedside and back again [1].

There is no genome variation resource for mouse comparable to resources available for human genome variation data such as EXAC [2], ClinVar [3], or ClinGen [4]. NCBI resources such as dbSNP and ClinVar no longer accept data from model organisms. While the European Variation Archive (EVA) serves a repository of SNP data for mouse, however, the resource does not accept imputed variation data or curated phenotype annotations associated with variation data that are central to data interpretation and analysis. Although the Mouse Genome Informatics database (MGI) [5] serves as a comprehensive mouse allele registry and curates information about the association of mouse variants with phenotypes and disease, the variation data in MGI are not currently available in format consistent with the Human Genome Variation Society (HGVS) standards [6]. The Mouse Variation Registry (MVAR) will represent the integration of all mouse genome variation data and includes processes to automatically canonicalize variants so that they are uniquely represented in the database with comprehensive annotation and their distribution across strains.

The starting dataset used as input into MVAR was downloaded in VCF format [7] (as a 42GB gzipped file) from the Mouse Genomes Project [8] and contains about 81M Single-Nucleotide Variants (SNV), ~9M Deletions and ~8M Insertions. Other data will be obtained from MGI, the Mouse Mutant Repository Database (MMRDB), the Diversity Outbred Database (DODB), and from computationally imputed SNP data.

The MVAR data ingest workflow has been developed to normalize, prepare and annotate input variation data. With the help of the GATK framework [9], the first step of the pipeline consists of normalizing i.e., left aligning each variant, and decomposing the multi-allelic variants (where there is more than one variation in a row of data). The next step in the pipeline is made with the use of the Ensembl Variant Effect Predictor (VEP) [10], which annotates the variation data with its corresponding HGVS nomenclature and existing external Id. The final step uses the Jannovar library [11] to enrich the data with Functional Consequence annotations. After the data has been pre-processed through the pipeline, they are inserted into a MySQL database with the help of custom tools developed to create the canonical variants representations.

MVAR supports programmatic data access to the registry through an API for interoperability. This API is used by a user-friendly web-application with rich user interfaces to query the database and display results. The API is also available to be a resource for other services or applications over HTTP with JSON data payloads. Wide-used industry frameworks like Angular and Groovy Grails were leveraged to build the MVAR web application.

To conclude, the lack of a comprehensive, annotated genome variation resource for mouse is a significant barrier to comparing variation and its biological consequences between mouse and human and limits the impact of many research and resource development programs. The MVAR project seeks to address this resource gap by bringing together investigators that have active projects in the area of genome variation in either mouse or human or both. Many of the investigators on this project have developed independent resources to curate or manage genome variation. This project aims to unify these efforts and build a common data resource. Future work will include the incorporation of structural variants into the MVAR registry.

References

[1]

Manolio, T.A., et al. 2017. Bedside Back to Bench: Building Bridges between Basic and Clinical Genomic Research. Cell, (Mar 2017). 169 (1): p. 6--12.

Crossref

Google Scholar

[2]

Karczewski, K.J., et al. 2017. The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res, (Jan 2017). 45 (D1): p.D840-5.

Crossref

Google Scholar

[3]

Landrum, M.J., et al. 2014. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res, (Jan 2014). 42 (Database issue): p. D980-5.

Crossref

Google Scholar

[4]

Rehm, H.L., et al. 2015. ClinGen-the Clinical Genome Resource. N Engl J Med, (Jun 2015). 372 (23): p. 2235--42.

Crossref

Google Scholar

[5]

Bult, C.J., et al. 2016. Mouse genome database 2016. Nucleic Acids Res, (Jan 2016). 44 (D1): p. D840-7.

Crossref

Google Scholar

[6]

Sequence Variant Nomenclature. Retrieved from http://varnomen.hgvs.org/

Google Scholar

[7]

Variant Call Format. Retrieved from https://samtools.github.io/hts-specs/

Google Scholar

[8]

Mouse Genomes Project. Retrieved from https://www.sanger.ac.uk/data/mouse-genomes-project/

Google Scholar

[9]

McKenna, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res, (Sep 2010). 20(9):1297--303.

Crossref

Google Scholar

[10]

McLaren, W., et al. 2016. The Ensembl Variant Effect Predictor. Genome Biol, (Jun 2016). 17(1):122.

Crossref

Google Scholar

[11]

Jager, M., et al. 2014. Jannovar: a java library for exome annotation. Hum Mutat, (May 2014). 35 (5): p.548--55.

Crossref

Google Scholar

Index Terms

MVAR: a mouse variation registry
1. Applied computing
  1. Life and medical sciences
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Identification of human-specific transcript variants induced by DNA insertions in the human genome

Motivation: Many genes in the human genome produce a wide variety of transcript variants resulting from alternative exon splicing, differential promoter usage, or altered polyadenylation site utilization that may function differently in human cells. ...
sRNA associated genomic islands in Salmonella spp.
ISB '10: Proceedings of the International Symposium on Biocomputing

Genomic Islands are parts of a genome that has evidence of horizontal origins. The present work is a continuation of our earlier work that identified 25 regions downstream of the small RNAs as hotspots of genomic island integration by analyzing three ...
A probabilistic method for the detection and genotyping of small indels from population-scale sequence data

Motivation: High-throughput sequencing technologies have made population-scale studies of human genetic variation possible. Accurate and comprehensive detection of DNA sequence variants is crucial for the success of these studies. Small insertions ...

Comments

Information & Contributors

Information

Published In

BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

August 2021

603 pages

ISBN:9781450384506

DOI:10.1145/3459930

General Chairs:
Hongmei Jiang
Northwestern University
,
Xiuzhen Huang
Arkansas State University
,
Jiajie Zhang
The University of Texas Health Science Center at Houston

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2021

Check for updates

Qualifiers

Poster

Funding Sources

Jackson Laboratory Director's Innovation Fund

Conference

BCB '21

Sponsor:

SIGBIOM

BCB '21: 12th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

August 1 - 4, 2021

Florida, Gainesville

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
39
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Identification of human-specific transcript variants induced by DNA insertions in the human genome

sRNA associated genomic islands in Salmonella spp.

A probabilistic method for the detection and genotyping of small indels from population-scale sequence data

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

References

Index Terms

Recommendations

Identification of human-specific transcript variants induced by DNA insertions in the human genome

sRNA associated genomic islands in Salmonella spp.

A probabilistic method for the detection and genotyping of small indels from population-scale sequence data

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations