Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3459930.3469503acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

MVAR: a mouse variation registry

Published: 01 August 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Model organisms are essential to understanding the biological and disease consequences of human genome variation. Bioinformatics resources that support meaningful comparisons of mouse and human genotype-to-phenotype data and knowledge are needed to support the translation from bench to bedside and back again [<u>1</u>].
    There is no genome variation resource for mouse comparable to resources available for human genome variation data such as EXAC [<u>2</u>], ClinVar [<u>3</u>], or ClinGen [<u>4</u>]. NCBI resources such as dbSNP and ClinVar no longer accept data from model organisms. While the European Variation Archive (EVA) serves a repository of SNP data for mouse, however, the resource does not accept imputed variation data or curated phenotype annotations associated with variation data that are central to data interpretation and analysis. Although the Mouse Genome Informatics database (MGI) [<u>5</u>] serves as a comprehensive mouse allele registry and curates information about the association of mouse variants with phenotypes and disease, the variation data in MGI are not currently available in format consistent with the Human Genome Variation Society (HGVS) standards [<u>6</u>]. The Mouse Variation Registry (MVAR) will represent the integration of all mouse genome variation data and includes processes to automatically canonicalize variants so that they are uniquely represented in the database with comprehensive annotation and their distribution across strains.
    The starting dataset used as input into MVAR was downloaded in VCF format [<u>7</u>] (as a 42GB gzipped file) from the Mouse Genomes Project [<u>8</u>] and contains about 81M Single-Nucleotide Variants (SNV), ~9M Deletions and ~8M Insertions. Other data will be obtained from MGI, the Mouse Mutant Repository Database (MMRDB), the Diversity Outbred Database (DODB), and from computationally imputed SNP data.
    The MVAR data ingest workflow has been developed to normalize, prepare and annotate input variation data. With the help of the GATK framework [<u>9</u>], the first step of the pipeline consists of normalizing i.e., left aligning each variant, and decomposing the multi-allelic variants (where there is more than one variation in a row of data). The next step in the pipeline is made with the use of the Ensembl Variant Effect Predictor (VEP) [<u>10</u>], which annotates the variation data with its corresponding HGVS nomenclature and existing external Id. The final step uses the Jannovar library [<u>11</u>] to enrich the data with Functional Consequence annotations. After the data has been pre-processed through the pipeline, they are inserted into a MySQL database with the help of custom tools developed to create the canonical variants representations.
    MVAR supports programmatic data access to the registry through an API for interoperability. This API is used by a user-friendly web-application with rich user interfaces to query the database and display results. The API is also available to be a resource for other services or applications over HTTP with JSON data payloads. Wide-used industry frameworks like Angular and Groovy Grails were leveraged to build the MVAR web application.
    To conclude, the lack of a comprehensive, annotated genome variation resource for mouse is a significant barrier to comparing variation and its biological consequences between mouse and human and limits the impact of many research and resource development programs. The MVAR project seeks to address this resource gap by bringing together investigators that have active projects in the area of genome variation in either mouse or human or both. Many of the investigators on this project have developed independent resources to curate or manage genome variation. This project aims to unify these efforts and build a common data resource. Future work will include the incorporation of structural variants into the MVAR registry.

    References

    [1]
    Manolio, T.A., et al. 2017. Bedside Back to Bench: Building Bridges between Basic and Clinical Genomic Research. Cell, (Mar 2017). 169 (1): p. 6--12.
    [2]
    Karczewski, K.J., et al. 2017. The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res, (Jan 2017). 45 (D1): p.D840-5.
    [3]
    Landrum, M.J., et al. 2014. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res, (Jan 2014). 42 (Database issue): p. D980-5.
    [4]
    Rehm, H.L., et al. 2015. ClinGen-the Clinical Genome Resource. N Engl J Med, (Jun 2015). 372 (23): p. 2235--42.
    [5]
    Bult, C.J., et al. 2016. Mouse genome database 2016. Nucleic Acids Res, (Jan 2016). 44 (D1): p. D840-7.
    [6]
    Sequence Variant Nomenclature. Retrieved from http://varnomen.hgvs.org/
    [7]
    Variant Call Format. Retrieved from https://samtools.github.io/hts-specs/
    [8]
    Mouse Genomes Project. Retrieved from https://www.sanger.ac.uk/data/mouse-genomes-project/
    [9]
    McKenna, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res, (Sep 2010). 20(9):1297--303.
    [10]
    McLaren, W., et al. 2016. The Ensembl Variant Effect Predictor. Genome Biol, (Jun 2016). 17(1):122.
    [11]
    Jager, M., et al. 2014. Jannovar: a java library for exome annotation. Hum Mutat, (May 2014). 35 (5): p.548--55.

    Index Terms

    1. MVAR: a mouse variation registry
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
            August 2021
            603 pages
            ISBN:9781450384506
            DOI:10.1145/3459930
            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 01 August 2021

            Check for updates

            Qualifiers

            • Poster

            Funding Sources

            • Jackson Laboratory Director's Innovation Fund

            Conference

            BCB '21
            Sponsor:

            Acceptance Rates

            Overall Acceptance Rate 254 of 885 submissions, 29%

            Upcoming Conference

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 39
              Total Downloads
            • Downloads (Last 12 months)3
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 11 Aug 2024

            Other Metrics

            Citations

            View Options

            Get Access

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media