Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3041021.3053060acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

BD2K ERuDIte: the Educational Resource Discovery Index for Data Science

Published: 03 April 2017 Publication History
  • Get Citation Alerts
  • Abstract

    The field of data science has developed over the years to enable the efficient integration and analysis of the increasingly large amounts of data being generated across many domains, ranging from social media, to sensor networks, to scientific experiments. Numerous subfields of biology and medicine, such as genetics, neuroimaging, and mobile health, are witnessing a data explosion that promises to revolutionize biomedical science by yielding novel insights and discoveries. To address the challenges posed by biomedical big data, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative (datascience.nih.gov). An important component of this effort is the training of biomedical researchers. To this end, the NIH has funded the BD2K Training Coordinating Center (TCC). A core activity of the BD2K TCC is to develop a web portal (bigdatau.org) to provide personalized training in data science to biomedical researchers.
    In this paper, we describe our approach and initial efforts in constructing ERuDIte, the Educational Resource Discovery Index for Data Science, which powers the BD2K TCC web portal. ERuDIte harvests a wealth of resources available online for learning data science, both for beginners and experts, including massive open online courses (MOOCs), videos of tutorials and research talks presented at conferences, textbooks, blog posts, and standalone web pages. Though the potential volume of resources is exciting, these online learning materials are highly heterogeneous in quality, difficulty, format, and topic. As a result, this mix of content makes the field intimidating to enter and difficult to navigate. Moreover, data science is a rapidly evolving field, so there is a constant influx of new materials and concepts. ERuDIte leverages data science techniques to build the data science index. This paper describes how ERuDIte uses data extraction, data integration, machine learning, information retrieval, and natural language processing techniques to automatically collect, integrate, describe and organize existing online resources for learning data science.

    References

    [1]
    D. Chen and C. D. Manning. A fast and accurate dependency parser using neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processingrm (EMNLP), 2014.
    [2]
    S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science (1986--1998), 41(6):391, 09 1990.
    [3]
    J. Gordon, L. Zhu, A. Galstyan, P. Natarajan, and G. Burns. Modeling concept dependencies in a scientific corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguisticsrm (ACL), pages 866--75. Association for Computational Linguistics, Aug. 2016.
    [4]
    T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers, 2011.
    [5]
    Y. Liu, Z. Huang, Y. Yan, and Y. Chen. Science navigation map: An interactive data mining tool for literature analysis. In Proceedings of the 24th International Conference on World Wide Web, WWW '15 Companion, pages 591--6, New York, NY, USA, 2015. ACM.
    [6]
    P. McQuilton, A. Gonzalez-Beltran, P. Rocca-Serra, M. Thurston, A. Lister, E. Maguire, and S.-A. Sansone. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences. Database: the journal of biological databases and curation, 2016.
    [7]
    L. Ohno-Machado. NIH's big data to knowledge initiative and the advancement of biomedical informatics. Journal of the American Medical Informatics Associationrm (JAMIA), 193, 2014.
    [8]
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825--2830, 2011.
    [9]
    A. Plangprasopchok, K. Lerman, and L. Getoor. A probabilistic approach for learning folksonomies from structured data. In Proceedings of the 4th ACM Web Search and Data Mining Conferencerm (WSDM\rm ), Feb. 2011.
    [10]
    F. Shahnaz, M. W. Berry, V. Pauca, and R. J. Plemmons. Document clustering using nonnegative matrix factorization. Information Processing & Management, 42(2):373--386, 2006.
    [11]
    M. Taheriyan, C. A. Knoblock, P. Szekely, and J. L. Ambite. Semi-automatically modeling web APIs to create linked APIs. In Proceedings of the ESWC 2012 Workshop on Linked APIs, 2012.
    [12]
    E. M. Talley, D. Newman, D. Mimno, B. W. Herr, H. M. Wallach, G. A. P. C. Burns, A. G. M. Leenders, and A. McCallum. Database of NIH grants using machine-learned categories and graphical clustering. Nat. Meth., 8(6):443--4, June 2011.
    [13]
    L. van der Maaten and G. Hinton. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 9:2579--605, Nov. 2008.
    [14]
    C. Van Rijsbergen. Foundation of evaluation. Journal of Documentation, 30(4):365--373, 1974.
    [15]
    M. Wattenberg, F. Viégas, and I. Johnson. How to use t-SNE effectively. Distill, 2016. http://distill.pub/2016/misread-tsne.

    Cited By

    View all
    • (2021)BD2K Training Coordinating Center's ERuDIte: The Educational Resource Discovery Index for Data ScienceIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2019.29034669:1(316-328)Online publication date: 1-Jan-2021
    • (2020)IndexingInformation Retrieval: A Biomedical and Health Perspective10.1007/978-3-030-47686-1_4(181-223)Online publication date: 23-Jul-2020
    • (2019)Linking educational resources on data scienceProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33019404(9404-9409)Online publication date: 27-Jan-2019
    • Show More Cited By

    Index Terms

    1. BD2K ERuDIte: the Educational Resource Discovery Index for Data Science

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web Companion
        April 2017
        1738 pages
        ISBN:9781450349147

        Sponsors

        • IW3C2: International World Wide Web Conference Committee

        In-Cooperation

        Publisher

        International World Wide Web Conferences Steering Committee

        Republic and Canton of Geneva, Switzerland

        Publication History

        Published: 03 April 2017

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. information integration
        2. machine learning
        3. online educational resources

        Qualifiers

        • Research-article

        Funding Sources

        • NIH

        Conference

        WWW '17
        Sponsor:
        • IW3C2

        Acceptance Rates

        WWW '17 Companion Paper Acceptance Rate 164 of 966 submissions, 17%;
        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)0

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)BD2K Training Coordinating Center's ERuDIte: The Educational Resource Discovery Index for Data ScienceIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2019.29034669:1(316-328)Online publication date: 1-Jan-2021
        • (2020)IndexingInformation Retrieval: A Biomedical and Health Perspective10.1007/978-3-030-47686-1_4(181-223)Online publication date: 23-Jul-2020
        • (2019)Linking educational resources on data scienceProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33019404(9404-9409)Online publication date: 27-Jan-2019
        • (2019)Advancing the international data science workforce through shared training and educationF1000Research10.12688/f1000research.18357.18(251)Online publication date: 4-Mar-2019
        • (2017)VIM: A Big Data Analytics Tool for Data Visualization and Knowledge Mining2017 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE)10.1109/WIECON-ECE.2017.8468939(224-227)Online publication date: Dec-2017

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media