Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

GeCoAgent: A Conversational Agent for Empowering Genomic Data Extraction and Analysis

Published: 15 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    With the availability of reliable and low-cost DNA sequencing, human genomics is relevant to a growing number of end-users, including biologists and clinicians. Typical interactions require applying comparative data analysis to huge repositories of genomic information for building new knowledge, taking advantage of the latest findings in applied genomics for healthcare. Powerful technology for data extraction and analysis is available, but broad use of the technology is hampered by the complexity of accessing such methods and tools.
    This work presents GeCoAgent, a big-data service for clinicians and biologists. GeCoAgent uses a dialogic interface, animated by a chatbot, for supporting the end-users’ interaction with computational tools accompanied by multi-modal support. While the dialogue progresses, the user is accompanied in extracting the relevant data from repositories and then performing data analysis, which often requires the use of statistical methods or machine learning. Results are returned using simple representations (spreadsheets and graphics), while at the end of a session the dialogue is summarized in textual format. The innovation presented in this article is concerned with not only the delivery of a new tool but also our novel approach to conversational technologies, potentially extensible to other healthcare domains or to general data science.

    References

    [1]
    Marco Masseroli, Pietro Pinoli, Francesco Venco, Abdulrahman Kaitoua, Vahid Jalili, Fernando Palluzzi, Heiko Muller, and Stefano Ceri. 2015. GenoMetric query language: A novel approach to large-scale genomic data management. Bioinformatics 31, 12 (2015), 1881–1888.
    [2]
    Marco Masseroli, Arif Canakoglu, Pietro Pinoli, Abdulrahman Kaitoua, Andrea Gulino, Olha Horlova, Luca Nanni, Anna Bernasconi, Stefano Perna, Eirini Stamoulakatou, et al. 2019. Processing of big heterogeneous genomic datasets for tertiary analysis of next generation sequencing data. Bioinformatics 35, 5 (2019), 729–736.
    [3]
    A. Bernasconi, A. Canakoglu, M. Masseroli, and S. Ceri. 2020. META-BASE: A novel architecture for large-scale genomic metadata integration. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2020), 1–1. DOI:https://doi.org/10.1109/TCBB.2020.2998954
    [4]
    Arif Canakoglu, Anna Bernasconi, Andrea Colombo, Marco Masseroli, and Stefano Ceri. 2019. GenoSurf: Metadata driven semantic search system for integrated genomic datasets. Database: The Journal of Biological Databases and Curation 2019 (2019). DOI:https://doi.org/10.1093/database/baz132
    [5]
    Andreas D. Baxevanis, Gary D. Bader, and David S. Wishart. 2020. Bioinformatics. John Wiley & Sons.
    [6]
    R. Gabe. 2010. A hitchhiker’s guide to Next Generation Sequencing - Part 2. Retrieved May 1, 2021, from https://blog.goldenhelix.com/a-hitchhikers-guide-to-next-generation-sequencing-part-2/.
    [7]
    Anna Bernasconi, Arif Canakoglu, Marco Masseroli, and Stefano Ceri. 2021. The road towards data integration in human genomics: Players, steps and interactions. Briefings in Bioinformatics 22, 1 (2021), 30–44. https://doi.org/10.1093/bib/bbaa080
    [8]
    Stefano Ceri, Anna Bernasconi, Arif Canakoglu, Andrea Gulino, Abdulrahman Kaitoua, Marco Masseroli, Luca Nanni, and Pietro Pinoli. 2017. Overview of GeCo: A project for exploring and integrating signals from the genome. In International Conference on Data Analytics and Management in Data Intensive Domains. Springer, 46–57.
    [9]
    Antony T. Vincent and Steve J. Charette. 2015. Who qualifies to be a bioinformatician?Frontiers in Genetics 6 (2015), 164.
    [10]
    Janez Demšar, Tomaž Curk, Aleš Erjavec, Črt Gorup, Tomaž Hočevar, Mitar Milutinovič, Martin Možina, Matija Polajnar, Marko Toplak, Anže Starič, Miha Štajdohar, Lan Umek, Lan Žagar, Jure Žbontar, Marinka Žitnik, and Blaž Zupan. 2013. Orange: Data mining toolbox in python. Journal of Machine Learning Research 14 (2013), 2349–2353. http://jmlr.org/papers/v14/demsar13a.html.
    [11]
    Mary J. Goldman, Brian Craft, Mim Hastie, Kristupas Repečka, Fran McDade, Akhil Kamath, Ayan Banerjee, Yunhai Luo, Dave Rogers, Angela N. Brooks, et al. 2020. Visualizing and interpreting cancer genomics data via the Xena platform. Nature Biotechnology 38 (2020), 675–678.
    [12]
    Ravi K. Madduri, Dinanath Sulakhe, Lukasz Lacinski, Bo Liu, Alex Rodriguez, Kyle Chard, Utpal J. Dave, and Ian T. Foster. 2014. Experiences building globus genomics: A next-generation sequencing analysis service using galaxy, globus, and amazon web services. Concurrency and Computation: Practice and Experience 26, 13 (2014), 2266–2279.
    [13]
    Davide Bolchini, Anthony Finkelstein, Vito Perrone, and Sylvia Nagl. 2009. Better bioinformatics through usability analysis. Bioinformatics 25, 3 (2009), 406–412.
    [14]
    Liliana Laranjo, Adam G. Dunn, Huong Ly Tong, Ahmet Baki Kocaballi, Jessica Chen, Rabia Bashir, Didi Surian, Blanca Gallego, Farah Magrabi, Annie Y.S. Lau, et al. 2018. Conversational agents in healthcare: A systematic review. Journal of the American Medical Informatics Association 25, 9 (2018), 1248–1258.
    [15]
    AM Turing. 1950. Mind. Mind 59, 236 (1950), 433–460.
    [16]
    Joseph Weizenbaum. 1966. ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM 9, 1 (1966), 36–45.
    [17]
    Richard S. Wallace. 2009. The anatomy of ALICE. In Parsing the Turing Test. Springer, 181–210.
    [18]
    Kenneth Mark Colby. 1975. Artificial Paranoia: A Computer Simulation of Paranoid Process. Pergamon Press.
    [19]
    Richard Wallace. 2003. The elements of AIML style. Alice AI Foundation 139 (2003).
    [20]
    Tom Bocklisch, Joey Faulkner, Nick Pawlowski, and Alan Nichol. 2017. Rasa: Open source language understanding and dialogue management. ArXivDOI:https://arxiv.org/abs/1712.05181.
    [21]
    Marti Hearst and Melanie Tory. 2019. Would you like a chart with that? Incorporating visualizations into conversational interfaces. In 2019 IEEE Visualization Conference (VIS’19). IEEE, 1–5.
    [22]
    James Allen, Nathanael Chambers, George Ferguson, Lucian Galescu, Hyuckchul Jung, Mary Swift, and William Taysom. 2007. Plow: A collaborative task learning agent. In AAAI, Vol. 7. Association for the Advancement of Artificial Intelligence, 1514–1519. https://www.semanticscholar.org/paper/PLOW%3A-A-Collaborative-Task-Learning-Agent-Allen-Chambers/431e61648a59abcd05411503ead56de8aa97906b.
    [23]
    Petter Bae Brandtzaeg and Asbjørn Følstad. 2017. Why people use chatbots. In International Conference on Internet Science. Springer, 377–392.
    [24]
    Esther Kaufmann and Abraham Bernstein. 2010. Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases. Journal of Web Semantics 8, 4 (2010), 377–393.
    [25]
    Asbjørn Følstad and Petter Bae Brandtzæg. 2017. Chatbots and the new world of HCI. Interactions 24, 4 (2017), 38–42.
    [26]
    Sharon Oviatt. 1999. Ten myths of multimodal interaction. Communications of the ACM 42, 11 (1999), 74–81.
    [27]
    Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M. Mitchell, and Brad A. Myers. 2019. PUMICE: A multi-modal agent that learns concepts and conditionals from natural language and demonstrations. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology. 577–589.
    [28]
    Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. Datatone: Managing ambiguity in natural language interfaces for data visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 489–500.
    [29]
    Kedar Dhamdhere, Kevin S. McCurley, Ralfi Nahmias, Mukund Sundararajan, and Qiqi Yan. 2017. Analyza: Exploring data with conversation. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 493–504.
    [30]
    Enamul Hoque, Vidya Setlur, Melanie Tory, and Isaac Dykeman. 2017. Applying pragmatics principles for interaction with visual analytics. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 309–318.
    [31]
    Melanie Tory and Vidya Setlur. 2019. Do what I mean, not what I say! Design considerations for supporting intent and context in analytical conversation. In 2019 IEEE Conference on Visual Analytics Science and Technology (VAST’19). IEEE, 93–103.
    [32]
    Adam Blum. 1999. Microsoft English query 7.5: Automatic extraction of semantics from relational databases and OLAP cubes. In VLDB, Vol. 99. 247–248.
    [33]
    Ana-Maria Popescu, Oren Etzioni, and Henry Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces. 149–157.
    [34]
    Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R. Mittal, and Fatma Özcan. 2016. ATHENA: An ontology-driven system for natural language querying over relational data stores. Proceedings of the VLDB Endowment 9, 12 (2016), 1209–1220.
    [35]
    Antonio Messina, Agnese Augello, Giovanni Pilato, and Riccardo Rizzo. 2017. BioGraphBot: A conversational assistant for bioinformatics graph databases. In International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing. Springer, 135–146.
    [36]
    Antonino Fiannaca, Massimo La Rosa, Laura La Paglia, Antonio Messina, and Alfonso Urso. 2016. BioGraphDB: A new GraphDB collecting heterogeneous data for bioinformatics analysis. In Proceedings of BIOTECHNO.
    [37]
    Walter Ritzel Paixão-Côrtes, Vanessa Stangherlin Machado Paixão-Côrtes, Cristiane Ellwanger, and Osmar Norberto de Souza. 2019. Development and usability evaluation of a prototype conversational interface for biological information retrieval via bioinformatics. In International Conference on Human-Computer Interaction. Springer, 575–593.
    [38]
    Rogers Jeffrey Leo John, Navneet Potti, and Jignesh M. Patel. 2017. Ava: From data to insights through conversations. In CIDR.
    [39]
    Norbert E. Fuchs and Rolf Schwitter. 1995. Specifying logic programs in controlled natural language. arXiv preprint cmp-lg/9507009 (1995).
    [40]
    Ethan Fast, Binbin Chen, Julia Mendelsohn, Jonathan Bassen, and Michael S. Bernstein. 2018. Iris: A conversational agent for complex tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.
    [41]
    Daniel Vanderveken. 1990. Meaning and Speech Acts: Volume 1, Principles of Language Use. Cambridge University Press.
    [42]
    Marco Masseroli, Abdulrahman Kaitoua, Pietro Pinoli, and Stefano Ceri. 2016. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. Methods 111 (2016), 3–11.
    [43]
    Anna Bernasconi, Stefano Ceri, Alessandro Campi, and Marco Masseroli. 2017. Conceptual modeling for genomics: Building an integrated repository of open data. In Conceptual Modeling, Heinrich C. Mayr, Giancarlo Guizzardi, Hui Ma, and Oscar Pastor (Eds.). Springer International Publishing, Cham, 325–339.
    [44]
    Anna Bernasconi, Arif Canakoglu, and Stefano Ceri. 2019. From a conceptual model to a knowledge graph for genomic datasets. In Conceptual Modeling, Alberto H.F. Laender, Barbara Pernici, Ee-Peng Lim, and José Palazzo M. de Oliveira (Eds.). Springer International Publishing, Cham, 352–360.
    [45]
    Sumit Raj. 2018. Building chatbots with python. In Using Natural Language Processing and Machine Learning. Apress.
    [46]
    Thierry Desot, Stefania Raimondo, Anastasia Mishakova, François Portet, and Michel Vacher. 2018. Towards a french smart-home voice command corpus: Design and NLU experiments. In International Conference on Text, Speech, and Dialogue. Springer, 509–517.
    [47]
    Srimoyee Bhattacharyya, Soumi Ray, and Monalisa Dey. 2020. Context-aware conversational agent for a closed domain task. In Proceedings of the Global AI Congress 2019. Springer, 303–318.
    [48]
    Luca Nanni, Pietro Pinoli, Arif Canakoglu, and Stefano Ceri. 2019. PyGMQL: Scalable data extraction and analysis for heterogeneous genomic datasets. BMC Bioinformatics 20, 1 (2019), 560.
    [49]
    Ted Boren and Judith Ramey. 2000. Thinking aloud: Reconciling theory and practice. IEEE Transactions on Professional Communication 43, 3 (2000), 261–278.
    [50]
    Greg Guest, Kathleen M. MacQueen, and Emily E. Namey. 2011. Applied Thematic Analysis. Sage Publications.

    Cited By

    View all
    • (2024)Roles, Users, Benefits, and Limitations of Chatbots in Health Care: Rapid ReviewJournal of Medical Internet Research10.2196/5693026(e56930)Online publication date: 23-Jul-2024
    • (2024)Democratizing Data Science:Using Language Models for Intuitive Data Insights and Visualizations2024 4th International Conference on Pervasive Computing and Social Networking (ICPCSN)10.1109/ICPCSN62568.2024.00177(1065-1069)Online publication date: 3-May-2024
    • (2024)Transparent, Low Resource, and Context-Aware Information Retrieval From a Closed Domain Knowledge BaseIEEE Access10.1109/ACCESS.2024.338000612(44233-44243)Online publication date: 2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Computing for Healthcare
    ACM Transactions on Computing for Healthcare  Volume 3, Issue 1
    January 2022
    255 pages
    EISSN:2637-8051
    DOI:10.1145/3485154
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2021
    Accepted: 01 April 2021
    Revised: 01 March 2021
    Received: 01 July 2020
    Published in HEALTH Volume 3, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Conversational agents
    2. natural language understanding
    3. genomic computing

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • ERC Advanced

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)192
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Roles, Users, Benefits, and Limitations of Chatbots in Health Care: Rapid ReviewJournal of Medical Internet Research10.2196/5693026(e56930)Online publication date: 23-Jul-2024
    • (2024)Democratizing Data Science:Using Language Models for Intuitive Data Insights and Visualizations2024 4th International Conference on Pervasive Computing and Social Networking (ICPCSN)10.1109/ICPCSN62568.2024.00177(1065-1069)Online publication date: 3-May-2024
    • (2024)Transparent, Low Resource, and Context-Aware Information Retrieval From a Closed Domain Knowledge BaseIEEE Access10.1109/ACCESS.2024.338000612(44233-44243)Online publication date: 2024
    • (2023)Ask Your Data—Supporting Data Science Processes by Combining AutoML and Conversational InterfacesIEEE Access10.1109/ACCESS.2023.327250311(45972-45988)Online publication date: 2023
    • (2023)InteractivityNatural Language Interfaces to Databases10.1007/978-3-031-45043-3_7(177-229)Online publication date: 25-Nov-2023
    • (2022)Enhancing Conversational Troubleshooting with Multi-modality: Design and ImplementationChatbot Research and Design10.1007/978-3-031-25581-6_7(103-117)Online publication date: 22-Nov-2022
    • (2022)Model, Integrate, Search... Repeat: A Sound Approach to Building Integrated Repositories of Genomic DataSpecial Topics in Information Technology10.1007/978-3-030-85918-3_8(89-99)Online publication date: 1-Jan-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media