Abstract
Currently, the dominant technology for providing non-technical users with access to Linked Data is keyword-based search. This is problematic because keywords are often inadequate as a means for expressing user intent. In addition, while a structured query language can provide convenient access to the information needed by advanced analytics, unstructured keyword-based search cannot meet this extremely common need. This makes it harder than necessary for non-technical users to generate analytics. We address these difficulties by developing a natural language-based system that allows non-technical users to create well-formed questions. Our system, called TR Discover, maps from a fragment of English into an intermediate First Order Logic representation, which is in turn mapped into SPARQL or SQL. The mapping from natural language to logic makes crucial use of a feature-based grammar with full formal semantics. The fragment of English covered by the natural language grammar is domain specific and tuned to the kinds of questions that the system can handle. Because users will not necessarily know what the coverage of the system is, TR Discover offers a novel auto-suggest mechanism that can help users to construct well-formed and useful natural language questions. TR Discover was developed for future use with Thomson Reuters Cortellis, which is an existing product built on top of a linked data system targeting the pharmaceutical domain. Currently, users access it via a keyword-based query interface. We report results and performance measures for TR Discover on Cortellis, and in addition, to demonstrate the portability of the system, on the QALD-4 dataset, which is associated with a public shared task. We show that the system is usable and portable, and report on the relative performance of queries using SQL and SPARQL back ends.
Chapter PDF
Similar content being viewed by others
References
Athenikos, S.J., Han, H.: Biomedical question answering: A survey. Comput. Methods Prog. Biomed. 99(1), 1–24 (2010)
Cimiano, P., Haase, P., Heizmann, J., Mantel, M., Studer, R.: Towards portable natural language interfaces to knowledge bases - the case of the ORAKEL system. Data Knowl. Eng. 65(2), 325–354 (2008)
Cornea, R.C., Weininger, N.B.: Providing autocomplete suggestions (February 4, 2014). US Patent 8,645,825
d’Aquin, M., Motta, E.: Watson, more than a semantic web search engine. Semantic Web Journal 2(1), 55–63 (2011)
Demartini, G., Trushkowsky, B., Kraska, T., Franklin, M.J.: CrowdQ: crowdsourced query understanding. In: CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research (2013)
Ding, L., Finin, T.W., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the 2004 ACM International Conference on Information and Knowledge Management, pp. 652–659 (2004)
Hahn, R., Bizer, C., Sahnwaldt, C., Herta, C., Robinson, S., Bürgle, M., Düwiger, H., Scheel, U.: Faceted wikipedia search. In: Abramowicz, W., Tolksdorf, R. (eds.) BIS 2010. LNBIP, vol. 47, pp. 1–11. Springer, Heidelberg (2010)
Hamon, T., Grabar, N., Mougin, F., Thiessard, F.: Description of the POMELO system for the task 2 of QALD-2014. In: Working Notes for CLEF 2014 Conference, pp. 1212–1223 (2014)
Lehmann, J., Furche, T., Grasso, G., Ngomo, A.N., Schallhart, C., Sellers, A.J., Unger, C., Bühmann, L., Gerber, D., Höffner, K., Liu, D., Auer, S.: DEQA: deep web extraction for question answering. In: 11th International Semantic Web Conference, pp. 131–147 (2012)
Lin, R.T.K., Chiu, J.L., Dai, H., Tsai, R.T., Day, M., Hsu, W.: A supervised learning approach to biological question answering. Integrated Computer-Aided Engineering 16(3), 271–281 (2009)
Lopez, V., Pasin, M., Motta, E.: AquaLog: an ontology-portable question answering system for the semantic web. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 546–562. Springer, Heidelberg (2005)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 55–60 (2014)
Marginean, A.: GFMed: Question answering over biomedical linked data with grammatical framework. In: Working Notes for CLEF 2014 Conference, pp. 1224–1235 (2014)
Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced naive bayes model (2013). CoRR abs/1305.6143
Shekarpour, S., Ngomo, A.N., Auer, S.: Question answering on interlinked data. In: 22nd International World Wide Web Conference, pp. 1145–1156 (2013)
Song, D., Schilder, F., Smiley, C., Brew, C.: Natural language question answering and analytics for diverse and interlinked datasets. In: The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 101–105 (2015)
Tummarello, G., Delbru, R., Oren, E.: Sindice.com: weaving the open linked data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 552–565. Springer, Heidelberg (2007)
Unger, C., Bühmann, L., Lehmann, J., Ngomo, A.N., Gerber, D., Cimiano, P.: Template-based question answering over RDF data. In: Proceedings of the 21st World Wide Web Conference, pp. 639–648 (2012)
Unger, C., Forascu, C., Lopez, V., Ngomo, A.N., Cabrio, E., Cimiano, P., Walter, S.: Question answering over linked data (QALD-4). In: Working Notes for CLEF 2014 Conference, pp. 1172–1180 (2014)
Usbeck, R., Ngonga Ngomo, A.C., Bühmann, L., Unger, C.: HAWK - hybrid question answering over linked data. In: 12th Extended Semantic Web Conference (2015)
Yahya, M., Berberich, K., Elbassuoni, S., Weikum, G.: Robust question answering over the web of linked data. In: 22nd ACM International Conference on Information and Knowledge Management, pp. 1107–1116 (2013)
Yu, H., Cao, Y.G.: Using the weighted keyword model to improve information retrieval for answering biomedical questions. Summit on translational bioinformatics, p. 143 (2009)
Zhang, X., Song, D., Priya, S., Daniels, Z., Reynolds, K., Heflin, J.: Exploring linked data with contextual tag clouds. Journal of Web Semantics 24, 33–39 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Song, D. et al. (2015). TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets. In: Arenas, M., et al. The Semantic Web - ISWC 2015. ISWC 2015. Lecture Notes in Computer Science(), vol 9367. Springer, Cham. https://doi.org/10.1007/978-3-319-25010-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-25010-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25009-0
Online ISBN: 978-3-319-25010-6
eBook Packages: Computer ScienceComputer Science (R0)