Abstract
This paper discusses the problem of building a comprehensive information retrieval system that facilitates the decision-making process in a specified wide topic. We analyze the requirements for such a system, types of information sources, and typical search queries and propose an architecture and an integrated pipeline. We also present a case study in the field of Arctic exploration (oil & mining, ecology issues, etc.). The results are also presented, including vibrant topics and typical associations between entities.
Similar content being viewed by others
References
Imran, M., et al., Processing social media messages in mass emergency: A survey, ACM Comput. Surv., 2015, vol. 47, no. 4, p. 67.
Petrovic, S., Real-Time Event Detection in Massive Streams, 2013.
Li, R., et al., Tedas: A twitter-based event detection and analysis system, 2012 IEEE 28th International Conference on Data Engineering (ICDE), 2012, pp. 1273–1276.
Li Zheng, Chao Shen, Liang Tang, et al., Disaster SitRep–A vertical search engine and information analysis tool in disaster management domain, Proceedings of 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI), 2012, pp. 457–465.
Ashktorab, Z., Brown, C., Nandi, M., and Culotta, A., Tweedr: Mining Twitter to inform disaster response, Proceedings of ISCRAM, 2014, pp. 354–358.
Xiaohua, L., Shaodian, Zh., Furu, W., and Ming, Zh., Recognizing named entities in tweets, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, 2011, pp. 359–367.
Bhattacharya, A., Tiwari, M.K., and Harding, J.A., A framework for ontology based decision support system for e-learning modules, business modeling and manufacturing systems, J. Intell. Manuf., 2012, vol. 23, no. 5, 1763–1781.
Rao, L., Mansingh, G., and Osei-Bryson, K.M., Building ontology based knowledge maps to assist business process re-engineering, Decis. Support Syst., 2012, vol. 52, no. 3, pp. 577–589.
Hersovici, M., et al., The shark-search algorithm. An application: Tailored web site mapping, Comput. Networks ISDN Syst., 1998, vol. 30, no. 1, pp. 317–326.
Chen, Z., et al., An improved shark-search algorithm based on multi-information, IEEE Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, 2007, vol. 4, pp. 659–658.
Su, C., et al., An efficient adaptive focused crawler based on ontology learning, IEEE Fifth International Conference on Hybrid Intelligent Systems, 2005. HIS'05, 2005, p. 6.
Liu, H., Janssen, J., and Milios, E., Using HMM to learn user browsing patterns for focused web crawling, Data Knowl. Eng., 2006, vol. 59, no. 2, pp. 270–291.
Blanvillain, O., Kasioumis, N., and Banos, V., Blog-Forever Crawler: Techniques and algorithms to harvest modern weblogs, Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14), ACM, 2014, p. 7.
Florian, R., et al., Named entity recognition through classifier combination, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 2003, vol. 4, pp. 168–171.
Al-Rfou, R., et al., Polyglot-NER: Massive multilingual named entity recognition, Proceedings of the 2015 SIAM International Conference on Data Mining, Vancouver, 2015.
Wikipedia. http://wikipedia.org. Cited January 20, 2016.
Bollacker, K., et al., Freebase: A collaboratively created graph database for structuring human knowledge, Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008, pp. 1247–1250.
Manning, C.D., et al., Introduction to Information Retrieval, Cambridge: Cambridge University Press, 2008, vol. 1, p. 496.
Sochenkov, I.V. and Suvorov, R.E., Services of full-text search in the information-analytical system (Part 1), Inf. Tekhnol. Vychisl. Sist., 2013, no. 2, pp. 69–78.
Takase, S., Okazaki, N., and Inui, K., Fast and Large-Scale Unsupervised Relation Extraction, 2015.
Angeli, G., Premkumar, M.J., and Manning, C.D., Leveraging linguistic structure for open domain information extraction, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL, 2015, pp. 26–31.
TAC Knowledge Base Population, NIST Information Technology Laboratory, 2015. http://www.nist.gov/tac/2015/KBP/. Cited January 20, 2016.
Hoffmann, R., et al., Knowledge-based weak supervision for information extraction of overlapping relations, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011, vol. 1, pp. 541–550.
Scrapy. A Fast and Powerful Scraping and Web Crawling Framework. http://scrapy.org/. Cited January 20, 2016.
Osipov, G., et al., Relational-situational method for intelligent search and analysis of scientific publications, Proceedings of the Integrating IR Technologies for Professional Search Workshop, 2013, pp. 57–64.
Agrawal, R., Imielinski, T., and Swami, A., Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993, vol. 22, pp. 207–216.
Blei, D.M., Probabilistic topic models, Commun. ACM, 2012, vol. 55, no. 4, pp. 77–84.
Devyatkin, D.A., Suvorov, R.E., and Sochenkov, I.V., A method of thematic clustering of large-scale collections of scientific and technical documents, Inf. Tekhnol. Vychisl. Sist., 2013, no. 1, pp. 33–42.
Haklay, M. and Weber, P., Openstreetmap: User-generated street maps, Pervasive Comput., 2008, vol. 7, no. 4, pp. 12–18.
Titan: Distributed Graph Database, DataStax, 2016. http://thinkaurelius.github.io/titan/. Cited January 20, 2016.
Lakshman, A. and Malik, P., Cassandra: A decentralized structured storage system, ACM SIGOPS Oper. Syst. Rev., 2010, vol. 44, no. 2, pp. 35–40.
Joishi, J. and Sureka, A., Vishleshan: Performance Comparison and Programming Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages, 2015.
Rodriguez, M.A., The Gremlin graph traversal machine and language (invited talk), Proceedings of the 15th Symposium on Database Programming Languages, 2015, pp. 1–10.
Aho, A.V. and Corasick, M.J., Efficient string matching: An aid to bibliographic search, Commun. ACM, 1975, vol. 18, no. 6, pp. 333–340.
Al-Rfou, R., Perozzi, B., and Skiena, S., Polyglot: Distributed word representations for multilingual nlp, arXiv Preprint arXiv:1307.1662, 2013.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © D.A. Devyatkin, R.E. Suvorov, I.V. Sochenkov, 2016, published in Iskusstvennyi Intellekt i Prinyatie Reshenii, 2016, No. 1, pp. 37–46.
About this article
Cite this article
Devyatkin, D.A., Suvorov, R.E. & Sochenkov, I.V. An Information Retrieval System for Decision Support: An Arctic-Related Mass Media Case Study. Sci. Tech. Inf. Proc. 44, 329–337 (2017). https://doi.org/10.3103/S0147688217050033
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0147688217050033