1. Introduction
Despite the development of various technologies and systems using artificial intelligence (AI) to solve the problems related to disasters, difficult challenges still remain [
1]. The quantity of disaster data is vast and consistently varies. A disaster seriously affects human lives and property. Furthermore, it can cause critical damage to the country in which it occurs [
2]. Various studies have been proposed to prevent damage or predict disasters. Representatively, future atmospheric conditions are predicted using modeling and supercomputers. Disaster occurrence may also be predicted using AI technology based on previous disaster datasets [
3,
4,
5,
6,
7]. When a disaster occurs, attempts to predict future situations are carried out through learning using various features, such as estimation of damage to property and buildings as well as economic damage [
8,
9,
10]. These research results can be used to support decision making for disaster management organizations.
First of all, data are fundamental and essential for studies. However, most disaster data depend on the domain by disaster type, include heterogeneous data, and lack interoperability. In particular, disaster data or related information typically require a large number of and various types of datasets such as geographical data, rainfall data, building data, social property data, administrative district data, and so on. Because of these reasons, syntactic and sematic heterogeneity occurs despite several organizations having tried to cooperate with each other in terms of consistent data sharing [
11,
12]. Likewise, in the case of open data related to disasters, there are several issues in which the source and format of data are different because various data are collected by different organizations [
13,
14]. The vocabularies used for each domain are also inconsistent [
15]. Therefore, it is difficult to access and use critical data or resources related to a disaster in a timely manner [
16]. Owing to these reasons, it is hard to find and use suitable data for solving disaster issues.
As regards solving the abovementioned problems, previous studies have already suggested many methods to address disaster data silos and improve interoperability in terms of data management that defines and uses relationships between data [
17,
18,
19,
20]. The knowledge graph is one of the representative methods representing the relationship between data along with ontology. It is expressed as a triple consisting of a subject, a predicate, and an object. The subject and the object are represented by nodes as entities. The predicate represents the relationship between entities and is represented by edges. The knowledge graph is excellent at expressing the relationship between heterogeneous data in various domains. However, the previously proposed flooding disaster knowledge graphs are constructed using different standards and terms based on case studies for specific areas; hence, some issues are difficult to interoperate and have limitations in reusability and scalability.
In this study, we propose a knowledge graph for disaster data management related to flooding disasters using open data. Among several kinds of disaster, we focus herein on flood disaster data. As mentioned above, one of the main challenges is sharing and searching for suitable data for domain experts or researchers. Our proposed model improves interoperability, reusability, and scalability using universally existing concepts and schema in the knowledge graph. This study builds a richer concept and relations using actual open data in Korea.
The remainder of this paper is organized as follows:
Section 2 presents the background of the study and the related works for flooding disaster knowledge graphs;
Section 3 describes our proposed model and the knowledge graph for flooding disasters and presents the principal datasets selected from an open dataset related to flooding disasters;
Section 4 discusses the performed experiment; and finally,
Section 5 concludes this study.
2. Background and Related Works
This section introduces the background for the open knowledge graphs and summarizes previous studies on flooding disaster knowledge graphs.
2.1. Open Knowledge Graphs
The knowledge graph is excellent at expressing the relationship between large-scale heterogeneous data in multiple domains [
21,
22,
23]. The knowledge graph is expressed as a triple comprising a subject, a predicate, and an object. The subject and the object are represented by nodes as entities. The predicate represents the relationship between entities and is represented by edges. The knowledge graph, first proposed by Google in 2012, aims to improve search results by using semantic search information accumulated from various sources [
21]. It is currently used in semantic search engines, recommendation systems, voice search, question-and-answer systems, and AI systems in various domains.
Among the various previously proposed open knowledge graphs, representative studies of the cross-domain are Wikidata [
24], DBpedia [
25], and YAGO [
26]. Freebase was excluded in this study because it was shut down in 2017, and data have been transferred to Wikidata [
27]. Wikidata is a collaborative large-scale knowledge-graph-based global community [
24] that aims to provide information commonly used in Wikimedia projects, such as Wikipedia. Each item has a unique ID preceded by the item name with a Q (e.g., Q8068 (“Flood”)). Therefore, it can provide information in any language without needing translation. Consequently, Wikidata supports reusability and extensibility. DBpedia is an open knowledge graph built by extracting information from various Wikimedia projects [
25]. DBpedia is linked to other knowledge graphs, such as Wikidata, Freebase, and OpenCyc, and acts as a hub for linked data. DBpedia describes 4.58 million entities, including persons, places, creative works, organizations, species, and disease domains. DBpedia has the advantage of automatic modification as Wikipedia changes and supports multiple languages and various domains. Meanwhile, YAGO was automatically built from other sources, such as Wikipedia, GeoNames, and WordNet. YAGO3 provides more than 10 million entities and contains more than 120 million facts about these entities as of 2019. One of the advantages of YAGO is that it has a manually evaluated accuracy of 95% [
26].
2.2. Flooding Disaster Knowledge Graphs
Several studies have already proposed flood disaster knowledge graphs to solve the issue for heterogeneous data using ontology engineering [
17,
18,
19,
20]. In [
20], the authors proposed a data management and utilization framework for urban flooding. To manage heterogeneous data, the relationship between data was defined in terms of time (e.g., before, after, during, equal, meet, overlap, and disjoint), space, and semantics (e.g., is-a, part-of, equivalent-of, synonymy-of, and relevant-of). The influence index of the factors influencing the flood disaster was calculated using the proposed framework. In [
15], the authors approached the solution of data management and sharing problems in disaster response management. They designed a flood disaster response domain ontology using interlocking institutional worlds to solve this problem. Finally, [
19] developed an information-centric flood ontology for structural and easy access to critical information, such as disasters. The developed ontology can be used in cyber infrastructure systems for natural disaster preparedness, monitoring, response, and recovery. Furthermore, the proposed methodology can be easily integrated with the domain knowledge of the expert system and used in voice-enabled intelligent applications through a web-based information platform. As mentioned above, most of the previous studies related to flooding disaster knowledge graphs constructed knowledge graphs based on scenarios or case studies. Therefore, although the common main purpose is to solve the issues of heterogeneous data, the knowledge graph is defined with different terms and relationship definitions, which imposes the limitation of difficult reuse or integration. Fortunately, there are recent several studies resulting from the beAWARE project within the European Horizon 2020 program, which aims to integrate and harmonize heterogeneous data from various sources such as sensors, social media, general public data, and data from the first responders for real-time crisis management due to climate-related natural disasters. Those results provide useful ontologies that can refer to the area of knowledge-based disaster management [
27,
28]. However, not only the integration of data from various sources and formats but also the issue of standardization and interoperability are still ongoing because various organizations still use different terms and data fields despite data having the same meaning.
3. Proposed Model
This section presents our proposed model. First, we describe a data management model for flooding disasters using open datasets. We then define elements for building the knowledge graph in the model. We propose the knowledge graph for flooding disasters according to the model.
3.1. Data Management Model for Flooding Disasters
We propose herein a data management model for flooding disasters to build a knowledge graph using open datasets. We tried to consider the reusability, scalability, and interoperability of the knowledge graph in the other disaster domains. To this end, first of all, open data needs to be identified and refined for their data fields that are used differently for each organization. We analyzed the data fields of Korean open data, extracted common concepts of different data sets, and analyzed the relationship between them [
29]. Standardization of data fields is based on the extracted common concepts. For extensibility, our model reuses the vocabulary of the existing knowledge graph and schema using URI [
30].
Figure 1 summarizes the proposed model and defined elements.
As shown in
Figure 1, this model consists of the following eight elements:
Open dataset: Data providers provide the open data of various formats and domain and disaster data;
Collect data: We define the purpose or objective of solving the disaster domain preferentially. The heterogeneous open data related to flooding disasters are collected from several data providers;
Extract nodes and relations: On the basis of the collected data, the relationship between the data and the main data columns are analyzed to extract the concept, attribute, synonym, and semantic relationship in the flood disaster knowledge graph;
Domain expert: When extracting nodes and relationships based on data, domain experts provide criteria and scope;
Integrate schema: A general schema and an existing knowledge graph are used (e.g., schema.org, Wikidata, DBpedia, and owl vocabulary) to increase the reusability and the extensibility of the established knowledge graph;
Data scientists: The nodes, edges, and URIs for the knowledge graph are defined on the basis of the concepts and relationships extracted by domain experts;
Knowledge graph: Triple structure as an S-P-O type;
Graph database: The constructed knowledge graph is stored in the database, such as Neo4j, in graph form.
On the basis of this model, we collected 24 datasets related to real flooding disasters in Korea from an open data provider [
31].
Table 1 describes the principal datasets selected from the open dataset related to flooding disasters. As shown in
Table 1, we extracted the data columns from each dataset required for constructing the knowledge graph through an analysis of domain experts. We then integrated and used the existing knowledge graph concept and schema that are universally used terms. Finally, the knowledge graph was built and stored in a database.
3.2. Knowledge Graph for Flooding Disaster
This section presents the proposed flooding disaster knowledge graph. First of all, our knowledge graph was constructed to consider reusability and scalability. The flooding disaster knowledge graph defines a hierarchical skeleton structure between the flooding concept and the existing knowledge graph and schema (
Figure 2). We linked the concept by limiting only Wikidata, DBpedia, and schema.org among various knowledge graphs on the basis of the Flooding reference concepts in KBpedia [
32].
The flooding concept has the “same as” relationship with the Flood of Wikidata and has the same meaning. In addition, Wikidata’s Natural Disaster and High Tide exist as super classes of Flooding. In other words, all concepts in the flooding disaster knowledge graph are subordinate to these classes. According to Flooding reference concept of KBpedia, we can infer that Wikidata’s Hazard, DBpedia’s Activity and Event, and schema.org’s Action classes are closely related to the Flooding concept. We thus define their relationship as a new relationship called isCloselyRelated. Therefore, the flooding disaster knowledge graph improves the interoperability with existing knowledge bases, search accessibility, and expandability by linking and reusing the general schema with the terms of the existing knowledge graph.
We now explore the principal relationships with Flooding based on the collected open data (
Figure 3). The three relationships first applied are “has cause”, “has effect”, and “subclass of”. Using these relationships, we can expand the information from flooding to damage to human lives and property as well as other kinds of natural disasters, such as earthquakes and typhoons. Although the knowledge graphs developed in this study may appear similar to those released for other reference sites, we incorporated many details based on real open data provided by the Korean government. Moreover, the use of similar or the same relationships allows us to easily link our knowledge graphs with other references, which is a good characteristic of our knowledge graphs in terms of compatibility.
We first analyzed what causes flooding while considering the causes of domestic and outside water immersion. We could then provide the natural disaster information and status of essential facilities near rivers and residential areas in addition to the list of flooding cases and definitions. We utilized the external data built in a knowledge graph format as much as possible. The left side of
Figure 3 shows how we linked the external knowledge graph released by Wikidata. Moreover, we linked the information of possible damage due to flooding to the flooding data. For more detailed information, we included local and national open data; hence, our knowledge graphs are expected to be useful for residents and policy makers at the local region of interest. Accordingly, one can obtain more fruitful flooding information from those linked data with meaningful relationships within our knowledge graph. Those links can be further expanded to various natural disasters based on geophysical background knowledge, such as rainfall, typhoons, summer monsoons, storms, waves, tides, tsunamis, and earthquakes. We expect that these knowledge graphs can be widely used for a range of relevant data and knowledge. The relationships of “has cause” and “has effect” released by Wikidata were considered in our knowledge graph.
4. Experimental Results and Discussion
We performed an experiment to evaluate the proposed flooding disaster knowledge graph. The experiment aimed to show the usability and the interoperability of search results. Accordingly, the experiment compared the search results of open datasets (
data.go.kr), Wikidata, and our knowledge graph according to the query types. We defined simple query types using the concept and relation of the knowledge graph. The query types also consisted of a few questions, but these questions cannot be directly queried from the experimental data. Therefore, we translated each suitable query language or retrieval method such as keyword mapping or manual searching.
Table 2 shows three questions and query results.
The result of Q1, which is a question to find a concept, showed 21 datasets as the search results when the open datasets were searched for flooding disaster data. We can find datasets such as Namdong-gu, Incheon_Status of flood risk areas, and Korea National Territory Information Corporation Flooding trace information flooding level line (annual). Wikidata returned 338 triples, whereas the flooding disaster knowledge graph returned 360 triples, including Wikidata triples. Q2 and Q3 are questions about the relationship between the data. In the case of open datasets, no query results were provided for Q2 and Q3 because no relation existed between the data. However, Wikidata showed five search results in Q2, whereas the flooding disaster knowledge graph returned more query results, including the search results of Wikidata. In the case of Q3, our knowledge graph returned more query results than did Wikidata.
We proposed a model by reusing and extending the existing knowledge graph. In summary, the proposed model provided more information and improved the interoperability in the flooding disaster domain.
The knowledge graphs shown here were based on the data written in ASCII format. However, many more data on natural disasters have been written in binary format. Many operational weather/environment service centers in the world generate high-resolution global weather and environmental prediction data from their numerical forecasting models every day. We have an increasing number of data obtained from remote sensing instruments. Undoubtedly, the numerical forecast and satellite data from remote sensing instruments are useful, vast, and qualified for studies in the field of natural disasters. However, they are usually written in NetCDF and/or HDF5 formats, which are forms of binary format, and we did not collect any binary data for the knowledge graph development in this study. We will certainly address how to utilize such valuable data for our knowledge graphs in the future to give users of our knowledge graph more benefits.
5. Conclusions and Future Works
This study proposed a data management model and a knowledge graph for flooding disasters using open datasets. Among several kinds of disasters, we focused herein on flood disaster data. We mainly aimed to share and search for suitable data for domain experts or researchers. To this end, our knowledge graph defined a hierarchical skeleton structure between the Flooding concept and the existing knowledge graph and schema and expanded the concept and the relation using open data. We performed an experiment to evaluate the usability and interoperability of the query results. In summary, our proposed model improves interoperability, reusability, and scalability using universally existing concepts and schema in the knowledge graph. Furthermore, we built a richer concept and relations using the actual open data in Korea. The contribution of this study is to define a domain knowledge graph using the Korean open dataset. We defined and described the causes and effects of outside water and domestic immersion in consideration of not only the natural disaster flood data but also the geographical characteristics of Korea. We expect that our efforts to integrate open data related to floods will help users to solve and manage issues of disasters induced by flooding. Furthermore, this study opens up the possibility of extensibility to other disaster domains associated with flooding.
For the future research, various disaster domain data could be investigated or integrated. In addition, as mentioned in the discussion, we should expand the flooding disaster knowledge graph using various data formats.
Author Contributions
Conceptualization, J.S.; methodology, J.S.; investigation, H.-S.S.; resources, J.-S.K.; data curation, C.-S.L. and H.-S.S.; writing—original draft preparation, J.S.; writing—review and editing, J.-S.K.; visualization, J.-S.K.; project administration, C.-S.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was conducted with the support of Open Data Solutions (DDS) Convergence Research Program funded by the National Research Council of Science and Technology “Development of solutions for region issues based on public data using AI technology -Focused on the actual proof research for realizing safe and reliable society-”.
Data Availability Statement
Data available in a publicly accessible repository that does not issue DOIs (Publicly available datasets were analyzed in this study. This data can be found here: [
https://www.data.go.kr], accessed on 10 May 2021).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Castillo, C. Big Crisis Data: Social Media in Disasters and Time-Critical Situations; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Khan, A.; Gupta, S.; Gupta, S.K. Multi-hazard disaster studies: Monitoring, detection, recovery, and management, based on emerging technologies and optimal techniques. Int. J. Disaster Risk Reduct. 2020, 101642. [Google Scholar] [CrossRef]
- Kim, S.; Kang, J.S.; Lee, M.; Song, S.K. DeepTC: ConvLSTM network for trajectory prediction of tropical cyclone using spatiotemporal atmospheric simulation data. OpenReview Net. 2018. Available online: https://openreview.net/forum?id=HJlCVoPAF7 (accessed on 9 May 2021).
- Kang, J.S.; Lee, S.K.; Choi, K.S. Computational Efficiency Examination of a Regional Numerical Weather Prediction Model using KISTI Supercomputer NURION. Turk. J. Comput. Math. Educ. 2021, 743–749. [Google Scholar] [CrossRef]
- Mizielinski, M.S.; Roberts, M.J.; Vidale, P.L.; Schiemann, R.; Demory, M.-E.; Strachan, J.; Edwards, T.; Stephens, A.; Lawrence, B.N.; Pritchard, M.; et al. High-resolution global climate modelling: The UPSCALE project, a large simulation campaign. Geosci. Model. Dev. 2014, 7, 1629–1640. [Google Scholar] [CrossRef] [Green Version]
- Cardoso, R.M.; Soares, P.M.M.; Miranda, P.M.A.; Belo-Pereira, M. WRF high resolution simulation of Iberian mean and extreme precipitation climate. Int. J. Climatol. 2013, 33, 2591–2608. [Google Scholar] [CrossRef]
- Cunden, T.M.; Dhunny, A.Z.; Lollchund, M.R.; Rughooputh, S.D.D.V. Sensitivity Analysis of WRF Model for Wind Modelling Over a Complex Topography under Extreme Weather Conditions. In Proceedings of the 2018 5th International Symposium on Environment-Friendly Energies and Applications (EFEA), Rome, Italy, 24–26 September 2018; IEEE: Piscataway Township, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Lee, M.; Hong, J.H.; Kim, K.Y. Estimating damage costs from natural disasters in Korea. Nat. Hazards Rev. 2017, 18, 04017016. [Google Scholar] [CrossRef]
- Murnane, R.J.; Elsner, J.B. Maximum wind speeds and US hurricane losses. Geophys. Res. Lett. 2012, 39. [Google Scholar] [CrossRef] [Green Version]
- Pielke, R.A.; Downton, M.W. Precipitation and damaging floods: Trends in the United States, 1932–1997. J. Clim. 2000, 13, 3625–3637. [Google Scholar] [CrossRef] [Green Version]
- Lettieri, E.; Masella, C.; Radaelli, G. Disaster management: Findings from a systematic review. Disaster Prev. Manag. Int. J. 2009. [Google Scholar] [CrossRef]
- Mansourian, A.; Rajabifard, A.; Zoej, M.V.; Williamson, I. Using SDI and web-based system to facilitate disaster management. Comput. Geosci. 2006, 32, 303–315. [Google Scholar] [CrossRef]
- Li, J.; Li, Q.; Liu, C.; Khan, S.U.; Ghani, N. Community-based collaborative information system for emergency management. Comput. Oper. Res. 2004, 42, 116–124. [Google Scholar] [CrossRef]
- Hristidis, V.; Chen, S.C.; Li, T.; Luis, S.; Deng, Y. Survey of data management and analysis in disaster situations. J. Syst. Softw. 2010, 83, 1701–1714. [Google Scholar] [CrossRef]
- Khantong, S.; Ahmad, M.N. An Ontology for Sharing and Managing Information in Disaster Response: In Flood Response Usage Scenarios. J. Data Semant. 2019, 1–14. [Google Scholar] [CrossRef]
- Purohit, H.; Kanagasabai, R.; Deshpande, N. Towards Next Generation Knowledge Graphs for Disaster Management. In Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, 30 January–1 February 2019. [Google Scholar]
- Liu, P.; Huang, Y.; Wang, P.; Zhao, Q.; Nie, J.; Tang, Y.; Sun, L.; Wang, H.; Wu, X.; Li, W. Construction of Typhoon Disaster Knowledge Graph Based on Graph Database Neo4j. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; IEEE: Piscataway Township, NJ, USA, 2020; pp. 3612–3616. [Google Scholar]
- De Wrachien, D.; Garrido, J.; Mambretti, S.; Requena, I. Ontology for flood management: A proposal. WIT Trans. Ecol. Environ. 2012, 159, 3–13. [Google Scholar]
- Sermet, Y.; Demir, I. Towards an information centric flood ontology for information management and communication. Earth Sci. Inform. 2019, 12, 541–551. [Google Scholar] [CrossRef]
- Wu, Z.; Shen, Y.; Wang, H.; Wu, M. An ontology-based framework for heterogeneous data management and its application for urban flood disasters. Earth Sci. Inform. 2020, 1–14. [Google Scholar] [CrossRef]
- Singhal, A. Introducing the Knowledge Graph: Things, Not Strings. Off. Blog Google 2012. Available online: https://www.benton.org/headlines/introducing-knowledge-graph-things-not-strings (accessed on 9 May 2021).
- Pujara, J.; Miao, H.; Getoor, L.; Cohen, W. Knowledge Graph Identification. In Proceedings of the International Semantic Web Conference, Sydney, Australia, 21–25 October 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 542–557. [Google Scholar]
- Fensel, D.; Şimşek, U.; Angele, K.; Huaman, E.; Kärle, E.; Panasiuk, O.; Toma, I.; Umbrich, J.; Wahler, A. Introduction: What Is a Knowledge Graph? In Knowledge Graphs; Springer: Cham, Switzerland, 2020; pp. 1–10. [Google Scholar]
- Vrandečić, D.; Krötzsch, M. Wikidata: A free collaborative knowledgebase. Commun. ACM 2014, 57, 78–85. [Google Scholar] [CrossRef]
- Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. Dbpedia: A Nucleus for a Web of Open Data. In The Semantic Web; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar]
- Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar]
- Hellmund, T.; Schenk, M.; Hertweck, P.; Moßgraber, J. Employing Geospatial Semantics and Semantic Web Technologies in Natural Disaster Management. In Proceedings of the International Conference on Semantic Systems (SEMANTiCS), Karlsruhe, Germany, 9–12 September 2019. [Google Scholar]
- Hilbring, D.; Moßgraber, J.; Hertweck, P.; Hellmund, T. Harmonizing Data Collection in an Ontology for a Risk Management Platform. In Proceedings of the Int. Conf. Inform. Environ. Protection 2018, Pretoria, South Africa, 15–16 August 2018; pp. 1–6. [Google Scholar]
- Kim, H. Metadata Analysis of Open Government Data by Formal Concept Analysis. J. Korea Contents Assoc. 2018, 18, 305–313. [Google Scholar]
- Kim, H. Interlinking open government data in Korea using administrative district knowledge graph. J. Inf. Sci. Theory Pract. 2018, 6, 18–30. [Google Scholar]
- Hwang, Y.Y.; Son, J.; Yuk, J.H.; Shin, S.; Choi, K.S. A Spatial Dataset Knowledge Model for Knowledge Navigation. Int. J. Electr. Eng. Educ. 2021. accepted. [Google Scholar]
- KBpedia. Available online: http://kbpedia.org (accessed on 9 May 2021).
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).