Crisis and Disaster Situations on Social Media Streams: An Ontology-Based Knowledge Harvesting Approach

Crisis and Disaster Situations on Social Media Streams: An Ontology-Based Knowledge Harvesting Approach

Interdisciplinary Journal of Information, Knowledge, and Management, 2019

Aim/Purpose: Vis-à-vis management of crisis and disaster situations, this paper focuses on important use cases of social media functions, such as information collection & dissemination, disaster event identification & monitoring, collaborative problem-solving mechanism, and decision-making process. With the prolific utilization of disaster-based ontological framework, a strong disambiguation system is realized, which further enhances the searching capabilities of the user request and provides a solution of unambiguous in nature. Background: Even though social media is information-rich, it has created a challenge for deriving a decision in critical crisis-related cases. In order to make the whole process effective and avail quality decision making, sufficiently clear semantics of such information is necessary, which can be supplemented through employing semantic web technologies. Methodology: This paper evolves a disaster ontology-based system availing a framework model for monitorin......Read more

Volume 14, 2019 Accepting Editor Iris A Humala │ Received: June 23, 2019│ Revised: August 19, August 21, 2019 │ Accepted: August 22, 2019. Cite as: Narayanasamy, S., Muruganantham. D. & Elçi, A. (2019). Crisis and disaster situations on social media streams: An ontology-based knowledge harvesting approach. Interdisciplinary Journal of Information, Knowledge, and Management, 14, 343-366. https://doi.org/10.28945/4420 (CC BY-NC 4.0) This article is licensed to you under a Creative Commons Attribution-NonCommercial 4.0 International License. When you copy and redistribute this paper in full or in part, you need to provide proper attribution to it to ensure that others can later locate this work (and to ensure that others do not accuse you of plagiarism). You may (and we encour- age you to) adapt, remix, transform, and build upon the material for any non-commercial purposes. This license does not permit you to use this material for commercial purposes. CRISIS AND DISASTER SITUATIONS ON SOCIAL MEDIA STREAMS: AN ONTOLOGY-BASED KNOWLEDGE HARVESTING APPROACH SenthilKumar Narayanasamy* Vellore Institute of Technology, Vellore, Tamil Nadu, India senthilkumar.n@vit.ac.in Dinakaran Muruganantham Vellore Institute of Technology, Vellore, Tamil Nadu, India dinakaran.m@vit.ac.in Atilla Elçi Aksaray University, Merkez, Aksaray, Turkey atilla.elci@gmail.com * Corresponding author ABSTRACT Aim/Purpose Vis-à-vis management of crisis and disaster situations, this paper focuses on important use cases of social media functions, such as information collection & dissemination, disaster event identification & monitoring, collaborative problem-solving mechanism, and decision-making process. With the prolific utilization of disaster-based ontological framework, a strong disambiguation system is realized, which further enhances the searching capabilities of the user request and provides a solution of unambiguous in nature. Background Even though social media is information-rich, it has created a challenge for deriving a decision in critical crisis-related cases. In order to make the whole process effective and avail quality decision making, sufficiently clear seman- tics of such information is necessary, which can be supplemented through employing semantic web technologies. Methodology This paper evolves a disaster ontology-based system availing a framework model for monitoring uses of social media during risk and crisis-related events. The proposed system monitors a discussion thread discovering whether it has reached its peak or decline after its root in the social forum like Twitter. The content in social media can be accessed through two typical ways: Search Application Program Interfaces (APIs) and Streaming APIs. These two kinds of API processes can be used interchangeably. News con- tent may be filtered by time, geographical region, keyword occurrence and

Crisis and Disaster Situations on Social Media Streams 344 availability ratio. With the support of disaster ontology, domain knowledge extraction and comparison against all possible concepts are availed. Besides, the proposed method makes use of SPARQL to disambiguate the query and yield the results which produce high precision. Contribution The model provides for the collection of crisis-related temporal data and decision making through semantic mapping of entities over concepts in a disaster ontology we developed, thereby disambiguating potential named entities. Results of empirical testing and analysis indicate that the proposed model outperforms similar other models. Findings Crucial findings of this research lie in three aspects: (1) Twitter streams and conventional news media tend to offer almost similar types of news coverage for a specified event, but the rate of distribution among topics/categories differs. (2) On specific events such as disaster, crisis or any emergency situa- tions, the volume of information that has been accumulated between the two news media stands divergent and filtering the most potential information poses a challenging task. (3) Relational mapping/co-occurrence of terms has been well designed for conventional news media, but due to shortness and sparseness of tweets, there remains a bottleneck for researchers. Recommendations for Practitioners Though metadata avails collaborative details of news content and it has been conventionally used in many areas like information retrieval, natural language processing, and pattern recognition, there is still a lack of fulfillment in se- mantic aspects of data. Hence, the pervasive use of ontology is highly sug- gested that build semantic-oriented metadata for concept-based modeling, information flow searching and knowledge exchange. Recommendation for Researchers The strong recommendation for researchers is that instead of heavily relying on conventional Information Retrieval (IR) systems, one can focus more on ontology for improving the accuracy rate and thereby reducing ambiguous terms persisting in the result sets. In order to harness the potential infor- mation to derive the hidden facts, this research recommends clustering the information from diverse sources rather than pruning a single news source. It is advisable to use a domain ontology to segregate the entities which pose ambiguity over other candidate sets thus strengthening the outcome. Impact on Society The objective of this research is to provide informative summarization of happenings such as crisis, disaster, emergency and havoc-based situations in the real world. A system is proposed which provides the summarized views of such happenings and corroborates the news by interrelating with one another. Its major task is to monitor the events which are very booming and deemed important from a crowd’s perspective. Future Research In the future, one shall strive to help to summarize and to visualize the po- tential information which is ranked high by the model. Keywords disaster management, social media, ontological support, semantic search, SPARQL, RDF INTRODUCTION For long, there has been a huge demand to develop an efficient mechanism to effectively search and extract much-needed information from the social web. Manual annotation is effectively possible in information retrieval for a limited number of documents, but impractical for a large accumulation of content, particularly in social media. And moreover, automatic annotation processes are in an infant

Volume 14, 2019 CRISIS AND DISASTER SITUATIONS ON SOCIAL MEDIA STREAMS: AN ONTOLOGY-BASED KNOWLEDGE HARVESTING APPROACH SenthilKumar Narayanasamy* Vellore Institute of Technology, Vellore, Tamil Nadu, India senthilkumar.n@vit.ac.in Dinakaran Muruganantham Vellore Institute of Technology, Vellore, Tamil Nadu, India dinakaran.m@vit.ac.in Atilla Elçi Aksaray University, Merkez, Aksaray, Turkey atilla.elci@gmail.com * Corresponding author ABSTRACT Aim/Purpose Vis-à-vis management of crisis and disaster situations, this paper focuses on important use cases of social media functions, such as information collection & dissemination, disaster event identification & monitoring, collaborative problem-solving mechanism, and decision-making process. With the prolific utilization of disaster-based ontological framework, a strong disambiguation system is realized, which further enhances the searching capabilities of the user request and provides a solution of unambiguous in nature. Background Even though social media is information-rich, it has created a challenge for deriving a decision in critical crisis-related cases. In order to make the whole process effective and avail quality decision making, sufficiently clear semantics of such information is necessary, which can be supplemented through employing semantic web technologies. Methodology This paper evolves a disaster ontology-based system availing a framework model for monitoring uses of social media during risk and crisis-related events. The proposed system monitors a discussion thread discovering whether it has reached its peak or decline after its root in the social forum like Twitter. The content in social media can be accessed through two typical ways: Search Application Program Interfaces (APIs) and Streaming APIs. These two kinds of API processes can be used interchangeably. News content may be filtered by time, geographical region, keyword occurrence and Accepting Editor Iris A Humala │ Received: June 23, 2019│ Revised: August 19, August 21, 2019 │ Accepted: August 22, 2019. Cite as: Narayanasamy, S., Muruganantham. D. & Elçi, A. (2019). Crisis and disaster situations on social media streams: An ontology-based knowledge harvesting approach. Interdisciplinary Journal of Information, Knowledge, and Management, 14, 343-366. https://doi.org/10.28945/4420 (CC BY-NC 4.0) This article is licensed to you under a Creative Commons Attribution-NonCommercial 4.0 International License. When you copy and redistribute this paper in full or in part, you need to provide proper attribution to it to ensure that others can later locate this work (and to ensure that others do not accuse you of plagiarism). You may (and we encourage you to) adapt, remix, transform, and build upon the material for any non-commercial purposes. This license does not permit you to use this material for commercial purposes. Crisis and Disaster Situations on Social Media Streams availability ratio. With the support of disaster ontology, domain knowledge extraction and comparison against all possible concepts are availed. Besides, the proposed method makes use of SPARQL to disambiguate the query and yield the results which produce high precision. Contribution The model provides for the collection of crisis-related temporal data and decision making through semantic mapping of entities over concepts in a disaster ontology we developed, thereby disambiguating potential named entities. Results of empirical testing and analysis indicate that the proposed model outperforms similar other models. Findings Crucial findings of this research lie in three aspects: (1) Twitter streams and conventional news media tend to offer almost similar types of news coverage for a specified event, but the rate of distribution among topics/categories differs. (2) On specific events such as disaster, crisis or any emergency situations, the volume of information that has been accumulated between the two news media stands divergent and filtering the most potential information poses a challenging task. (3) Relational mapping/co-occurrence of terms has been well designed for conventional news media, but due to shortness and sparseness of tweets, there remains a bottleneck for researchers. Recommendations for Practitioners Though metadata avails collaborative details of news content and it has been conventionally used in many areas like information retrieval, natural language processing, and pattern recognition, there is still a lack of fulfillment in semantic aspects of data. Hence, the pervasive use of ontology is highly suggested that build semantic-oriented metadata for concept-based modeling, information flow searching and knowledge exchange. Recommendation for Researchers The strong recommendation for researchers is that instead of heavily relying on conventional Information Retrieval (IR) systems, one can focus more on ontology for improving the accuracy rate and thereby reducing ambiguous terms persisting in the result sets. In order to harness the potential information to derive the hidden facts, this research recommends clustering the information from diverse sources rather than pruning a single news source. It is advisable to use a domain ontology to segregate the entities which pose ambiguity over other candidate sets thus strengthening the outcome. Impact on Society The objective of this research is to provide informative summarization of happenings such as crisis, disaster, emergency and havoc-based situations in the real world. A system is proposed which provides the summarized views of such happenings and corroborates the news by interrelating with one another. Its major task is to monitor the events which are very booming and deemed important from a crowd’s perspective. Future Research In the future, one shall strive to help to summarize and to visualize the potential information which is ranked high by the model. Keywords disaster management, social media, ontological support, semantic search, SPARQL, RDF INTRODUCTION For long, there has been a huge demand to develop an efficient mechanism to effectively search and extract much-needed information from the social web. Manual annotation is effectively possible in information retrieval for a limited number of documents, but impractical for a large accumulation of content, particularly in social media. And moreover, automatic annotation processes are in an infant 344 SenthilKumar, Dinakaran, & Elçi stage. As Ritter, Etzioni and Clark (2012) indicated, automatic annotation has not reached its complete stage. There would be deemed requirements to properly utilize ontologies to precisely govern the types of knowledge to harvest. With the support of ontology (Sakaki, Okazaki, & Matsuo, 2010), domain knowledge extraction is very relevant and relates all possible concepts. Ontology-based knowledge extraction is expected to provide a boost to the domain of this research, disaster management. Hence, pervasive use of ontology has been highly suggested (Ha-Thuc, Mejova, Harris, & Srinivasan, 2010; Luo, Osborne, & Wang, 2012) that builds semantic oriented metadata for concept-based modeling, information flow searching and knowledge exchanging. Once the semantic aspect of metadata is built for the content available over the Web on a domain of interest, then it will provide common grounds for understanding and sharing the information, as well as increasing relevancy and reducing, perhaps even minimizing inherent ambiguities. In the past, there were projects aimed at scavenging Web content with exemplary results for semantically annotating the metadata for domain-specific semantic searches. Several noteworthy examples may be mentioned; for instance, systems like PlanetOnto, ArtEquAKT, sparse kernel learning continuous relevance model for image annotation (Moran & Lavrenko, 2014), and integration of linked data in Knowledge-Based Systems for process planning (Rehage, Joppen, & Gausemeier, 2016). However, when it comes to working with the fast transient contents of social networks, a different paradigm is needed where this research comes in to contribute. The next potential task of the operation is accessing the news contents from social media platforms. Though the task seems simple, it is inherently complicated due to interoperability conundrum and accessibility disabilities. Most of the social media platforms have accorded privileges to the programmers accessing the content through its appropriate APIs, but they differ from one another in what they provide and are mostly resource-limited (i.e., in the sense of the number of allowed request for a unit time). The content in the social media can be accessed through two typical ways: permitting users to access and archive the past messages, it is called Search APIs; and allowing users to subscribe for the real-time data feeds, it is known as Streaming APIs. These two kinds of API processes can be used interchangeably and allow expressing information needs, such as filtering the news content by time, geographical region, keyword occurrence and availability ratio (Celik, Abel, & Houben, 2011; Raman, Kuppusamy, Dorasamy, & Nair, 2014). The harvested data however still requires further pre-processing. DATA P RE -P ROCESSING AND N ORMALIZATION Though we have effective information extraction processes and well-established APIs to gather the news source content, there still is a pertinent need for preprocessing the extracted data. The news content extracted from social media can be pre-processed by natural language processing (NLP) toolkit. Common pre-processing operations are tokenization and labeling, part-of-speech tagging, semantic role labeling, dependency parsing and named entity linking (Kumar & Muruganantham, 2016; Lima, Espinasse, & Freitas, 2017). The next challenge on data pre-processing is to reduce the amount of data identifying and eliminating duplicate messages. It is not an easy task since every message posted in the social media can be valuable; de-duplicating the messages requires thorough clustering and then prioritizing them based on the event context. In order to prune this whole process, semantic-based technologies are used. Apart from that, there are various issues associated with handling social media messages. The most prominent issues are scalability and content. The scalability issue concerns Twitter stream size, volume, and velocity. Particularly during any large crisis or severe havoc (Otegi, Arregi, Ansa, & Agirre, 2015; Sakaki et al., 2010), a huge volume of tweets and millions of messages pertaining to that event may be posted. In these critical situations, the tweet velocity would never be at a constant rate. Instead, it grows drastically and records a huge response from people over the event. If it is observed at various times, it would be discovered that same/similar tweets were repeated and reposted again 345 Crisis and Disaster Situations on Social Media Streams and again for the same event. This is the foremost challenge for the scalability issue; redundancy avoidance is the core factor for decision making and enhances the level of understanding over the specified event. Next, the content issue deals with tweets that are very brief and canonical (Abel, Gao, Houben, & Tao, 2011); most of the tweets posted in the social media are akin to normal speech and they pose a seminal challenge for the computational methods to deliver the correct form. OBJECTIVES OF TH IS R ESEARCH The objective of this research is to provide informative summarization of social network content concerning happenings such as crisis, disaster, emergency and havoc based situations in the real world. A system is proposed that provides the summarized views of such happenings and corroborates the news by interrelating with one another. Its major task is to monitor the events which are very booming and deemed important from a crowd’s perspective (Samuel & Sharma 2018; Sheth, Thomas, & Mehra, 2010). Important events cannot be adjudged as such. Instead, the suggested events must have the root on the social media such as Twitter after the specified news inception, and it must take the serious impact on the social media through series of takes by the social media users (Lei, Rao, Li, Quan, & Wenyin, 2014). The system proposed and evolved in this research monitors a discussion thread whether it has reached its peak or decline after its root in the social forum like Twitter. Eventually, it would give us insight into the evolutionary trends of the specified events over time. Thus this research aims to address the problems of entity ambiguity and its associated entity types for purposes of disaster management. We categorize the disaster based entity domains using ontology and enhance searching capability by incrementing the explicit connection which mutually exists between entity and ontology class. In order to achieve this task, we identify major issues to deal with and study them thoroughly for efficient processing of the following results: (1) Twitter streams and conventional news media tend to offer almost similar types of news coverage for a specified event but the rate of distribution among topics/categories differs. (2) On specific events such as disaster, crisis or any emergency situations, the volume of information accumulated between the two news media stands divergent and filtering the most potential information poses a challenging task. (3) Relational mapping/co-occurrence of terms suits well conventional news media but due to shortness and sparseness of tweets there remains a bottleneck for researchers. Therefore, we cover the details of the above-stated problems at length and propose algorithms and methods to exercise the cases conservatively. In the following sections, we present semantic filtering of entities from Twitter streams and identifying the potential meaning of tweets using domain ontology. Besides, we highlight a semantic model for disaster situations and detail at length different ontological tools available for effective filtering of semantic content. In that connection, we propose the model by which to analyze the semantic mapping of entities/terms over the concepts and disambiguate the potential named entities. Eventually, we detail the analysis and present the empirical results obtained from our proposed model that lead us to the conclusions. RELATED WORKS In recent years, social media has gained momentum over collecting real-time events, and it has been proven that it has given the appropriate responses over the time of crisis when compared to other sources of decision-making systems. Celik et al. (2011) and Lei et al. (2014) did carry out timely searches over the crisis-related events even when the access to the online events was dropped consistently due to network latency and data traffic on the social media sites. Also, Abel et al. (2011) stated that extracting the relevant content over the crisis-related situations turned mostly ambiguous and many times redundant information. The key challenges that they addressed in the paper are how to overcome the difficulties in avoiding ambiguous results and removing redundancy. Sakaki et al. (2010) empirically discovered that during the course of extracting crisis-related information, compre- 346 SenthilKumar, Dinakaran, & Elçi hending the phrases entered by social media users into the site is directly affecting the search results. Hence they used Wikipedia and other organized knowledge sources to map the canonical terms thus yielding better results through employing term dissimilarity. Grolinger, Capretz, Shypanski and Gill (2011) empirically showed event identification using statistical methods to observe an event’s history and concluded on some proximity-based estimation of preciseness. They proposed a robust system that employs an algorithmic method based on ‘Latent Dirichlet Allocation’, which monitors the events and omits the content that is not relevant. They computed the average Euclidean distance between events, segregated abnormal changes present in the streams so that unknown distribution of data would be neglected, and got the Bag-of-Words. Another approach that they proposed in the algorithm is the log-likelihood rate by which the statistical ratio of data and TF-IDF difference present in the user-generated streams can be normalized based on term weighting and similarity score. In Wirtz, Kron, Löw and Steuer (2014), classification algorithms were used largely for event identification and classification. Many supervised learning algorithms were applied to divide the user-generated streams into the already-defined topic categories and carry out the detection process much easier than before. Approaches such as text classification and named entity recognition were extensively employed for exploring hidden events and disambiguating the selection process (using a vector space model to increase the viability of event detection). Liu, Brewster and Shaw (2013) explored the facts that covered the most common patterns on the disaster-related content and stored the details in a separate database. Weidong, Jidong, Jia, and Danni (2012) and Liu, Shaw, and Brewster, (2013) developed a system that detects online news events and searches related events on social media for the widespread collection of related event information. In contrast to our approach of ontology-based discrimination, all of the references mentioned above are statistical in nature. Lei Li and Tao Li (2013) developed a domain ontology for analyzing multi-document summarization pertaining to disaster or any crisis management. They provided many experimental models for developing an efficient ontological framework specifically on Hurricane Wilma in 2005. With that ontology, they precisely demonstrated the ontology-based multi-document summarization that performed well compared to other existing models. Jerman-Blažič, Matskanis and Bojanc (2017) discussed differences between man-made and natural calamities in detail and delineated the strategic measures, such as preparation, response, and recovery of any crisis or disaster-related situations. Their model was approved by the European Union and funded for the project REDIRNET. Later Selvam, Balakrishnan and Ramakrishnan (2018) constructed an ontology for Social Event Detection (SED) and applied it for Flickr (online photo management and sharing application) website by extracting the metadata features, such as geolocation, photoID, tags, description, title, timestamp, etc. On the other hand, they extensively utilized the Linked Open Data (LOD), such as Last.fm, Eventful, GeoNames, FourSquare, and so on, for productive discrimination of ontological properties. Although ontology/metadata-based, these works considered full documents or photo metadata in contrast to our study of short bursty messages, for example of Twitter. SEMANTIC TECHNOLOGIES IN DISASTER MANAGEMENT The primary objective of Semantic Web technologies is to pave the way for users to easily find relevant information, navigate among diverse data sources and integrate heterogeneous information. For example, usage of semantic technologies will be highly important in getting relevant content and linking data elements to search concepts in the case of Twitter content during any heavy crisis or mass convergence event (Schulz, Ristoski, & Paulheim, 2013; Yates & Paquette, 2011). Nevertheless, the task is complicated because all such processes should be machine-readable and automated; this is where the semantic Web technologies come handy in affecting ontological enrichments (see Table 1). 347 Crisis and Disaster Situations on Social Media Streams Table 1. Semantic technologies for unambiguous data handling Component Name Cached Description RDF, RDFS MicroFormats Data RSS Feeds XSLT Web Services YES YES YES NO NO Vocabulary & Markup supported Coupling Vocabulary & Markup Languages Fetching Atom and Metadata Define data prototyping Interoperate with remote data objects In the rest of this section, we expand on the need to semantically enrich social media content, survey semantic Web technologies to that end, and how to integrate data from various sources towards facilitating crisis management. S EMANTIC E NRICH MENT OF SOCIAL M EDIA C ONTENT As the semantic Web technologies play a seminal role in extracting meaningful information from social media and in the context of any crisis management (Heath & Bizer, 2011; Liu, Shaw, & Brewster, 2013), robust methods are required in dealing with different expressions of text that all in turn point to the central concepts. To get out this discrimination existing in the social media messages, the sole ways of solving this complexity is through making the Web machine-readable. To deal with the above-stated problems, semantic technologies provide an efficient technique called “named entity linking.” Named entity linking is the process of detecting the entities prevailing in social media messages and associating the entities which are closely matched to the specified event (see Figure 1). Figure 1. Semantic enrichment of news contents The named entity linking does two things. Firstly, it crawls the social media messages related to the crisis event detecting the potentially emerging entities like names, places, locations, organizations, category, etc. Table 2 displays ontological relations instrumental in extracting potential named entities of various genres using semantic technologies. 348 SenthilKumar, Dinakaran, & Elçi Table 2. General domain-based ontological relations Event General Relation Identify Relation is‐a, isAbout, defines, occurs, exists, classifies, express, describes, isRelatedTo, sameAs. hasTime, timeInterval, time‐span, timeStamp, during, eventDate, begin, end, since, nextTo. place, region, space, location, hasBoundary, nearTo, direction, overlap, placeName cause, result, factor, agent, actor, action, activities, impact, consequence, result, participant, role, product, instrument. isPartOf, hasSubEvent, hasComponent, hasMember, unifies, includes, involves, transitive, symmetric, negative, opposite Temporal Relation Spatial Relation Causal Relation Exceptional Relation Secondly, for every candidate event found, the named entity links searches for events in proximity to the named entity in that context. Unlike conventional search engines, it never searches based on a keyword. Instead, it makes use of the ontologies to augment the search process rendering the system to automate the process through enabling techniques such as RDF, RDFS, and OWL (Celik et al., 2011; Gruhl, Nagarajan, Pieper, Robson, & Sheth, 2009). Table 3 provides a sample of ontologies relevant in crisis management domain. Table 3. Domain specification of ontologies for crisis management Domain Specification Resources People Organizations Disaster Damage Infrastructure Geography Meteorology Hydrology Ontology Name SW Representation SoKNOS MOAC SIADEX FOAF BIO IntelLEO Organization OL EM-DAT UNEP-DTIE Canadian Disaster Database HXL OTN EPANET GeoNames NEW weather ontology Ordnance Survey Hydrology Ontology OWL-DL RDF Unknown RDF RDF RDF RDF Online Query Online query Application System RDF RDF Development RDF OWL OWL 349 Crisis and Disaster Situations on Social Media Streams Once the named entity linking process is completed, thus having semantically enriched the messages present in the social media, then users can search for the information they want. This is called “faceted search”. In faceted search, information is not crawled based on the keyword supplied but by associating the concepts for the term through proper ontological support (Grolinger et al., 2011; Malizia, Onorati, Díaz, Aedo, & Astorga-Paliza, 2010). For example, if we give the search word “virus”, it will not search the information through using the keyword “virus”; instead, it finds relevant concepts for the keyword and associated terms in the ontology hierarchy and then related media messages based on the ontological concepts are returned. P ERTAINING C H ALLENGES Tweets extracted from Twitter entails a huge amount of challenges to overcome in order to put them to use for a particular purpose. Wirtz et al. (2014) pointed out precisely that there would be no metrics stipulated about how much of information need to be monitored, extracted and evaluated. The following are major problems that present various challenges in yielding useful results: a) Tweets extracted cover different aspects of the event and fail to make the distinction between the choices of the events. b) Many tweets tend to be very noisy and sometimes irrelevant to the event thus causing unnecessary computational problems. c) No measure to counteract rumors germinated during the events and it has been feared that it would spread vehemently over the short course of a period. d) Dealing with misspelled tweets is a big task since there had been no apt use of a dictionary to makeover. E VENT E XTRACTION Event extraction from Twitter is carried out through the Twitter Streaming API, which is the standard application fulfilling the filtration process effectively (El-Halees, & Al-Asmar, 2017; Sakaki et al., 2010). It extracts the tweets for the event by crawling through the hashtags created for the events by various NEWS sources on Twitter and monitors the user-generated posts subsequently (Silva, Wuwongse, & Sharma, 2013). The crawler is started to fetch the tweets that have been monitored and stored in the data store. Each and every tweet consists of the original post, author, timestamp, geographic information and hashtag from which it was obtained. All these properties are very useful in deriving the patterns and identifying the purpose of the posts. The formal definition for entity extraction of Twitter streams is expressed in the graph theory as G(E,V) where E and V represent some set of edges for the given set of vertices. To determine the potential social entities prevailed in the twitter streams and build the appropriate relationships between the entities, we link the set of edges E, in which vi ∈ V, i = 1. . . N denotes the extracted entities in twitter streams and vivj ∈ E denotes relationship between entities vi, vj ∈ V. In this connection, to estimate the candidate entity for the query q, the search engine would normally generate most ambiguous entity sets about the given candidate entity, and it is termed as 𝑎𝑖 = 𝑞 ← 𝑎𝑖 (1) 𝑎𝑤𝑖 = 𝑞 ← 𝑎𝑖 , 𝐾𝐾 (2) However, in our proposed approach, we have introduced a novel method to tackle this ambiguity prevailed over the search results by incurring the semantic web ontology for the domain at which the entity is dealt with through the appropriate level of ontological weight, and it can drastically reduce with the addition of ontology candidate keyword KW, i.e. consequent of 350 SenthilKumar, Dinakaran, & Elçi which reduced the entity ambiguity with which |awi| ≤ |ai|, |ai| ∈ ai is a cardinality of ai and |awi| ∈ awi is a cardinality of ai, KW. By utilizing a well-formed query for the candidate entity in the query, the named entity information would come as 𝑎"𝑖" = 𝑞 ← "𝑎𝑖 " (3) 𝑎𝑤"𝑖" = 𝑞 ← "ai ", 𝐾𝐾 (4) 𝑎𝑖 𝑎𝑗 = 𝑞 ← 𝑎𝑖 , 𝑎𝑗 (5) 𝑎𝑤𝑖 𝑎𝑤𝑗 = 𝑞 ← 𝑞𝑎𝑖 , 𝐾𝐾 (6) In some cases, |𝑎"𝑖" | ≤ �𝑎𝑗 � 𝑎𝑎𝑎 |𝑎"𝑖" | ∈ 𝑎"𝑖" is a cardinality of “ai”. As well as with is about one of the information concentrations of a named entity. Then after pruning the entity cardinality, in the next process, the relationship between two named entities is based on the concept of co-occurrence. Thus, which is a process to augment the semantic similarity between the two named entities and build the relationships between them, with which �𝑎𝑖 ∩ 𝑎𝑗 � ≤ | 𝑎𝑖 | 𝑎𝑎𝑎 � 𝑎𝑖 ∩ 𝑎𝑗 � ≤ |𝑎𝑗 | and �𝑎𝑖 ∩ 𝑎𝑗 � ∈ 𝑎𝑖 𝑎𝑗 is a cardinality of 𝑎𝑖 𝑎𝑗 . Besides, with the supplement of a keyword towards the co-occurrence will usually subside the number of entities given, and that is But it should satisfy that �𝑎𝑤𝑖 ∩ 𝑎𝑤𝑗 � ≤ �𝑎𝑖 ∩ 𝑎𝑗 �, �𝑎𝑤𝑖 ∩ 𝑎𝑤𝑗 � ∈ 𝑎𝑤𝑖 𝑎𝑤𝑗 is a cardinality of 𝑎𝑖 𝑎𝑗 , KW. Similarly, effective utilization of the well-defined entity set for the query will yield the appropriate relationships between the two named entities. ONTOLOGICAL INCLUSION FOR DISASTER MANAGEMENT Semantic technologies that support crisis events identification are often required for interacting between different developers and software applications operated by various agencies. In this context, affecting the semantically enabled system to communicate the information in a unified format is the critical challenge, and social media platforms behave differently to address the dimension of the problem to interrelate with one another. Interoperability shortcomings at the semantic level of concepts can be alleviated using common vocabularies as well as shared concepts in linking the whole processes (Liu, Brewster, & Shaw, 2013; Rajpathak & De, 2016). The best way of accomplishing this critical task is through the use of machine-understandable ontologies that can precisely define the concepts, categorize the events based on clustering approach and build an appropriate path between concepts and unified communication. The next aspect of the challenge in retrieving the crisis-related information is in extracting related events or messages from blogs, forums, and referring wikis. These are the places where vibrant information is present at a high rate and shared opinions and suggestion made by several authors. Now the critical challenge is interlinking not only social media platforms but also these blogs and forums. It is very difficult in repurposing the content and tough to identify the common events among these sites. For instance, take Wikipedia – it is a huge repository of publicly accessible knowledge source but reusing the same knowledge for other applications presents new challenges and difficulties (Kumar & Muruganantham, 2016; Yates & Paquette, 2011). Furthermore, a user can create accounts on many sites like in blogs, forums, wikis, and other social media platforms, but it is very complicated to inter-relate the candidate entities among these different social sites. The major problem pertaining to these media sources is that the information items in such sites are entirely disconnected and completely separated from one another. There is an absolute lack of exchanging the semantics of entities and unable to derive the facts from such information silos. Every site holds the information posted by their registered users independently and, at times, it has turned into stagnant information silos which are untapped by others. 351 Crisis and Disaster Situations on Social Media Streams To meet the challenges rising from crisis or havoc situations, there would be a huge demand to decentralize the process and enable interactive processes to fetch hidden relevant facts from the content. Table 4 gives the details of handling the events from the time news originates to planning for future action taking against the crisis events (i.e., past to present). Table 4. Information handling for crisis situations Goal Main Activity Content Information Handling Software Tools News Inception Inform & Publish the news Gather relevant eventbased information Initially, discrete data Confidential Present State Share & Collect the news Track & Monitor the events Clustered Data Privileged Access In-house Software Commercial Software Future Plan Engage & Prune Prepare the action Find Relationships Absolute transparency Open Source Software Semantic Web recently provided the necessary tools for effective information linking and interoperability. Moreover, many semantic Web vocabularies have successfully been deployed at various social platforms facilitating machine-understandable message processing (Benali, & Rahal, 2017; Ritter et al., 2012). Some of the semantic Web vocabularies are RSS, FOAF (Friend of a Friend), and SIOC (Semantically Interlinked Online Communities). With the help of these and other more refined semantic vocabularies, interlinking communities and social sites became effective and helped curtail down information redundancy. For example, let’s consider the query for crisis-related content on Twitter such as “Was there a storm near the city?” In this query, the name of the city is not mentioned, but the tagging engine like DBPedia Spotlight (http://wiki.dbpedia.org/projects/dbpedia-spotlight) and OpenCalais (http://www.opencalais.com/) would annotate the given query and fix the appropriate entity for the annotated tokens based on the Agglomerative Clustering techniques applied on the collected tweets. Since there is a semantic link (i.e., rdf:type) between the query and DBPedia, it can be computed based on the similarity score and higher relevance of content. It is illustrated in the following: ONTOLOGY L ANGUAGES The so-called ‘semantic Web stack’ comprising a number of semantic Web languages was suggested for effective utilization in information processing and that, by the way, leads to efficient semantic implementations on the retrieval systems. The first language that was suggested to use was the Resource Description Framework (RDF), by which the basic framework to represent the potential information on the Web content was availed. The basic structure of any RDF statements is just a triple (subject, predicate, and object), which further yields the hierarchical RDF graph to prune the data in a much effective way. In simpler terms, RDF statement is just denoting the relationship existing between the existing things called nodes and that node is interconnected with other nodes (i.e., semantically related nodes). RDF serialization uses XML syntax and terms such as element name, attributes, values, etc. The scope of using RDF is to make the system machine-readable and process the infor- 352 SenthilKumar, Dinakaran, & Elçi mation semantically (see Table 5). Besides, it integrates the data without any serious glitches as it follows the well-formed logic which is universally acknowledged. Table 5. Semantic Web languages and their structural contents SW Languages OWL-S SWSL WSML OWL RDF Schema RDF WSDL Structural Contents Services in a machine-understandable format Equality Relation between Classes Cardinality Restrictions for Properties Relation of Properties Classes Objects Properties Prepositions as Triplets Services in an exact human understandable format A set of RDF statements form an RDF graph through interconnected nodes. As the RDF graph is conventionally followed in expressing the logical facts about the potential named entities, ontologies are used to give the domain and category of each thing (i.e., entity) and yield the appropriate relationship between them. Ontologies contain features to wholesomely express the rich relationship between entities and also set appropriate constraints on them (Grolinger et al., 2011; Wang et al., 2018). The language followed for effective creation of ontology is RDF Schema and OWL (see Tables 6 and 7). The RDF Schema (RDFS for short) employs the set of classes related to the entity and chooses the properties according to its domain. The basic objective of RDFS is to provide a well-structured description of entities properties. Some ontologies used to set classes and properties are called RDF vocabularies, and examples include FOAF, SKOS, MOAC, Dublin Core, etc., whereas OWL is facilitating the information interoperability and providing additional vocabularies to enhance the formal semantics of RDF Schema. Table 6. RDF and OWL Ontological Constructs RDF/OWL Category Class Definition Axiom Relation Definition Axiom Functions Class Enumerated Class Restriction IntersectionOf UnionOf, ComplementOf subClassOf Equality disjointWith Property Domain, range subPropertyOf (Inverse)Functional 353 Crisis and Disaster Situations on Social Media Streams RDF/OWL Category Instance Definition Axiom Functions Equality, inverseOf Transitive, symmetric Type (In)Equality ONTOLOGY E DITORS In order to derive the meaning out of the collected information from Twitter streams and as we process the tweets into the respective semantic representations, there is a need to design an ontology that is well mapped to the information and facilitates fetching the content in the hierarchical format (Liu, Shaw, & Brewster, 2013; Malizia et al., 2010). In this connection, we are required to build ontology using standard text editors that are very simple in design and development. Besides, it can be formatted with the semantic Web languages called RDF-XML format. One such editor is called Protégé (https://protege.stanford.edu/) which is a tool that permits designing OWL ontologies and further helps to connect to the data interrelated with one another in the overall ontological framework. With Protégé, querying and reasoning help to disambiguate the information and assist in making the filtration absolutely error-free. Other popular ontology editors are SWOOP, OntoStudio, NeOn, Altova, WebODE, and so on. Among all the editors, Protégé was the first, freely available, and open-source ontology editor and framework for building intelligent systems thus tops the list and widely deployed in much recent research. Table 7. A notional mapping between RDF/OWL and relational concepts RDF/OWL Terms rdf:class rdf:property rdfs:domain rdfs:range rdf:type Relational Concepts Table Column Table that the rdf:property is a column of Data Type of the column Values of the Primary Key column S EMANTIC M ATCH ING AND T RANSLATION As ontologies play a seminal role in semantic processing of information (Celik et al., 2011; Sheth et al., 2010), we should, therefore, try harnessing the potential meaning hidden in the collected information streams. Ontologies help process the meaning of different terms represented in the information and avail the system to understand the very basic structure of information in a precise way. The system processes the collected data automatically and does matching and translation with its rich vocabulary sets, which is fed as the dataset to the ontology while designing the complete framework of the domain ontology (Gruhl et al., 2009; Madani, Boussaid, & Zegour, 2015). Each term (i.e, a concept) in a tweet is mapped with corresponding vocabulary sets in the specific ontology domain, and sometimes it arches to other domains of vocabulary set to find the exact meaning of the concepts represented in the tweet. Precisely, the inclusion of mapping the terms over multiple ontologies is the biggest challenge in designing an application that is used to integrate the ontologies to disambiguate in parallel; that is a challenging research area to deal with. Further, to make the automation of semantic matching and translation effective, appropriate use of mapping rules over the information is necessary and should be defined using ontology matching tools. Ontology matching is variously called also as ontology aligning, mapping, and translation (for example, for Web services discovery: Fellah, Malki, & Elçi, 2016). Conventionally, ontology mapping tools come in two categories: element-based approach such as name similarity, entity similarity, concept similarity, etc., and structurebased approach such as sub/super -categories, -domains, -levels, etc. Besides, mapping requires infus354 SenthilKumar, Dinakaran, & Elçi ing external knowledge such as Thesauri, WordNet, etc., to yield precision and high recall. Some of the ontology mapping tools available on the market are RiMOM, ASMOV, and AgreementMaker. S EMANTIC S EARCH The next level in semantic utilization of social media harvested information is through semantic search, which should return results without any ambiguity and sparseness. As semantic mapping directly links information repositories, domain ontologies should facilitate the operation of effective semantic search (Kumar & Muruganantham, 2016) retrieving facts that are interconnected with one another in the ontological framework. In order to render the search process easier, appropriate use of indexing methods in the ontological inclusion over the concepts is deemed important. Ontologies follow the semantic indexing approach using its standard principle of “indexing the RDF triplets”, thus smoothen the way of semantic search over the collected information. Several pieces of research have been carried out in this regard to fetch the precise and unambiguous results by semantically integrating information from diverse ontological frameworks in retrieving the results from multiple repositories. I NTEGRATION OF DATA Another critical issue faced in utilizing social media content is the integration of data, which may be considered from two different perspectives: data source (database/stream) reconciliation, and information integration. The basic objective of overcoming the problems of semantic heterogeneity between these two categories using the appropriate ontology framework is a challenging task to any research (Liu, Shaw, & Brewster, 2013). In considering database integration, the role of ontologies lies on the upper layer of the schema (i.e, semantic matching of information and table schema should be shared with the domain ontology and additionally, make use of the ontology to integrate the database schema rightly towards its order). While integrating information the core problem lies at integrating the terms from various sources and mapping the potential candidate terms relevant to its vocabulary sets and bring into the consolidated view called the new collection of a derived set. The challenge that lies here is to not change the original sense of the terms while mapping to the appropriate sense of terms in the vocabulary sets. Besides, in database integration, ontologies convert tables into respective classes in RDF triplet and columns in the table into data relation in the RDF Schema. The data models followed in the semantic conversion would always be 1:1 mapping cardinalities. In recent works (Kumar & Muruganantham, 2016; Kwak, Lee, Park, & Moon, 2010), the authors propose rules to dynamically map the data models into ontologies and consider mapping instances to class levels. Also, some language constructs are given to fetch data objects and, using the queries, they can be annotated dynamically representing in RDF. Whereas in information integration, accumulating the terms from various sources of datasets to bring them into a unified collection, several efforts were carried out in the recent past but failed to resolve it. Some early researchers have tried to apply the Description Logic (DL) as ontology language and observed few changes in the outcome. Later, the Prolog programming language is employed for expressing the information formally and integrating the terms using appropriate domain ontology. In the next section, we introduce our proposed information-centric model for management of crisis and disaster based situations through integration of many of the technologies mentioned above combined by our innovative approach. Our proposed model is introduced, followed by the empirical tests and discussion of findings. THE PROPOSED MODEL The objective of this research is to potentially harness the information gathered from various social media platforms and render it relevantly interconnected with the selected news articles. In doing so, 355 Crisis and Disaster Situations on Social Media Streams here we introduce some of the notation and problems that we define formally before presenting the Semantic Search for Events Algorithm. Problem 1 (News stream): For every news article related to disaster or crisis situation, content must be analyzed and scrutinized for a further level of comprehension. Let N={n0, n1,…, ni} be the news posted on various sites and gathered from various news agencies. For every news article posted say ni, we find the actually published time 𝑡 (𝑎𝑖 ). Since the origin of news story gives the real arrival of news, it brings in the proximity among related news articles. Problem 2 (Tweet stream): Upon the arrival of every news article related to disaster or crisis, the next task of the system is to identify the equivalent social media content such as Twitter where the relevant news item is discussed and promulgated. Let 𝑆 = �𝑠0 , 𝑠1 , … , 𝑠𝑗 � be the Twitter Streams for the taken news articles and load the inter-related Twitter messages posted by various potential social users. For every tweet sj, we find the actually posted time t (sj) and responses for the message. Problem 3 (News recommendation problem): Once the news items N and its associated social media contents (Twitter Streams) S are mapped, then the real task is to find the top-k most relevant news for the topic. Let’s take the set of users interacted on the particular news topic U = {u0, u1, u2, …, un} in the social media platforms and explicitly categorize the social messages and news streams of general interest (i.e., for any social user u ∈ U at any point of time T, we recursively adopt the functional ranking which links the users interest among its neighbors). Problem 4 (Social influence): In order to find whether the news item ni ∈ N has influenced the social media users U= {u0, u1, u2, …, un} effectively, we give the social influence model S= |U| x |U| matrix where S (i, j) calculates the cumulative interest of the selected users ui to the usergenerated content by uj. This process states that each user in the context would pose an absolute interest to the user-generated content posted by the other user. Problem 5 (Tweets-to-news model): To merge the process, let N be the order of news collected and S be the streams of social media messages, we model the relationships between user-generated content and news items as M= |S|.|N| matrix Z where S(i,j) is the closest proximity of usergenerated content si to news item nj. Algorithm (Semantic Search for Events) Input: Seed words for each crisis event Output: Generation of semantic classes BaseTerms  set of seed words given; for i: 1 to N (Number of Iterations) do BaseTerms  ExpansionOf (Seed words, Corpus); BaseTerms  Cluster (BaseTerms, Seed words, Corpus); end return BaseTerms The algorithm carries two significant operations: (1) expand the seed words with the assistance of ontologies; and (2) cluster the events based on the similarities existing in the classification. For the given seed words for the event, it crawls for new terms that possess the similar distributional features to the seed word and assigned to the set of seed words (also called as candidate words). In fact, extracting the new terms for the seed word can be done based on the contextual features and top score similarity measure. For the clustering, the selection procedure would process the learned terms and 356 SenthilKumar, Dinakaran, & Elçi seed terms based on the distributed similarity and set the minimum threshold value for estimating the exact precision of the terms. The sole plan of this algorithm is to get the patterns which are interlinked with a semantic relation and bring in the semantic class for the search terms (Kumar & Muruganantham, 2016; Wang et al., 2018). As listed in the algorithm, it has three core operations for finding the patterns existing in the disaster based situations: 1. using the semantic class expansion algorithm, extract the candidate terms for the disaster event tags; 2. find the patterns for the candidate terms selection and fix the semantic category of the events; and 3. choose the cluster events which hold similar action terms and evaluate the patterns for further classification. It has been noted at several instances (Abel et al., 2011; Sheth et al., 2010; Wirtz et al., 2014) that news items and user-generated content at social media platforms (say, Twitter Streams) co-exist with one another with same news topic (see Figure 2). Sometimes, a published news story is pushed into social media platforms for further discussion and circulation. And, at many times, a news item first discussed vehemently on social networks then becomes a topic in news stories (Lei et al., 2014; Li, Liu, Li, Qin, 2016). In these two cases, the predominant factor is holding the current trending entities, which give the unflinching bond between social networks and news sites. There is an absolute relationship between user-generated content and news stories, which create an intermediary layer that paves the way to generalize the analysis. Hence, this would make our work equally applicable in deriving the ultimate decision for disaster management and assess the core patterns for decision making. During the analysis of the relationship between events and results, there would emerge a need to attain a similarity score for the ultimate decision process. The similarity score for the crisis/disaster management (Liu, Shaw, & Brewster, 2013; Malizia et al., 2010; Schulz et al., 2013) can be accepted and formulated based on the following assumptions: 1. If there would be a high or low hazard during the disaster situations, then it requires the scientific or technical measure to be assured, and precautionary steps should be taken based on scientific or technical grounds. 2. If there would be high or low outrage, then it is an emotive issue and should be tackled through proper negotiations or political balance. In these two cases, the analysis of the events played a crucial role in disseminating the user-generated content posted on social networks and determining the effective decision-making process (Heath & Bizer, 2011). DISASTER ONTOLOGY To substantiate our proposed model, we constructed an ontology for disaster datasets with a glossary consisting of more than 150 definitions (i.e. mostly of recurring terms) and further accumulated terms related to disaster from books, papers, survey on seismic risk and other relevant disaster web sites. We constructed the ontology with associated concepts, its attributes and proper relationships between concepts. In constructing this ontology, we followed many dictionary terms (i.e. entities related to disaster) with their associated meanings (axioms) and connected the terms with taxonomic relationships. Relationship mapping of terms can be done in many ways such as Taxonomic (IS-A relationship), Meronomic (PART-OF relationship) and Telic (PURPOSE-OF relationship). The relationship mapping of terms can be achieved through inference rules to augment better reasoning and increase the credibility ratio of knowledge representations. On this proposed system, we used Protégé, an ontology tool which is more of an object-oriented paradigm and well suited for term inher- 357 Crisis and Disaster Situations on Social Media Streams itances. The relationship IS-A is a generalization/specialization between the candidate entities: superclass entities publically generalize the subclass entities and the sub-class entities particularly employ specialization of superclass entities. Likewise, Protégé permits to formulate the disaster ontology by considering different instances to insert and able to accommodate a huge set of information for a digital archive. Figure 2. Disaster Ontology using Protégé Our proposed system concerns mostly about urban risk with specific governance on seismic risk management. The effective building of this ontology paves the way for common knowledge, makes the concepts understandable, and prompts information into unambiguous semantics. This ontology construction has been performed in three steps: 1. Fetch the core concepts of the domain (Seismic Risk) and relevant terms in the glossary. 2. Extract the Super-Classes and Sub-Classes of the concepts using the IS-A relationship. 3. Find other related types of relationships using inference rules (properties, slots, and roles associated with each concept). Relation mapping for the collected tweets can be performed and filtered using the relational properties displayed in Table 8). Entity resolution and disambiguation have been effectively dealt with in Disaster Ontology constructed above and resolve the term ambiguity persisting over the collected documents (Twitter streams). 358 SenthilKumar, Dinakaran, & Elçi Table 8. Relationship mapping between concepts and classes Relation Name isResponsibleFor Source Target Department Process workIn Actor Department isPartOf Task Process isA Perform Actor Task Produce Task Information Description Identify which sector is responsible for the event and map the relationship between department and process Map the relationship between the person and the department. Identify the actor responsible for the event. Find the task which is responsible for the process and filter out the concepts related to the event. Relationship between super-class and sub-class Group the actor performed the task on the event. Filter the information for the task on the event. E NTITY R ELATIONSH IP AND R ANKING SCORE The disaster ontology has now become a knowledge source for our disambiguation effort. When we process each and every tweet, we find the exact match of those entities against the knowledge source such as DBpedia or YAGO. If it is not present, then it sends the NIL result. Now, by means of our proposed method, we can again cross-match with our own ontology created from news articles and find the exact match of those entities. In this method, the accuracy is relatively high because the created ontology is extracted from news articles related to the tweets and context of the news articles is highly relevant and appropriated match with the tweets. If we go for the entity–mention match with DBpedia, it lists out candidate mentions for the entity, and we need to probe for the context pertaining to the tweet. But if we match the same with our own ontology, it is exact and gives an appropriate match. Hence, in our approach, we take the link probability (Kumar & Muruganantham, 2016; Yates & Paquette, 2011) for the entity with DBpedia mention, and it can be defined as follows: 𝐹(𝑒,𝑚) = 𝐶𝐶𝐶𝐶𝐶(𝑚,𝑒) 𝐶𝐶𝐶𝐶𝐶(𝑚) (7) Here, we utilized an outlined ontology to arrange the mentions for the given named entities and appropriately estimate the similarity distance between them. Now the task is to estimate the distance between the entity and the suggested set of mentions from DBpedia. In this connection, we have taken the Cosine Similarity measure to access the similarity difference existing between the entity and candidate mentions as follows: 𝐶𝐶𝑠𝑆𝐶𝐶(𝑒, 𝐶) = 𝑃𝑃𝐶𝑑𝐶𝑢𝐶(𝑒,𝑚) �|𝑒|�∗||𝑚|| (8) By this method, we categorically filter the exact match of mention for the given entity and appropriately reference with DBpedia URI as stated in (Liu, Brewster, & Shaw, 2013; Malizia et al., 2010; Schulz et al., 2013). We utilized the DBpedia Spotlight to get the URI match of each entity and return the JSON results for our implementation. def filter(entity): return JSON (DBpediaSpotlight.annotate(entity)); 359 Crisis and Disaster Situations on Social Media Streams The result of the proposed approach would create a binary mapping of the entity and mentions, as seen in Table 9. Table 9. Identifying the relation between named entity and candidate mention Mention NE Class NE Link DBpedia Ontolog y Class Score Barack Obama Chennai Cricket Person Location Sports Dbpedia: Obama, USA Dbpedia: Chennai, India Dbpedia: Cricket Dbpedia-owl: Person Dbpedia-owl: Place Dbpedia-owl: Sports 3 1 2 Generally, entities in DBpedia have its name, label, type, etc. and, to fetch the entity name given in the DBpedia for the specified URI, it can be queried through the SPARQL query as follows. For example, searching for ‘Sachin Tendulkar’: Select distinct * where { ?URI rdf:label ?name ?URI dbpprop:iupacname ?name filter(str(?name) = “Sachin Tendulkar”) } In order to get the category of a given entity from the DBpedia, we issue the following SPARQL query. For example, for ‘Vehicle’: Select * where{ <http://dbpedia.org/resource/Vehicle> <http://purl.org/dc/terms/subject> ?categories. } E MPIRICAL T EST AND ANALYSIS We used Twitter4J API to gather disaster-related tweets from Twitter and utilized TextRazor API to effectively recognize the potential named entities present over the tweets and link them accordingly to its respective DBpedia URI. Additionally, we used the rich natural language processing tools of Stanford Core NLP Library to segregate tweet patterns and performed sentiment analysis for grasping the sense of the tweets. Tweets were collected on the month of August 2017 and, to witness the trust, we followed the leading news agencies on Twitter such as BBC World, CNN, New York Times, NDTV, and Breaking News. Tweets were crawled and stored only if they had at least one named entity that has its link on DBpedia URI. In our datasets, we were able to filter out 20 different topics and classified the tweets successively based on seismic risk by applying the classification rules. The algorithm proposed above is able to detect the factual information containing about 3 out of 5 tweets. 360 SenthilKumar, Dinakaran, & Elçi Table 10. Event relevance and categories Event Category Earthquake Tsunami Cyberattack Unrest in a Country Celebrity Death Terror Attack Total Events Potential Sub-Events by Relevance R3 R3+R2 R3-R1 75 120 114 150 115 120 35(46%) 46(38%) 51(44%) 77(51%) 43(37%) 68(56%) 51(68%) 79(65%) 87(76%) 90(60%) 66(57%) 79(65%) 59(78%) 88(73%) 95(83%) 97(64%) 81(70%) 85(70%) We tested the DBpedia corpus to identify potential events on seismic risk, which provided the six complex event categories listed in Table 10. The entities were extracted based on the recommendations stated above and identified their relationship types in corresponding DBpedia URI. Besides, we again queried the DBpedia Knowledge Source for the sub-events correlated with the events extracted from the tweets. We substantially ranked the sub-events on the basis of frequency of occurrence and chose the best-matched event category to a tweet. After evaluating the event categories against DBpedia, we determined whether the event is of positive instance or not. Sometimes, the retrieved events would pose a challenging task such as if it is partially relevant but not exactly appropriate to the categorized concepts. During these anomalies, we assigned the following three relevance scores in order to fit the events into their appropriate decks: • • • • Relevance (R1): Events with fuzzy relationship to the concept/category. Relevance (R2): Events with positive occurrences of sub-events or subject-object mapping. Relevance (R3): Events are positive instances and fit into the category for the posted query. Otherwise, the relevance zero indicates the events with absolutely NIL relationship. Table 10 displays detected event categories and potential sub-events or co-occurrence of events with relevance scores. As was witnessed, the precision values varied considerably among the categories. The Stanford NLP Library was deemed fit to extract the potentially relevant tweets, and type filtering of events was absolutely effective at identifying the appropriately named entities. We obtained an accuracy of 74.13% and computed the Precision (0.641), Recall (0.716) and F-Measure (0.691) respectively for the given datasets. DISCUSSION The dynamic change in the amount of information gathered at the various medium of platforms indicates the need for a rapid decision-making process in crisis events. It was observed that the information gotten from these sources rapidly varied. Statistics (Wirtz et al., 2014) showed that the frequency of report variation grows ten times greater than the previous day. Besides, to better account for the report variation of the information accumulation, the report dimensions were categorized into three crucial breakpoints, i.e., D + 1, D + 5, D + 10. This elapsed gap fetches the detailed overview of the crisis or disaster based events and showed us the real potential of the event happenings (see Figure 3). 361 Crisis and Disaster Situations on Social Media Streams TWEET COUNTS Event 1 Event 4 Event 2 Event 3 350100 300100 250100 200100 150100 100100 50100 100 D A Y 1 D A Y 2 D A Y 3 D A Y 4 D A Y 5 D A Y 6 D A Y 7 D A Y 8 D A Y 9D A Y 1 0 DAY WISE TWEETS Figure 3. Daily frequency of information on social media platforms Through the data obtained from various sources and on different days of report gathering, we can formulate deviance of patterns and get through the details of anomalies that exist in the report. By applying the pruning algorithm, we can sort the crisis events for the decision-making process and get to the core base of the events. In this research, the real task is to find the actual reason for the crisis event and get the substantiated evidence for its occurring. To augment this process, we classified the events into many chronological orders influenced by the usage of ontological background with semantic technologies. By mapping different day event reports, we scrutinize the process for discrimination (i.e., fetch the positive or negative or neutral feedback from the potential users on the social media) and allow filtering the facts based on cross-checking in tabulating the actual events of the situation. Our approach achieved the accuracy rate of 74.13% where other existing models succeeded getting 68.42% using Support Vector Machine (SVM), 67.93% using Maximum Entropy Model (MEM), and 64.71% using Conditional Random Fields (CRF) based on the analysis successfully performed with the help of Table 9. Since our proposed model extensively uses the dedicated ontology of Crisis and Disaster, instead of employing the Bag-of-Words (BoW) method, we employed Bag-of-Concepts (BoC) and Relevance of Concepts, as well as calculating the semantic similarity score between ambiguous terms. Deep proliferation of the ontological network paved the way to yield the subcategories of a topic and skimmed the words that are completely unambiguous. The relevance R of the concepts were derived with other three relevances R1, R2, and R3 as shown in Table 10, whereas the other existing methods mostly used only a single relevance score and restricted the research scope to Bag-of-Words model. The major contribution of this research is in collecting crisis-related temporal data from multiple bursty short-message sources and decision making through semantic mapping of entities over concepts disambiguating potential named entities. The problems persisting over entity ambiguity and its associated entity types were addressed as well. We categorized the disaster-based entity domains using ontology and enhanced the searching capability of the system by incrementing the explicit connection mutually existing between entity and an ontology class. CONCLUSIONS In this paper, we proposed a novel solution to harvest and compare the content of Twitter streams and conventional news sources such as CNN, New York Times, BBC World, NDTV, and Breaking News in the cases of havoc situations. We developed a semantic filter that can map the concepts correlated between Twitter streams and traditional news sources, and can disambiguate the candidate entities based on the ontological framework particularly loaded with disaster/crisis events. 362 SenthilKumar, Dinakaran, & Elçi The major advantage of our work is that, instead of pruning a single news source, it paves the way for clustering the information from diverse sources and harnessing the potential information to derive the hidden facts in it. We also developed a disaster ontology for this research and used it to segregate the entities which pose ambiguity over other candidate sets. Empirical results show that the approach based on our model outperforms other models available in the literature to solve this research gap by various other approaches. In the future, we shall strive to extend the model in order to help summarize and visualize the potential information ranked high by the model. REFERENCES Abel, F., Gao, Q., Houben, G.-J., Tao, K. (2011). Analyzing user modeling on Twitter for personalized news recommendations. In J. A. Konstan, R. Conejo, J. L. Marzo, & N. Oliver (Eds.), User modeling, adaption and personalization. UMAP 2011. Lecture notes in Computer Science (Vol. 6787). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-22362-4_1 Benali, K., & Rahal, S. A. (2017). OntoDTA: Ontology-guided decision tree assistance. Journal of Information & Knowledge Management, 16(3), 1750031. https://doi.org/10.1142/S0219649217500319 Celik, I., Abel F., & Houben, G. J. (2011). Learning semantic relationships between entities in Twitter. In S. Auer, O. Díaz, & G. A. Papadopoulos (Eds.), Web engineering. ICWE 2011. Lecture notes in Computer Science (Vol. 6757). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-22233-7_12 El-Halees, A., & Al-Asmar, A. (2017). Ontology based Arabic opinion mining. Journal of Information & Knowledge Management, 16(3), 1750028. https://doi.org/10.1142/S0219649217500289 Fellah, A., Malki, M., & Elçi, A. (2016). Web services matchmaking based on a partial ontology alignment. International Journal of Information Technology and Computer Science, 8(6), 9-20. https://doi.org/10.5815/ijitcs.2016.06.02 Grolinger, K., Capretz, M. A., Shypanski, A., & Gill, G. S. (2011, May). Federated critical infrastructure simulators: Towards ontologies for support of collaboration. Proceedings 24th Canadian Conference on Electrical and Computer Engineering (CCECE), Niagara Falls, ON, Canada, 1503-1506. https://doi.org/10.1109/CCECE.2011.6030715 Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., & Sheth, A. (2009). Context and domain knowledge enhanced entity spotting in informal text. In A. Bernstein et al. (Eds.), The semantic web. ISWC 2009. Lecture notes in Computer Science (Vol. 5823, pp. 260-276). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-64204930-9_17 Ha-Thuc, V., Mejova, Y., Harris, C., & Srinivasan, P. (2010, September). News event modeling and tracking in the social web with ontological guidance. Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing, 414-419. https://doi.org/10.1109/ICSC.2010.75 Heath, T., & Bizer, C. (2011). Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: Theory and technology (Vol. 1, pp. 1-136). San Rafael: Morgan & Claypool. https://doi.org/10.2200/S00334ED1V01Y201102WBE001 Jerman-Blažič, B., Matskanis, N., & Bojanc, R. (2017). Semantic ontology design for a multi-cooperative first responder interoperable platform. Computing and Informatics, 35(6), 1249-1276. Kumar, N. S., & Muruganantham, D. (2016). Disambiguating the Twitter stream entities and enhancing the search operation using DBpedia ontology: Named entity disambiguation for Twitter streams. International Journal of Information Technology and Web Engineering, 11(2), 51-62. https://doi.org/10.4018/IJITWE.2016040104 Kwak, H., Lee, C., Park, H. and Moon, S. (2010). What is Twitter, a social network or a news media? Proceedings of the 19th International World Wide Web Conference (pp. 591-600). New York: ACM. https://doi.org/10.1145/1772690.1772751 Lei, J., Rao, Y., Li, Q., Quan, X., & Wenyin, L. (2014). Towards building a social emotion detection system for online news. Future Generation Computer Systems, 37, 438-448. https://doi.org/10.1016/j.future.2013.09.024 363 Crisis and Disaster Situations on Social Media Streams Li, L., & Li, T. (2013). An empirical study of ontology-based multi-document summarization in disaster management. IEEE transactions on systems, man, and cybernetics: systems, 44(2), 162-171. Li, Z., Liu, Y., Li, Q., & Qin, B. (2016). Relationships between knowledge bases and related results. Knowledge and Information Systems, 49(1), 171-195. https://doi.org/10.1007/s10115-015-0902-z Lima, R., Espinasse, B., & Freitas, F. (2017). OntoILPER: An ontology-and inductive logic programming-based system to extract entities and relations from text. Knowledge and Information Systems, 56(1), 223-255. https://doi.org/10.1007/s10115-017-1108-3 Liu, S., Brewster, C., & Shaw, D. (2013). A semantic framework for enhancing information interoperability in emergency and disaster management. International Conference on Social Media and Semantic Technologies in Emergency Response, 1-20. Liu, S., Shaw, D., & Brewster, C. (2013). Ontologies for crisis management: a review of state of the art in ontology design and usability. Proceedings of the Information Systems for Crisis Response and Management Conference, Baden-Baden, Germany, 349–359. Luo, Z., Osborne, M., & Wang, T. (2012, May). Opinion retrieval in twitter. In Sixth International AAAI Conference on Weblogs and Social Media. Madani, A., Boussaid, O., & Zegour, D. E. (2015). New information in trending topics of tweets by labelled clusters. Journal of Information & Knowledge Management, 14(3), 1550019. https://doi.org/10.1142/S0219649215500197 Malizia, A., Onorati, T., Díaz, P., Aedo, I., & Astorga-Paliza, F. (2010). SEMA4A: An ontology for emergency notification systems accessibility. Expert Systems with Applications, 37(4), 3380-3391. https://doi.org/10.1016/j.eswa.2009.10.010 Moran, S., & Lavrenko, V. (2014). Sparse kernel learning for image annotation. Proceedings of the International Conference on Multimedia Retrieval, Glasgow, UK, 113-120. https://doi.org/10.1145/2578726.2578734 Otegi, A., Arregi, X., Ansa, O., & Agirre, E. (2015). Using knowledge-based relatedness for information retrieval. Knowledge and Information Systems, 44(3), 689-718. https://doi.org/10.1007/s10115-014-0785-4 Rajpathak, D., & De, S. (2016). A data- and ontology-driven text mining-based construction of reliability model to analyze and predict component failures. Knowledge and Information Systems, 46(1), 87-113. https://doi.org/10.1007/s10115-014-0806-3 Raman, M., Kuppusamy, M. V., Dorasamy, M., & Nair, S. (2014). Knowledge management systems and disaster management in Malaysia: An action research approach. Journal of Information & Knowledge Management, 13(1), 1450003. https://doi.org/10.1142/S0219649214500038 Rehage, G., Joppen, R., & Gausemeier, J. (2016). Perspective on the design of a knowledge-based system embedding linked data for process planning. Procedia Technology, 26, 267-276. https://doi.org/10.1016/j.protcy.2016.08.036 Ritter, A., Etzioni, O., & Clark, S. (2012). Open domain event extraction from Twitter. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1104-1112. https://doi.org/10.1145/2339530.2339704 Sakaki, T., Okazaki, M., & Matsuo, Y. (2010, April). Earthquake shakes Twitter users: Real-time event detection by social sensors. Proceedings of the 19th International World Wide Web Conference, 851-860. https://doi.org/10.1145/1772690.1772777 Samuel, A., & Sharma, D. K. (2018). A novel framework for sentiment and emoticon-based clustering and indexing of tweets. Journal of Information & Knowledge Management, 17(2), 1850013. https://doi.org/10.1142/S0219649218500132 Schulz, A., Ristoski, P., & Paulheim, H. (2013). I see a car crash: Real-time detection of small scale incidents in microblogs. In P. Cimiano, M. Fernández, V. Lopez, S. Schlobach, & J. Völker (Eds.). The Semantic Web: ESWC 2013 Satellite Events. Lecture Notes in Computer Science (Vol. 7955, pp. 22-33). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-41242-4_3 364 SenthilKumar, Dinakaran, & Elçi Selvam, S., Balakrishnan, R., & Ramakrishnan, B. (2018). Social event detection: A systematic approach using ontology and linked open data with significance to semantic links. The International Arab Journal of Information Technology, 15(4), 729-738. Sheth, A., Thomas, C., & Mehra, P. (2010). Continuous semantics to analyze real-time data. IEEE Internet Computing, 14(6), 84. https://doi.org/10.1109/MIC.2010.137 Silva, T., Wuwongse, V., & Sharma, H. N. (2013). Disaster mitigation and preparedness using linked open data. Journal of Ambient Intelligence and Humanized Computing, 4(5), 591-602. https://doi.org/10.1007/s12652-0120128-9 Wang, M., Liu, J., Wei, B., Yao, S., Zeng, H., & Shi, L. (2018). Answering why-not questions on SPARQL queries. Knowledge and Information Systems, 58(1), 169-208. https://doi.org/10.1007/s10115-018-1155-4 Weidong, H., Jidong, Y., Jia, Z., & Danni, Z. (2012). Study on construction of emergency plan ontology model. Information Technology Journal, 11(4), 414. https://doi.org/10.3923/itj.2012.414.419 Wirtz, A., Kron, W., Löw, P., & Steuer, M. (2014). The need for data: Natural disasters and the challenges of database management. Natural Hazards, 70(1), 135-157. https://doi.org/10.1007/s11069-012-0312-4 Yates, D., & Paquette, S. (2011). Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake. International Journal of Information Management, 31(1), 6-13. https://doi.org/10.1016/j.ijinfomgt.2010.10.001 BIOGRAPHIES Prof. SenthilKumar Narayanasamy received his Master Degree in M.Tech IT from VIT University, Vellore and currently working as Assistant Professor in VIT University, Vellore, India. He has 10 years of teaching experience and his research areas include Semantic Web, Information Retrieval and Web Services. Dr. Dinakaran Muruganantham received his Doctorate in Computer Science from Anna University, Chennai and Master Degree in M.Tech IT from VIT University, Vellore. He is currently working as Associate Professor in VIT University, Vellore, India. He has good teaching experience of more than 8 years. His areas of research include Information Retrieval, Networking and Web Service Management. 365 Crisis and Disaster Situations on Social Media Streams Atilla Elçi is full professor emeritus chairman of the Department of Electrical & Electronics Engineering at Aksaray University, Turkey. He established the Internet Technologies Research Center (2003-2009) and is founder and managing partner of IT&T Inc., Turkey (2008-2003). He was also at ITU, Switzerland, where he was chief technical advisor (19851997); METU, Turkey, where he was assistant chair and chair of Computer Engineering Department (1976-1985); and Purdue University, USA. His research is on web semantics, agent-based systems, robotics, machine learning, ontology, information security, and software engineering. He has published over a hundred edited books by Springer and IGI Global. He has organized IEEE Engineering Semantic Agent Systems Workshops since 2006, Security of Information and Networks Conferences since 2007; and COMPSAC since 2005. He has B.Sc. (with Honors) in Computer/Control at METU (1970), M.Sc. and Ph.D. (5.63/6.00) in Computer Sciences at Purdue University, USA (1973, 1975). 366

Log In

Crisis and Disaster Situations on Social Media Streams: An Ontology-Based Knowledge Harvesting Approach