Research I Automatic Ontology Construction From Relational Databases
Research I Automatic Ontology Construction From Relational Databases
Research I Automatic Ontology Construction From Relational Databases
QuratulAin Rajput dfghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq wertyuiopasdfghjklzxcvbnmqwertyuio pasdfghjklzxcvbnmqwertyuiopasdfghj klzxcvbnmqwertyuiopasdfghjklzxcvbn mqwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasdf ghjklzxcvbnmqwertyuiopasdfghjklzxc vbnmqwertyuiopasdfghjklzxcvbnmrty uiopasdfghjklzxcvbnmqwertyuiopasdf ghjklzxcvbnmqwertyuiopasdfghjklzxc
RESEARCH I Automatic Ontology Construction from Relational Databases
1/26/2012 By: Sameen Fatima
RESEARCH I
CONTENT
Dedication Acknowledgement Chapter I:................................................................................................................... 01-03 Introduction Chapter II: ...................................................................................................................05-08 Overview of Related Fields Relational Databases o Relational Model
Ontology Chapter III: ...................................................................................................................09-16 Ontology Construction for Relational Databases A literature review o o o Paper 1: Ontology Automatic Constructing Based on Relational Database Paper 2: Relational Database as a Source of Ontology Creation Paper 3 : An Approach for Ontology Construction Based on Relational Database
Page II
RESEARCH I
Page III
RESEARCH I
Acknowledgement
I would like to express my sincere gratitude to my supervisor, Dr QurutulAin Rajput, for her patient guidance and persistent support and encouragement throughout this research. Moreover, she gave me both the motivation for starting my topic and freedom in my research interests.
I would like to thank Dr. Sajjad Haider and Dr. Zaheeruddin Asif for their expert guidance.
Page IV
RESEARCH I
Data is a precious thing and will last longer than the systems themselves.
Tim Berners-Lee
Page V
RESEARCH I
Abstract:
Relational databases (RDB) play a vital role in managing the organizations data but they are dependent on autonomous hardware and software and thus create a problem of data integration. On the other hand Ontologies are considered as one of the most popular solutions in knowledge representation as a universal language to share and integrate knowledge. To realize the vision of semantic web, more and more data is being published over web using ontologies and it has been evident that relational databases include large amount of data that should be published over a web.. The purpose of this study is to explore algorithms to construct ontologies automatically for relational databases. A comparison is made on the basis of conversion rules defined corresponding to constructs used in both models and assess the strength and weakness of each algorithm. Finally exploring the future research direction to utilize more semantic contructs of ontologies for RDB data.
RESEARCH I
Chapter I: Introduction
It is said that Information is power. If the information is separated and isolated from other data it cannot bring any value to the organization. Before the emergence of database systems it was difficult to manage this information. As the organizations engaged their resources in managing a lot of duplicated data, handling data dependencies, dealing with incompatible file formats and representation of data from users view, they cannot utilize this information to its full potential. Database systems were introduced to manage these autonomous files as a single centralized collection of data. These systems reduce data duplication, avoid data inconsistency, allow sharing of data, increased security and maintain data integrity and make it available on demand[1]. This approach remained successful and sufficient to meet user requirements for several years. However, today's user data processing requirements and capabilities have
changed and new applications often involve accessing and maintaining data from several preexisting databases, which are typically located on autonomous software and hardware platforms distributed over the many sites of a large computer network which leads to heterogeneity and legacy problems, initiating a need for timely and efficient solution by sharing existing knowledge [2]. Besides ongoing advances in database technologies there are still the challenges of uniform and scalable access to multiple information sources including databases and other repositories[3]. Now (World Wide Web) WWW is playing a more vital role in information sharing for the purpose of education, business, research etc. therefore more and more people are publishing the data over web to share information among large audiences.
Page 1
RESEARCH I
Data could be published in different formats such as PDF, Doc, HTML, data from databases in the form of HTML etc, however, among these different representations of data most of the information is coming from databases. One of the study reported that it was determined that
Internet accessible databases contained up to 500 times more data compared to the static web and roughly 70% of websites are backed by relational databases [4].
increase in the volume of published data, it is desirable to provide some automatic mechanism to search and integrate information over the Web which is not possible on existing web. In recent years with the advent of semantic web technologies (RDF, RDFS, OWL) that have been standardized under W3C group, has proven to be a powerful support for the techniques used for managing data and for the problems of data heterogeneity and semantic interoperability[5]. Ontologies (RDFS or OWL) have been suggested as a way to solve the problem of information heterogeneity by providing formal, shared and explicit definitions of data called semantics. The addition of such semantics also improves the query processing by providing more meaningfull answer. Additionally, ontologies also have reasoning ability to infer new knowledge and to identify inconsistencies. An ontology-based access to relational data reduces the barriers for data exchange and integration. The expressive and formal semantics increases the value of the existing data and enables new applications on that data [3]. Recently different projects have been developed over Web using semantic web technologies such as DBpedia, Semantic wikis. Moreover, due to the popularity of ontologies, now commercial relational databases (such as Oracle) also provide support of ontologies. However, the construction of ontology is still manual [6]. Thus, it is highly desirable to transform databases into ontologies due to two reasons i) to publish relational data as RDF on the web ii) combining a relational data with existing RDF.
Page 2
RESEARCH I
This thesis presents an algorithm to automatically/semi-automatically construct ontologies from relational databases that can provide a conceptual view over the data. Therefore we can take advantage of both technologies. To construct the ontology model this algorithm defines mapping between ontological constructs (concept, relation, individual, etc.) and relational databases (tables, attributes, attribute values, etc.), as well as variants of Description Logic. Recently, several approaches have been proposed in literature to convert databases into ontology [7][8][9][10]. The rest of the thesis is outlined as follows. The next chapter presents a brief overview of relational databases and ontologies. Chapter 3 describes the recently proposed algorithms in literature to construct ontologies for databases. Chapter 4 provides conclusion and future research directions.
Page 3
RESEARCH I
The Semantic Web can assist the evolution of human knowledge as a whole. (SciAm)
Page 4
RESEARCH I
Relational databases
Every organization has data that needs to be collected, managed, and analyzed. A relational database fulfills these needs. Data analysts, database designers, and database administrators (DBAs) need to be able to translate the data in a database into useful information for both dayto-day operations and long-term planning.
Relational Model The relational model is the basis for any relational database management system (RDBMS). A relational model has three core components: a collection of objects or relations, boolean combinations that act on the objects or relations, and data integrity methods. In other words, it has a place to store the data, a way to create and retrieve the data, and a way to make sure that the data is logically consistent. A database consists of both data and metadata. Metadata is the data that describes the datas structure within a database. If you know how your data is arranged, then you can retrieve it. Because the database contains a description of its own structure, its self-describing.
Page 5
RESEARCH I
The database is integrated because it includes not only data items but also the relationships among data items. The database stores metadata in an area called the data dictionary, which describes the tables, columns, indexes, constraints, and other items that make up the database.[11]
The relational model was first proposed by Dr. E. F. Codd in 1970. A relational database uses relations, or two-dimensional tables, to store the information needed to support a business. Relational databases have replaced databases built according to earlier models because the relational type has valuable attributes that distinguish relational databases from those other database types.
2. 2
Ontology
Ontologies have been increasingly emerging because of the crucial role that they play: Ontologies provide a concise and unambiguous description of concepts and their relationships for a domain of interest. This knowledge can be shared and reused by different participants. A denition given by Tom Gruber: An ontology is an explicit specication of a conceptualization [12] which later is amended to express new features. The new definition of Ontology is : a formal explicit specification of a shared conceptualization[13]. Here formal means building of ontologies in machine understandable language; explicit specification means providing descriptive names for ontology concepts for clear understanding and defining relationships amongst concepts; shared is the consensus on knowledge to allow reuse of the ontologies across and the shared understanding of domain concepts; conceptualization is the comprehensive definition of domain concepts[12][13].
Page 6
RESEARCH I
(Web Ontology Language) is developed by W3C, which is used to describe semantics. There are three subsets of OWL: Lite, DL and Full. OWL Lite is the least expressive one, a subset of OWL DL. It assures efficient reasoning by reducing axiom constraints in OWL DL.. OWL DL provides reasoning functions in description logic, which is formalized basis of OWL. OWL Full contains all elements of ontology with no restriction with no computational guarentee.
Page 7
RESEARCH I
Both the technologies have certain features that makes its usage more powerful. Databases are well developed and store data efficiently. They provide the transaction systems to handle concurrent users and make sure that data storage in the system remains consistent even in case of failures. They have proved to be sufficient in handling large amount of data and have optimum query performance. While ontology is a new idea as compared to database, it outweighs database by adding the semantics to the information and resolving the issues of heterogenity. The usage of each technology varies from the context of deployment. Information manipulation can be done more effectively by taking the powerful features of both the technologies.
Page 8
RESEARCH I
The algorithm defines two types of relations; Co-relative Relation and Basic Relation. Corelative relations are those which do not have any nonkey attribute and only refer to the relations between entities. The basic relations are those which are not corelative relation[7].
Page 9
Rule1: If a relation is a basic relation then it will be converted into Class of ontology
Rule for Ontology Properties: Rule2: If a relation is a co-relative relation then it will be converted into two simultaneous object type properties Rule3: If a relation attribute is not a foreign key attribute then it will become data type property in Ontology Rule 6: All the primary keys of co-relative relation will be converted into object property with their referenced tables Rule 7: All the foreign keys in a basic relation will also create a relation of object property.
Rule for Ontology Restrictions: Rule 4: If a relation is a basic relation and has only one Primary Key then it will be converted to its data type functional with Cardinality Restriction =1 Rule 5: If a relation is a basic relation and has more than one Primary Key then each primary key will have a property with Cardinality Restriction =1 Rule 8: If the property of relation is set as Unique then it will take the restriction property of functional Rule 9: If the property of relation is set as Not Null then it will take the cardinality restriction minCardinality=1
Page 10
RESEARCH I
Rules for creating Classes: The two rules which form the basis of classes are: 1. If we have multiple database relationsand all of those relations have same Primary Key (like basic relation in paper 1) then it is possible to integrate all of these relations under the single class/concept of ontology 2. If we have relations in the data base and no other relation could be integrated with it according to rule 1 and at the same time one of the condition applies, then also we can create a class from that relation: a. PrimaryKey=1 b. Or PrimaryKey >1 and at the same time it applies that an attribute A exists rules where A is Primary Key but not Foreign Key
Page 11
RESEARCH I
Rules for creating Properties: OWL identifies two types of properties: Data type property and ObjectType Property. Following rules are valid for property creation: 3. If we have two relations , where one relation is subset of another relation and at the same time the common attribute is not the primary key, then object property can be created. The relation which is subset becomes the Domain for property and the other relation (superset) becomes the range for the property. 4. If we have two relations, then it is possible to create two object properties has_part and is_part_of if the following two conditions are fulfilled: a. Relation1 has more than 1 PrimaryKey b. ForeignKey of relation2 belongs to PrimaryKey of relation1. Provided that relation1 is mapped on class1 and relation2 is mapped on class2 then for the porperty has_part class2 becomes domain and class1 becomes range and for property is_part_of class1 becomes domain and class2 becomes range 5. The relation which is used to break the many-to-many relationship in the database makes the two inverse object properties in ontology 6. All the other attributes of relations in database which cannot be converted into object porperty according to the above rules, they become the datatype property in ontology Rules for creating Hierarchies: 7. If we have two relations and they have their respective PrimaryKeys and they cannot be integrated according to rule1 but one relation is subset of the other then sub-class/subproperty could be created.
Page 12
RESEARCH I
Rules for creating Cardinalities: 8. If the attribute in a relation is PrimaryKey then it will have
minCardinality=maxCardinality=1 9. If the attribute in the relation is declared as NOT NULL then the minimum cardinality corresponding to the property is 1 10. If the attribute in the relation is declared as UNIQUE then the maximum cardinality corresponding to the property is 1 Rules for creating Instances: 11. If C is the corresponding class to database relations R1 , R2, .... then each tuple of table t for which all the relations R1, R2 belong to t will become an instance of Class C.
Page 13
RESEARCH I
The paper descirbes has two types of tables in a database. One is entity table( similar to basic relation in paper 1) which represents the entities and the other is the relation table (similar to corelation in paper 1) which shows the linkages between entities and hence become the source of providing some semantics[9]. Rules for creating Classes: 1. If the table is entity table then it becomes the Ontology concept/class and will take the name after table
Rules for creating Properties: 2. If table T1 is a relation table, T2 and T3 are entity tables; T1 has T2 and T3s foreign keys. If T2 and T3s correspond concepts are C2 and C3, then two inverse object properties will be created; and domains and ranges are set accordingly. First property will have C2 as domain and C3 as range, second property will have inverse domain and range in relation to property one.
Page 14
RESEARCH I
3. If a column is not a foreign key, it will be transformed to the related concepts property; it has columns name, data type and range. 4. If table T1 has T2s foreign key, then it will be transformed into an object property, its domain is T1s corresponding concept, range is C2 which is generated by T2. If a column is not a foreign key in a relation table, it will become the related concepts common property. Rules for creating Hierarchies: 5. If there is a column within an entity table, it only has several value, no matter how many records, some sub concept could be created by the columns data value. Rules for creating Cardinalities: 6. An entity tables column A, if A is declared as NOT NULL then the related propertys minCardinality=1, foreign key and primary keys corresponded propertys minCardinality and maxCardinality are both set as 1.
Rules for creating Instances: 7. Each record will be transformed into related concepts instance, the primary keys value will be the instances ID. The relationships will be created between instances according to the foreign keys between the tables.
Page 15
RESEARCH I
Discussion:
The papers discussed above shows the similarity between the mapping rules.Paper I creates ontology classes, properties and restrictions by briefly using the primary key, foriegn key entities and other dependencies such as UNIQUE and NOT NULL and follows the 3NF regulations. Paper II creates classes, properties, hierarchy, cardinality and instances in ontology by using inverse process to the process of transforming conceptual model into relational model. This paper uses the decomposition of N:M cardinality of relations and models the semantics else hidden in the transformation process. Paper III addresses ontology concepts, relations between the concepts and transformation of database constraints to ontology restrictions by briefly using the information hidden behind primary and foreign keys and other integrity constraints such as UNIQUE and NOT NULL.The rules defined in this paper doesnot tell anything about hierarchy if tables are from different databases and also similar concepts cannot be found as it focuses on designing the same concept for the same table to avoid data redundancy. This can be seen that algorithms show a similar pattern of transformation rules. However alot more can be learnt from databases other than typical ontology constructs like property characteristics; reflexive, transitive etc and hence it still remains a topic of interest in the world of semantics.[10].
Page 16
RESEARCH I
(RDF/OWL), and discusses literature work done so far by explaining the proposed algorithms that how databases can be transformed into ontologies to make the information more useful and integrated. More specifically, these algorithms define some rules to build a generic approach that provides a transformation system from a relational database to an ontology without human involvment. Inspite of all the efforts put so far the transformation process still needs to be standardized and is still a work in progress.
Page 17
RESEARCH I REFERENCES:
[1] Te-Wei Wang and Kenneth E. Murphy, Semantic Heterogeneity in Multidatabase Systems: A Review and a Proposed Meta-Data Structure, Journal of Database Management, Vol. 15, No. 4, pp. 71-87,2004 [2] Natalya F. Noy, Semantic Integration: A Survey Of Ontology-Based Approaches, SIGMOD Record, Vol. 33, No. 4, pp. 65-70, 2004 [3] Matthias Hert Department of Informatics, University of Zurich, Binzmuehlestrasse 14, CH-8050 Zurich, Switzerland hert@ifi.uzh.ch, Relational Databases as Semantic Web Endpoints [4] Blog on Semanticweb.com Relational Database and the Semantic Web By Juan Sequeda on June 18, 2010 [5] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web, a new form of web content that is meaningful to computers will unleash a revolution of new possibilities. Scientic American, 2001. [6 ] www.oracle.com [7] Peng Liu1, , Xiaoying Wang,, Aihua Bao,, Xiaoxuan Wang,Ontology Automatic Constructing Based on Relational Database, 2010 Ninth International Conference on Grid and Cloud Computing [8] Zdenka Telnarova, Relational Database as a source of ontology creation, Proceedings of the International Multiconference on Computer Science and Information Technology pp 135-139 , 2010 IEEE [9] Xu Zhou, Guoji Xu, Lei Liu, An Approach for Ontology Construction Based on Relational Database, International Journal of Research and Reviews in Artificial Intelligence, Vol. 1, No. 1, March 2011,Copyright Science Academy Publisher, United Kingdom [10] W3C RDB2RDF Incubator Group,A Survey of Current Approaches for Mapping of Relational Databases to RDF ,January 2008-09 [11] Allen G. Taylor, SQL For Dummies, 7th Edition [12] John Wiley, Book Relational Database Concepts, chapter 1 [13] Deborah L. McGuinness, Frank van Harmelen. OWL Web Ontology Language Overview. [14] T. Berners-Lee, J. Hendler, and O. Lassila, "The Semantic Web", Scientific Am, May pp. 3443, 2001. [15] MAN LI, XIAO-YONG DU, SHAN WANG, LEARNING ONTOLOGY FROM RELATIONAL DATABASE, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005
Page 18