SQL To JSON-LD
SQL To JSON-LD
SESSION 2016/2017
MULTIMEDIA UNIVERSITY
MARCH 2017
SQL MIGRATE TO JSON-LD
BY
SESSION 2016/2017
MULTIMEDIA UNIVERSITY
MARCH 2017
© 2015 Universiti Telekom Sdn. Bhd. ALL RIGHTS RESERVED
II
DECLARATION
I hereby declare that the work have been done by myself and no portion of the work
contained in this thesis has been submitted in support of any application for any other
degree or qualification of this or any other university or institute of learning.
____________________
Haritharan S/O Nedunselian
Faculty of Information Science & Technology
Multimedia University
Date: 13 March 2017
III
ACKNOWLEDGEMENT
IV
ABSTRACT
Relational database management system (RDBMS) ideas are utilized all over
the world and it is globally used by many organization. Under RDBMS, there are two
types of the database introduced which are SQL and NoSQL databases. SQL database
is mainly designed for structured data and NoSQL database is designed for semi-
structured data and unstructured data. Therefore, SQL database is known by everyone
in the earlier period but nowadays there are many drawbacks of SQL database over
the storage device. In Large organization, massive amount of data collection from
different field gives a huge amount of problem due to the storage device over the
technology and it is also costly. The relational database is not scalability, not partition
tolerance and it is not fit for current web service technology. This brings a situation of
more organization and database vendor struggling with the new technology that
attempts to solve the disadvantages of the SQL.
V
TABLE OF CONTENTS
ACKNOWLEDGEMENT ...................................................................................... IV
ABSTRACT ...............................................................................................................V
CHAPTER 1 INTRODUCTION.............................................................................. 1
VI
2.4.5 Graph Database ................................................................................. 22
2.4.6 Cap Theorem ..................................................................................... 24
2.5 What is JSON-LD? ................................................................................ 25
2.6 Existing migrate method and tools for transform SQL to JSON-LD .... 29
2.7 Comparison of transformation method proposed by authors. ............... 36
2.8 Chapter Summary .................................................................................. 37
CHAPTER 3 METHODOLGY.............................................................................. 39
........................................................................................................................... 72
5.1 Discussion.............................................................................................. 74
CHAPTER 6 CONCLUSION ................................................................................ 74
REFERENCES ........................................................................................................ 75
APPENDICES.......................................................................................................... 79
VII
LIST OF TABLES
VIII
LIST OF FIGURES
Figure 2:1 Relational Database Model (Ramakrishnan & Gehrke, 2000) ........... 6
Figure 2:2 A design for Primary Key ...................................................................... 7
Figure 2:3 A design for Foreign Key ....................................................................... 7
Figure 2:4 The diagrams show the process of normalization (Dharaneeswaran) 8
Figure 2:5 Show an example of ERD (Jones, 2006) .............................................. 14
Figure 2:6 Show the different between ACID and BASE (McPhillips, 2012) .... 16
Figure 2:7 Show RDB data model and Document data model (Kammerer &
Nimis, 2014) .............................................................................................................. 17
Figure 2:8 JSON format of Data in the form of Document Structure (Lamllari,
2013) .......................................................................................................................... 18
Figure 2:9 Shows the Key and Values (Manoj, 2014) .......................................... 18
Figure 2:10 Shows JSON document (Manoj, 2014) ............................................. 20
Figure 2:11 Shows Super Column (Manoj, 2014)................................................. 20
Figure 2:12 Shows Column Family database (davevalz, 2013) ........................... 21
Figure 2:13 Shows the different between RDB and Columnar database (davevalz,
2013) .......................................................................................................................... 22
Figure 2:14 Shows Graph Database using Neo4j (Hunge, 2016) ........................ 23
Figure 2:15 Shows CAP Theorem (Lamllari, 2013) ............................................. 25
Figure 2:16 Shows JSON-LD Data Type (Presbrey, 2014) .................................. 27
Figure 2:17 Shows Simple Couch DB Document (Södergren & Englund, 2011)
................................................................................................................................... 33
Figure 3:1Performance Comparison Line Graph ................................................ 40
Figure 3:2 Hardware Review ................................................................................. 41
Figure 3:3 Installed hardware review in the PC ................................................... 41
Figure 3:4 Shows the sharding process ................................................................. 45
Figure 3:5 Shows example of JSON document ..................................................... 46
Figure 3:6 Shows example of field in NoSQL ....................................................... 47
Figure 3:7 Shows example of SQL join relations and NoSQL relation. ............. 47
Figure 3:8 Shows the summarization of the SQL to NOSQL migrating process.
................................................................................................................................... 48
Figure 3:9 Shows the map reduce query process. ................................................ 48
Figure 3:10 Development Process .......................................................................... 52
IX
Figure 3:11 Data Flow Diagram for SQL to JSON-LD ....................................... 52
Figure 4:1Show list of database in MySQL Workbench and MySQL ............... 54
Figure 4:2 Show interface of MongoDB and list of the same database in MySQL
................................................................................................................................... 54
Figure 4:3 Mongify is the intermediate for migrating SQL to MongoDB ......... 55
Figure 4:4 Perform configuration between MySQL database and MongoDB .. 55
Figure 4:5 Performing testing on Mongify connection ........................................ 56
Figure 4:6 Diagram show the source code for translating SQL to MongoDB ... 56
Figure 4:7 Diagram show the translation process between SQL to MongoDB . 57
Figure 4:8 Diagram show data after performing translation into MongoDB which
stored in document database .................................................................................. 57
Figure 4:9 Perform query for specific document database ................................. 58
Figure 4:10 Diagram show document database which in JSON format ............ 58
Figure 4:11Robomongo represent the document database into GUI ................. 59
Figure 4:12 Standard RDB mapping approach .................................................... 60
Figure 4:13 Successfully generated standard mapping ....................................... 60
Figure 4:14 Custom relational database mapping ............................................... 61
Figure 4:15 Show the interface for manual mapping process between relational
database and Ontology ............................................................................................ 61
Figure 4:16 Configuring database for MySQL database .................................... 62
Figure 4:17 Show the manual mapping process between relational database and
Ontology ................................................................................................................... 63
Figure 4:18 Show dump to RDF datasets .............................................................. 64
Figure 4:19 Query RDF to file ................................................................................ 64
Figure 4:20 Starting the server .............................................................................. 65
Figure 4:21 Show that D2R server is start ............................................................ 65
Figure 4:22 Show the relation between classes, property, subject predicate and
object ......................................................................................................................... 66
Figure 4:23 Show the result different between standard mapping and custom
mapping .................................................................................................................... 67
Figure 4:24 Show the result different between standard and custom mapping in
term of SPARQL query .......................................................................................... 67
Figure 4:25 Show the SQL data into OntoGraf as a result ................................. 69
Figure 4:26 Show Ontograf .................................................................................... 69
X
Figure 4:27 Show the relationship between each table in the OWLViz form ... 70
Figure 4:28 Object property for SQL database.................................................... 70
Figure 4:29 Show the Entities of SQL data ........................................................... 71
Figure 5:1 Show how the data is stored in the different migration process ....... 72
Figure 5:2 OWL data is represented subject, predicate, and object in JSON
visualization.............................................................................................................. 72
Figure 5:3 RDF represent the data in the form of OntoGraf .............................. 73
Figure 5:4 RDF represent the data in the form of subject, predicate and object in
OntoGraf .................................................................................................................. 73
XI
LIST OF ABBREVIATIONS/ SYMBOLS
XII
LIST OF APPENDICES
XIII
CHAPTER 1
INTRODUCTION
1.1 Overview
In the current period of time, the growth of internet and pc usage all over the
world is increasing. In the past 10 years the number of the internet users have increased
by 28.5%. As compared to last time, people nowadays have started to adopt technology
by using computer. When the market introduced personal computer, people are afraid
of the technology given in the computer but nowadays, everyone is more accepting
towards technology, it shows the growth of technology in current scenario towards the
needs of the people. Relational Database Management System (RDMS) is a program
that is all over the world and it is used to store data. Relational database is developed
by E.F.Codd since 1969. IBM, Oracle, and Microsoft are successfully using the
relational database services all over the world but for a larger scale of database, the
users need to pay. It makes it less cost-effective. After more people started using the
internet, it contributed to performance issue over the relational database. In addition to
that, relational database is facing performance issues as it is not scalable. Many
organization, found the solution to solve the performance issue and scalable problem.
NoSQL is the database which fills the gap in the database market, it is also capable of
managing large scale of data. NoSQL database system started using era web 2.0. As a
result, NoSQL has become the new database which competes with RDB. However
compared to RDB, NoSQL database has become more popular because it supports
structured and unstructured data. RDB only supports structured data. Moreover,
NoSQL database system can handle the huge amount of data to distribute the huge
scale of data into various storage devices with low cost. NoSQL database is supported
in horizontal and vertical scaling. For horizontal scaling, the NoSQL has distributed
the simple operation into many servers. As for vertical scaling, the NoSQL is used to
store the single record on the different server. Then it is able to replicate between the
servers and distribute data over many servers. Additionally, NoSQL database is
dynamically able to add the new field in the records. Compared to NoSQL database
with SQL, NoSQL is more popular because it supports unstructured data, semi
structured data and schema-less. NoSQL database is schema-less, because it supports
1
many servers and perform the task faster than RDB. Giant industry such as Facebook,
Twitter, and Google adopts the NoSQL database because of the availability and
scalability. Another key feature in the NoSQL database is shared nothing, replication
and partitioning data over many clients. NoSQL database system does not support
ACID properties except Couch DB. However, it is not that effective. NoSQL database
consist of CAP theorem. In the CAP theorem, there are system which must have only
two properties out of three. In addition to that, the database does not support standard
method for query processing in NoSQL but it is faster than SQL query method.
Although, World Wide Web Consortium (W3M) plays an important role after the SQL
data is migrated to NoSQL and stored in JSON and JSON-LD document. JSON data
is highly scalable and is itself describing. JSON data is also similar to NoSQL, which
is schema-less. Finally, the focus of this research paper is migrating SQL data into
NoSQL document store in JSON-LD. The reason of this research is to show how
JSON-LD data is supported over the web and what are the challenges faced in
translating SQL to NOSQL.
For the current web, SQL plays a standard role in database representation over
the web. However, it also has too many drawbacks over relational database which are
SQL.SQL is not highly scalable and it has poor performance over multiple databases.
To solve this problem transformation of SQL into NoSQL should be carried out. Then
it should also carry out the mapping process SQL to NoSQL database. The document
in the NoSQL database is stored in the JSON document. By plugin, some library JSON
is converted into JSON-LD and it easier to use in web.
2
1.3 Project Objectives
The Objective of this research is Migrating SQL data into JSON-LD. The following
are the objectives:
The scope of the thesis is migrating the relational database into NoSQL
database and transform it into JDON-LD. Upon migrating SQL to NoSQL database,
the next is representation: linking, field, document, and collection. After mapping is
completed the document is stored in JSON or JSON-LD based on the plugin library.
The mapping file must manipulate with graph for an identical result.
3
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
This Literature review will discuss the current trends of the database and era
web 2.0 technology database which is NoSQL database. Then it also introduces JSON-
LD used by NoSQL database to store data. The current trend of an industry is that it
has started using NoSQL databases such as Facebook, twitter, and Netflix. So, some
of existing vendor also started to migrate to NoSQL database. Next chapter will discuss
on what tools that is used for migrating and what are the properties that are used by
NoSQL database in depth. The objective of this research paper is to analyse relational
database, SQL, NoSQL, and JSON-LD. Bearing this in mind, this literature review
will give some brief information on how it works and these information are collected
from the latest articles, Journals and research papers.
Relational Database (RDB) is one of the symbols which are sorted out into a
collection of relations. The idea of “Relational” is inspired with the aid of the
mathematical idea of a relation. Before the relational concepts are implemented there
are several approaches that has been practiced but there are not as successful as a
relational database. Relational databases are a collection of one or more relations,
columns or attributes and rows of data known as tuples. It explains that the origins of
the relational model or relational database that is developed by Edgar Frank Codd in
the early and mid-1970.Relational Database Management System (RDBMS) is used
by various software system for maintaining relational database. The development of
relational database began in 1976 and early 1991, the database is made of two-
dimensional array of rows and columns for recombining different relation database
and it produces great flexibility of data performance. The method of structured data is
introduced during 1986 by Burroughs.
4
According to (Elmasri & Navathe, 2009), Geography database system (GIS)
stores data into the two-dimensional tables through its process, it can communicate
with multiple data elements of a data set. The dynamic data is used on operational, for
example regular organization, foundation and association they utilized information to
store as a record and they also used to modify and maintained the data. Static data is
used on analytical for track historical databases. Additional inconsistent and excess
information are issues related with an early database demonstrate alluded to as a
various levelled tables which can represent parent child relationship in the table. Link
database units are equally complicated. Both are helpful for developing the relational
database model.
The relation data model is used for storing the dataset information using the
relations or tables. It means the Database Management Software allows for storing,
accessing, modifying, deleting, and retrieve the information which is stored in the
server. However, the database model is used by other application software.
Identification of the relational database model can done by main mechanisms for
relational database. They are the entity, tuple, attribute, and domain.
5
Figure 2:1 Relational Database Model (Ramakrishnan & Gehrke, 2000)
Then the most important data integrity features in the relational database model
is the candidate key, primary key, and foreign key. And the relations are in tables,
tuples are in rows and the attribute in the columns. A candidate key is referred to the
columns on the database. The candidate key can be one or more candidate key in the
tables but if there have only one candidate keys means it special it assigned as a
primary key for the tables. A Primary Key is a candidate key which doesn’t have
repeated values nor does it comes with a NULL value in the table. A primary
key will unambiguously identify every row in any table, so a primary key's chiefly
utilized for record looking.
Primary key in any table is each a super key as well as a candidate key.
It is potential to have a lot of than one alternative of candidate key in a selected table
example. In this case, the selection of the first key would be driven by the designer’s
choice or by user necessities. Primary key shown below Figure 2.2. A foreign key's
usually from one table that seems as a field in another wherever the primary table
encompasses a relationship to the second. In primary key table cannot be null it must
contain a unique data or non-null value for the each row it knows as entity integrity.
The foreign key is must contain any values, that are refer to the existing row in the
parent database table. Make sure they are no one adding added rows to the table and
does not match the entry in the table. It knew as referential integrity. Foreign Key
shown below 2.3.
6
Figure 2:2 A design for Primary Key
Normal Form (1NF):If there are group are eliminated of repeating data, it
means it is creating a new table for each group data for a specific group and which
must identified by primary key lines.
7
Normal Form (2NF): The primary key is compromised with several columns
and it must consist of the 1 normalization rules in order to continue the 2NF method
and they do not have dependencies on any other values and key which are primary key.
Third Normal Form (3NF): The table which is already in 1NF and 2NF is used
for the 3RD normalization and it used the non-primary key which is not be determined
by a non-primary key value in 2NF. For example, in 2NF product table is represent
networkID as a primary key and discount rate is not allocate as a column in with
Products table on the off chance that it is likewise reliant on the productPrice, which
is not part of the primary key.
8
2.2.2 Relational Database Terminology
The data that been stored in the database is the static data until it makes some
modification on the data. Sometimes in the database NULL value represents the empty
or the real value cannot be evaluated by the relational database algorithms.
In the relational database table is one of the important factors and it consists of
entities, attributes and a lot of information. The arrangement of the table is
supposed to be not an important factor. It usually represents unmarried topics,
which can be an item or an event. A field is the smallest structure in the
relational database it also knows as the column. The column represents the
relationship between the value and it also shows some multivalued
relationship.
The structure of the table in the relational database is represent as record whit
unique identifier that known as the primary key. In table, tuple is also taken the
main role to represent the subject of the content. The process is implemented
on saved SQL query.
9
Relationship-related Term:
Types of Relationships:
10
on destination database it causes to change on all another database including source
database. It required more computing resource to perform the replication process.
Duplication is the process made based on the source of database or master database.
In this process the database only duplicated the database but if make any change on
the duplicated data it does not cause any issue on the source database. Besides, there
is no definition of the exchange, the database will be a wrecked condition. The better
solution for this case is ACID properties. (Yu, 2009) stated distributed database can
be ACID properties it makes reliability of database. ACID defined into Atomicity,
Consistency, Isolation and Durability.
Atomicity:
Durability:
11
2.2.4 The SQL Language
12
Entity Relationship Diagrams square measure a serious information demonstrating
instrument and can facilitate organize the info in your project into entities and outline
the relationships between the entities. One of the method that been established to
supply an honest info structure in order that the info may be held on and retrieved in
an exceedingly best manner. Entity Relationship Diagram is that the structure of a
system and it’s a conceptual model. ERD, representational process solely structural
options give a static read of the system.
Components of an ERD:
Entity: Entity is the main part in the table because it represents the data which
relating to table and relationship of the table is composed based on the entity
data. For example city, student, hostel, and more.
13
Figure 2:5 Show an example of ERD (Jones, 2006)
Nowadays in the information technology (IT) world, the leading one of the
technology is relational database management system (RDBMS).The RDBMS in
developed by E.F.Codd during 1970 for ruled the business. There are different types
of relational database model concept is implemented, one of the models is hierarchical
model and another is network data model were introduced in 1960.The hierarchical
database model has recorded the data into tree structured. The records are in the parent
and child relationship. The Network model allows the multiple parents and child
process in the database and generates through graph database. The network model is
origin by Charles Bachman in 1969 in the Conference/Committee on Data Systems
Languages (CODASYL) Consortium. This approach is used for every relation
database system. After introduced the semantic web 2.0 application the NoSQL
databases is more scalable and reliable solution than a relational database for data
storage. (McPhillips, 2012) stated that "Transitioning an application from a relational
environment to NoSQL environment presents numerous difficulties as far as diagram
outline and information access techniques".
14
Therefore the relational database is capacity for processing and managed a
large amount of database. Nonetheless, taking care and managed the large database is
obligatory for IT Companies. RDBMS is not suitable for managing a large amount of
data for clustered computing to solve this kind of problem another approach is find
which are NoSQL is introduced was acquainted with moderate a portion of the issues
is not took care of carefully by RDMBS. Carl Strozzi is introduced the Not only SQL
is one of the open source relational databases in term of "Strozzi NoSQL" and it was
used 1998 first time. The reason why called NoSQL databases is combination and
collaboration of many plugin and libraries in Application Programming Interface
(API).
Scheme-less is, for the most part, there is an immediate similarity between this
"schema less" style and powerfully wrote a language. Develops, for example, those
above are anything but difficult to represent in PHP, Python and Ruby. What we are
attempting to do here is make this mapping to the database common.
Highly available the Information is reproduced over various servers taking into
account an exceptionally accessible design that can deal with various server
disappointments and bolster fiasco recuperation simple query process it does not
15
support SQL query language for query but it provides the different method which is
simplest that SQL Query. The NoSQL is used the unstructured and semi-structured
database. It also asynchronously replication with the procedure for recreating
information between databases where the essential stockpiling. No concurrent
Replication has the benefit of speed, at the expense of expanded danger of information
misfortune due to correspondence or copy framework disappointment.(Mughees,
2014) explained that relational database it make use of ACID properties but NoSQL is
databases is used BASE concept. The Base represents the acronym of basics
availability, soft state and eventually consistent. BASE concepts are implemented into
NoSQL database and it also highly suggested. The relational database is data is
consistent but the NoSQL data is not consistent. The BASE concept and ACID
properties is are two different concepts. ACID alternative is called BASE concept.
Below the diagram shows the different between the ACID and BASE.
Figure 2:6 Show the different between ACID and BASE (McPhillips, 2012)
16
2.3.1 Document Store (DS)
Therefore, the data in the NOSQL database the data is stored in semi-structured
and unstructured form. The DS used to store the database information in NoSQL
database. In the relational database it uses the table to store the information and used
SQL for query but in this case, the NoSQL database does not contain any table it used
document store for sired the data. The DS does not represent any schema but it is
independent. The programmer's life become easy went it is independent and it easy for
integrity issues. On the off chance that a field is expecting an invalid esteem then it is
not a part of the document. In the NoSQL database, every document store it
automatically or manually create a unique identifier (UID).The UID is same functions
as primary keys in RDBMS. In the distributed network the document store is useful
for web based application. Figure 2.6 (Kammerer & Nimis, 2014) below shows the
different between RDBMS and DS.
Figure 2:7 Show RDB data model and Document data model (Kammerer &
Nimis, 2014)
17
Figure 2:8 JSON format of Data in the form of Document Structure (Lamllari,
2013)
These system is using different database compare SQL. The NoSQL use a key-
value database to store the data in a system. (Lamllari, 2013) stated that the Amazon
is the first organization that used key value database. The Amazon used DynamoDB
model for implementing the key-value pairs.
The key-value database is the master for all NoSQL database. Therefore in the
key-value database, the key are generated automatically or a user can modify manually
based on user requirement but the most commonly it generated automatically. The
values in the key-value database are based on the user input and it must represent the
key. Key-value database is used hash tables for store the data. Therefore the query
process also can be done but it can make it source to the destination; it cannot perform
another round.
The NoSQL database it also accepted any type of data such as the array, int,
and char but it based on the architecture of the system. In the key value, we can also
perform instantiate, read, write and delete four fundamental operations. For example
Figure 2.8 Show the image of Key Values database.
(Manoj, 2014) described that most command method that used for a query in
the NoSQL database in a columnar database. The columnar database is also known as
the Column-oriented database. The first organization that implemented the columnar
database is Google. Moreover that all organization inspired the method and start to
implement, it been used on a very large scale of structured data for the storage system
and it distributed by Google Bigtable. However, the Facebook message is also used
columnar database and the functionality provides by the HBase. The HBase is also
supported to perform the thousand of a message. This approach is has been used for
NoSQL database. In the NoSQL database, it only supported column by column, row
by rows and rows by column. For example, in the relational database if the user
performs querying on a large database it took longer time because it performs query to
all rows and column but in NoSQL database, the query only performs by row and
column and column and column, it makes the process fastest than the traditional
database. The columnar database is not stored in the table it stores in circulated models.
In a columnar database, it can be divided into three ideas, which are the column, super
column and column family.
2.4.2 Column
In the columnar database, all values are stored in the column format. The rows
in the NoSQL are always paired with keys and values. One of the best example which
used the column is Google BigTable, HBase and Cassandra but it initialized by
Bigtable. Below shows and example of JSON notation.
19
Figure 2:10 Shows JSON document (Manoj, 2014)
NoSQL has also consisted of the super column which the object contain by
column families. In the super column the key-value pair performs the mapping
between each and other in column families, the reason is user can view the number of
tables similar to the relational database. But NoSQL is not supported view function
but it similar to view on a table. During the mapping process interrupted such as similar
name, age and first name can be taken part but it continues the process. Super column
is read the whole column and rows and it generated in two subsystems into the
memory. Super column same function as the composite column in NoSQL. Based on
the key-value they are different between a column and super column. Normally the
column values are int and string type but the super column type are different, it
produced based on the mapping processing. It also sorted the values of the column into
the array during the process. Below show the simple example of a super column.
20
2.4.4 Column Family
The NoSQL data are stored in column family database, if the data consists
many columns also it can associate with the rows. For example user access to Facebook
privacy settings at the same time they access to Facebook profile of the friend. In the
relational database it contains bulk of data in rows but in NoSQL the data store row
with multiple columns. And another different is the different type of rows can't be the
same column.
The diagram above show the example of column family database .The key are
shows the different between the relational database and NoSQL. The key is supported
how much that big. The column is supported more than column and rows. It means
“table with table”. For the relational database it will created another table and linked
with the primary key and foreign key but column family it all contain single column.
21
Figure 2:13 Shows the different between RDB and Columnar database
(davevalz, 2013)
Huge preferred standpoint here is that we are conveying the related information
nearer to each other, which makes information recovery quick. However attempting to
total this kind of information gets to be troublesome and requires a Map Reduce work.
Much the same as the Key Value stores two major advantages to Column
Stores are even scaling and the absence of a blueprint necessity. Be that as it may, with
Column Stores, having a decent comprehension of the tables and key structure
characterized preceding improvement helps to actualize a valuable information store.
22
A graph database is also used shortest path algorithms to calculate the short
path and it scalable. The can have the different type of relationship between the nodes
and there are no limit on the relationship between the nodes. The graph database is
start and end with nodes but there have the different type of relationship.
23
2.4.6 Cap Theorem
(Mughees, 2014) explained the CAP theorem proposed by Eric Brewer in the
year 2000. First of all, CAP in cap theorem stands for consistency, availability and
partition tolerance and these are three attributes of the distributed system that is, a
system made up multiple machines and multiple nodes communicating with one
another over the network and these are three promises.
C – Consistency:
If we write to one node when we read from another node it will return to what
we just wrote or to be more precious it will return
A – Availability
Promise that when we talk to one node will respond unless that node has
failed. Availability arouses for failed node but if the node has not failed it will
respond partition.
P – Partition Tolerance
When the network is partitioned whatever other promises have made about the
systems it will still keep those promises. A network partition when messages
can flow from one machine to another. This might happen when we are set 2
different data centres and the wide over the connection between the two is
severed. Figure 3.4 Show the CAP Theorem classification diagram.
When the network is partitioned whatever other promises have made about the
systems it will still keep those promises. A network partition when messages can flow
from one machine to another. This might happen when we are set 2 different data
24
centres and the wide over the connection between the two is severed. Figure 3.4 Show
the CAP Theorem classification diagram.
(Presbrey, 2014) described Liked data is one of the technology that raised
faster in IT world. The semantic web is one of the most command technology that
used today. The semantic web is similar to the web. In the semantic web, there have
been making many changes based on the World Wide Web consortium. The
semantic web has several evaluations which are semantic web 1.0, semantic web 2.0,
semantic web 3.0, semantic web 4.0 and semantic web 5.0.
Semantic web 1.0 is based on the "read only" web and viewing.
25
Semantic web 2.0 is based on the personal web page, portable, wiki and
widget that powered by the web. During this time NoSQL was introduced. It
also is known as the social web.
Semantic web 3.0 is called intelligent semantic web because it connected the
existing data with the intelligent data for smarter uses. Linked data approach
was introduced during this evaluation and it enables more computer resources
to do more work by reading the web page. It is one of the ways to perform
standard on the web. Linked data is machine readable across the world.
Linked data is improved on the web, it engages individuals that distribute the
web and make sure used it. The data on the web it shared vocabulary with and the
standard and published across the IT world. JavaScript Object Notation (JSON) is the
standard format that used in a web for transmitting the data. A human can easily read
and make changes in the JSON code and it is easy to implement compare to another
approach. JSON is based on JavaScript and it is and independent language. In the
semantic Web 2.0, JSON was first introduced. JSON is an alternative method for
extension markup language (XML). Then JSON is the light weight programmed and
easy to execute in the web. It also "no-schema", which means it easy to implement in
web server and it quickly perform integration from other sources. Facebook, Google
service and Twitter are now used the JSON for supporting the web. If compare the
JSON with a relational database it caused some disadvantages. Which has been
developing technology is NoSQL, they are limited for NoSQL for query process.
JSON also does not have the standard method for query process. Besides that
JSON is incomplete the ACID theorem. JSON is represent 6 types of data and can be
divided into two. First is primitive types itself it contains string, number,
Boolean and null and second is structure types itself it contains object and
arrays.
String: The code must be used ' " ' symbols for the starting and ending code.
Number: The number can be any types such as int and Boolean and it can't
be ANSI types.
Null: it empty value.
26
Array: value must by ending with “," and valid data.
Object: Values and key pairs is the attribute for the object
In the web, service HTML is mainly created for the human for making user
readable and another side is mainly for computer readable is called JSON. JSON is
back-end process and it supported human and machine readable. RDFa and HTML are
used for expressing linked data but it does not propose in the standard way so that
JSON-LD is introduced. JSON-LD is human readable and machine readable, it also
lightweight linked data. It depends on the officially effective JSON organize and gives
an approach to help JSON information interoperate at Web-scale. JSON-LD is a
perfect information group for programming situations, REST Web administrations,
and unstructured databases, for example, Couch DB and Mongo DB. JSON-LD is
similar to the RDF and it successful factor in the World Wide Web. In the semantic
web, Jason-LD is been used and it integrated. Since JSON-LD is 100% perfect with
customary JSON, designers can keep on using their current apparatuses and libraries
and it also initially with REST service.
RDBMS does not support XML.JSON and JSON-LD format but it possible to
migrate the RDBMS to JSON-LD through MongoDB. The RDBMS data can be
transformed into Turtle and N-Triple form then it can be migrated to JSON. The
translation of RDB to Turtle more and similar to JSON-LD. The language structure
was changed drastically in the most recent forms and permits now information to be
27
serialized in a way that is regularly indistinct from conventional JSON. This is
exceptional since JSON is utilized to serialize a graph model that are in tree form or
parent-child model.
The graph model in the JSON-LD is represented subject, object and properties
which are similar to RDF. The subject nodes must end with another node. URI is the
extension of IRI; it is one of the standards was published by RDF3987.The subject
which represents the nodes must be labelled with the IRI and this is the first
requirement for the JSON-LD. Jason-LD is supported labelled and unlabelled nodes.
In some cases, the nodes have not fully filled the requirement it's also accepted but not
for all cases. The edges are must label with the IRI and it used refer another document.
In the current period of time, most developers think that for JSON-LD we must put
more effort to migrate from JSON to JSON-LD. However the fact is JSON and JSON-
LD is command method but it including with additional key values which are
@context and @id.
(Lanthaler & Gütl, 2012) described that JSON-LD is similar to JSON, most the
developer and user think that JSON-LD is more complicated. The developer or user
still go on through the existing tools and method. The JSON-LD has basically
supported the RDF method. The creation of JSON-LD is to solve the multiple webs
contributed the same data. For example, in the first web page, the name data value is
similar to the second web page it courses an ambiguity problem. To solve this kind of
the problem JSON-LD is used global identification which is @id and @context. The
migration of JSON to JSON-LD is just making it by adding the global variable or
reference with API for producing the output.
28
2.6 Existing migrate method and tools for transform SQL to JSON-LD
MongoDB:
(Dzhakishev, 2014) proposed that the have Mongo DB is one of the tools that
been used nowadays for migrating a relational database to NoSQL and document are
stored in the JSON-LD format. Mongo DB is open source software it published by
10gen in 2007.A C++ has been used for developing Mongo DB. It the service is
provided for Linux, Mac OS and windows. Mongo DB is more commonly known as
NoSQL database. The relational database table is can transform into JSON document
format it done by Mongo DB. In the Mongo DB, the document is stored in Binary
JSON (BSON).The BSON is limited with 16 MB size for each document. Then Mongo
DB data are stored in the rows and column instead table and it does not have any
database. Mongo DB is automatically creating are @id for each process, the id similar
to the Primary key in a relational database.
The id must be unique and it does not same with others id. Mongo DB is using
key-value for hold the data .However Mongo DB has held the data in the memory
instead buy large storage device for the system, it can also save money from buying
large storage device. Mongo DB is also supported sharing concepts. Sharing concepts
are divided the large amount data into single data, by splitting the data the process can
be done quickly and scalable. Usually, the data in the Mongo DB are divided into a
column by using the shared keys. A particular scope of the key is characterized as a
shared and any key falls inside that extent is doled out to that shard. It also supported
indexing and Map Reduce. Map Reduce method is proposed by Google in
2004.MapReduce is read a large amount of data in very quickly and splitting up it
individual and it distributed over the network computer. The main idea of Map Reduce
has divided the function and executed in cluster network and it was done by two
functions which are map () and reduce ().
One if the benefit using the MapReduce over Mongo DB is asked the question
over the document store which is no less confined than the potential outcomes of the
programming dialect in which the MapReduce example is actualized. In the Mongo
29
DB, it included all the JSON data types which are Boolean, null, array, double and
object. Mongo DB has used the special command like findAndModify, update if
current and modifier.
(Cattell, 2011) described about the modifier command is used for set the value,
append, remove the values, and retrieve the record and updated the record. The
findmymodify command is for to perform an immediate update and store the value in
the document, this process is been done too fast. Update if current command is used
to make changes only if the value which is similar to previous value then updated.
Therefore, Mongo DB is the replication mechanism. Mongo DB is provided database
durability and concurrency over the document, Mongo DB is also synchronizing
information over numerous servers and the master slave is the replication system.
Mongo DB is also used for read and write the data of master-slave and more slaves in
the network. In this process, the master can read, write and update from the slaves.
Finally the MongoDB server is used mongo server for published the JSON and JSON-
LD text over the HTTP.
Google app engine is an integral part of the Google cloud platform. Google app
engine provide us everything we need to build for the cloud including platform and
infrastructure as service as well as big data, cloud storage and much more. Cloud
application should be able to scale at a moment notice to handle huge. Using app
engine we can build our application to run on top of Google’s world-class
infrastructure we don’t need to worry about provisioning and managing a data center
that scales to meet our demand. (Zahariev, 2009) stated In-app engine we can build
cloud scalable app with python, PHP or GO using your favorite development tools in
a local environment for testifying and debugging your app before we deploying it to
the cloud app engine also supports java and with an eclipsing and we can also develop,
test and debug locally before we deploy it to cloud. In-app engine, we have multiple
options for storing our data including relational databases scalable file storage and a
lightweight data store. App engine makes us more productive by reducing the need to
30
write boilerplate code and manage services such as task queues and the user API will
help to build world class application.
Couch DB:
(Cattell, 2011) describe about Apache Couch DB is one of the most NoSQL
database and it also open source. Couch DB database most for web and store JSON or
JSON-LD document. Then it normally save in JSON document but in current period
in migrating to JSON-LD. Couch DB database is written in Erlang programing
language. User must access the documents via HTTP web browser. In the Couch DB
the query the document based on JavaScript. Most of the mobile web and modern web
page are using Couch DB. Couch DB is also provide web administration console for
the admin. Couch DB is also provide add, delete, modify and delete function on
database document which store in the JSON format.
Couch DB is distributed in 2005 and it stated well kwon during 2008 when it
became an Apache product. Couch DB is totally different with relational database
because the data it not store in the relationship of table. The data of the data set is store
in the individual document form.
31
every one of the information has been replicated and all clients transitioned to the new
document.
(Anderson, Lehnardt, & Slater, 2010) stated, based on the CAP Theorem
Couch DB is supported availability and partition tolerance. When any critical session
all the client is view the database, during this time the user perform read and writing
process makes availability takes a secondary lounge to consistency, but case are
implement in RDB but in the Couch DB is performed the action successfully. However
in the documenting the Couch DB is written concurrently during the edit, delete and
add process but not Boolean. At the client side the user does not have to wait for read
or write during concurrent process. The record in are special id which are DocID and
Sequence ID. Then every record it provide a Sequence id with the database, if make
changes or modification also is updated simultaneously with the document and stored.
The document is split the data into separate rows and columns in database system. At
the point when reports are resolved to plate, the record fields.
Furthermore, metadata are stuffed into cradles, successively one archive after
another. Couch DB is also supported Compaction method. By scheduling the database
setting surpasses a specific measure of squandered space means the compaction
process is take over and clone the all the dynamic information to another record and
after that disposes of the old record. The database is still online until specific time and
all redesigns and peruses are permitted to finish effectively. The old database record
is erased just when every one of the information has been replicated and all clients
transitioned to the new document. Finally is Couch DB is intended for sans lock
simultaneousness, in the reasonable model and the genuine Erlang usage. It reducing
the bottlenecks problem and maintaining from keep looking the whole system under
overload problems. It also supported concurrent and availability, when the data centre
is the failure, it must to be scan and fix the error and need to be restart but in the Couch
DB database distributing system is detect any failure means it restarts the system but
the system still available.
32
Figure 2:17 Shows Simple Couch DB Document (Södergren & Englund, 2011)
Simple DB:
Based on (Dimovski, 2013) Simple DB is also most popular NOSQL data store
which is written in Erlang Programming language. Then it used for web service at
developed by Amazon. It distributed in 2007 and it partially paid software, based on
the client storage. The name Simple DB is based on the process Select, Delete,
Get_Attributes, and Put_Attributes. Simple DB document store id different with other
and it simpler. Simple DB store the document Google store. Based on the CAP
theorem, (Cattell, 2011) describe Simple DB has supported consistency but not value-
based consistency. Like a large portion of alternate frameworks, it does no concurrent
replication. Compare to other tools it dislikes key-value document store and it used
attribute-value pairs. These concept make the client wording practically equivalent to
ideas in a customary spreadsheet table. It also supports multiple attributes, the
document is stored in the document. Select operations are in one space and determine
a conjunction of requirements on attributes. Simple DB is select automatically select
the attributes and updated, in some cases, it also perform query by automatically.
However, it does not supported automatically transfer the data over series. Simple DB
also has some disadvantage, it is predictability performance. Finally Simple DB is
provided 10GB limited space for per domain, it not utilized for the daily multi-terabyte
Internet crawl.
33
HBASE:
HBASE is one of the popular Not only SQL database which it stored in column
oriented database. The system itself run on top of Hadoop Distributed File System.
HBASE can also be managing the big data which parsing the data on the cloud
environment. HBASE application a written in java, mud like typical map-reduce
application. HBASE application is written in Apache AVRO and RESTFUL API. The
HBASE application made up by a set of valued-pair and it contains table form. The
tables are stored in HDFS, HBASE does not typically use map reduce. Each table
contains row and columns. The table must be defined with column and it contains
primary key and it can also call by another document for accessing. HBASE columns
represent an attribute of an object and it grouped together into column families.
RDB2LOD used to migrate the SQL database into RDF and stored the data in
JSON format, it a concept of ontology. RDB2LOD application was developed for
installation and running on the Java virtual machine environment, and supports the
MySQL database management systems (driver native D2RQ platform) and MY SQL
Server (driver enclosed within the framework). Moreover, Relational Database
mapping can also done through this mapping process, can also make a SPARQL query
through D2RQ Server for JSON data and the mapping file are saved in JSON, N-Triple
and Ontology. The tools were possible to make the customization of the mapping file,
so Semiautomatic through the OWLtoD2RQ-Mapping Tool, which made the
replacement standard vocabulary the terms of the domain RDF. Then they compared
the views of the connected data generated, both from the standard RDB mapping, as
from the customized mapping (with the incorporation of domain ontology).
Application of the approach provided a RDB2LOD better expression to generate RDF
triples, presenting well-defined meanings (Approximate to natural language) for the
subjects, predicates and objects therefore, were obtained customizing the RDB
mapping with the incorporation of an ontology domain.
34
The application of RDB2LOD approach also provided a user interactive by
means of graphical interfaces, eliminating the need for manual interaction
configuration and operation of the tools applied to data publishing process connected.
Thus, the customization of mapping the associations between tables and RDB columns
and the classes and properties of ontology, which until then was made manually it shall
be done in an automated manner by means of this approach, which reduces the time
spent and the need for technical knowledge of the file language Mapping generated by
the applied tool. The customization of mapping bases Data with a large number of
tables and columns, the method would be impractical conventional (manually), it will
be possible through this approach. Finally, the after perform the mapping process the
dataset is saved in JSON format and the JSON file are used in linked data in Ontology.
.
35
2.7 Comparison of transformation method proposed by authors.
36
2.8 Chapter Summary
37
for query process. SQL is supported only structured data but NOSQL supported
structured and unstructured data. Organization is changing from SQL to NOSQL is not
a surprising statement because it based on the goal of an organization. Although some
organization is adopted to the system and started using it. Base on this is useful to
migrate SQL to NOSQL and JSON-LD it helps to understand how the data in RDB is
stored and maintain the different of NOSQL.
Therefore, JSON-LD is taking the main role in NOSQL database for store the
document. JSON is used to transmit data back and forth between websites and in
browsers. Linked Data (LD) is new term is a way to publishing data on the web such
that it’s interconnected between different websites. Era semantic web 2.0 and semantic
web technology is used to linked data in JSON-LD. JSON-LD is really good at being
both human readable and easily possible by machines and the problem is when we start
to gather data from multiple websites.
38
CHAPTER 3
METHODOLGY
Window 7 has good visual styles and animation. It can support the installation
of the software that will be used for the real implementation and for XML editors as
well. Besides that, the laptop was manufactured by Acer and fall under ASUS A43SD
model. Processor Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz provides average
performance as shown below In addition, it also has Turbo speed of 3.1 GHz, Clock
speed of 2.5GHz.
39
Figure 3:1 Performance Comparison Line Graph
Based on the performance table shown above, this research will be faster and
produce excellent results if replace the current processor with the Intel Core i7 4790K
@ 4.00GHz processor. In addition, the hardware consists of NVIDIA GeForce GTX
630M, middle-class graphic card. This card ensures less power consumption and offers
pure video HD technology for video encoding.
40
Figure 3:2 Hardware Review
41
3.2 Software Requirements
MYSQL:
FME workbench:
42
comprehend your database, make new databases or figure out existing
databases to change, archive, examine, and upgrade.
lib curl:
Libcurl is one of an open source tools that is used for multiprotocol file transfer
library. It is easy to use by the client side. Since in our research it used for
HTTP, FTP, and TFTP file for import and export. In addition, it also supports
IPV6 and supper fast file transfer. Finally, the program is written in C language.
Python Sphinx:
However, sphinx tools are written in Python language, it makes user easy to
create documents. Furthermore, the sphinx convert restructured text or file into
HTML websites. By using this tools, user will find it easy to create an
intelligence documentation. Moreover, it is user-friendly.
43
for SQL to JSON-LD migration. MongoDB only uses its own storage. The client can
build more application from using the MongoDB tools. It also solves the RDB problem
since the database is not scalable for a larger database so MongoDB builds blocks to
solve larger database problem.
Although, sharding is performed automatically more than one shard and each
shard are carried the total data to be the shard. It also read and writes automatically.
The shard is supported by replicate set; the replica set holds the data that been shard.
By holding the shard replica is provide high availability and redundancy. The replica
is defined in MongoDB is it can support more than one server and all containing same
data. In this case can be only one server will be primary and other become secondary.
Sometimes, when the primary server is down or disaster occurs another server
becomes primary and rest of becoming secondary it managed by MongoDB.
Therefore, sharding help for scaling the system and replication is supported back hand
and it provides high availability, data security, and disaster recovery. Based on
44
(Mughees, 2014) stated that among the different tools, MongoDB provides good result
during, and when performing the query and filter the data. The reason why MongoDB
used Jason is because it is faster to use static filtering because of these advantage make
me choose MongoDB for migrating SQL to JSON-LD.
45
3.4 Justification for Selected tools for SQL to JSON-LD
46
different schema, for example, if user wants to add tag to web, you can simply write
the tag and save it as document; in the web the older tag is discarded and it shows new
tag to the web but the older tag is still available in the MongoDB database until a
specific period. The keyword "tag" is represented in Information system in a computer
file, digital image and more. This concept is not available in RDB. Additionally, the
document in the NoSQL can contain different field in the database.
Subsequently, the diagram on the left side shows the 4 fields but the diagram
on the right side document shows only 3 fields, there is a different field but it supported
by NOSQL and all fields are saved into a single document. Then, this schema-less
approach cannot be done in RDB table and rows because it not efficient in creating
such database. Another key difference in NOSQL is that it does not support inner joins.
In RDB it uses primary key, composite key and foreign key for the relationship
between one tables to another table. But in the NoSQL database there is no such
relationship.
Figure 3:7 Shows example of SQL join relations and NoSQL relation.
47
The diagram below shows how the MongoDB uses linking approach with
database document. However, MongoDB uses automatic generating id field which is
similar to RDB. It also uses another id which is called "_id". The "_id" is used to
prevent stale data. If someone makes changes in the document, it becomes stale data.
To overcome this problem, user must resave the document.
Figure 3:8 Shows the summarization of the SQL to NOSQL migrating process.
48
Finally, the RDB2LOD tools that allows to create of an ontology from a
database and mapping a database to an existing ontology. The tools will analyse the
data components (tables, column, primary/foreign keys), and transform to ontology
components such as individuals (instances), classes (concepts), attributes and relations.
In case they have exiting ontology data, we will use a manual mapping. In some cases,
we can consider a semi-automatic approach for one to one mapping. Therefore, we
need a more refined mapping such as one to many or many to many, this
transformation based on conditional mappings. Thus, in this case, we cannot consider
automatically or semi-automatically approach to solve the problem of our tools to offer
a graphical user interface that allows a user to manually use a mapping approach. In
our tools, we can specify two major tools or component in the mapping approach. The
first one is D2R Map, which associates with the ontology components with SQL
statements such as classes, object property and data graph property. The other may be
an additional refined mapping specification language that we tend to propose basing
on D2RQ and D2R Server. So, we have implanted elements in RDB data mapped to
owl: class and from our database, RDB2LOD tools are mapped automatically and
generated the relation such as “has”, “Part-of” and more; it based on the SQL tables
which is stored in database and standard ontology terms used to represent relationship
between classes. After completed the mapping process user can start the query using
JSON-LD and SPARQL.
49
3.5 Converting SQL to RDF using RDB2LOD and stored in JSON-LD
document
Therefore, after creating the mapping document it can be used for queries,
mapped for the single database to the local ontology is doesn’t require because the
RDB2LOD tools are already created its own local ontology in case the expert user
want to manually map between the databases to local ontology also possible in this
tools. Ontology creation from the database and mapping database to an existing
ontology also can produce the mapping document based on the user requirement.
50
and existing ontology. The result is produced the mapping JSON-LD document. The
RDB2LOD tools provide the graphical user interface to association with the SQL
statement. Therefore, the mapping process is done by selecting the database of SQL
command and associating with an ontology class and property. JDBC (Java Database
Connectivity) API is used in RDB2LOD tools to connect the database and association
with SQL statement. The API extract the metadata from the database, include table
and column. The database connects to the MySQL statement and it extracted using
JDBC API. The extracted data are encapsulated into internal database model. Then the
existing ontology is loaded and expressed as Semantic Web Framework Jena. The
database and ontology are loaded into the GUI. Using the GUI, mapping association
of ontology component and SQL statement. The database model is utilized as the
contribution to the ontology generation algorithms. At the point, the execution of this
algorithms generates an internal ontology model, which is changed into Jena ontology
model or acquire the OWL ontology. At last, the mapping model is made an
interpretation of utilizing JDOM API into XML format and transform into mapping
document. The data of metadata is managed by RDBMS.
The RDB2LOD tools are used API for reading, manipulating and writing the
OWL ontologies by Jena (Semantic Web Framework for Java).Jena is java based
language it provides RDF, OWL, and SPARQL environment for the tools. SQL parses
SQL statement and manipulate and The SQL statement. Java-based document object
for XML (JDOM) is another tools which provide XML document. The user can easily
read and manipulating of the database to ontology mapping process. RDB2LOD tools
are supported MySQL statement and Oracle databases.RDB2LOD.We benefit from
some specific sees gave by these RDBMSs to depicting the database metadata.
RDB2LOD can be effectively stretched out to bolster different RDBMSs that give such
perspectives.
51
3.6 Schematic diagram
52
Figure 3:12 Data Flow Diagram for SQL to RDF then JSON-LD
53
CHAPTER 4
Figure 4:2 Show interface of MongoDB and list of the same database in MySQL
54
Figure 4:3 Mongify is the intermediate for migrating SQL to MongoDB
55
Figure 4:5 Performing testing on Mongify connection
Figure 4:6 Diagram show the source code for translating SQL to MongoDB
56
Figure 4:7 Diagram show the translation process between SQL to MongoDB
Figure 4:8 Diagram show data after performing translation into MongoDB
which stored in document database
57
Figure 4:9 Perform query for specific document database
58
Figure 4:11Robomongo represent the document database into GUI
59
Figure 4:12 Standard RDB mapping approach
The user needs to key in the database information such as server name, port
number, database server, username and password for accessing the database with
authorized. Then the user needs the key in the name of the file that wants to map and
its store in the .ttl file. After key in all the information user need to click the generate
button to create the mapping file. If the entered information is correct, then a message
box will pop out as shown below.
60
4.2.2 Custom RDB Mapping
After the user click the custom mapping the shown empty in database and
ontology, then the used need to click the “Config Database” for configure the database.
Figure 4:15 Show the interface for manual mapping process between relational
database and Ontology
61
The configuration database is shown below. The user needs to select the
database engine such as MySQL Server and MS SQL Server. Then constantly key in
the database information such as server name, port number, database name, username
and password for accessing the database with authorized.
Therefore, after the user configures the database. The tables and columns are
automatically arranged into owning form. Then open the "Open OWL File" to perform
the manual mapping. Then the “Database Table” check box, the table is displayed,
obtained from the RDB structure, those who are associated with the respective classes
of the loaded ontology, these appear in the box “Ontology Classes” selection. The
record of each are associating between the RDB class ontology is done by clicking the
“>>” the button which immediately displays the association in the check box “Table
to Classes”. To remove an association, just select the “<<” superior. The association
will allow the user to use (or subject) in triple RDF generated is the key field in a table,
referred to as a class of the field provided by the user ontology.
Similarity, the check box ”Database Columns” are displayed the columns
which previously selected from the user selection those who are associated with the
respective properties of the selected class, displayed in the check box “Datatype
Properties”. Each is associated with RDB column and property of ontology by
selecting the “>>” button below, immediately displayed in the “Column to Properties”
62
check box. To remove an association, just select the “<<” below. This association
allows the attribute in each RDF triple generated is a column of a table, referred to the
Datatype Property of a class of the field provided by the user ontology. During the
process user also can select the RDF syntax format and custom URI and default URI.
After the user entered information is correct, then a message box will pop out as shown
“Mapping File Successfully Generated”.
Figure 4:17 Show the manual mapping process between relational database and
Ontology
After complete the “Standard RDB Mapping” or “Custom RDB Mapping” the
user click the output button to perform RDF to RDB mapping process. The RDB2LOD
application provides the options window "Dump RDF Dataset" through the process
the dump all the contents or mapped RDB. In the process, RDB database is mapped
into RDF dataset through the process dump and in the store the result in the form RDF
dataset. During the process user also can select the RDF syntax format and custom
URI and default URI. Finally, a user entered the correct information, the message box
will pop out as shown below.
63
Figure 4:18 Show dump to RDF datasets
Secondly, the diagram presents the options window "Query RDF to file",
which is goal make an SPARQL query mapped to the RDB database and save the
result in the form of RDF triples in a file. User needs to select the mapping file, and
create the query file name and choose a file format such as
(.txt, .xml, .Jason, .csv, .srb, .ttl).Then the user can choose a default URI or custom
URI based on the user requirement. The application also has an additional feature that
the user can also set up the "Timeout”. Finally, in the windows, the text box "SPARQL
Query" is intended to query formulation SPARQL to be performed. At the end, the
user entered correct information and the displays a dialog with the message "Query
File successfully generated."
64
Finally, the user needs to click the "Start-D2R Server”, which aim to trigger
D2R Server tool to boot a Web server, which allows viewing and "explore" the
contents of the mapped RDB to RDF, or make queries through one SPARQL console.
Then the user needs to select the “Select Mapping File ...” button, it allows the user to
select the mapping file based on the user requirement, and user needs to click the start
button to start the server and view content.
Furthermore, after complete the mapped process user need to open the browser
and type http://localhost:2020/ then it shows D2R Server. In the D2R Server show the
RDF and Owl file view.
65
In the SPARQL result, it shows the subject, predicate and object in the form of
RDF, the classes and properties show the OWL view through the browser. The user
can also type in manually SPARQL command to view the result. The content (Entity
or Attribute) is translated or mapped into RDF form.
Figure 4:22 Show the relation between classes, property, subject predicate and
object
66
4.2.3 Different between Standard RDF Mapping and Custom RDB Mapping
Figure 4:23 Show the result different between standard mapping and custom
mapping
Figure 4:24 Show the result different between standard and custom mapping in
term of SPARQL query
67
From the comparisons presented in both display and in consultation SPARQL,
we can say that the application of RDB2LOD approach provided a better expression
to generate RDF triples, presenting well-defined meanings (approximate to natural
language) for the subjects, predicates and objects of these, which were obtained from
the customization of the RDB mapping the incorporation of a domain ontology.
4.2.4 Import the Standard RDF Mapping and Custom RDB Mapping file into
Protégé 4.3
The first user needs to open the .rdf or .owl or .ttl file view the RDB in the form
of RDF. Then the user can view by clicking the “Active
Ontology”, ”Entities”, ”Classes”, ”Object Properties” , “Data Properties” , Annotation
Properties” , “ Individuals “, “OWL Viz” , “DL Query” , “OntoGraf” , “SPARQL
Query” , and “Ontology Differences”.
68
Figure 4:25 Show the SQL data into OntoGraf as a result
69
Figure 4:27 Show the relationship between each table in the OWLViz form
70
Figure 4:29 Show the Entities of SQL data
Finally, shown the prove that RDB database can mapped into RDF triple form
and view through Protégé 4.3.Then the user doesn’t require to key in the data manually
in Protégé 4.3.,through this tools user can import the RDB database into the application
which RDB2LOD and mapped the database into RDF triple form and save the result.
The saved RDF triple file import to Protégé 4.3 and view the result in the RDF and
OWL form.
71
CHAPTER 5
Based on the research paper is simplify the main objective of the research paper
is to implement existing mapping method and tools which as for being discussed in the
previous chapter. Below the diagram is compare the different between JSON
visualization and OntoGraf representation which represent the final result of the
research objectives which take place by using the tools and method that been used.
Figure 5:1 Show how the data is stored in the different migration process
Figure 5:2 OWL data is represented subject, predicate, and object in JSON
visualization
72
Figure 5:3 RDF represent the data in the form of OntoGraf
Figure 5:4 RDF represent the data in the form of subject, predicate and object
in OntoGraf
73
5.1 Discussion
Therefore, based on the research paper there are many methods and tools for
migrating SQL to JSON-LD but in this research have primarily chosen MongoDB as
my tools for producing the result. However in this research Relational Database to
Linked of Data (RDB2LOD) tool which functions as SQL migrating to Resource
Description Framework (RDF) used to store in JSON and JSON-LD data. The reason
to choose this tools for migrating the SQL to JSON-LD is to produce the different
result and determine the advantages and disadvantages. MongoDB is the tools that
easy to use and transforms SQL data into JSON but RDB2LOD is taking the longer
process for migrating the SQL data which are SQL to RDF and JSON. MongoDB is
even faster when there is a larger database and each the data is represented as the single
document. The SQL data that been migrate is stored in the key-value and column store
format; that’s the reason why MongoDB is faster in the query process. MongoDB is
one of the tools that supporting the sharding technique which also known as horizontal
sharding during the process is automatically and manually balanced the data over the
cluster. It also handles concurrent process during the sharding method and it supported
high performance. MongoDB also has unique ID for each document to prevent from
interference from the same values.
The migrating process for SQL to JSON and JSON-LD is coming through
new era so that used JSON visualization to represent the result in the tree and Graf
form. The diagram is shown different how the data is stored in different approaches.
There is two type of different result. For MongoDB result, it shows with a unique ID
for single document rather than a relationship with a content. This result will be
suitable for the new user who wants to migrate SQL to JSON-LD with semantic web
new era. The migration process of MongoDB will fully satisfy the new user
requirements. As for RDB2LOD result shows subject, predicate, and object which
represent in parent-child approach. If user used RDF in the Ontology representation
then it would be better to migrate JSON-LD migration process.
74
CHAPTER 6
CONCLUSION
Moreover, it also provides the steps for mapping method from SQL to
NOSQL. The migration process gives an advantage because it is easy for query
process. However, based on my research, it can be identified that NoSQL performs
better than SQL database. The MongoDB is suitable for distributed network and for
the replication process. It also uses sharding mechanism which supported multiple
users to access the system. MongoDB is easier than SQL in programming terms. The
reason why MongoDB’s perform better and faster because it is documented in JSON-
LD document. JSON-LD is highly scalable and independent. JSON is used for the web
transforming data in web service.
75
In conclusion, the NOSQL database system has more advantage compared to
SQL. In Addition, according to research there were another tool called Relational
Database to Linked of Data (RDB2LOD) that enable to migrate Structured Query
Language (SQL) to Resource Description Framework (RDF) and the data stored in
JSON and JSON-LD based on user requirement. In this research, the result has
compared between MongoDB and RDB2LOD through visualization and OntoGraf.
The output of the research represents two types of the result but both are accepted. The
result of this research is helpful for the organization which wants to migrate RDF to
JSON-LD and SQL to JSON-LD. In this both approach, by using MongoDB and
RDB2LOD tools is to transform the actual content of SQL into the data that user
required as the result JSON-LD.
76
REFERENCES
Anderson, J. C., Lehnardt, J., & Slater, N. (2010). CouchDB: the definitive guide: "
O'Reilly Media, Inc.".
Aswamenakul, C., Buranarach, M., & Saikaew, K. R. A review and design of
framework for storing and querying rdf data using nosql database. Paper
presented at the 4th Joint International Semantic Technology Conference.
Bonham-Carter, G. F. (2014). Geographic information systems for geoscientists:
modelling with GIS (Vol. 13): Elsevier.
Cattell, R. (2011). Scalable SQL and NoSQL data stores. Acm Sigmod Record, 39(4),
12-27.
davevalz. (2013). Rules of Engagement – NoSQL Column Data Stores
Dharaneeswaran, M. 1NF-to-5NF-Normalization-with-Eg Retrieved from
https://www.scribd.com/doc/49645421/1NF-to-5NF-Normalization-with-Eg
Dimovski, D. (2013). Database management as a cloud-based service for small and
medium organizations. Master Thesis, Masaryk University Brno.
Dzhakishev, D. (2014). NoSQL Databases in the Enterprise. An Experience with
Tomra s Receipt Validation System.
版社.
77
Manoj, V. (2014). Comparative study of nosql document, column store databases and
evaluation of cassandra. International Journal of Database Management
Systems, 6(4), 11.
McPhillips, J. (2012). Transitioning from Relational to NoSQL: A Case Study: Regis
University.
Mughees, M. (2014). DATA MIGRATION FROM STANDARD SQL TO NoSQL.
Presbrey, J. J. W. (2014). Linked data platform for web applications. Massachusetts
Institute of Technology.
Ramakrishnan, R., & Gehrke, J. (2000). Database management systems: McGraw-
Hill.
Södergren, P., & Englund, B. (2011). Investigating NoSQL from a SQL Perspective.
Yu, S. (2009). ACID Properties in Distributed Databases. Advanced eBusiness
Transactions for B2B-Collaborations.
78
APPENDICES
79
Appendix-B: CD Cover
80
Appendix-C: Source Code
81
table "areatrigger_involvedrelation" do
column "id", :key, :as => :integer
column "quest", :integer
end
table "areatrigger_tavern" do
column "id", :key, :as => :integer
column "name", :text
end
table "areatrigger_teleport" do
column "id", :key, :as => :integer
column "name", :text
column "required_level", :integer
column "required_item", :integer
column "required_item2", :integer
column "required_quest_done", :integer
column "target_map", :integer
column "target_position_x", :float
column "target_position_y", :float
column "target_position_z", :float
column "target_orientation", :float
end
table "battleground_events" do
column "map", :integer
column "event1", :integer
column "event2", :integer
column "description", :string
end
table "battleground_template" do
column "id", :key, :as => :integer
82
column "MinPlayersPerTeam", :integer
column "MaxPlayersPerTeam", :integer
column "MinLvl", :integer
column "MaxLvl", :integer
column "AllianceStartLoc", :integer
column "AllianceStartO", :float
column "HordeStartLoc", :integer
column "HordeStartO", :float
end
table "battlemaster_entry" do
column "entry", :integer
column "bg_template", :integer
end
table "command" do
column "name", :string
column "security", :integer
column "help", :text
end
table "conditions" do
column "condition_entry", :integer
column "type", :integer
column "value1", :integer
column "value2", :integer
end
table "creature" do
column "guid", :integer
column "id", :key, :as => :integer
column "map", :integer
column "modelid", :integer
83
column "equipment_id", :integer, :references => "equipment"
column "position_x", :float
column "position_y", :float
column "position_z", :float
column "orientation", :float
column "spawntimesecs", :integer
column "spawndist", :float
column "currentwaypoint", :integer
column "curhealth", :integer
column "curmana", :integer
column "DeathState", :integer
column "MovementType", :integer
end
table "creature_addon" do
column "guid", :integer
column "mount", :integer
column "bytes1", :integer
column "b2_0_sheath", :integer
column "b2_1_flags", :integer
column "emote", :integer
column "moveflags", :integer
column "auras", :text
end
table "creature_ai_scripts" do
column "id", :key, :as => :integer
column "creature_id", :integer, :references => "creatures"
column "event_type", :integer
column "event_inverse_phase_mask", :integer
column "event_chance", :integer
column "event_flags", :integer
column "event_param1", :integer
84
column "event_param2", :integer
column "event_param3", :integer
column "event_param4", :integer
column "action1_type", :integer
column "action1_param1", :integer
column "action1_param2", :integer
column "action1_param3", :integer
column "action2_type", :integer
column "action2_param1", :integer
column "action2_param2", :integer
column "action2_param3", :integer
column "action3_type", :integer
column "action3_param1", :integer
column "action3_param2", :integer
column "action3_param3", :integer
column "comment", :string
end
table "creature_ai_summons" do
column "id", :key, :as => :integer
column "position_x", :float
column "position_y", :float
column "position_z", :float
column "orientation", :float
column "spawntimesecs", :integer
column "comment", :string
end
table "creature_ai_texts" do
column "entry", :integer
column "content_default", :text
column "content_loc1", :text
column "content_loc2", :text
85
column "content_loc3", :text
column "content_loc4", :text
column "content_loc5", :text
column "content_loc6", :text
column "content_loc7", :text
column "content_loc8", :text
column "sound", :integer
column "type", :integer
column "language", :integer
column "emote", :integer
column "comment", :text
end
table "creature_battleground" do
column "guid", :integer
column "event1", :integer
column "event2", :integer
end
table "creature_equip_template" do
column "entry", :integer
column "equipentry1", :integer
column "equipentry2", :integer
column "equipentry3", :integer
end
86
mongify check database.config
mongify translation database.config
mongify translation database.config > fulldb.rb
mongify process database.config fulldb.rb
sql_connection do
adapter "mysql2"
host "localhost"
username "root"
password "tharan"
database "fulldb" # This is defaulted to 10000 but in case you want to make that
smaller (on lower RAM machines)
end
mongodb_connection do
host "localhost"
database "fulldb"
end
87
Faculty of Information Science and Technology (FIST)
Final Year Project Meeting Log
2. WORK TO BE DONE
3. PROBLEMS ENCOUNTERED
4. COMMENTS
*: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the first trimester (week 11: report submission, weeks 13 &
14: presentation)
**: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the second trimester (week 11: report submission,
weeks 13 & 14: presentation)
1
Faculty of Information Science and Technology (FIST)
Final Year Project Meeting Log
2. WORK TO BE DONE
3. PROBLEMS ENCOUNTERED
4. COMMENTS
*: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the first trimester (week 11: report submission, weeks 13 &
14: presentation)
**: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the second trimester (week 11: report submission,
weeks 13 & 14: presentation)
2
Faculty of Information Science and Technology (FIST)
Final Year Project Meeting Log
2. WORK TO BE DONE
3. PROBLEMS ENCOUNTERED
4. COMMENTS
*: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the first trimester (week 11: report submission, weeks 13 &
14: presentation)
**: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the second trimester (week 11: report submission,
weeks 13 & 14: presentation)
3
Faculty of Information Science and Technology (FIST)
Final Year Project Meeting Log
2. WORK TO BE DONE
3. PROBLEMS ENCOUNTERED
4. COMMENTS
*: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the first trimester (week 11: report submission, weeks 13 &
14: presentation)
**: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the second trimester (week 11: report submission,
weeks 13 & 14: presentation)
4
Faculty of Information Science and Technology (FIST)
Final Year Project Meeting Log
2. WORK TO BE DONE
3. PROBLEMS ENCOUNTERED
4. COMMENTS
*: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the first trimester (week 11: report submission, weeks 13 &
14: presentation)
**: week 1, 3, 5, 7, 9, 11 or 2, 4, 6, 8, 10 of the second trimester (week 11: report submission,
weeks 13 & 14: presentation)
5
Appendix B: Checklist for FYP Interim Submission
________________________
Student’s Signature & Date
6
7