Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Semantic Annotation and Revision Control

Proceedings of the Fourth International Conference on Web Information Systems and Technologies, 2008
...Read more
SEMANTIC ANNOTATION AND REVISION CONTROL Kiavash Bahreini and Atilla Elci Department of Computer Engineering, and Internet Technologies Research Centre Eastern Mediterranean University,T.R.N.C., Famagusta, via Mersin 10, Turkey Keywords: Annotation, Metadata, OWL, Revision Control. Abstract: Software engineers and programmers will probably find themselves needing to manage multiple versions of their software. This entails, among others, managing source codes, inserting metadata tags for annotation, tracing source changes from current to previous versions, checking respective change logs, retrieving different versions of the source code, etc. The issues are more pronounced for software teams and especially those working in distributed development environments. Similar issues are observed in the case of dealing with OWL files and other enterprise systems documentation resources. It is noted that, although currently not being practiced, ontology-based annotation techniques in revision control can be influential in surmounting many of the problems associated with such issues. These issues and related new approaches on revision control are considered in this paper. We introduce a novel revision control approach based on semantic ontology annotation in distributed environments. 1 INTRODUCTION This paper concerns source code control, making separate ontologies in distributed environments using OWL (OWL 2004), applying annotation techniques into semantic ontology making, and security model for data and documents sharing based on multilevel security in enterprise systems. First issue is source code control (SCC) which is only one step in software configuration management (CCC 1984). SCC is especially very important for any software development team working in distributed environment (Sink 2004). Distributed revision control is the other hot topic which software development team deal with the problem of controlling shared documents in enterprise systems. Distributed revision control takes a peer-to-peer approach, as opposed to the client- server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the code base is a bona-fide repository (Wheeler 2004). An open system of distributed revision control is characterized by its support for independent branches, and its heavy reliance on merge operations. Next important issue, managing separate ontologies in distributed environments using OWL, involves the related information resources placed in distributed systems on the Web. So, for providing seamless services at any distance, efficient management of the distributed information resources is required (Kim et al 2007). It is useful to keep some general questions in mind while reading this paper. How can we manage shared documents or resources among individual or distributed ontologies? How can we process text based documents? How can we investigate essential lines which have been changed in text? How can we check the version of source code via ontologies? How can we annotate ontologies using version control? How can we merge contents of ontologies? How can we run queries over ontologies to retrieve some new changes inside source codes? Would it be better to put all of the content of changed-source code inside ontologies or is it better to have just the lines which are changed as compared with old version. In the Semantic Web, the dispensability will be increased. Among languages for describing metadata and ontology, only OWL can support the dispensability of Web; (Kim et al. 2007) so in this research, we considered using OWL as a semantic language in our distributed ontologies to annotate documents. Next issue which has been considered for years is talking about metadata based ontology and 294 Bahreini K. and Elci A. (2008). SEMANTIC ANNOTATION AND REVISION CONTROL. In Proceedings of the Fourth International Conference on Web Information Systems and Technologies, pages 294-297 DOI: 10.5220/0001522102940297 Copyright c SciTePress
one founded answer for this problem was annotation. Annotation is defined as the extra information asserted at a particular point in a document or other piece of information (Annotation 2007). Offering a way to enable semantic annotations that could be easily organized and found is one of the advantages of semantic web (Pereira et al 2006). For convenience, annotations are represented as an extension of the ontology, but they could be implemented by any other means, as they are not a proper part of the domain knowledge (Castells et al 2007). Annotation can be used to overcoming two of the major challenges of the semantic Web initiative which are availability of semantic Web content, and the challenge of ontology based information retrieval (Abrahams 2005). In our approach, annotation is used as a high level tier to work with OWL files on top of the source code files to handle metadata over text (source code), to hold versioning of codes, and to retrieve change history of data. In this paper, we consider features of revision control for supporting in distributed environment on the semantic web, repository, text processing. Then, we classify features of OWL and ontology for supporting distributed environment on the Web and consider of metadata in distributed systems. Then, we consider, semantic annotation, and finally conclusion and future work. 2 REVISION CONTROL Several different forms of version controlling have been pursued to solve problems faster by doing slice controlling in the source codes. The main goal of those forms can be described by just relating changes to the metadata files. The first major division of kinds of version controlling is based on revision control over any changed items versus revision control over the whole content of the documents. There is probably a better understanding of the changeable analysis source code of transformation of data inside the source codes than there is of full of control transformations, which is an old technique in the source code control issue. 2.1 Repository Repository and working folder are two terms in revision control, in which, repository is a place to store the source code and a working folder is a place which each individual developer works on it. All version control systems have to solve the same fundamental problem: how will the system allow users to share information, but prevent them from accidentally stepping on each other's feet? It's all too easy for users to accidentally overwrite each other's changes in the repository (Ben Collins et al 2007). 2.2 Distributed Revision Control Distributed source controlling approaches to discovery would require remote methods are regularly synchronized. Because of distributed revision control tasks involve comparing several sources, analyzing and understanding relations between their elements can be cognitively difficult. In this case, we strongly suggest taking advantage of using semantic metadata files. In distributed revision control model on the Internet, there exist many aspect like: repository replication, repository backup, access controls, supporting multiple repository access methods, customizing subversion experience, branching and merging, entries file, metadata files etc. 2.3 Text Processing Text processing is an important task in revision control. Our aim in this investigation did not involve making of metadata using text processing algorithm to retrieve source code changes. Thus, it can be concluded that known text search processing would suit this goal. This subject has been discussed in great depth in others’ research. 3 ONTOLOGY AND METADATA Metadata and ontology will play important roles for advanced information retrieval systems. Metadata is data about data describing the content, quality, condition, and other characteristics of data. And metadata can represent semantic relations between information resources. Ontology defines vocabulary and represents relations between terms for a specific domain (Kim et al 2007). OWL has rich expressive power and is considered as the next standard language for describing metadata and ontology. In the Semantic Web, the dispensability will be increased. Among languages for describing metadata and ontology, only OWL can support the dispensability of Web (Kim et al 2007). All ontologies in our model are distributed in the network. So, for providing seamless services without concern for distance, integration of distributed ontologies is required. If we want to get SEMANTIC ANNOTATION AND REVISION CONTROL 295
SEMANTIC ANNOTATION AND REVISION CONTROL Kiavash Bahreini and Atilla Elci Department of Computer Engineering, and Internet Technologies Research Centre Eastern Mediterranean University,T.R.N.C., Famagusta, via Mersin 10, Turkey Keywords: Annotation, Metadata, OWL, Revision Control. Abstract: Software engineers and programmers will probably find themselves needing to manage multiple versions of their software. This entails, among others, managing source codes, inserting metadata tags for annotation, tracing source changes from current to previous versions, checking respective change logs, retrieving different versions of the source code, etc. The issues are more pronounced for software teams and especially those working in distributed development environments. Similar issues are observed in the case of dealing with OWL files and other enterprise systems documentation resources. It is noted that, although currently not being practiced, ontology-based annotation techniques in revision control can be influential in surmounting many of the problems associated with such issues. These issues and related new approaches on revision control are considered in this paper. We introduce a novel revision control approach based on semantic ontology annotation in distributed environments. 1 INTRODUCTION This paper concerns source code control, making separate ontologies in distributed environments using OWL (OWL 2004), applying annotation techniques into semantic ontology making, and security model for data and documents sharing based on multilevel security in enterprise systems. First issue is source code control (SCC) which is only one step in software configuration management (CCC 1984). SCC is especially very important for any software development team working in distributed environment (Sink 2004). Distributed revision control is the other hot topic which software development team deal with the problem of controlling shared documents in enterprise systems. Distributed revision control takes a peer-to-peer approach, as opposed to the clientserver approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the code base is a bona-fide repository (Wheeler 2004). An open system of distributed revision control is characterized by its support for independent branches, and its heavy reliance on merge operations. Next important issue, managing separate ontologies in distributed environments using OWL, involves the related information resources placed in distributed systems on the Web. So, for providing seamless services at any distance, efficient management of the distributed information resources is required (Kim et al 2007). It is useful to keep some general questions in mind while reading this paper. How can we manage shared documents or resources among individual or distributed ontologies? How can we process text based documents? How can we investigate essential lines which have been changed in text? How can we check the version of source code via ontologies? How can we annotate ontologies using version control? How can we merge contents of ontologies? How can we run queries over ontologies to retrieve some new changes inside source codes? Would it be better to put all of the content of changed-source code inside ontologies or is it better to have just the lines which are changed as compared with old version. In the Semantic Web, the dispensability will be increased. Among languages for describing metadata and ontology, only OWL can support the dispensability of Web; (Kim et al. 2007) so in this research, we considered using OWL as a semantic language in our distributed ontologies to annotate documents. Next issue which has been considered for years is talking about metadata based ontology and 294 Bahreini K. and Elci A. (2008). SEMANTIC ANNOTATION AND REVISION CONTROL. In Proceedings of the Fourth International Conference on Web Information Systems and Technologies, pages 294-297 DOI: 10.5220/0001522102940297 Copyright c SciTePress SEMANTIC ANNOTATION AND REVISION CONTROL one founded answer for this problem was annotation. Annotation is defined as the extra information asserted at a particular point in a document or other piece of information (Annotation 2007). Offering a way to enable semantic annotations that could be easily organized and found is one of the advantages of semantic web (Pereira et al 2006). For convenience, annotations are represented as an extension of the ontology, but they could be implemented by any other means, as they are not a proper part of the domain knowledge (Castells et al 2007). Annotation can be used to overcoming two of the major challenges of the semantic Web initiative which are availability of semantic Web content, and the challenge of ontology based information retrieval (Abrahams 2005). In our approach, annotation is used as a high level tier to work with OWL files on top of the source code files to handle metadata over text (source code), to hold versioning of codes, and to retrieve change history of data. In this paper, we consider features of revision control for supporting in distributed environment on the semantic web, repository, text processing. Then, we classify features of OWL and ontology for supporting distributed environment on the Web and consider of metadata in distributed systems. Then, we consider, semantic annotation, and finally conclusion and future work. 2 REVISION CONTROL Several different forms of version controlling have been pursued to solve problems faster by doing slice controlling in the source codes. The main goal of those forms can be described by just relating changes to the metadata files. The first major division of kinds of version controlling is based on revision control over any changed items versus revision control over the whole content of the documents. There is probably a better understanding of the changeable analysis source code of transformation of data inside the source codes than there is of full of control transformations, which is an old technique in the source code control issue. 2.1 Repository Repository and working folder are two terms in revision control, in which, repository is a place to store the source code and a working folder is a place which each individual developer works on it. All version control systems have to solve the same fundamental problem: how will the system allow users to share information, but prevent them from accidentally stepping on each other's feet? It's all too easy for users to accidentally overwrite each other's changes in the repository (Ben Collins et al 2007). 2.2 Distributed Revision Control Distributed source controlling approaches to discovery would require remote methods are regularly synchronized. Because of distributed revision control tasks involve comparing several sources, analyzing and understanding relations between their elements can be cognitively difficult. In this case, we strongly suggest taking advantage of using semantic metadata files. In distributed revision control model on the Internet, there exist many aspect like: repository replication, repository backup, access controls, supporting multiple repository access methods, customizing subversion experience, branching and merging, entries file, metadata files etc. 2.3 Text Processing Text processing is an important task in revision control. Our aim in this investigation did not involve making of metadata using text processing algorithm to retrieve source code changes. Thus, it can be concluded that known text search processing would suit this goal. This subject has been discussed in great depth in others’ research. 3 ONTOLOGY AND METADATA Metadata and ontology will play important roles for advanced information retrieval systems. Metadata is data about data describing the content, quality, condition, and other characteristics of data. And metadata can represent semantic relations between information resources. Ontology defines vocabulary and represents relations between terms for a specific domain (Kim et al 2007). OWL has rich expressive power and is considered as the next standard language for describing metadata and ontology. In the Semantic Web, the dispensability will be increased. Among languages for describing metadata and ontology, only OWL can support the dispensability of Web (Kim et al 2007). All ontologies in our model are distributed in the network. So, for providing seamless services without concern for distance, integration of distributed ontologies is required. If we want to get 295 WEBIST 2008 - International Conference on Web Information Systems and Technologies programmers codes, we should use integrating techniques over ontologies on the network. Special relations between classes or properties in the ontologies should be expressed. Some features of OWL are used to support the integration of distributed ontologies are: disjointWith, equivalentClass, and equivalentProperty. Many writers (e.g. YounHee Kim, YongWook Kim) modified the storage part of Sesame to support distributed ontologies written in OWL. Their structure consists of four tables on RDB. For further info the reader is directed to Kim et al, 2007 paper. 4 DATA INTEGRATION One of the greatest challenges facing the distributed revision control today is the integration of data and resources. Two approaches for integrating applications are commonly used today: point-topoint and services bus integration (Microsoft Journal 2006a). We simply can expand the above approaches to realize our goal of data integration for revision control over enterprise systems. In the first approach, one direct link is created to establish a connection between two nodes while message passing is used in the second approach. When a new node add to enterprise architecture, managing a complete integration between nodes exponentially increase in the cost of each node in turn (Microsoft Journal 2006a). 5 SEMANTIC ANNOTATION Semantic annotation is a specific metadata generation and usage schema aiming to enable new information access methods and to enhance existing ones (SWT 2006). Annotation is extra information asserted with a particular point in a document or other piece of information (Annotation 2007). In our approach, either we use annotation over OWL files or source code we can apply some metadata over codes already tagged or make new tags for any other changes in source files. Some highlighting text and posting sticky notes on our metadata documents or web pages and share these with other people would be very useful. We simply can insert many tags inside our metadata files between these two tags <owl:AnnotationProperty rdf:ID=""> and </owl:AnnotationProperty> which were declared to use in OWL files to annotate data. Some criteria should apply in OWL files to use 296 annotation in our model in source codes. First of all, author(s) information e.g. name and surname, then many fields like date and time of implementation of document, line number, changed line numbers, size of the file before and after changes, owner of file, number of history records exist in OWL files related to this file etc. The sample code lines below are prepared to show in which way we are able to apply annotation tags in our model. … <owl:AnnotationProperty rdf:ID="Smith"> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl# DatatypeProperty"/> <rdfs:domain rdf:resource="#LowLevelProgrammer"/> <rdfs:range> <owl:DataRange> <owl:oneOf rdf:parseType="Resource"> <rdf:first rdf:datatype="http://www.w3.org/2001/XMLSche ma#string">Smith</rdf:first> <rdf:rest rdf:resource="http://www.w3.org/1999/02/22rdf-syntax-ns#nil"/> </owl:oneOf> </owl:DataRange> </rdfs:range> </owl:AnnotationProperty> … In a nutshell, semantic annotation is about assigning to entities and relations in the text links to their semantic descriptions in ontology. Most importantly, automatic semantic annotation enables many new applications: highlighting, semantic search, categorisation, generation of more advanced metadata, smooth traversal between unstructured text and formal knowledge. Semantic annotation is applicable to any kind of content-web pages, regular (nonweb) documents, text fields in databases, video, etc (SWT 2006). Since using OWL files in semantic web environment are going to be standard and approximately all applications like semantic web programs, semantic web sites etc are going to use them, so we prefer introducing OWL files as our standard and use them as metadata and annotation. 6 CONCLUSIONS Investigating the security impact of semanticallyenhanced OWL-based programs opens a new research area in OWL database security. Ontologies are used extensively in XML-based applications to improve data exchange in decentralized environments. We discussed that these new technologies might lead to undesirable data disclosure. Ontologies and OWL-based tools can SEMANTIC ANNOTATION AND REVISION CONTROL facilitate inference attacks on large, publicly available XML documents. We proposed some theoretical and practical ways on source code control based on semantic web, annotation techniques, and use of OWL as metadata file via secure connection in enterprise architecture. This approach uses an ontology-aided inference process to identify ontology equivalent information with inconsistent security classification. We represented some solutions to build a system which is able to monitor users’ source code, capable of text processing, annotate metadata in OWL files in distributed environments. As it is mentioned above, our approach helps software architects, project managers, programmers, developers, etc to obtain a secure distributed control over source codes and annotate the whole content of documents in the best manner. 7 FUTURE WORK Designing and implementing a source code control which can be called knowledge-based SCC (KBSCC) based on semantic distributed ontologies annotation using enhanced multilevel security in enterprise systems is the first goal in the future. Semantic web is going to define the next generation of web, so grafting some ideas from current applications such as web-based programs to the semantic web is useful for future. Making a secure system, and using multiagents or service-oriented architecture are three important issues on the distributed networks; hence, using multiagents approach or SOA for implementing semantic webbased programs via secure connections is our next target to establish a high level and stable system. REFERENCES CCC, (1984), Change and Configuration Control. IEEE Software, July 1984, Volume: 1, Issue:3, On page(s): 112a-112a, Digital Object Identifier: 10.1109/MS.1984.234733. Sink, E. (2004), What is source control, http://www.ericsink.com/scm/scm_intro.html .August 26, 2004. Wheeler, David A. (2004). Comments on Open Source Software / Free Software (OSS/FS) Software Configuration Management (SCM) Systems. , April 10, 2004; lightly revised May 18, 2005, Retrieved on 2007-05-08, http://www.dwheeler.com/essays/scm.html. OWL. (2004), Web Ontology Language; W3C Recommendation 10 February 2004. http://www.w3.org/TR/owl-features/. Accessed October 11, 2007. Kim, YounHee, YongWook Kim, ByungGon Kim, and HaeChull Lim. (2007), IEEE 2007, Management System for OWL Documents in Distributed Environment on the Semantic Web. Advanced Communication Technology, The 9th International Conference. 12-14 Feb. 2007 Volume: 2, On page(s): 1216-1220 Location: Gangwon-Do, ISSN: 1738-9445 ISBN: 978-89-5519-131-8 INSPEC Accession Number: 9551150 Digital Object Identifier: 10.1109/ICACT.2007.358577 Posted online: 2007-0507 11:28:34.0. Annotation. (2007), 9 October 2007, Wikipedia. http://en.wikipedia.org/wiki/Annotation Pereira Rui G. and Mário M. Freire. (2006), IEEE 2006, SWedt: A Semantic Web Editor Integrating Ontologies and Semantic Annotations with Resource Description Framework. Telecommunications. (2006). AICT-ICIW '06. International Conference on Internet and Web Applications and Services/Advanced International Conference on Publication Date: 19-25 Feb. 2006, On page(s): 200- 200, ISBN: 0-7695-25229, Digital Object Identifier: 10.1109/AICTICIW.2006.184. Castells Pablo, Miriam Fernandez, and David Vallet. (2007), IEEE 2007, An Adaptation of the VectorSpace Model for Ontology-Based Information Retrieval. Knowledge and Data Engineering, Publication Date: Feb. 2007, Volume: 19, Issue: 2, On page(s): 261-272, Location: Los Angeles, CA, USA, ISSN: 1041-4347, INSPEC Accession Number: 9317010, Digital Object Identifier: 10.1109/TKDE.2007.22. Abrahams, B. and Wei Dai. (2005), IEEE 2005, Architecture for automated annotation and ontology based querying of semantic Web resources. Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on Publication Date: 19-22 Sept. 2005 On page(s): 413417, ISBN: 0-7695-2415-X, INSPEC Accession Number: 8747729, Digital Object Identifier: 10.1109/WI.2005.34. Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato. (2007). Version Control with Subversion: For Subversion 1.5: (Compiled from r2880). (pp. 2-7) http://svnbook.red-bean.com/. SWT. (2006), Semantic Web Technologies: Trends and Research in Ontology-based Systems, John Davies(Editor), Rudi Studer (Co-Editor), Paul Warren (Co-Editor), ISBN: 978-0-470-02596-3, Hardcover, 326 pages April 2006. Microsoft Journal. (2006a), The Architecture Journal issue 7, Generation Workflow. Workflow in application integration, http://msdn2.microsoft.com/enus/library/bb245667.aspx, pp. 19-23, Kevin Francis, 2006. 297