Harmonization methodology for metadata models

Ambjørn Naeve

Harmonization methodology for metadata models Mikael Nilsson, Ambjörn Naeve, Erik Duval, Pete Johnston, David Massart To cite this version: Mikael Nilsson, Ambjörn Naeve, Erik Duval, Pete Johnston, David Massart. Harmonization methodology for metadata models. Research report of the ProLearn Network of Excellence (IST 507310), Deliverable 4.7. 2008. <hal-00591548> HAL Id: hal-00591548 https://hal.archives-ouvertes.fr/hal-00591548 Submitted on 10 May 2011 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Network of Excellence in Professional Learning PROLEARN European Commission Sixth Framework Project (IST-507310) Deliverable D4.7 Harmonization of Metadata Standards Editor Mikael Nilsson Work Package Learning Objects, Metadata and Standards Status Final Date 2008-01-21 The PROLEARN Consortium 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. Universität Hannover, Learning Lab Lower Saxony (L3S), Germany Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI), Germany Open University (OU), UK Katholieke Universiteit Leuven (K.U.Leuven) / ARIADNE Foundation, Belgium Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. (FHG), Germany Wirtschaftsuniversität Wien (WUW), Austria Universität für Bodenkultur, Zentrum für Soziale Innovation (CSI), Austria École Polytechnique Fédérale de Lausanne (EPFL), Switzerland Eigenössische Technische Hochschule Zürich (ETHZ), Switzerland Politecnico di Milano (POLIMI), Italy Jožef Stefan Institute (JSI), Slovenia Universidad Polictécnica de Madrid (UPM), Spain Kungl. Tekniska Högskolan (KTH), Sweden National Centre for Scientific Research “Demokritos” (NCSR), Greece Institut National des Télécommunications (INT), France Hautes Etudes Commerciales (HEC), France Technische Universiteit Eindhoven (TU/e), Netherlands Rheinisch-Westfälische Technische Hochschule Aachen (RWTH), Germany Helsinki University of Technology (HUT), Finland imc information multimedia communication AG (IMC), Germany Open Universiteit Nederland (OU NL), Netherlands Page 1 of 20 Document Control Title: Harmonization of metadata standards Author/Editor: Mikael Nilsson E-mail: mikael@nilsson.name Amendment History Version Date Author/Editor Description/Comments 0.1 2007-06-10 Mikael Nilsson Initial draft 0.2 2007-06-25 Mikael Nilsson First round of feedback integrated 0.3 2007-07-28 Mikael Nilsson Draft deliverable finalized 0.9 2008-12-13 Mikael Nilsson Final draft to reviewers 1.0 2008-01-21 Mikael Nilsson Final version Legal Notices The information in this document is subject to change without notice. The Members of the PROLEARN Consortium make no warranty of any kind with regard to this document, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The Members of the PROLEARN Consortium shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material. Page 2 of 20 Harmonization of Metadata Standards ProLEARN Deliverable D4.7 Final version, January 2008 Editor: Mikael Nilsson <mikael@nilsson.name> Contributors: Ambjörn Naeve Erik Duval Pete Johnston David Massart Contents 1 INTRODUCTION .....................................................................................................................4 2 THE NOTION OF METADATA ................................................................................................5 3 METADATA STANDARDS ......................................................................................................6 3.1 Domain-specific complete element sets and schemas ...............................................6 4 HARMONIZATION ..................................................................................................................6 4.1 Abstract Model standards ..............................................................................................7 4.2 Vocabulary standards .....................................................................................................8 4.2.1 Element vocabularies .................................................................................................9 4.2.2 Value vocabularies ...................................................................................................10 4.3 Syntax standards ..........................................................................................................11 4.4 Application profiles ......................................................................................................12 5 CHALLENGES FOR HARMONIZATION ...............................................................................13 5.1 Application Profiles in DC and LOM ............................................................................13 5.2 Identifying and reusing elements ................................................................................14 5.3 Requirements for application profiles ........................................................................15 5.4 Summary of obstacles ..................................................................................................15 Page 3 of 20 6 ADDRESSING THE HARMONIZATION ISSUES ..................................................................16 7 CONCLUSIONS ....................................................................................................................18 8 REFERENCES ......................................................................................................................19 1 Introduction Metadata allows systems, applications and users to manage and access resources without a need for interaction with the resource itself. For this reason, the administration and exchange of metadata is a central activity in systems that manage learning objects. Metadata considerations are fundamental when creating interoperable e-learning tools, and metadata standards have been among the very first learning technology standards to mature. However, despite enormous progress in the harmonization of learning object metadata standards, culminating in the release of the IEEE Learning Object Metadata standard in 2002, there remains a core of unsolved issues with respect to metadata interoperability and metadata harmonization. Today there is a plethora of metadata specifications (such as IEEE LOM, Dublin Core, METS, MODS, MPEG-7, etc), many of which are useful in whole or part for activities related to teaching and learning. While each specification in itself is designed to increase system interoperability, we are increasingly seeing systems that need to work with more than one of these specifications. Adding support for an additional specification generally presents a significant amount of added complexity in implementation. The reason for this is a lack of harmonization between specifications. In an ideal world, adding support for an additional metadata specification would be a simple matter of slightly extending the existing system. Existing solutions to the metadata harmonization issue are few - systems are either limited to a single specification, or implement ad-hoc solutions that only work in that particular environment. There are many examples of "mappings" between specifications that provide partial solutions to the problem, but generally fail due to lowfidelity translations and lack of generality (i.e. the mapping only works for limited parts of specifications). Another solution is to create a top-level data model that encompasses the common aspects of all the specifications. This has proven to be feasible in relatively well-constrained domains such as resource aggregation, where the work on the RAMLET top-level ontology for resource aggregation has proceeded well within the IEEE. In the field of general metadata, where there is no such common ground, sucn an approach is substantially less likely to be successful. This deliverable analyses a number of existing metadata specifications in order to isolate the reasons and issues behind harmonization problems. By making these issues explicit, we hope to contribute towards producing a benchmark against which possible solutions can be measured. The deliverable also discusses the potential of a solution based on harmonization of the various abstract models used in the metadata specifications. The deliverable begins with a short introduction to metadata in Section 3. Section 4 discusses a set of metadata specifications that are highly relevant to learning and Page 4 of 20 teaching. Section 5 forms the core of the deliverable and analyzes the harmonization issues among a chosen set of specifications. Section 6 generalizes the analysis in Section 5 and makes a deeper analysis of the relationship between IEEE LOM and Dublin Core. Section 7, finally, points to possible ways to address the identified harmonization issues. 2 The notion of metadata In practice, most modern metadata standards adopt a definition of metadata that allows descriptions of digital and non-digital things alike, usually collectively termed resources, but limits the type of metadata descriptions to a very restricted kind of metadata, as defined by the metadata standard. In this deliverable, we will use the term “metadata” in the sense of the following modern definition: Machine-processable information about (digital and non-digital) resources This definition encompasses not only human-assigned information about a resource (such as name/title, subject and creator), but may also be used for information relating to e.g.: • the life-cycle of a resource (different versions, history, etc.) • technical aspects of a resource (size, format, functionality, etc.) • relations between resources and aggregations of resources (lessons comprised of learning objects etc.) It encompasses information not only about digital resources, but also about e.g.: • learners and teachers (history, competencies, etc.) • events (location, participants etc.) • abstract notions (pedagogical designs, terms in taxonomies etc.) In recent years, the notion of metadata has started to expand, taking new forms and being managed in new ways. For example, the following aspects go beyond the traditional notion of metadata as we have encoded in the metadata specifications of today: • • • • Automated generation of metadata, where metadata is generated as part of the creative process, or part of usage or context of resources, or inferred from the contents of a resource. Attention metadata, where information about users’ actions and progress in systems is captured automatically and processed for different purposes. Such metadata is not used for traditional descriptive purposes, but rather for expressing contextual relationships and supporting adaptive system behaviour. Truly subjective metadata, such as metadata about emotions, mood, opinions, arguments, ratings etc., as sometimes seen in social networks and instant messaging contexts. Collaborative tagging as a way of capturing user-generated classifications. Many of these developments have not yet been formalized in metadata specifications, and as such will not form part of the analysis in this deliverable. However, any attempt Page 5 of 20 at harmonization of metadata standards must address these aspects of metadata as well in order to be prepared for future developments within this field. 3 Metadata standards The terms "metadata standard" or "metadata schema" are often used to refer to the various kinds of specifications for metadata available from different organizations. Note that in the notion of "standard" we include both international de jure standards, as well as metadata specifications from established specification organizations. Metadata standards come in different forms and with different kinds of audiences, and for the purpose of this deliverable it is useful to look at the following broad categories of standards. Generic, framework-level models • • The DCMI Abstract Model (DCAM), defining the underlying model for Dublin Core metadata terms (Powell, Nilsson, Naeve, and Johnston 2004). RDF, the Resource Description Framework, a general-purpose, web-oriented metadata framework, defined by the W3C. Generic, framework-level syntaxes • • Expressions of Dublin Core in RDF/XML/XHTML, describing syntaxes for encoding DCAM-compatible metadata in various syntaxes. RDF/XML and other RDF syntaxes. General-purpose element sets to be reused in many different contexts • DCMI Metadata Terms, defining a set of metadata terms conforming to the DCMI Abstract Model. 3.1 Domain-specific complete element sets and schemas • • • IEEE LOM Data Model, defining the basic metadata elements and how they combine into a LOM instance. IEEE LOM currently has an XML syntax only. MODS, Metadata Object Description Schema - an XML schema for encoding MARC21 library records defined by the Library of Congress. MPEG-7 MDS, defining a complex XML format for multimedia metadata. It should be obvious from this list that comparing "metadata standards" is not an easy task. In this deliverable, we will tackle the problem by analysing five "groups" of specifications: 1. 2. 3. 4. 5. The IEEE LOM family of specifications The DCMI family of specifications The RDF family of specifications The MODS schema The MPEG-7 specification Page 6 of 20 4 Harmonization Learning object metadata interoperability refers to the ability of different systems to exchange information about resources. Metadata created in one system and then transferred to a second system will be processed by that second system in ways which are consistent with the intentions of the metadata creators (human or software). Duval, Hodgins, Sutton, and Weibel (2002) set forth four fundamental principles for such interoperability, repeated in the Dublin Core – IEEE LTSC Memorandum of Understanding (“Memorandum”, 2000). These are: • • • • Extensibility, or the ability to create structural additions to a metadata standard for application-specific or community-specific needs. Given the diversity of resources and information, extensibility is a critical feature of metadata standards and formats. Modularity, or the ability to combine metadata fragments adhering to different standards. Modularity is stronger than simple extensibility in that it requires that metadata from different standards, including metadata extensions from different sources, should be usable in combination without causing ambiguities or incompatibilities. Refinements, or the ability to create semantic extensions, i.e., more fine-grained descriptions that are compatible with more coarse-grained metadata, and to translate a fine-grained description into a more coarse-grained description. Multilingualism, or the ability to express, process and display metadata in a number of different linguistic and cultural circumstances. One important aspect of this is the ability to distinguish between what needs to be human-readable and what needs to be machine-processable. Metadata harmonization as used in this deliverable refers to a further step beyond this level of system interoperability, and instead refers to interoperability between metadata standards. Harmonization then refers to the ability to use several different metadata standards in combination in a single software system. The rest of the deliverable will analyze the different groups of standards and try to find obstacles to harmonizations. In that analysis, the four interoperability principles above form a useful basis for evaluating the achieved progress in metadata harmonization. In (Nilsson et al., 2007), a fifth principle is suggested, namely • Machine-processability, or the ability to automate processing of different aspects of the metadata specifications, so that machines can handle extensions, manage modules, understand refinements and provide support for multilingualism. This principle suggests that given the right support, harmonization may be realized in an automated fashion, with no need for translations, mappings or other manual interventions. 4.1 Abstract Model standards Underlying most metadata specifications there is an assumption about an abstract model (sometimes referred to as "metamodel" or "data model"), within the framework of which the metadata is defined. The abstract model specifies the concepts used in the Page 7 of 20 standard, the nature of terms and how they combine to form a metadata description. The abstract model is the schematics used by an application to understand a metadata expression given in a specific format, thus making it possible for a single standard, though expressed in several different formats, to still be understood in a uniform way by users and applications. Metadata elements are defined and presented using this model. In other words - the abstract model is the framework that exists independently of the particular metadata elements used. Abstract models are sometimes also the basis for query languages just like SQL is dependent on the underlying relational database model. In the above examples, we see several abstract models defined. IEEE LOM uses an abstract hierarchical model with no formal semantics. RDF, and as consequence, Dublin Core, use an entity-relationship model grounded in model-theoretical semantics, while MPEG-7 and MODS use an XML-based structure, to which MPEG-7 adds objectoriented semantics. The models differ substantially in their methods for adding extensions - the XML-based models base their extensions on XML Schema, IEEE LOM depends on being able to extend the hierarchy, while the entity-relationship-based models have no notion of "extensions" as there is no base set of elements to begin with. Specification Structure Formal semantics No formal semantics Extensions IEEE LOM Tree-based Additions to the tree The DCMI specifications Entity-relationship Model-theoretic model semantics Any entity or relation can be used RDF Entity-relationship Model-theoretic model semantics Any entity or relation can be used MODS XML tree No formal semantics XML Schema extensions MPEG-7 XML tree object-oriented XML Schema and DDL (Data Definition Language) extensions As is apparent from the many proposals about mapping from one model to another (MPEG-7 to DC, MPEG-7 to RDF/OWL, IEEE LOM to DC, MODS to DC, DC in MPEG-7), this plethora of models is making metadata harmonization a very difficult task. 4.2 Vocabulary standards In order to fill the abstract models with concrete metadata elements, metadata vocabularies are needed. Nilsson et al. (2007) identifies two kinds of metadata vocabularies: • Element vocabulary - a set of metadata elements, that are used as some form of "descriptive attribute" in a metadata record. Examples of elements include dcterms:creator (the Creator element from Dublin Core) and "General.Title" (the Page 8 of 20 • title element from IEEE LOM). The corresponding element vocabularies would be the set of DCMI Metadata Terms and the set of IEEE LOM Data elements. Value vocabulary - a set of concepts or terms that can be used as value for a metadata element. Examples of such value vocabularies include the Library of Congress Headings, which includes terms such as "Biology" etc. Another example of a value vocabulary is the IEEE LOM Contributor Role vocabulary, containing terms such as "author", "illustrator" etc. 4.2.1 Element vocabularies Element vocabularies and value vocabularies have fundamentally different characteristics. While value vocabularies are used to construct taxonomies and thesauri that describe relationships between concepts in terms of broader/narrower, containment etc, element vocabularies are used to construct schemas and ontologies that describe how metadata instances are to be constructed. Abstract models tend to contain a model for describing element vocabularies. The following table summarizes a few important differences between the ways element vocabularies are handled in the different models. In particular, we highlight the method of defining element vocabularies, and the method for identifying elements in metadata instances. The table also summarizes the ways that elements may reference other elements. Semantic models generally support refinement, i.e. defining elements that are more precise than an existing element (such as dcterms:creator being more precise than dcterms:contributor). Hierarchical models generally support structural relationships between elements. Specification Method for defining element vocabularies Element identification Element relationships IEEE LOM Defines element vocabularies by describing the placement of the elements at a particular node in the Tree path hierarchy describing the metadata instances. Does not allow for refinements of elements, but does allow substructures. The DCMI specifications Define element vocabularies using URI RDF Schema. Allow refinement using RDF Schema constructs. RDF Defines element vocabularies using URI RDF Schema. Allows refinement using RDF Schema constructs. MODS Defined as XML elements only Does not allow for refinements of elements, but does allow substructures. MPEG-7 Elements defined in MPEG-7 DDL XML name (Data Definition Language). XML name Allows refinement through subclassing in DDL, as well as subPage 9 of 20 structures. It seems clear that element vocabularies are very problematic from a harmonization point of view, as elements in the different standards are defined, identified and related using fundamentally different underlying mechanisms. For example, an XML element and an RDF property have fundamentally different characteristics. 4.2.2 Value vocabularies Value vocabularies are usually simply referred to in different ways in metadata instances. The following table summarizes how value vocabularies are defined and referenced in the different specifications. Specification Defining value vocabularies Referring to values IEEE LOM Refers to values using two string IEEE LOM does not define a method for tokens: the "Source" and the describing value vocabularies. "Value". The DCMI specifications Do not define a preferred method for defining value vocabularies, although SKOS is becoming more and more popular (see below). Refer to values using URIs or natural language strings. RDF Does not define a preferred method for defining value vocabularies other than RDF Schema, although SKOS is becoming more and more popular (see below). Refers to values primarily using URIs. MODS Has no way of defining vocabularies Refers to values using natural except listing them in the XML Schema. language strings. MPEG-7 Defines vocabularies by listing them in DDL. Refers to values using natural language strings, unless they are XML elements, in which case there is a built-in reference mechanism. SKOS (Simple Knowledge Organisation Systems) is a W3C working draft specification for defining taxonomies and classification schemes (see http://www.w3.org/2004/02/skos/). The major harmonization issue with value vocabularies has to do with the way terms in the vocabulary are referenced in metadata instances. In the above table, there are four major methods used: URIs, Souce/Value pairs, string tokens and natural language strings. Different methods of identification imply different levels of precision, support for multilingualism and application independence. In order of decreasing precision (the examples are made up for illustration purposes only): Page 10 of 20 Value referencing method URI Example Ambiguity http://www.loc.go Depends on URI v/subjects/Biolog scheme used and y identifier stability Multilingualism Application independence fully multilingual reusable across any kind of application Source: LCSH, Value: Biology Depends on what "Source" token is reusable across any used, as well as pre- fully multilingual application agreement on allowed "source" token. Token EA32 Unique as long as it is depends on tied to a particular fully multilingual knowledge of XML XML schema or other Schema/context context natural language string Biology Ambiguous Source/value pair Cannot be reused, Not multilingual as meaning is context-dependent Clearly, URIs and source/value pairs are potent ways of referencing value vocabularies. See also CORES (Baker & Dekkers, 2002), an agreement to use URIs to identify components of metadata standards. 4.3 Syntax standards Exchanging metadata records requires a serialization format, or metadata syntax. Specification Syntax IEEE LOM Can be expressed in XML, other syntaxes can be defined. The DCMI specifications Dublin Core metadata can be expressed using XML, any RDF syntax (see below) or HTML meta tags. RDF RDF can be expressed in RDF/XML, N3, Turtle, RDFa (RDF in HTML attributes) and a few other, specialized syntaxes. MODS As the model is based on XML, MODS can only be expressed in XML. MPEG-7 Like MODS, only XML expression is possible. Metadata syntaxes are strongly linked to the abstract models of the respective specifications. As can be seen, the specifications whose abstract models are tightly linked to XML are also restricted to be expressed in XML. In the case of IEEE LOM, the abstract model is not based on XML, but is still based on a similar hierarchy. This is one reason for the existence of only one syntax of LOM so far (Nilsson et al, 2003). While that does not make other syntaxes for these metadata specifications impossible, clearly the design of an abstract model can influence the complexity of expression using different syntaxes. Page 11 of 20 Why would it be important to be able to express information using many syntaxes? There are several, related reasons for this: • • • Different syntaxes have different features. For example, XML provides an easily parsed, well-structured syntax in the cases that the data is relatively homogeneous, while RDF provides a more flexible model in the face of heterogeneous data. Thus different applications will support certain syntaxes better. Different syntaxes will support different query formalisms, which are useful in different contexts. Metadata might be stored not only in files using text-based syntaxes, but also in databases - or be accessed using programming interfaces. A standard that can more easily be bound to different formalisms will be easier to implement in various systems. 4.4 Application profiles In order to support community-specific and regional needs, metadata standards generally support a notion of customization through application profiles. While the exact methods used vary from specification to specification, the customization generally encompasses selecting a set of metadata elements from one or several element vocabularies, possibly extending the base element vocabulary as defined in the specification using locally defined elements, and choosing a set of useful value vocabularies for use with these elements. Enabling such customizations of metadata standards is one of the ultimate goals of metadata harmonization as we have described it in this chapter, since many such usecases depend on being able to reuse metadata elements from different specifications. Reusing an element in this case means referencing a metadata element in a way that can be automatically understood by an application, without reference to the definition of the application profile itself. If application profile-specific handling is needed, the process is better referred to as mapping. Application profiles rely on the interoperability features of the respective metadata standards. The metadata standards we have discussed use slightly different notions of application profiles. Combined with the differences in abstract models we have discussed previously, this produces significant barriers for the harmonization that application profiles have been designed to enable. The following table summarizes how application profiles are defined in the different specifications. In some cases, an application profile can be expressed in a machineprocessable format. The table also summarizes the support for reuse of metadata elements across application profiles. Page 12 of 20 Specification Application Profile support Machine-readable expression of Application Profiles Reusability IEEE LOM Profiles defined as restrictions/extensions of the base schema. Difficult to reuse Currently only possible extensions reliably as through XML Schema. element vocabularies are not well-defined. The DCMI specifications Profiles defined as arbitrary restrictions of arbitrary combinations of elements. Several proposed formats ("Guidelines", 2005, Description Set Profiles). RDF No notion of application profiles, though OWL No formalism except ontologies sometimes fill a OWL for ontologies. similar function. Fully reusable. MODS Profiles are defined as XML extensions. XML Schema. Difficult to reuse extensions, though XML namespaces could help. MPEG-7 Profiles are defined as XML extensions. MPEG-7 DDL (Data Definition Language). Difficult to reuse extensions, though XML namespaces could help. Any part of an application profile can be reused separately. 5 Challenges for harmonization Let us now focus on the two major metadata specifications in the e-learning domain: IEEE LOM and Dublin Core, and the possibilities for harmonization. 5.1 Application Profiles in DC and LOM The Dublin Core and LOM interpretations of the concept of application profile are both rooted in the corresponding abstract models underpinning these standards. A Dublin Core application profile refers to properties, vocabulary encoding schemes and syntax encoding schemes; a LOM application profile refers to LOM data elements or extended data elements and their value spaces, using the range of datatypes specified by the LOM standard. As has already been discussed these are fundamentally different types of constructs: an occurrence of a LOM data element is interpreted through the semantics of the LOM abstract model, and a reference to a DC property is interpreted through the semantics of the DCMI abstract model. Neither approach is sufficient to support the Lego-like assembly of a modular metadata description which draws on both the LOM and DC metadata standards. Secondly, the LOM standard provides not only a set of data elements, but also a default pattern for the use of those data elements, a “base” application profile to which other community- or application-specific LOM application profiles should also conform. Page 13 of 20 Closely related to this second point is that the LOM abstract model does not define a mechanism for uniquely identifying and referencing data elements within a global context. While the use of extended data elements is possible, the disambiguation of those elements is reliably possible only within a context where the use of names is controlled. The LOM abstract model does not lend itself to the reuse of data elements within a global context, or to the sharing of LOM metadata descriptions beyond a context in which names are controlled. The DC and LOM application profile constructs are both useful in formalising the way in which the implementers of metadata standards customise and (to a greater or lesser degree) extend those standards. They also provide a basis for disclosing existing work and encouraging the reuse of components used within existing application profiles, again subject to some limitations. They highlight that a degree of mixing and matching is indeed possible – but only within the framework of the corresponding abstract model. For DC and LOM, the incompatibility of those abstract models means that the two incompatible application profile constructs are not sufficient to address the problem of how to use component parts of those two standards in combination. 5.2 Identifying and reusing elements As shown in Nilsson et al. (2007), mixing different metadata standards using the XML format does not work the way we would want it to. Using RDF as a common format works well with standards that use an abstract model compatible with RDF, but is still problematic for LOM and other standards based on an elements-in-elements model. The main reason for this is that such models have no canonical interpretation as entityrelationship models, and thus need to be reinterpreted/reengineered in order for them to be usable in RDF. The CORES Resolution (Baker and Dekkers, 2002), which has been signed by both the IEEE LTSC and the Dublin Core Metadata Initiative, encouraged the owners of metadata standards to assign URI references to their “elements”, the “units of meaning comparable and mappable to elements of other standards”, but it did not specify what “comparable and mappable” meant. As a consequence the owners of different standards assigned URI references to "elements" that were created within different abstract models and used metadata formats that rely on those incompatible abstract models for their meaning and interpretation. The assignment of a URI reference to an "element" means that it can be unambiguously cited, but it does not change the nature of the "element". For example, it is not meaningful to use a URI reference for a LOM element as, e.g., a property URI in a Dublin Core metadata description. Similar incompatibilities have been noted between, e.g., RDF and MPEG-7 (van Ossenbruggen, Nack and Hardman, 2004 and Nack, van Ossenbruggen and Hardman, 2005). The conclusion we may draw from this analysis is that we must not confuse the components used in a metadata format with the constructs in the abstract model. The components in a metadata format, such as “element URIs” may seem to be similar and compatible, but in reality they belong to completely different frameworks that might not be compatible. There are several problematic scenarios, including: • Mixing two metadata formats created to conform to different abstract models, such as Dublin Core XML and LOM XML. A similar example is trying to use parts Page 14 of 20 of a Dublin Core RDF description serialized in the RDF/XML language together with elements from another XML language such as the LOM XML language. As LOM and RDF use incompatible abstract models, this also leads to ‘nonsensical’ metadata constructs (Johnston, 2005). • Reusing metadata terms or elements adhering to different abstract models, regardless of the metadata format used, such as reusing a Dublin Core element URI in a LOM metadata description. As discussed in Nilsson et al. (2007), this leads to nonsensical metadata constructs, since the URIs of Dublin Core and of LOM must be interpreted in terms of different abstract models. For example, the Dublin Core XML expression forces an interpretation of XML elements as properties - an interpretation that may not apply to included LOM metadata elements expressed in XML. • Mixing two different syntaxes expressing the same specification, when those two expressions apply different interpretations to the use of similar components in the metadata format. This is the case with the Dublin Core XML binding, which must be interpreted using a different set of rules than the RDF/XML serialization of the Dublin Core RDF binding, although they contain component parts that are confusingly similar. Hence we must conclude that the notion of “reusing elements” between metadata standards and formats using incompatible abstract models is fundamentally flawed. While assigning URI references for the component parts of a metadata standard is clearly a worthwhile effort in other ways, this does not really address the fundamental issue when creating interoperable metadata standards, namely the compatibility of their respective abstract models. 5.3 Requirements for application profiles In conclusion, we see that in order to reuse components of different standards in a machine-processable way as discussed above, the following criteria must be met: 1. The components must be unambiguously identified, so that components from different sources can be clearly distinguished and their origins can be separated. This is addressed by the CORES resolution. 2. The components must adhere to compatible abstract models. There is currently no resolution to address this, although the Dublin Core – IEEE Memorandum of Understanding (“Memorandum”, 2000) points in this direction. 3. A metadata format must be used that allows for consistent interpretation of the components with respect to their respective abstract models. This too is mentioned in the “Memorandum”, but has yet to be realized. 5.4 Summary of obstacles The following harmonization obstacles were raised in the previous section: Extensibility Different abstract models have different methods for extensions, rendering the extensions mutually incompatible, and therefore not reusable across specifications. Modularity Page 15 of 20 Different notions of application profiles lead to impossibility of combining fragments from different specifications. Refinements Not all abstract models support refinements, meaning that cross-vocabulary refinements becomes impossible. Multilingualism In the cases where abstract models do not clearly separate natural language items from abstract tokens, multilingualism quickly becomes an issue. This is handled well in at least LOM and Dublin Core. Machine-processability of standards and extensions This depends on articulated abstract models with well-defined semantics, which is currently provided by at least Dublin Core, RDF and MPEG-7. Identification of elements and values Clearly, a common model for referring to terms from element and value vocabularies is needed. Syntaxes The syntaxes used must be firmly rooted in the applicable abstract model. Application Profiles A common, formal model for Applications profiles is needed. 6 Addressing the harmonization issues The above analysis shows that there are many difficulties on the road towards metadata harmonization. This chapter outlines a roadmap towards metadata harmonization generally, and between LOM and Dublin Core in particular. Five areas of harmonization are identified: identification harmonization, abstract model harmonization, vocabulary harmonization, application profile harmonization and syntax harmonization. Issue Comment The first important issue to be resolved is that of identification, of both metadata elements and values taken from vocabularies. The analysis above shows that the tokens work locally and in wellIdentification defined communities, but on a global scale, global identifications is necessary. A related issue is when element identification depends on the placement of an element in a hierarchy, as in the LOM standard. Needed actions • Encourage the specification of URIs for values in controlled vocabularies. • Provide mappings from such URIs to tokens and natural language strings. • Encourage the specification of URIs for metadata elements. Page 16 of 20 Issue Abstract Model Vocabulary model Comment As has been shown above, value identification is relatively unproblematic, while element identification relies on understanding precisely what is being identified. In order for element identification to have an effect on harmonization, the elements need to be of the same kind, using a common understanding of the underlying model. While there is no strong requirement for a common value vocabulary model (since value identification is the major issue), a common model for element vocabularies is tightly linked to the harmonization of abstract models. Common, machine-understandable formats for element vocabularies are a prerequisite for enabling modularity - since this will enable automatic disassembling and processing of composite metadata. Needed actions • Encourage harmonization through synchronization of abstract models. As we have seen in the analysis above, differences in abstract models create unnecessary incompatibilities. • Avoid relying on mapping of instance data for harmonization. As described in Nilsson et al (2007), except for highly similar standards, this creates a m x n problem, where every standard needs to mapped to every other. Instead try to align on the abstract model level. • Discourage the introduction of new abstract models into the domain, as this further fragments the community. Of particular worry is the work in ISO/IEC JTC1 SC36 on Metadata for Learning Resources. • In the cases where such synchronization is unfeasible, try to provide mappings on the level of abstract models. In the analyzed specifications, three vocabulary models are used: RDF Schema, XML Schema and MPEG-7 DDL. Relying on a syntax-oriented model such as XML Schema to define abstract entities that can be reused across syntaxes and systems leads to difficult interoperability issues. Therefore the recommendation is to define element vocabularies using RDF Schema, even if RDF itself might not always be used as a way of expressing the metadata. Page 17 of 20 Issue Comment Working cross-standard application profiles require a common understanding of what an application profile is. This is dependent on the issues above, in particular regarding identifying and defining element vocabularies. If we Application are to support the multitude of Profile Model description types mentioned in the beginning of this paper, an application profile model cannot be based on a "base" model such as LOM, as this would render the model unusable for describing other things than e.g. learning objects. Syntaxes A syntax is useless without a processing model, and such a model must be based on the abstract model of the metadata standard. Needed actions Models for application profiles that are independent of a particular element set need to be developed. As the LOM example shows, such models must be able to handle high structural complexity and specificity. Make sure metadata syntaxes are firmly grounded in an abstract model, and that, conversely, the abstract model is considered before the syntax when developing metadata specifications. 7 Conclusions In this deliverable we have analyzed the obstacles to metadata harmonization. The issues fall in three broad categories: Conventions The different metadata specifications use different methods for identifying and describing metadata elements and terms from value vocabularies. It seems possible to enable high fidelity harmonization solutions to these issues without disrupting the existing specifications. For example, terms identified by source/value pairs can be assigned URIs. Models The specifications differ substantially in how they define metadata records, and in how metadata is structured and processed. A mapping solution is therefore destined to be incomplete and suffer from not being generalizable to extensions. For example, the IEEE LOM notion of "Category" has no correspondence in Dublin Core metadata, and any generalizable mapping of Categories will therefore be problematic. Combinations Combining elements to form application profiles, and encoding them in syntaxes are both processes that rely heavily on models as well as conventions. It is likely that once conventions and models are harmonized, applications profiles and syntaxes will become more easily addressable harmonization issues. The above three categories also represent milestones on a roadmap to harmonization harmonize conventions, then models, then application profiles and syntaxes. It should Page 18 of 20 therefore be clear that a solution to the harmonization problems needs to take the whole framework of conventions, models and application profiles and syntaxes into account. Recently, there has been a clear movement towards conventions based on Web architecture, leading to a strong recommendation for basing identification on URIs. There is also increased momentum towards describing element and value vocabularies in a Web architecture-friendly way, using the RDF Vocabulary Description language (RDF Schema) for element vocabularies and SKOS (Simple Knowledge Organization Systems) for describing value vocabularies, i.e. controlled vocabularies, taxonomies and classification schemes. For abstract models, a consensus has yet to be reached, although the Resource Description Framework (RDF) does provide a framework well founded in Web architecture and a formal semantics. This deliverable still recommends that metadata specifications harmonize their models with the RDF model and, by extension, the semantic web. For application profiles and syntaxes, no firm guidances can really be given, though developments such as ontologies and the Dublin Core Description Set Profile specification remain highly relevant. Concrete work on harmonizing IEEE LOM and Dublin Core is currently progressing within the Joint DCMI / IEEE LTSC Taskforce1. The approach taken is that of reinterpretation of the IEEE LOM data elements in terms of a completely different abstract model – the DCMI Abstract Model. The resulting specifications make LOM elements reusable in Dublin Core and RDF metadata, but at the cost of imperfect translation. At the same time, work on a “new LOM” is slowly starting, although it is unclear at this point what approach towards harmonization that will be taken. In a similar spirit, the RDA (Resource Discovery and Access)2 project is redesigning the data model that is behind the world's library data specifications (the MARC data model). The work is done in collaboration with the Dublin Core Metadata Initiative, and aims to produce a model that is compatible with the DCMI Abstract Model and RDF3. Together, these two initiatives, both of which include important contributions from ProLEARN members, demonstrate important progress towards harmonization of several important metadata domains – generic metadata using Dublin Core, educational metadata, and library metadata, as well as a widening from the all-digital domain to the domain of physical artefacts (books). Harmonizing metadata specifications in the way outlined in this document seems an overwhelming task, but the steady flow of important developments still makes the future seem bright. 1 http://dublincore.org/educationwiki/DCMIIEEELTSCTaskforce 2 http://www.collectionscanada.gc.ca/jsc/rda.html 3 http://dublincore.org/dcmirdataskgroup/ Page 19 of 20 8 References Baker, T. & Dekkers, M., (2002), CORES Standards Interoperability Forum Resolution on Metadata Element Identifiers. http://www.cores-eu.net/interoperability/cores-resolution/ Dublin Core Application Profile Guidelines (2003), CEN Workshop Agreement CWA 14855. ftp://ftp.cenorm.be/PUBLIC/CWAs/e-Europe/MMI-DC/cwa14855-00-2003-Nov.pdf Duval, E., Hodgins, W., Sutton, S. & Weibel, S. L. (2002), Metadata Principles and Practicalities, D-Lib Magazine, April 2002. http://www.dlib.org/dlib/april02/weibel/04weibel.html Friesen, N., Mason, J. & Ward, N. (2002), Building Educational Metadata Application Profiles, Dublin Core - 2002 Proceedings: Metadata for e-Communities: Supporting Diversity and Convergence. http://www.bncf.net/dc2002/program/ft/paper7.pdf Godby, C. J., Smith, D. & Childress, E. (2003), Two Paths to Interoperable Metadata, Proceedings of DC-2003: Supporting Communities of Discourse and Practice – Metadata Research & Applications, Seattle, Washington (USA). http://www.siderean.com/dc2003/103_paper-22.pdf Guidelines for machine-processable representation of Dublin Core Application Profiles (2005), CEN Workshop Agreement CWA 15248. ftp://ftp.cenorm.be/PUBLIC/CWAs/eEurope/MMI-DC/cwa15248-00-2005-Apr.pdf Heery, R. & Patel, M. (2000), Application Profiles: mixing and matching metadata schemas, Ariadne Issue 25, September 2000. http://www.ariadne.ac.uk/issue25/app-profiles/ Johnston, P., (2005a), XML, RDF, and DCAPs. from http://www.ukoln.ac.uk/metadata/dcmi/dcelem-prop/ Memorandum of Understanding between the Dublin Core Metadata Initiative and the IEEE Learning Technology Standards Committee (2000). http://dublincore.org/documents/2000/12/06/dcmi-ieee-mou/ Nack, F., van Ossenbruggen, J. & Hardman, L. (2005), That obscure object of desire: multimedia metadata on the Web, part 2, IEEE Multimedia 12 (1) 54-63. http://ieeexplore.ieee.org/iel5/93/30053/01377102.pdf?arnumber=1377102 Nilsson, M., Palmér, M. & Brase, J. (2003), The LOM RDF Binding - Principles and Implementation, Proceedings of the Third Annual ARIADNE conference. http://kmr.nada.kth.se/papers/SemanticWeb/LOMRDFBinding-ARIADNE.pdf Nilsson, M., Johnston, P., Naeve, A., Powell, A. (2007), The Future of Learning Object Metadata Interoperability, in Harman, K., Koohang A. (eds.) Learning Objects: Standards, Metadata, Repositories, and LCMS (pp 255-313), Informing Science press, ISBN 8392233751. Powell, A., Nilsson, M., Naeve, A., Johnston, P. Dublin Core Metadata Initiative Abstract Model, DCMI recommendation, 2007, http://dublincore.org/documents/abstract-model/ van Ossenbruggen, J., Nack, F. & Hardman, L. (2004), That obscure object of desire: multimedia metadata on the Web, part 1, IEEE Multimedia 11 (4) 38-48. from http://ieeexplore.ieee.org/iel5/93/29587/01343828.pdf?arnumber=1343828 Page 20 of 20

RELATED PAPERS

RELATED TOPICS

Log In

Harmonization methodology for metadata models

Harmonization methodology for metadata models

Related Papers

RELATED PAPERS

RELATED TOPICS