Metadata and Semantics
Sicilia, M.; Lytras, M. (eds)
2009 Springer 549pp
This volume contains the proceedings of the 2nd International Conference on Metadata and Semantics Research, held in Corfu in October 2007. The conference had a 50% acceptance rate, with most of the presented papers being published in the book.
Despite the fact that many papers present only interim or early research work, the volume provides a fascinating overview of a cross section of metadata and semantics research in both pure form and as applied in a variety of domains, including those of cultural heritage, education and agriculture. In addition to the advances in these individual fields and domains, one is able to determine some general trends and patterns running through the work presented. Perhaps overriding all is the sense that many researchers and archivists are struggling with the problem of balancing the complexity of describing information resources within a domain adequately, while at the same time ensuring that discrete sets of knowledge can actually be successfully linked and that systems which can guide the user in retrieval and discovery can be usefully developed on top of these structures.
Immediately apparent as one reads the volume is the dominance of semantic web approaches to metadata and knowledge modelling, influencing migrations from stand-alone formats and conventions to those based on semantic web standards such as RDF and OWL. Sections are devoted to semantic web applications and to ontology engineering. In several places one sees how such approaches can provide the missing semantic aspect to rather more well-established syntactic rules and begin to enable reasoning and knowledge discovery through the use of ontologies. In Semantic Application Profiles: A Means to Enhance Knowledge Discovery in Domain Metadata Models, for instance, Koutsomitropoulos and colleagues show how the CIDOC Conceptual Reference Model for cultural heritage can be represented in OWL and used to discover related works (using a painting example). In Capturing MPEG-7 Semantics, Dasiopoulou et al, argue that a semantic model is needed to enhance and disambiguate multimedia mark-up enabled by the MPEG-7 format. Also very much worthy of mention in this regard is good the article by Voss, Encoding changing country codes for the Semantic Web with ISO 3166 and SKOS which demonstrates with a grounded example how changes and versioning can be accommodated using the RDF based Simple Knowledge Organisation System. This is clearly paramount when considering the provision of globally applicable yet cohesive and adaptive data sets.
The complexity of and unwieldiness of ontologies has been an issue in the past and several of the papers address this by providing either automated means of deriving ontology structures or reviewing tools used (or needed) for ontology and semantic web data manipulation.
In the first category, Ensemble Learning of Economic Taxonomy Relations from Modern Greek Corpora by Kermanidis shows how a number of clustering algorithms can be applied in combination to arrive at quite a high level of accuracy in deriving relations, based only on the learning sets themselves. In A new Formal Concept Analysis-based learning approach to Ontology Building, Jia and colleagues promote the FCA approach in providing a lattice of concepts rather than the hierarchy given by other popular clustering methods, which they argue as being more “true to life”. Jia et al then show how the derived ontology can then be used to provide a query refinement/expansion method.
While there are a number of papers that demonstrate derivations and developments of specialised or extended ontologies and metadata schemes, I found the more pragmatic papers to be more valuable and progressive for the way they manage to remain in touch with the ultimate aim of their work. A nice balance between the convenience of machine cataloguing and the need for oversight of human cataloguers is struck in Whitelaw and Collins’ Pragmatic support for taxonomy-based annotation of structured digital documents. The authors describe the derivation of metadata for Open University course documents (based on IEEE Learning Object Metadata) using a series of more- or less automated approaches and then having these evaluated by library cataloguing staff. The optimal solution was found to be where terms are suggested from the vocabulary based on document content, but where the staff member can click through to the definition before confirming the use of the term as metadata. The authors also begin to look at the cost benefit ratio of collecting more detailed metadata to a slight increase in cataloguing effort. While these authors don’t get as far as exploring the efficiency of retrieval of documents, this is handled in Cervera et als’ Quality Metrics in Learning Objects which stresses the reusability objective and notes that quality metrics are crucial to the navigation of resources. Notably, Cervera et al describe the need for socially based rankings and ratings to be incorporated into the metadata scheme.
A further example of attempts to enhance the usability and comparability of metadata is provided by Nilsson et al in Formalizing Dublin Core Application Profiles: Description Set Profiles and Graph Constraints. A number of practical measures for introducing conventions for how metadata elements are used and coded are presented, including standard templates and the use of a wiki to provide for easy human editing of profiles which can then also be exported at XML, thereby bridging the gap between qualitative and more structured usage guides.
Against this general background of attempts to standardise, enable interoperability between collections and provide enhanced semantic infrastructures you have the inherent messiness of human decisions as to meaning and the complexity of the encoding act itself, something that is probably not discussed enough in this community but which is covered in Scifleet et al’s The Human Art of Encoding: Markup as Documentary Practice. The authors note that metadata cataloguing is socially situated and may often be idiosyncratic, though the extent to which this may affect quality and comparability is poorly known. They report on a programme of research to study this variety in the practice of participants in the Text Encoding Initiative (TEI). Unfortunately their full findings are not covered in this volume but are well worth following up for those interested in this important question.
In summary, this is an interesting and varied read, with enough “current state of the art” review papers to interest the non-specialist as well as some good fodder for those already wrestling with metadata, semantics and knowledge representation issues within particular domains.