Topic Maps for Cultural Heritage Collections

By murray - Posted on 04 April 2008

The New Zealand Electronic Text Centre (NZETC) is a digital library project at Victoria University of Wellington. The Centre has been digitising New Zealand books since early 2002, and publishing them on its website. Since April 2005, the structure of the website has been based on Topic Maps.

Many digital library websites use a resource-centric organisation, in which individual documents are the primary or exclusive objects of interest. By contrast the NZETC’s Topic Maps-based website has a more subject-centric architecture which accommodates not only the digitised texts and images, but equally their subjects and themes, their authors and publishers, as well as the people and places mentioned or depicted in those texts and images. Because of the generality of the Topic Maps paradigm, the conceptual structure can be extended as needed, e.g. to include extra classification schemes such as Linnaean classification for biological texts, or to provide more specific types of relationships between texts, such as a new law repealing a section of an old law.

Texts in the NZETC collection are transcribed into XML which encodes the logical structure of the texts, such as their division into paragraphs, sections, and chapters, and also bibliographic metadata such as the author, subject classification, and publication information. This presentation describes the work of the NZETC to further identify and encode subjects and relationships implicit in our texts, and to use this information to generate a navigational framework capable of delivering enhanced resource discovery and navigation both within and between collections.

A key part of the work is authority control: unambiguously identifying people, places, and other entities mentioned in the digitised texts. To support this work, the NZETC has developed software tools for searching and matching names, including a database application called Entity Authority Tool Set (EATS), which we use to allocate unique identifiers for all the subjects in our texts, and store names, biographical data, and external links.

The NZETC’s topic map is constructed automatically by transforming each of the TEI documents, the authority database, and other sources, into individual XML Topic Map documents, and merging these topic maps together using the open source TM4J Topic Map engine. The NZETC system is based on international standards for the representation and interchange of knowledge. Digitised texts are encoded using Text Encoding for Interchange (TEI). Authority lists are maintained in a purpose-built database and exchanged using Metadata Authority Description Schema (MADS). TEI and MADS documents are transformed using Extensible Stylesheets (XSLT) into XML Topic Maps (XTM). The topic maps themselves use the CIDOC Conceptual Reference Model CRM. The collection includes over 2000 texts and tens of thousands of records of people and places, totally about 110,000 topics.