The focus of HOPE's work thus far has been to produce, collect, and disseminate data of a relatively uniform quality and form. Much of the onus has been placed on the content providers to comply with domain standards and with the acceptance criteria set by the Aggregator and external services as well as with the best practice recommendations for data harmonization and cross searching. HOPE has just begun to explore the possibilities that lie beyond this.
Looking ahead, HOPE sees potential in the enrichment of supplied metadata through special Aggregator tools for mining data, clustering information, or adding context to locally-produced description. One such endeavor was the development of the HOPE collection record. (See: Semantic Harmonization, section on Metadata Recommendations for Collection Descriptive Units.) Placed over and above lower-level domain-based descriptions, the collection record serves as a uniform entry point to HOPE's heterogeneous content. Another is the possibility of developing common authority lists for agents, places, events, and concepts. These would help integrate collections on the Social History Portal; it is notable that in the early survey of target user habits, 79 percent claimed to use personal or organization search terms, 67.7 percent use thematic or historical terms, and 40.3 percent geographical terms. Common authority lists would also enable HOPE metadata to be linked to published controlled vocabularies such as Virtual International Authority File (VIAF).
The following discussion will treat HOPE's plans to cluster and merge local authority lists using the Aggregator's Authority File Manager Tool as well as the successful realization of the Aggregator's Tagging Tool, which already allows content providers to add HOPE Themes—a special list of social history themes developed within the framework of the project—to HOPE collections and items.
Feasibility of Building Common Authority Lists in HOPE
According to the HOPE Content Providers Survey, metadata on persons and organizations, so-called 'agents', is validated by content providers representing 78.5 percent of the supplied metadata records. On the other hand, content providers representing only 23.1 percent of the supplied metadata employ a published standard such as Personennamedatei (PND-German), Gemeinsame Körperschaftsdatei (GKD-German), and RAMEAU (French). In general, content providers indicated that the syntax of this metadata is fairly consistent but often includes additional descriptive metadata about the title of the person and the dates of birth and death. This should be considered in whatever mapping and merging procedure is developed. Nevertheless a common agent authority file seems feasible within the scope of the HOPE project.
Metadata on geographical locations is validated by content providers representing 96 percent of the supplied metadata records. Yet those representing only 18.2 percent of the supplied metadata use a published standard such as ISO 3316-1, Merriam-Webster's Geographical Dictionary, MOTPRO (Finnish), and RAMEAU (French). Metadata includes geographical names with differing scope, including local, regional, national, and supra-national terms. Any mapping and merging procedure developed should distinguish between geographical names of varying scopes. Nevertheless, a common geographic authority file seems feasible within the scope of the HOPE project.
Metadata on subject terms is validated by content providers representing 95.4 percent of the supplied metadata records. Yet, those representing only 21 percent of the supplied metadata use a published standard such as LCSH, Helvetosaurus (Swiss), ONKI Finnish Ontology Library Service: YSO (Finnish General Upper Ontology), TYPO (Finnish labour history thesaurus), and RAMEAU (French). Metadata includes topical terms as well as personal or family names, corporate names, names of events, geographical names, chronological terms, and genre/form terms. Thus, HOPE should look into the possibilities for using named entity extraction tools to distinguish between the different types of subjects. Moreover, any mapping and merging procedure should take into account that some content providers provide articulated term—i.e. terms consisting of multiple parts, organized into a meaningful order—for topical terms and/or genre/form terms. It remains unclear whether a common concept authority file is feasible within the scope of the HOPE project.
In sum, considering persons and organizations, geographic names, and subjects, a vast majority of the metadata has been validated using controlled vocabularies, which makes this metadata eligible for creating common HOPE authority lists. At the same time, only around 20 percent
of this metadata has been validated using a published data value standard. Of those that do use a published standard, most use national data value standards. This suggests that HOPE should look to international published vocabularies that build on national terminologies. The remaining content providers use controlled vocabularies developed and maintained as in-house standards. Furthermore, when asked whether those metadata elements that are hard validated also contain non-validated terms, about half indicated that they do not validate all terms in those elements. This indicates that mapping and merging procedures should accommodate terms in submitted (agent, geographic, and subject) metadata that are not hard validated against a controlled vocabulary.
It is clear from above that the Aggregator's Authority File Manager Tool should merge locally-employed terminology supplied by the HOPE content providers. But the tool should also provide the possibility to compare and merge terms with published controlled vocabularies. Based on a preliminary analysis, the following published vocabularies should be considered: 1) VIAF and Union List of Artist Names (ULAN) for personal and corporate names; 2) geonames.org for geographical locations; and 3) Library of Congress Subject Headings (LCSH) or Art and Architecture Thesaurus (AAT) for topical terms. Importantly, the Authority File Manager Tool requires content providers to supply controlled vocabularies as separate authority lists, including a local identifier for each vocabulary term; such identifiers are needed to track the provenance of the term. It is still unknown whether HOPE content providers would be able to supply such identifiers.
Mapping Authorities to the HOPE Data Model
As a basis for future common authority file management using the Authority File Manager Tool, the HOPE data model has specified the following elements for recording vocabulary terms for persons, organizations, places, concepts, and events. (An Events Entity has been included based on the Europeana Data Model. The practical use of this entity is still unclear.) Terms would be supplied as part of the submitted metadata record. In the first instance, the vocabulary term would be mapped as a property of the Descriptive Unit into an associated pair of elements (from two distinct groups).
The first element in each pair already records values supplied by the content provider as part of the descriptive metadata. If authority management is implemented, this elements will serve to store a back-up of the originally supplied term. The element will also record the local identifier of the vocabulary term as an attribute. It is important to remember that this element will hold both non-validated and validated data supplied by content providers. Here is a list of such Descriptive Unit elements by authority type.
- For Agents: Creator, Contributor, Publisher, Subject;
- For Places: Creator/place, Publication/place, Spatial Coverage;
- For Concepts: Subject;
- For Events: Temporal Coverage.
The second element in the pair would record the vocabulary term after it has been merged with other vocabulary terms. This term would be the 'unified' term, which would also be recorded in the corresponding entity (i.e. Agent, Place, Concept, or Event). In the first instance, such elements would record the content provider's local identifier for each term. (In the second instance, it will also record the PID and preferred term resulting from the merging procedure.) Here is a list by authority type of Descriptive Unit elements recording the unified term corresponding to the above:
- For Agents: Is Created By, Has Contributions By, Is Published By, Associated Agent, Depicted Agent (from the Visual Profile only);
- For Places: Is Created In, Is Published In, Associated Place, Depicted Place (from the Visual Profile only);
- For Concepts: Associated Concept, Depicted Concept (from the Visual Profile only);
- For Events: Associated Event, Depicted Event (from the Visual Profile only).
In the second instance, all vocabulary terms would be mapped to a corresponding entity (e.g. person and organization terms would be mapped to the Agent Entity). These entities by default record the following metadata elements:
- A PID created by the Aggregator for the vocabulary term. The PID unambiguously identifies the term in the HOPE System. The PID would also be supplied to the second set of elements above;
- The local identifier of the vocabulary term. This identifier refers to the local system of the content provider who supplied the term;
- The unified term, or so-called 'preferred term'. The unified term would likewise be supplied to the second set of elements above;
- Any alternative terms.
In addition, each entity also records a set of descriptive metadata for each vocabulary term, such as dates, description, spatial coverage, and source. They may also record the PID of the term in one or more established vocabularies (e.g. VIAF, geonames.org, or AAT). The specific workflow for populating the additional elements would need to be worked out in detail.
The future of HOPE lies in functionality and services targeted to social history end users as well as in high-level linked open data initiatives which promise to unit HOPE collections with a vast body of content. HOPE has demonstrated the feasibility of using the Authority File Manager Tool to create common HOPE authority lists, and has developed its data model to support the mapping and merging of terms across a range of elements. Nevertheless, a detailed business case for common authority management still needs to be made.
In addition to the four entities mentioned above, the HOPE data model has specified an entity for HOPE Themes. The HOPE Theme entity accommodates thematic headings specific to the fields of social and labor history. To populate this entity, HOPE has created a its own list of 47 HOPE Themes, organized under seven general facets or headings:
- ''Political Movements:'' Anarchist movements; Christian-Democrat movements and parties; Communist movements and parties; Conservative and Liberal movements and parties; Elections/Electoral campaigns; Fascism and Nazism/Fascist and Nazi movements and parties/Anti-fascist movements and parties; Green movements and parties; Radical movements and parties; Socialist and Social Democrat parties/Socialist International;
- ''Social Movements:'' Environmentalists and anti-nuclear movements; Feminist movements/Women's movements; Migrations/Migrant movements; Religious movements/Anti-clericalism/Atheism; Sexualities/LGBT movement; Youth and students movements;
- ''Labour and Economic Activities:'' Capitalism/Anti-capitalist movements/Anti-globalization movements; Cultural and sociocultural movements; Joblessness/Social security; Negotiations/Labour exchanges; Peasant movements; Science and technology; Social economy; Syndicalism/Trade unions; Industrial and agricultural transformation issues; Workers movements/Workers councils/Workers International organizations; Workers sport organizations and activities;
- ''Life Conditions:'' Culture, media and arts; Education; Health; Housing;
- ''War and Peace:'' Military conflicts and activities; Occupation/Resistance movements; Pacifism/Peace movements; War crimes and trials; War prisoners;
- ''Human Rights:'' Anti-slavery movements; Concentration camps/Internment camps/Forced labor camps; Exiles/Political refugees, Human rights organizations; Political prisoners/Political trials; Racism/Anti-racist movements; Censorship;
- ''State and International Questions:'' Colonial question/Anti-colonialist movements; Independence/Independence movements; Imperialism/Anti-imperialist movements; International relations; Nationalism/Autonomist and separatist movements; State and administrative activities.
The HOPE Theme Entity was developed to accommodate several features of the HOPE Themes. First, it supports hierarchical relations between a set of broader terms, i.e. cluster titles, and narrower terms, i.e. the actual HOPE Themes which will be assigned to the metadata. Note that though the initial stage only proposes the management and assignment of a flat list of themes, in later stages HOPE Themes may be expanded into a fully-supported hierarchical list. The HOPE Theme Entity also allows the possibility of recording translations for each HOPE Theme. It also supports descriptive metadata on each theme, including a date or date range and a description or definition.
In contrast to the authority terms, HOPE Themes are not provided through the mapping or merging of terms. Instead they are assigned to collection, series, and item records using the HOPE Tagging Tool. The Tagging Tool allows descriptive units to be tagged only after they have been ingested and only by the content provider who supplied them. The tool's search interface supports the search of uploaded collections using a variety of criteria, including local identifiers, collection name, and thematic tags already present. General keyword searching likewise makes it possible to tag thematic subsets of metadata within heterogeneous collections. The tool refines the tagging process with the option to 'replace existing themes' or 'add to existing themes.' The tool also provides the option to propagate terms from a parent to all child records or from any child to its parent record. Once applied, the thematic tags can be indexed, searched, and presented alongside submitted metadata. If metadata is re-ingested, assigned tags are re-applied.
Importantly, the tagging tool is not intended to create 'virtual collections' with their own collection-level description that can be displayed as a separate entity. Instead, it enriches HOPE's heterogeneous metadata to facilitate cross searching. HOPE Themes have in fact proved a relatively easy to implement 'first response' to the problem of cross searching. Common authority management remains the next obvious step.
Art & Architecture Thesaurus (AAT) Online'' (http://www.getty.edu/research/tools/vocabularies/aat)
HOPE: Heritage of the People's Europe. "Section: Data Model." ''The HOPE Common Metadata Structure, including Harmonisation Specifications''. May 2011. (http://www.peoplesheritage.eu/pdf/D2_2_Metadata%20Structure.pdf)
''Library of Congress Subject Headings (LCSH)'' (http://id.loc.gov/authorities/subjects.html)
''Union List of Artist Names (ULAN) Online'' (http://www.getty.edu/research/tools/vocabularies/ulan)
''Virtual International Authority File (VIAF)'' (http://viaf.org)