Article contents [±]
- Scholarly Use of Digital Collections
- Resources: Information needs of scholars
- Collections Development Principles
- Resources: Collection development
- Descriptive Metadata for Collections and Items
- Collection description
- Resources: Collection description
- Item description for aggregation and interoperability
- Resources: Item description
- Aggregation issues and problems
- Resources: Aggregation development
- General article comments
Libraries, museums, and archives have been producing digital collections for decades, providing scholars with broad access to countless special collections. Researchers engaged in digital scholarship have also created many digital collections tailored to the interests of their particular research communities. Both kinds of collections are curated, in that they have been carefully selected and assembled for a specific purpose or audience. In the networked information environment, curated collections will become increasingly important as organizational units for the scattered and diverse mass of available digital information and for providing coherent contexts for meaningful engagement with that information. Aggregations, or collections of collections, are essential backbone resources in the evolving e-research platform that also need to be curated if they are to truly support and enhance discovery and innovation across the disciplines. Curatorial activities, such as archiving, preservation, and maintenance, are also important for managing the entire lifecycle of collections, but collection development and collection description are formative curation activities that add value for scholarly inquiry at both the collection and aggregation levels. This chapter is focused on key formative activities, particularly selection, representation, and related practices that support the integration of collections into large-scale digital aggregations. Without careful attention to interoperability and metadata for intellectual and navigational context, for example, aggregations can become incoherent and unwieldy masses of undifferentiated content. The resources presented below cover four key aspects for development of digital collections that are fit for purpose, function effectively in the networked information environment, and can contribute to the creation of rich, extensive, and diverse aggregations for scholarly use.
There is a large body of literature on scholarly practices, including how researchers use libraries, archives, and digital collections. It provides a substantial base of knowledge on the information resources of value to scholars and how they search, explore, and use collections. The resources listed in this section address foundational issues related to the role of collections in the research process and the implications for development of digital collections and services, ranging from accounts of the scholarly primitives, or basic activities involved in conducting scholarship in the humanities and sciences, to the kind of sources that are of value in different disciplines and related aspects of organization and description. The following resources identify and analyze the information practices and needs of scholars that well curated collections must support.
Collection curation includes careful selection and organization in anticipation of user needs. Proceeding from the user perspectives established above, this section is concerned with guiding principles for building valuable digital collections. In the following resources, conceptual approaches to collection definition, development, and curation feed into specific and pragmatic guidelines for collections planning, implementation, and sustainability.
Many of the aspects covered impact how well collections can be aggregated into larger, shared resources, in part because the possibility of aggregation or eventual collection repurposing is often neglected by collection administrators during collection-making, despite its commonness in the real world of cultural heritage institutions. The possibility of aggregation – along with other forms of repurposing that might entail loss of context for collections or items in collections – must be accounted for during collection curation: interoperability is essential to sustainability in the networked digital environment. Unfortunately there are currently no sources that provide comprehensive coverage of collection development principles or practices for building large-scale aggregations.
The previous sections covered high-level considerations that should guide collection building into alignment with user needs. This section begins to elaborate collection curation, especially as it relates to description, with greater specificity: how does one set about making a useful, rich, and shareable collection description that will facilitate discovery?
A recurrent theme in the following resources is the curatorial necessity of keeping long-term considerations in mind during collection description, with awareness of how the collection may be shared, aggregated, or repurposed outside of its original context. To preserve for the long term the contextual value that a collection provides, collection descriptions must answer certain questions:
- On collection characteristics: What is the title of the collection? Where can the collection be accessed (persistent URL)? How many items are in the collection? What media types and formats are represented in the collection? What is the topical, temporal, and geographic coverage of the collection? What languages are represented?
- On the provenance, administration, rights or access restrictions, and institutional affiliations of the collection: Where do the collection and items within the collection come from? How were they brought together and by whom? What is the custodial history of the collection? Are items readily accessible online, and if so, with what restrictions? What copyrights apply to this collection and items in it? Were resources in the collection born digital or do they exist in physical form, and if so, how can they be accessed? For aggregators: What metadata formats are in use? Are there alternative access points (e.g. an OAI-PMH data provider for items in the collection? An API? An RSS/Atom feed for items in the collection)? Does this collection have associated projects or other collections?
- On the relationship between items and the collection: Why have these items been brought together as a collection? What is in the collection, or what is the shape of the whole? What is the collection’s target audience? How does the gathering of these items into a single collection create a new information resource with value added beyond the value of individual items?
Of these, the last set of questions – on the relationship between items and the collection itself – highlights the primary way in which collection description differs from item-level description and requires special consideration. In this way, collections as information resources are distinct from items as information resources, and also represent a new resource of greater value than the sum of items within the collection. Section 3.1 addresses collection description, building on the idea that description that facilitates discovery and use over the short and long terms is a curatorial activity; the section following carries the themes of section 3.1 further, to item description in the context of collections and aggregations.
Collection description should rely on relevant element set and vocabulary standards whenever possible, maintaining two kinds of balance:
- The collection description should be expressive enough to capture local context and domain-specific information, both for users and for collection administrators, and at the same time shareable and fully expressive outside of the local context.
- The collection description should rely both on structured data (for indexing and automated retrieval) and free text description; there is interplay between the two.
Certain lessons about creating useful collection descriptions translate to describing and sharing metadata for items within collections:
- Metadata should be created with long-term considerations in mind, including the potential for repurposing, relocation, or loss of collection- and institutional context.
- Balance between expressiveness and interoperability is critical. Metadata should rely on standards suited to the nature of the items and expressive enough to capture relevant information and context, and yet should prioritize interoperability. Collection administrators should ask themselves the following sorts of questions when creating item-level metadata: Can the data be shared without information loss? Is the data structured and made accessible such that sharing and repurposing are technically feasible? What information would be lost by moving item data to another context?
Metadata standards for describing all types of items from all domains abound; we do not broach specific metadata or sharing standards or protocols. Instead, resources in this section cover best practices for shareable metadata (regardless of format, though in practice the most common example is Dublin Core) in the context of collections and aggregations of collections.
Digital aggregations can provide essential metastructures for unifying distributed collections and content, and therefore have a role to play in collection curation. Aggregation at various levels, including but not limited to the following levels, is increasingly common:
- Large-scale, national and international digital libraries aggregate highly diverse collections from all kinds of institutions.
- Thematic aggregations pull together topically related collections or items from a variety of institutions, usually for a particular audience with a particular topical interest.
- Local, institution-specific collections of collections often integrate technically and topically diverse data into a single content management system behind a single point of access.
However, the act of bringing together and providing access to a large number of collections does not guarantee that the resulting aggregation will be a useful resource for researchers. Each kind of aggregation entails challenges for both aggregation administrators and individual collection administrators. Given the commonness of aggregation and its decontextualizing potential, collection curation should anticipate how aggregation, and other kinds of data repurposing, may affect the informative value of collection- and item-level data. For example, aggregations that provide item-level search across hundreds or thousands of collections often collapse or obscure original collection organization, thereby weakening or obfuscating – from both users’ and administrators’ perspectives – the relationship between an item and its original collection. Aggregators should develop the aggregation in a principled way, both to create an aggregate resource of greater value, and to preserve institutional and collection contexts within the aggregation. As collection administrators develop and describe collections for use in a local context, the following resources on aggregation-development principles and strategies may inform how curation activities can anticipate aggregation.