Article contents [±]
- Planning Questions and Issue Spotting
- University of Michigan Library: A Case Study
- Resources: General
- Resources: Copyright
- Resources: Privacy
- Resources: Data
- Open Ideas, Open Access
- Resources: Open access
- Resources: Articles Addressing the Benefits of Open Access
- General article comments
Memory institutions have a long history of curation: collecting, preserving, sharing, describing, and interpreting all kinds of tangible material. To address the legal issues for digital curation projects, it is important to start with the “big picture” questions: what does digital curation in the humanities mean for our collective duty as stewards of memory? How is digital curation in the humanities a fundamental concern for new invention, ideas, and expression? Asking and answering the big questions help shape the way legal issues are addressed in the context of humanities projects. Collecting digital creative or original works prompts concerns about a wide range of possible legal issues – copyright, contract, privacy to name but a few. These legal responsibilities are significant but should be considered in the larger context of the socially-valuable stewardship responsibilities libraries, archives, and museums engage in or new ways of teaching and approaching education.
Where to begin? Start with core principles and encourage decision makers to learn to spot the legal issues, address them in written policy and practice, and rely on the growing body of standards as appropriate to your effort. Is anyone harmed? What is lost if we fail to act? How is our role different from the private sector? How can we collaborate across funding opportunities (whether public, private, or philanthropic) to meet the potential of digital curation in the humanities? Different approaches to legal issues may be needed for different kinds of projects. Are you scanning special collections of distinct analog materials (books, sound recordings, artwork)? - or collecting and preserving digital-only web-based material? – or setting up preservation and emulation collections for digital video games? High-level principles, policies, and standards can help you develop responsible, productive projects even for very different kinds of materials or collections.
When people think about legal issues, they commonly think of contracts, copyright, privacy and similar concerns. However much of the information with legal significance is also to be found in written policies for your organization. It may be legally significant for your project to have written practices and procedures for your specific project. Such documents can be as simple or complex as the situation warrants. They provide a framework for consistency that helps with project management and orientation for staff working on the project. It is worth investing in these kinds of documents and revising them as needed, tracking version dates. Policies and procedures are important for good communication and keeping projects and people focused on a given goal. Further, they are important legal evidence – knowing these kinds of records are in place helps minimize legal exposure and helps minimize that exposure if questions do arise. This is why there is an emphasis on guides, standards, and planning in the Resource section. Community standard and practice and the specific policy and procedure associated with a project are important as a functional matter as well as a legal concern.
The following questions are meant to help project managers plan and recognize some of the matters that should be addressed – this list is by no means comprehensive. (This list is influenced by the Digital Preservation Workshop led by Nancy McGovern and Kari Smith: http://www.icpsr.umich.edu/dpm/workshops/instructors.html.)
Content in the information ecosystem
- Did you create it? Can you share it?
- Did you consider Creative Commons and open licenses generally as tools for communicating your intentions with regard to your own work?
- If you did not create it, is there a copyright holder? Do you need permission to copy the work?
- What level of access will you provide?
- How will you document all of the known legal characteristics in metadata?
- Are you licensing materials or software? How do the terms of the licenses affect long-term usability?
- Did you get information when possible from creators of content e.g. copyright, privacy, donor restrictions?
Collecting or depositing in an archive or digital repository
- Do you have the legal right to collect, preserve, and or provide access or use the materials?
- Are there any copyright concerns in the material being collected or deposited? What are they and how are they documented?
- Are there privacy concerns related to the material being collected?
- Have you looked at examples of submission agreements – will you use a model or tailor it to your project? (See Resources for agreements in the TAPER project, ICPSR, Data-PASS, MetaArchive, Deep Blue (UM), HathiTrust.)
- Who is in charge? Who is the lead? Who are the parties?
- Is there a steering committee, an operations committee – how are they composed?
- Who are the authorized representatives for all roles, approvals for replacements; how will you communicate and document these matters?
- How will new members be handled?
- What rights to members have to contribute data? Do they retain any copyright or intellectual property rights? Do they have information about copyright, privacy, or other rights and is it included in metadata or some other manner?
- How can members withdraw or be removed? If this is necessary, will their contributions remain with the corpus of the collection? Will they have access to their contributions after their membership ends?
- How will disputes be handled?
- How will changes to agreements be handled (in writing)?
- Is the content your creation or someone else’s?
- Is there any impetus to secure the resulting collection or body of material or can it be shared and remixed?
- Do any of the exceptions to copyright apply? — Section 107, Fair Use; Section 108, Exceptions for Libraries and Archives; Section 110, TEACH Act.
- Even if there are no copyright concerns, there may be other legal, ethical, or practice reasons for managing collections and access to collections.
Privacy and Human Subjects
- Are there any privacy or confidentiality requirements such as medical, financial, or other personal information included in the material?
- How will private information be protected? Should you collect it at all – if so can you manage the security and access control needed?
Access and Security
- Who will have access to the materials online? Who will have access to the servers/storage?
- What security restrictions will be needed, over and above the normal security for a data installation?
- Did you get information for the desired level of access in terms of duration (short, mid, or long term); who will have access?
- How will you control the material to conform with any rights you have cleared, ensure metadata is accurate over time?
- Have you addressed accountability in terms of responsibilities and documentation? Do you need to plan for audits for financial matters, security or other areas of concern? Will you do periodic self-audits or self reviews?
- How will you deal with risk of loss as a legal matter, as a financial matter, and as a technical or practical matter (e.g. redundant backups)?
- How will derivative research be managed?
- How will derivative research be made available to others inside and outside your organization?
- How will publication of derivative research be managed legally and logistically?
- How will the “remixed” data be maintained with regard to changes made to source data in a digital repository or collection?
- Are policies and procedures in writing, publicly available where appropriate?
- Are there multiple participants? If so, responsibilities should be stated explicitly in agreements with and among the participants.
- Are policies and procedures reviewed on an ongoing basis to confirm they are accurate, being followed, and to consider needed updates?
- Are there policies and procedures addressing copying, need for redundant data, authentication systems, firewalls, backup, and disaster preparedness, staff training?
- Is your project consistent with your organization’s policies and procedures?
- Is your project documented in a way to ensure transparency? Is transparency appropriate in some areas e.g. privacy, security matters? Can you distinguish these?
- Are there opportunities for open design and if so what are the implications for practical policy and procedure?
The environment at the University of Michigan Library of is one of “policy in action” in support of the campus, research and scholarship generally. The culture is proactive with attention to both opportunities and challenges. There is high value placed on collaboration, cooperation, and shared responsibilities. Many projects are related by an overall culture of creative thinking and possibility combined with nuts and bolts pragmatism (“how” do we do it?). This culture is further supported by vital help from the University’s office of general counsel, which is able and willing to think creatively with the Library to find ways to address new and evolving legal scenarios in a manner consistent with their overarching duty to the University as a whole. In this environment, seemingly distinct projects and services are thematically or functionally related because of this institutional culture.
Deep Blue is the University of Michigan's institutional repository service. To participate in Deep Blue, authors (as the creators and copyright holders of their work) must enter an agreement. These standard agreements and the intellectual property policy that governs the service are easily accessed at the Deep Blue website. (See Resources.) The agreements are very simple and straightforward. They are significant because they only require authors to grant permission to the repository for us to do necessary tasks – no transfer of copyright is required or needed. Authors provide a simple non-exclusive grant to the repository to “display and distribute the submission including its abstract and descriptive information in electronic format in accordance with the Repository's policies, copy, convert or migrate the submission to any medium or format for the purpose of preservation and access, keep more than one copy of the submission for purposes of security, back-up, and preservation.” Deep Blue’s approach to author agreements reflects the need for permanence and consistency as well as a commitment to open access to the scholarly work product of the University.
The HathiTrust is another manifestation of this commitment to preservation and access. HathiTrust presents more possible permutations than Deep Blue regarding agreements and copyright. As a partnership of major research institutions and libraries, the HathiTrust works to preserve the cultural record and ensure that it is available in the future. The HathiTrust’s website reflects overarching policy and culture as an informative, transparent, living record of its policy and governance documents which facilitate collaboration among partner institutions .
HathiTrust preserves works as, for example, a book as an artifact and as a cohesive object to be read– but also makes it possible to treat the same content as data. This in turn allows for non-consumptive research. Non-consumptive research allows researchers to use “computational analysis of one or more books without the researcher having the ability to reassemble the collection. Rather than reading the material, researchers use specialized algorithms to analyze text as a massive data set…” The Sloan Foundation is funding a research project by Indiana University's Data To Insight Center (D2I) on non-consumptive research with HathiTrust as a large mass digitized collection. D2I is partnering with the HathiTrust Research Center (HTRC) and the University of Michigan's Department of Electrical Engineering and Computer Science on the project. This kind of research opens new possibilities for scholarship and discovery.
For books “as books”, the HathiTrust provides full public access to read works in the public domain — and for books that for which permission is obtained from the copyright holder to share publicly. (We provide Creative Commons options available for interested copyright holders.) For the public domain determination, we use conservative cutoff - 1923 for US works, 1870 for non-US works. But there are many works beyond that category that are likely in the public domain.
To research more with more depth and identify books that may be in the public domain, the IMLS generously funded the Copyright Review Management System (CRMS). In CRMS, the University of Michigan Library partnered with other research libraries to ascertain the copyright status of works published in the US between 1923 and 1963. In that time frame, there were formalities that were required to maintain a copyright in the US; failure to do so meant a work entered the public domain. The analysis is complex, so conservative cutoff dates are typically efficient. However, by developing a transparent, documented process executed in collaboration with other libraries, every day brings new information about the copyright status of books. Reviewers trained in the process examine works in the HathiTrust – their access is secured and limited to the review process because of the need to protect scans of books that are subject to copyright. In thinking about how legal status (and thus metadata) changes, there is a more complex sequence for providing access and for determining the rights status of works in HathiTrust than, for example, Deep Blue. The IMLS will provide another grant to the Library (the initial grant period concludes November 2011). The new grant allows us to work with a large number of partner institutions to continue our revies of books published in the US from 1923 to 1963 and to develop a process for learning more about the copyright status of books published in the UK, Canada, Australia and Spain. These grants are opportunities to think further about legal mechanisms to ensure access and thus relevance for books in our collections. Copyright-related information is an invaluable component to thinking about how we can improve and innovate.
These examples demonstrate the way different perspectives and roles can be fluid and influence each other. A general observation: experience with analog collections offer lessons for ways to approach the multidimensional issues of digital curation and associated legal issues. For example, the context of a collection of analog material may covey meaning as a body of material. Further, there may be significance in each of the items that make up the collection and further significance in how those things those things were made as well as the materials they are made from. All of these considerations exist in the realm of digital curation, making for even more multidimensional research opportunity as well as growing complexity. Each of those perspectives may have different legal implications. Get comfortable thinking about your work as continuous problem solving because there are rarely clear or definitive answers; and if there are, they will change over time. Institutional and professional culture as well as overarching policy are key aspects of working through legal questions.
Remember that copyright status changes over time, making it a real challenge. When you can get intake agreements directly from creators that allow you to preserve and use over time, you ease the circulation of material subject to copyright. Agreements to ingest content – like those used for Deep Blue – obtain from the author the rights needed for repository preservation and access while leaving the author with their copyright. The field of copyright is so large and nuanced that I am only providing a few resources here – there are many other comprehensive copyright guides. You should be familiar with basic copyright principles including exceptions under US law. There is active discussion through IFLA and other library groups regarding copyright on a global scale and its intersection with digital collections of all sorts. There are no absolute solutions. It is important to be comfortable with indefiniteness in this arena and critical to act in the letter and spirit of the law. Given this indefiniteness, your written policies and community practices are legally significant.
Think about privacy concerns associated with the content being collected – medical or financial information are flags. These should be identified prior to collecting and on an ongoing basis. It may need to be documented in written agreements with subjects. Can information be depersonalized? Consider necessary substantive administrative metadata and how to handle them as private information.
Contracts are legally enforceable promises to do or not do something based on some reliance and/or exchange. A license, a memorandum of understanding, and agreement – all of these may be legally binding contracts regardless of what they are called. Approach contracts as an opportunity for communication, which may lessen the risk or need for legal action (or the need to respond to legal action) once your project is underway. Agreements may be needed between you and other organizations that are collecting or contributing to the body of material, licenses for software to support the whole, intake agreements and more. Some general comments on contracts: state what each party is expected to do and reference any relevant procedures or practice standards. They should address duties, how warranties will be made, indemnifications if any (that is, what happens if the party making the promise fails to meet its duties or breaches), how that will be funded (insurance requirements). Sometimes contracts are silent about certain issues depending on the nature of what is at stake, whether it could be cured, and the nature of the legal entities who are party to the contract – but that should be a conscious choice not an oversight. Think about how agreements can help address rights or duties from the moment of creation of the content as well as through a lifecycle.
In the digital realm, content is data. You can look at an object in many ways – as the whole, as the sum of its parts. It quickly becomes rather metaphysical. The sciences provide many resources for scholars and project managers useful for digital curation in the humanities because scientists are accustomed to thinking about data. In the humanities, one needs to think about the same material from multiple perspectives.
Ideas associated with open access and tools like Creative Commons licenses may help shape the way you approach projects. Applying these concepts in different arena may help with interoperability, access, long-term preservation to name a few areas. As a legal matter, “open” are more fluid and easier to administer. Open access policies functionally remove copyright concerns, easing transactional costs and expanding opportunity in situations where sharing is desired by the copyright holder (typically the author) That said, one needs to also keep in mind privacy responsibilities. You can have content with no copyright issue that still needs careful management or access limits for privacy reasons. Its important to keep these different strands in mind and distinct from one another. Resources are provided as background reading.