Article contents [±]
- Cyberinfrastructure for Digital Classics
- Resources: Cyberinfrastructure for Digital Classics
- Virtual Research Environments and Research Portals/Projects
- Resources: General Classics portals
- Resources: Ancient Near Eastern Studies portals
- Resources: Archaeology portals
- Resources: Manuscript Resources and Philology portals
- Resources: Papyrology portals
- Digital Repositories
- Primary Sources and Research Data
- Resources: Archaeology research data
- Resources: Epigraphy research data
- Resources: Manuscript Studies research data
- Secondary Scholarship
- Resources: Working papers
- Federated Collections and Research Databases
- Resources: Federated Collections and Research Databases
- Resources: Ancient Near Eastern Studies standards
- Resources: Archaeology standards
- Resources: Epigraphy standards
- Resources: Manuscript Studies standards
- Resources: Papyrology
- Resources: Philology
- Collaborative and Communication Tools
- Resources: Collaborative and Communication Tools
- Model Digital Resources
- Resources: Epigraphy best practices models
- Resources: Manuscript Studies best practices models
- Resources: Papyrology best practices models
- Digital Preservation Lessons from Classics
- Resources: Digital Preservation Lessons from Classics
- Research Practices
- Manuscript Studies and Related Classical Disciplines
- General article comments
The field of classics encompasses a large number of related disciplines such as archaeology, epigraphy, manuscript studies, numismatics, philology and papyrology, each with their own unique set of research methods, objects of study (including various types of artifacts, ancient monuments, ancient documentary and literary texts, coins, inscriptions, papyri, etc.), and various ways of sharing and preserving data. Classical studies is thus an inherently interdisciplinary field that has also long made use of advanced digital technologies (e.g. advanced imaging and document recognition for classical languages such as Sumerian, Ancient Greek and Latin, the 3D reconstruction and visualization of ancient monuments, and the use of TEI-XML to create digital editions of classical texts). The relatively advanced digital nature of many classical disciplines has in turn shaped both the research practices of the field and increased the need for data curation strategies that address the complex needs both of specific disciplinary research methods and specific types of digital data that are created as part of the research process. While the field of classics has long faced the issue of preserving fragile physical artifacts such as damaged manuscripts and centuries old fragments of papyri, it now increasingly faces the challenges of preserving digital objects created to represent these artifacts as well.
Complicating matters is the fact that for many of the disciplines of classics, preserving the interpretative stages and individual decisions involved in creating a “final” scholarly argument can be as important as preserving the final result of such scholarship, such as a virtual reconstruction or digital edition of an inscription. For example, the digital reconstructions of archaeological monuments typically involve a large amount of uncertainty and individual scholarly interpretation, yet many visualizations are often viewed by students as complete and accurate representations of “reality.” Similarly, in creating a digital edition of a classical text (e.g. a play of Aeschylus with many manuscript sources), many individual scholarly decisions are made in terms of what text variants to include or what manuscript witnesses are considered more reliable, yet many digital editions lack the “apparatus criticus” that contains such decisions and can give the illusion of one text. These issues and projects/solutions that have been created to address them will receive further attention below.
For data curators, the key question to consider is how the research practices of “digital classics” are creating new challenges for data curation, and indeed a number of significant projects across the disciplines are currently working to address some of these challenges. Despite often seemingly huge differences between various disciplines in the field of classics, there are a number of common themes that will require further research and collaboration between classical scholars and those working in data curation.
To begin with, the difficulties of defining both the complex semantics and structure of classical data that needs to be preserved must be addressed. Data in classics is extensively multi-lingual and multi-script (with many different languages such as Ancient Greek, Latin, Sumerian, Sanskrit, etc.) The same data (e.g. data about the same classical place, person or other named entity, descriptions of the same archaeological object, multiple images of the same inscription) found across different projects is also often described using very different vocabularies. Similarly, more research will need to focus on how meaningful data integration might be used to create larger digital classical resources that could then possibly be more effectively curated. A variety of issues complicate this process, including the fact that multiple digital facsimiles of objects exist in various digital data collections (often with greatly varying levels of metadata). To solve this issue, many projects have chosen to create virtual data centers or utilized a federated approach allowing data to remain distributed and independent. In fact, complete interoperability or full data integration may be impossible to attain and is also not necessarily an ideal solution according to many practitioners.
As indicated by the projects that will be covered here, the process of data curation and of ensuring data sustainability has many components. Many consider the technical components of sustainability to be the easiest task for the long run and stress that long-term financial planning for the organization (or organizations) that will host and curate the data is far more essential. Similarly, political considerations of both the needs of data contributors/partners and users (both current and future) must be taken into consideration.
The issue of standards and digital preservation/curation is also of great importance, and many projects emphasized the potential use of XML as a preservation format for digital humanities data. Standard formats are required to ensure some level of data conformity for both interoperability and long-term curation, and tools were frequently developed by projects to help scholars and other contributors create data that conforms to standard formats or ontologies.
Regardless of the discipline, a number of basic requirements for a digital repository or data curation system have been clearly identified. To begin with, systems must provide for clear authorship of data contributors to ensure proper attribution and credit. Varying levels of editorial control and authorization will also typically be required, and many system developers also note that the ability to support at least temporary embargoing of data in both the short and long term is a frequently desired feature of users. While all data must be clearly versioned, it is clear that earlier versions of data must also be kept available in order to ensure persistent citation and a fully traceable scholarly record. To assist in this process, standard identifiers should thus be used whenever possible to encourage persistent citation and linking. Another important feature was providing for different levels of participation (e.g. supporting full service hosting of content vs. allowing users to act as digital curation partners).
Another significant issue is the challenge of curating humanities research as well as digital data. Many projects and resources have illustrated that the scholarly interpretations of digital objects need to be encoded and stored as one form of metadata along with other more traditional types of metadata (e.g. technical, administrative, etc.) Curators will need to understand how digital data is used by scholars in their research in order to best support both active curation and to help plan for future use of that data. One potential research topic is studying how to develop collaborative workspaces that save and curate data as it is being created: for instance, how does one effectively curate a distributed editing environment? Similarly, further research is needed in how best to curate algorithms and computational processes that are used in the creation of digital data and now often serve as a key part of creating humanities scholarship.
A final topic frequently raised is how active curation that supports data reuse might serve as one method of effective long-term preservation. To begin with, open licensing schemes are required (in order to make multiple copies of data for preservation and to support reuse). Additionally, at least some minimal metadata must be encoded with digital data to support greater reuse in the future. An essential role for curators in this task is to promote the development of "communities of use" around the digital objects they curate.
As illustrated above, a number of important issues for data curation exist across the classical disciplines, and a number of projects seeking to address these issues will be examined in greater detail here. The resources presented here have been grouped by important themes and drawn from across the related disciplines.
In order to support data curation and digital preservation across the spectrum of classical disciplines, there have been increasing calls from within the discipline to build a sophisticated cyberinfrastructure that can support a wide variety of research needs. A recent issue of the Digital Humanities Quarterly (DHQ) entitled Changing the Center of Gravity: Transforming Classical Studies Through Cyberinfrastructure included a series of articles that addressed different aspects of such a cyberinfrastructure. There have also been calls to create a comprehensive digital library or repository that can preserve a variety of digital classical scholarship, as addressed by the articles listed below.
Another major challenge for data curation in classics and across the humanities will be determining how to curate data that is created through increasing scholarly use of virtual research environments (VREs) or disciplinary research portals that are currently being created in a number of humanities disciplines. Data is broadly construed here not just as the structured or unstructured data sources that are made available through such environments, but also the algorithms and computational processes that are often used in the creation of digital scholarship. There are many projects across the classical disciplines that have sought to develop VREs or portals that have already and will continue to create large amounts of digital data, and this list of resources has been grouped by larger discipline.
While many VREs and research portals listed above also include some basic data preservation functions, most have likely not been designed with the long-term curation of the digital research data that has been created within them in mind. To address issues of longer term sustainability and curation of both digitally created research data and digitized copies of analog data, digital repositories have been developed for a number of disciplines within the field of classics, with the largest number by far within the discipline of archaeology.
Much research within the classical disciplines as well as within the humanities involves the use of primary sources and data, from manuscripts and early printed editions of classical texts, to ancient inscriptions, scraps of papyri, excavated artifacts, ancient coins, images of classical art objects (as well as the analog objects themselves), and virtual reconstructions, among many other types of sources. Although research in the humanities is often not considered to produce much in the way of “research data,” increasingly digital research across many classics disciplines (as well as “traditional” research in many fields such as archaeology) in fact has produced a wealth of data that now needs active curation. In addition to curating already existing legacy and born digital data, curators will also need to explore new ways to encourage researchers to contribute the data that they are actively creating as well.
As with most humanities disciplines, effective data curation strategies for classics will require preserving both the data used in creating scholarship as well as the final products of that scholarship, including publications such as journal articles and monographs. The preservation and curation of formally published scholarship in digital repositories (either institutionally based or disciplinary) is perhaps the most well-established and supported form of digital preservation, but classics has lagged somewhat behind in this area.
One long-term challenge for data curation in classics and indeed across the humanities is the large number of individual digital projects that have idiosyncratic or “fuzzy” data, and even when complementary projects have data regarding the same object (e.g. an image of a classical vase or a Latin inscription) it may have been described using very different metadata vocabularies or ontologies. The need to allow individual projects to maintain their autonomy while also providing a long-term infrastructure that can provide greater semantic interoperability and integration, more sophisticated levels of access, and ideally long term curation of federated data sets is currently being explored in greater detail by a number of projects.
The existence of standards in well-understood formats can be one key component for the long-term curation of digital data, and various standards have been developed for different classical disciplines and the larger digital humanities including the Text Encoding Initiative (TEI), EpiDoc, and ArchaeoML. TEI in particular has proved particularly important and adaptable across different classical disciplines, and is often used as the core standard to promote interoperability. Standards proved to be important not just for the integration of data from different projects, but also at times served as a tool that could be used to support the reuse of data created for an earlier project by later research.
While a great deal of research in the humanities is often conducted by individual scholars, one prominent feature of much digital scholarship is that it is often quite collaborative in nature, and requires resources that support both collaboration and scholarly communication. A number of important resources/projects exist within the digital classics community that promote greater collaboration and communication among interested scholars who hope to share their work, find new colleagues, identify best practices, among many other possibilities.
The following list of resources illustrates projects that model best practices for building digital resources that can be sustained and effectively curated for the long-term and is drawn from across the classical disciplines. These best practices include the use of standards, the provision of documentation, permanent and fully citable URLs, and open access to data that can be exported in a variety of formats.
As the various disciplines of classics have long dealt with both utilizing fragmentary and fragile sources from antiquity as well as preserving them, those working in the discipline have some useful advice regarding larger data curation issues.
While classical studies is an inherently interdisciplinary field with a number of related disciplines, a number of these disciplines have specific research practices that will require different strategies for data curators. At the same time, many of the disciplines of classics share the same or similar kinds of primary sources as well as a number of research techniques that might lend themselves to common solutions.
The field of classical archaeology has long made use of information technology and of perhaps all of the related disciplines of classics it has by far the most diverse and complicated types of data, both analog legacy data and newly digitized data, all of which needs to be actively curated. Since archaeological sites are typically destroyed during an excavation, the need to carefully document data as it is excavated is also very important. Archaeological research includes significant work in the field and while extensive information is typically recorded at the site, such as through the creation of context cards, much information about excavated objects is ultimately entered into a database after the site excavation is completed. This double entry is not without its problems and the VERA project tried to address this issue through the introduction of information technology into the field. They discovered nonetheless that the introduction of technology did not necessarily improve data quality, and illustrated the importance of documenting actual research practice and critically examining how technology may or may not assist in solving difficult issues.
While many archaeologists actively preserve the data that they have created, the creators of both Digital Antiquity and OpenContext noted that much of this data is not preserved in a systematic or sustainable fashion. Both projects are seeking to encourage archaeologists to not only deposit their legacy data, but also to actively deposit their current research data so that it is easier to access and can be shared with the larger scholarly archaeology community. Fears about copyright, intellectual property, and being “scooped” by another scholar before formally publishing research that makes use of data they have gathered and organized are all complicated issues that the Archaeological Data Service, Digital Antiquity and OpenContext are trying to address through a variety of solutions (e.g. indicating clear authorship of contributed data, supporting embargoing of data). Similar fears about the loss of recognition were also expressed by papyrologists in Roger Bagnall’s discussion of attempting to get papyrologists to contribute their data to the Integrating Digital Papyrology project. These issues will thus need to be considered by data curators seeking to actively work with archaeologists to preserve their data.
A related issue that was often cited by many working in the field of archaeological data preservation is the lack of common standards for describing, integrating and sharing archaeological data, despite the existence of standards such as ArchaeoML. While the existence of a single standard for recording and sharing archaeological data is both unlikely and impractical, the vastly varying nature of descriptive practices has made the development of larger digital repositories for archaeological data far more challenging, particularly in terms of efficiently sharing data of potential interest between related researchers. Nonetheless, there are growing calls within the field that argue that one of the major benefits of making data available digitally is that it allows for data to be more closely integrated and linked to published interpretations of that data in scholarly publications and archaeological site reports. Data curators will thus need to ensure that archaeological data within their care can easily be linked to and cited online in order to support the next level of digital scholarship. At the same time, while electronic publication in archaeology, according to a recent report by the Center for Studies in Higher Education at the University of Berkeley (Harley et al. 2010, above), is slowly growing in acceptance by senior scholars, the traditional and established format of publishing for tenure in archaeology remains the monograph.
The increasing availability of technologies to create 3d models or digital reconstructions of archaeological sites has also led to a huge growth in the creation of such models within the discipline of archaeology, with two of the best known sites being Digital Karnak (http://dlib.etc.ucla.edu/projects/Karnak/) and Rome Reborn (http://www.romereborn.virginia.edu/). The consequent growth in these models has also led to new challenges in archiving not just the models, but also in finding ways to record and preserve the levels of scholarly interpretation and uncertainty inherent in creating them and has been addressed by the SAVE project discussed above. Curators will need to actively work with scholars to develop metadata models and technical solutions to address these issues.
The need to actively curate not just digital data that scholars create but also to preserve or at least record or store the levels of interpretation, uncertainty and individual scholarly decision making involved in the creation of many digital objects, such as digital editions of Cuneiform text, inscriptions, papyrus fragments and manuscript transcriptions, illustrates one of the common data curation problems found across the classical disciplines including epigraphy, papyrology and manuscript studies.
Epigraphy is one of the largest related disciplines of classics and involves the study of ancient inscriptions or epigraphs that have been engraved into durable materials such as stone. The discipline of epigraphy is quite advanced digitally and numerous projects exist online including large digitized inscription collections such as the Corpus Inscriptionum Latinarum (http://cil.bbaw.de/cil_en/dateien/forschung.html) or the PHI Greek Epigraphy Project (http://epigraphy.packhum.org/inscriptions/main) and federated databases that search various inscription collections such as EAGLE (Electronic Archive of Greek and Latin Epigraphy-http://www.eagle-eagle.it/). Although a large number of inscription collections are still published only in print, Gabriel Bodard recently argued that digital publication provides a more sophisticated way of managing inscriptions in that it allows for inscriptions to be more fully treated both as texts (e.g. through markup and subject encoding) and as archaeological objects (see Bodard 2008, above). The digitization of large numbers of inscribed texts thus raises a number of issues for the long-term curation of these inscriptions and the research practices of epigraphy, the most important of which will be to effectively curate both the encoded text/s of an inscription and storing and linking them to the multiple images of an inscription that are digitized or potentially to visualizations that have been created of the archaeological object on which the inscription was found.
The standard practice in epigraphy has long been print publication of inscriptions that have been transcribed using a set of publishing protocols called the Leiden conventions, which according to Roued (2009, above) are “a type of semantic encoding” that uses brackets, underdots and other markings to indicate missing characters, uncertainty, addition or other corrections made by the editor of the inscription. Many digital epigraphy projects (largely in the form of relational databases) have consequently directly transferred Leiden encoded inscriptions to a digital form. Cayless et al. (2009, above) and others within the field of epigraphy have increasingly challenged this practice by advocating the use of the EpiDoc standard, because the use of EpiDoc allows not just the encoding of an inscription text, but the inclusion of information about the history of the inscription, scholarly commentary on and descriptions of the text, as well as links to photographs and translations. In addition, Cayless et al. argued that inscriptions that are encoded in EpiDoC XML could be far more easily exported both as individual inscriptions and as entire corpora, thus making it far easier to distribute copies of a digital epigraphic corpus and preserve it. Data curators will thus need to work with creators of digital epigraphy projects to encourage wider adoption of standards such as EpiDoc, encourage creators of projects to make them available for download, and then actively download copies of digital corpora of inscriptions so that they can be preserved in multiple locations.
One other significant issue identified by both Bodard (2008) and Cayless et al. (2009) in the design of digital corpora of inscriptions was the need to use persistent identifiers (such as DOIs) at the level of individual digital objects (e.g. an individual inscription) so that in a future where the same digital inscriptions may be published and even preserved in multiple location, individual inscriptions could still be cited independently of location. The need to support persistent citation and linking to all types of data stored within digital repositories will be an important feature that data curators will need to implement in order to support digital scholarship in the future.
Although manuscript studies is not a “discipline” of classics per-se, the first standard critical editions of classical texts upon which modern scholars frequently still rely were based off the study of large numbers of medieval manuscripts. Ancient and medieval manuscripts are an important data source for study across a number of classical and historical disciplines including codicology, philology and palaeography. Access to both the digitized images of manuscripts as well as various levels of transcriptions (e.g. a basic text transcription, a TEI-XML digital edition) has thus been cited as a key component of a cyberinfrastructure for classics (see Crane et al. 2009, above).
The large amount of data created by both libraries and scholars in the digitization and transcription of manuscripts raises a number of significant curation issues. Data curation and digital infrastructure for manuscripts will require providing both long-term management of and effective access to the wealth of data created in their digitization. This data includes digital images of individual manuscript pages as well as digital versions of entire manuscripts, diplomatic transcriptions of individual pages and entire manuscripts, possibly multiple TEI editions of the same manuscript by different scholars, and scholarly annotations (both historical ones such as scholia and modern ones such as philological commentaries on words).
Data curation will also need to consider the various ways in which different types of scholars use these data sources in order to provide varying levels of service. For example, the discipline of philology studies historical languages, particularly the grammar and history of words and their variant forms, and philological scholarship within Classics typically produces “shared primary and secondary sources about linguistic sources” (Crane, Seales and Terras 2009, see above) with a particular focus upon Greek and Latin. The research needs of philologists then, especially in terms of creating digital critical editions, are closely tied to the digitization of individual Greek and Latin manuscripts and annotated corpora of historical manuscripts, and typically they require access to all of the data used in their creation (e.g. multiple manuscripts of the same classical work, diplomatic transcriptions that include all textual variants). Teuchos and TextGrid are thus seeking to create comprehensive environments that not only support the creation of new philological data by scholars working with digitized manuscripts and other sources, but also the development of collaborative workspaces for sharing data and archives for maintaining such data in the long-term.
The discipline of papyrology studies documentary texts, ancient literature, personnel correspondence and many other types of texts that have been preserved in papyri. The publication of papyri online, such as with inscriptions and archaeological reconstructions, involves a large amount of scholarly editing and individual interpretation, and has been described by many within the field as akin to the act of creating a scholarly edition. Data curation for papyrology, as with many other disciplines within the larger field of classics (including both epigraphy and archaeology), will thus require preserving not just the digital objects and databases that have been created, but the scholarly interpretations involved in the creation of digital objects/editions of individual ancient texts.
There are numerous digital collections of papyri available online (two both well-known and large full-text collections are Oxyrhynchus (http://www.papyrology.ox.ac.uk/POxy) and the Duke Data Bank of Documentary Papyri (DDbDP- http://www.papyri.info/ddbdp/) as well as major papyri aggregators such as the APIS and important databases of papyrological metadata such as the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV- http://aquila.papy.uni-heidelberg.de/gvzFM.html). These resources as well as several prominent research projects, namely e-Science and Ancient Documents (eSAD) and Integrating Digital Papyrology (IDP), have illustrated a number of important lessons about the practice of papyrology that will improve important to long-term for data curation.
To begin with, there is a significant amount of digital papyrological data and metadata currently available online, often with differing levels of data and metadata about the same papyri in different collections and databases. Effective curation of this data will involve meaningful integration and access to this data, and the Integrating Digital Papyrology (IDP) project is seeking to provide one possible model of data integration for papyrology. This project has also offered important lessons (see Bagnall 2010, above) in terms of what is required to actively solicit the participation of scholars in terms of sharing their research data (e.g. ensuring appropriate authorship and credit) within a long-term archive and how to build editorial models that reassure potential contributors/users of the data quality of such contributed data.
Similarly, eSAD’s research offers insights into the need to curate not just the research outputs of digital papyrology but also the research methods used by scholars in creating digital resources. The work on an interpretation support system by eSAD to model and store scholarly interpretations (Roued 2009, de la Flor et al. 2010, above) during the use of their prototype research environment for working with ancient texts illustrates one important type of data that is only created during the research process itself and would likely be hard to recreate later. Data curation processes will thus need to be created that can address the archiving of evidence and methodology of digital scholarly research not just the “final” scholarly output (e.g. a digital edition of a papyrus fragment).