Archiving and Text Fluidity / Version Control
- Archiving Scholarly Publications in Digital Form
- Text Fluidity / Version Control
- The Interviews
Alan Burk, James Kerr, and Andy Pope
Four processes might preserve digital objects: physical conversion to an accepted archival medium; digital conversion that parallels changes in the hardware and software environment; physical maintenance of the appropriate hardware and software environment; or virtual maintenance of the appropriate hardware and software environment. The literature describing these four processes or promoting one or another of them is complex and necessarily speculative. The explicit speculation tends to focus on what is technically possible or administratively plausible. The social and economic components of any digital preservation program are assumed more often than they are stated or examined. No one knows what a digital preservation program would cost or how its implementation would be structured. Preservation of human artifacts has been largely a societal, rather than an individual or corporate, activity. Funding for libraries, museums, and archives is implicitly contingent on their acceptance of various preservation responsibilities. Digital preservation is different from traditional preservation in the sense that no agency is currently being funded to provide general digital preservation. The institutional will to create digital archives and the societal support to maintain them may develop, but if they do not, it really will not matter whether the technical problems raised by digital archives can be solved.
Before reviewing the distinguishing characteristics of each preservation option, it is worth noting that they share some common features and limitations. What each can preserve is more like a single snapshot than a motion picture. If it is desirable to preserve the stages in the evolution of a specific digital object, then the object must be preserved at each stage no matter what preservation scheme is utilized. The issue of text fluidity is a red herring as far as digital preservation is concerned. Similarly, if it is desirable to preserve the objects linked from a specific digital object, their preservation is a separate task within each process.
Who may use a digital object and what rights accrue to the creator of that object are time dependent features that can be preserved by any of the schemes. However, preserving these attributes of an object and having them make sense at some point in the future are two distinct problems. In order for authorization or rights management to function properly over time, not only must they have been preserved with the object but there must also be an algorithm to transform their values from what was appropriate at the earlier time to what is appropriate at the later time. Finally, if it is desirable in the future to be able to assert that a specific digital object is identical with the object as referenced in the past, the grounds for such an assertion need to be built into the preservation scheme. This is a separate problem for each scheme rather than being an intrinsic feature of any scheme.
Adherence to standards is sometimes suggested as an additional method for digital preservation, but the relation between standards and preservation is both simpler and more complex than that. On the simpler side, past standards have not always ensured preservation. Some scholarly information, encoded in what may have been or may still be a standard scheme, is no longer accessible because there is no functioning hardware to read data from the particular storage medium.
On the more complex side, there is a real concern that "adherence to standards" is a phrase that needs always to be tied to specifics to avoid confusion. It is open to far too many interpretations. Many standards, both de jure and de facto, relate in some fashion to scholarly publishing. Judiciously chosen standards might significantly improve the prospects for preservation, while others might be of little value in this context. Moreover, some standards, such as ANSI Z39.50, are so complex or permissive that widely varying implementations can, and do, reasonably claim adherence.
Standards are most usefully viewed as a facilitator to any one of the four preservation processes, rather than as a fifth competing process. They are only as good as their level of social acceptance, now and in the unknown future. Even those standards that have wide acceptance now may eventually be rendered obsolete by changes in the external environment.
Physical Conversion to an Accepted Archival Medium
The content of some digital objects may be preserved by replicating the digital object in paper or microform. This is not digital preservation, per se, because the underlying digital structure is not preserved. Moreover, it is only applicable to those digital objects which can be represented in two dimensions. However, the simplicity of this process has some undeniable appeal. The cost of conversion is incurred only once. When the conversion is complete, the preservation problem has been transformed from a context that is not understood to one that is. None of the three methods that do preserve digital structure can currently justify either claim.
Digital Conversion in Parallel with Hardware and Software Changes
The migration of digital objects from one machine, operating system, or application program to another has been the preservation method of choice for about 30 years. Migration has worked well with small, self-contained, high-value, institutionally-created digital objects, such as financial records or bibliographic data. While migration does not necessarily preserve the underlying bit stream, the preservation of the content, rather than the bits, has been the accepted goal with these objects. Migration is a reactive strategy and must be achieved in conjunction with the hardware or software changes. It may well not be possible to migrate after the fact, and it is generally not possible to migrate in advance of the hardware or software changes that drive the process. The cost of preservation using migration is highly time dependent, and it is incurred each time the hardware or software environment changes. The success of migration seems to be tied to the highly centralized situations in which it has been applied. It is not obvious that the appropriate resources can be assembled at the right time to migrate digital objects in the highly decentralized web environment.
Physical Maintenance of the Hardware and Software Environment
The argument in favor of the nature refuge approach to preservation can be stated simply. If a digital object currently exists in a particular hardware and software environment, the object can be preserved by preserving its environment. Instead of migrating the object to the new environment, it can be preserved without modification in the old environment.
The argument against this approach can also be stated simply. This method is currently associated with no long-term successes and many short-term failures. Whatever its naive appeal might be, it is not a route with any appeal for those who actually have to work with hardware or system level software.
Virtual Maintenance of the Hardware and Software Environment
Emulation is the most ambitious of the proposed methods of digital archiving. It involves wrapping the digital object's bit stream in a digital package that contains the entire hardware and software specifications required to recreate the object in its original form and with all its original attributes intact. The cost of producing the emulation package would be incurred once and could be attributed to the preservation of all the objects functioning in that environment. There would still be some costs associated with migrating the objects and emulators from one storage medium to another, but such costs would probably be minor.
The fundamental problem with emulation, aside from the fact that it is largely untried and probably very expensive to initiate, is that the hardware and software environments to be emulated are generally proprietary. Whether the owners of the rights to the appropriate hardware and software systems would sanction their emulation to preserve digital objects in which they have no proprietary interest is an open question.
What Is To Be Preserved
No academic or research library aims to preserve the entire corpus of print publications. Formal characteristics, such as language, discipline, and medium, and institutional variables, such as scope, funding, staffing and space enter into preservation decisions. Inter institutional aspects are also beginning to make a difference in preservation decisions. The existence of joint storage facilities, document delivery consortia, automated cascading searches against a hierarchy of library catalogues, and electronic delivery of what were originally print articles have made it possible to incorporate greater concern for cost effectiveness in preservation decisions. Moreover, each library segments, implicitly if not explicitly, printed material into three broad categories for preservation: essential, desirable, and ephemeral. Preservation of the scholarly record, consisting of peer reviewed publications plus the indexing and abstracting tools to access this literature, is generally considered essential to the role of academic and research libraries.
Preservation has been a component in the transition of the scholarly record from print to digital publication thus far. Migration of bibliographic databases, parallel print and electronic issues of periodicals, and the preservation constraints within JSTOR all exemplify the cautious approach that has guided almost all of the digitization of the scholarly record. The need to institute a digital preservation process for the scholarly record is just being postponed by this caution, but for the moment postponing the problem seems a very good approach. Of course, where those responsible for the creation of the scholarly record have not been cautious, there is an immediate and significant problem.
The larger and more pressing digital preservation problem seems to lie with the incredible expansion over the last decade or so of the number and variety of digital objects that are not a part of the scholarly record, per se, but are cited within the scholarly record or might be the subject for future scholarly investigation. If a feasible solution to this problem is ever to be implemented, a few basic points seem germane.
- The scholarly process has always assumed as a matter of trust that most researchers are fair, honest and conscientious in recording data, reporting observations, summarizing results, citing other literature, etc. Peer review attempts to ensure that this trust is well deserved. Most readers of scholarly articles in print, however, neither wish to inspect every element of the underlying structure of the article, nor could they do so even if they wished. The fact that publication in digital form makes it possible to incorporate within an article links to the elements of this underlying structure does not mean that readers will want to inspect the objects linked. Nor does it mean that the scholarly process will be undermined if some of these linked objects become inaccessible. Any proposal based on the converse assumption is likely to fail.
- The significance of digital preservation may well vary by discipline and by the level of institutional support built into particular digital objects. Some disciplines, mathematics or philosophy, for instance, might have a relatively low frequency of references to digital objects within their scholarly literatures. Others, astronomy, for instance, might have literatures that refer to digital objects frequently but these digital objects are supported by a few large agencies, such as NASA. Disciplines of either type will have an easier time in focussing effort on digital preservation than will a discipline whose literature frequently refers to digital objects scattered among a large number of sites, all of which have little or no institutional support.
- A well articulated triage approach will be a necessary step in initiating productive digital preservation projects. Not everything can be preserved. Not every preservation method can be funded. Not every institution that has had a role in preservation of print can have role in digital preservation. Not every country can afford to try being a leader in digital preservation. Not every discipline can separate its wheat from its chaff. Any digital preservation proposal that ignores these factors is likely to fail.
A fluid is not fixed, firm or stable. A text is fixed, firm and stable, at least for the duration of any reading. The conjunction of the two concepts, "text fluidity", produces one of those jarring metaphors intended to enlighten but inclined to mislead and confuse. The intent behind this essay is to unpack this metaphor in terms of the more neutral phrase, "version control" and thereby to clarify one perspective on the future of electronic scholarly publishing.
Texts can be adapted, abstracted, translated, edited, condensed, corrected, marked up, transcribed, annotated, amended, paraphrased, transliterated, illustrated, indexed, or abridged. They can be commented upon or referenced by other texts. They can be analysed or synthesized. These modifications, and any others that might be considered, produce distinct versions, all related to the original text. Modifications to a text do not produce a text that is fluid. Every version of a given text could be uniquely named and its relation to the first version, indeed its relation to every other version, could be specified if such relational specificity made sense. All of this holds whether a text is hand written, typewritten, printed, in braille, on punched cards, in some analog auditory medium, in electronic form, or produced in some form yet to be developed.
Naming versions of a text and specifying the differences between the revision and the original text are the two elements of version control. There is no universal naming convention for versions of a text, but adding a revision date or version number to the original title is a simple procedure that is adequate for most cases. This device also serves to specify the sequence of texts, which in many instances is the only relation that need be specified. The differences between two versions of a text can be expressed at whatever level is appropriate, from fairly general terms, which work in most cases, to a syntactic level, which can be automatically generated for some pairs of electronic texts.
The paradigm shift from manuscript to printed texts greatly expanded the number and variety of versions that could be produced for any particular text. The current shift to electronic publication does to print what print did to manuscripts. The change is so substantial that the solidity we have come to associate with printed texts seems to have melted into a liquid or sublimated into a gas. The image is powerful; the substance behind the image is more prosaic.
Electronic publications can be changed with ease. It does not follow from this fact that the canons of scholarly publishing could be or should be modified to accommodate large numbers of variants on a single text. The production of knowledge has evolved with features intended to ensure incremental growth and social acceptance. One of the most important of these features is that the version control be appropriate for the type of publication. All of these features are necessarily conservative. Individuals, now given the power of publishing from their desktop, may be able to ignore the traditional rules of scholarly publishing, but they cannot expect to attain social acceptance and credibility automatically. Something more is needed, and it is not obvious how new conventions will emerge or what form they will take.
Scholarly publishing can be expected to change in response to the opportunities offered by electronic texts, but the change will be incremental. Publishers will impose the print conventions on electronic texts unless there is a good reason to change. Procedural modifications may be common in areas such as manuscript submission, distribution to referees, and editing. New features will be possible within all types of scholarly texts. The most likely scenario for version control, however, remains as it is now: acceptance of variant texts will depend on the type of text and possibly on the discipline.
Types of Variation Among Texts
There are currently four levels of acceptance for variant versions of scholarly publications. Publishers of refereed journal articles reject variants almost completely. Publishers of scholarly monographs accept variants, subject to the obvious economic constraints, as long as the variation is reasonably well documented and meets some scholarly need. Publishers of grey literature accept variants with very few, if any, restrictions. Publishers of secondary scholarly texts often require that the texts be changed on a regular basis.
Refereed articles are the foundation for scholarship in most disciplines. The refereeing and editing process culminates in a text that has to remain permanently fixed in all its essentials for the sake of the discipline. Most journals permit the subsequent publication of minor corrections and some permit a limited amount of follow-up discussion in the form of letters to the editor and responses. However, if an author has serious second thoughts after publishing an article, he or she must, in almost all cases, express those thoughts through a second article, which also must be refereed and edited.
Publishers of electronic scholarly journals will likely add a version control section to each article that requires subsequent minor corrections. This section will document these corrections for future readers. The corrections themselves can simply replace the original portion of the text. The likelihood is that the scope for such corrections will be under strict editorial control.
There are two sections that could be added to scholarly articles in electronic journals without undermining the integrity of the original text. One is a cited-by section. This section could be populated automatically as the article is cited by other articles. The other possible section would consist of reactions by other scholars and responses by the author. It is not obvious that authors and editors would be willing to entertain the additional workload that such a section would entail.
Scholarly monographs are often revised and republished in new editions. Academic libraries, other than the very largest, are rather casual concerning any implied obligation to preserve all editions of a scholarly text. Scholars are generally concerned with only the latest edition of such texts. Except in some very close analysis of the history of a discipline, it would be considered bad form to cite a portion of an early version of scholarly monograph if the author had subsequently revised that section in a later version. Publishers of scholarly monographs similarly have an obligation to indicate clearly the version of the monograph being made available. It is considered equally bad form for a publisher to present as a new edition a text which is fundamentally a new printing of an earlier edition.
The production of scholarly monographs in electronic form has thus far been extremely limited compared to the electronic production of scholarly journals. It is, however, difficult to see any substantive reason why version control of scholarly monographs will be much different in the electronic domain than it has been in print.
Grey literature resides on the periphery of scholarship. It is usually regarded as a byproduct of the scholarly process, rather than an end product. The common constituents of grey literature are publications that are preliminary to, ancillary to or just not up to the standards of refereed articles. Some scholarly contributions go through a number of instantiations, each of which enters the grey literature of the discipline, before being presented in some final form, usually as a refereed journal article. In most disciplines the number of variants that precede the final version causes no problems. It would not be appropriate to cite a preliminary version if the final version is in print. If the final version has not yet appeared, then citing a preliminary version may be acceptable. Much of what appears on the web and is presented as scholarly publication is, in fact, grey literature of this sort and will remain so forever.
The web offers new possibilities for the presentation of material ancillary to refereed articles. Data, which often only appears in summary in journal articles, can be presented in full on the web. The web can also be a medium for referencing expanded methodological details or the content of social survey instruments. Social mechanisms will probably emerge to codify and raise the status of some of this material in much the same way that scholarly journals evolved to remedy the deficiencies of personal correspondence.
Secondary publications, such as textbooks, annual reviews, and encyclopedia articles, are often syntheses of current knowledge on a topic and have to be revised regularly to continue to serve their intended function. Since the revisions are clearly sequenced and normally incorporate content that indicates the relevant changes, version control is almost a non-issue with publications of this type in print. There are, as yet, few such secondary publications in electronic form, but this development will, no doubt, come in due time, and the carryover of the print conventions for version control will likely be an element in this development.
Web Sites, Credibility and Version Control
While the web is being populated with texts that are functionally indistinguishable from their print equivalents, it has also become a medium for new sorts of texts that do not fit well within the standard taxonomy of scholarly publication. Web sites can incorporate traditional text, non-linear sequencing, various sorts of digital multimedia objects, and a level of interactivity that goes far beyond the relation between reader and static print. Academics who invest their time, skills and creativity in the development of such web sites need a reasonable expectation that their efforts will not be dismissed as just a new form of grey literature.
Implementing some simple conventions for version control, particularly in conjunction with greater use of metadata by web site creators, would facilitate the archiving process for innovative scholarly web sites and would constitute a starting point for gaining academic acceptance. Attaining academic respectability, however, will require much more than formal version control. What really seems to be needed is more peer review and less individualism. Refereed articles are valued because they combine individual effort with disciplinary approval. Without some similarly rigorous mechanism for evaluating, improving, and presenting web sites it is difficult to see how they will ever attain the level of approbation they deserve.
Interviews with Key Organizations on the Preservation of Scholarly Texts and Text Fluidity
In investigating issues related to the preservation of scholarly electronic texts and to text fluidity, study participants from the Electronic Text Centre at the University of New Brunswick interviewed representatives from the National Library of Canada, the Electronic Text Center at the University of Virginia, and Reed Elsevier.
The interview format was to ask each organization one or two questions, with each question having several suggested focal points. A sample introductory letter which talks about the project, the interview and outlines a question with focal points is included below.
Questions were tailored to each organization in order to elicit information which the organization could reasonably respond to, were relevant to the report, and were unlikely to have been previously published. The questions directed to each organizations are also included below.
The HSSFC (Humanities and Social Sciences Federation of Canada) have commissioned a group of academics and humanities computing specialists to look at a number of issues having to do with the credibility of electronic scholarly publishing in the Canadian context. The Text Centre at the University of New Brunswick Libraries, as its part in the project, is providing a report on archiving and on what the HSSFC has labelled text fluidity. In the group's proposal to HSSFC, the sections on archiving and text fluidity are characterized as follows:
2.3 ArchivingElectronic scholarly texts can be thought of as ordered collections of digital objects, comprised of such entities as texts, images, audio, and simulations. To read them may require specific hardware and software. In such an environment, the archiving challenges are many. This review will include considerations of migration strategies as well as issues related to version control and to text boundaries, such as referenced objects. The review will yield a list of identifiable strategies (technological and cultural) to deal with these intersecting needs.(Members of the University of New Brunswick Electronic Text Centre, currently involved in a larger study of journal metadata, citation linking and journal archiving will contribute this section of the report, and that of Text Fluidity, with input from the entire team.)2.3.1 Text FluidityEarlier studies have suggested that some academics perceive electronic publications as being texts that undergo continuous change and adjustment, and that this undermines concerns related to stability and archiving. The team will explore the consequences on traditional scholarship associated with the perception of what might best be called "textual fluidity." The team will also examine options for ensuring textual integrity in a fluid environment.This review will seek to reconcile the need for stability in archiving texts with a recognition that one of the features of the electronic text is that it can be "maintained" and kept up to date in a way that the print medium will not allow. In this context, it becomes possible to speak positively of the electronic text as "dynamic" rather than "unstable."We thought it would be helpful to supplement our summaries and reactions to some of the literature on archiving and text fluidity with responses to questions from some relevant stakeholders in electronic scholarly publishing. We are asking for your participation.
(Specific questions were directed at each participant. Please consult the interviews below.)
Could you confirm your company's participation and whom we should contact. We will then look to arrange a time for a conference call. If possible, we would like to record your answers to ensure that we provide an accurate summary in our report. If you would like to provide a written response we will add that as an appendix to our report.
Much of the information coming out of the three completed interviews is new and significant. The interview with the National Library gives a clear indication of the complexity of a national level digital preservation program and the issues the National Library is facing in trying to implement one. Reed Elsevier, a major commercial publisher of scientific print and electronic journals, is taking preservation seriously and talks about a focused, commercial approach to preserving their journal publications. David Seaman, Founding Director of the Electronic Text Center at the University of Virginia, discusses version control and preservation at a local level, how he sees their Center's mandate to preserve publications, and their ability to carry it out.
The interviews expand upon and illuminate some of the contentious issues in the sections, Archiving Scholarly Publications in Digital Form and Text Fluidity/Version Control. They illustrate just how complex these issues are in practice and how the long-established print paradigms inform, but do not resolve, the problems inherent in the transition to electronic scholarly publishing. They also demonstrate that the values from which the traditional model of scholarly publishing arose remain strong, even if there is little in the way of consensus regarding the best way of protecting these values in an emerging digital world.
Interview with Karen HunterThe following is a transcription summary of a teleconference interview of Karen Hunter from Reed Elsevier by Alan Burk and Andy Pope from the Electronic Text Centre at the University of New Brunswicks Libraries. The interview was held on the 8th day of August, 2000. This is not a literal transcription. Some parts of the interview were not included. For the included parts, changes were made to the wording, grammatical structures, and order of comments in order to make the interview more readable but not to change or distort the content.
Reed Elsevier is a leading publisher and information provider with its principal operations in North America and Europe. The Reed Elsevier group employs over 26,000 people. Reed Elsevier's major objective is to deliver highly valued and demonstrably superior and flexible information solutions, increasingly via the internet. (http://www.reedelsevier.com/ Science Direct, frequently referred to in this interview, is a major Reed Elsevier service and is one where Karen Hunter has played a key developmental role. As their mission statement explains, Science Direct is a Web Information Source for scientific, technical and medical research that offers access to more than 1,100 journals across 16 fields of science. (http://www.sciencedirect.com/)
Karen Hunter has been with Reed Elsevier since 1976 and has concentrated for several years on strategic planning and the electronic delivery of journal information. She is, at present, Senior Vice President of Elsevier Science.
Questions for Reed Elsevier:Transcript Summary
How do you see your company's role in preserving its electronic publications? To help focus the question, we are interested in such issues as:
- The preservation of publications that may no longer have a direct market value.
- The possibility of having partners to share in the preservation role, such as libraries or archiving centres and how that might work.
- Appropriate levels of preservation for your publications (In a recent article on preservations of electronic scientific serials, William Arms refers to three levels of preservation: conservation of the look and feel of the publication; preservation of the digital object and access to that object; and preservation of the content of a publication).
- Obstacles for you in preserving scholarly publications, whether they be social or technological.
Part 1: Version Control
Version control is an important issue for publishers of scholarly electronic texts. It is relatively easy technically to make changes to electronic objects, even after they have been published. What kinds of changes should be allowed to these objects, if any, and under what conditions?
Karen Hunter discusses these questions in the context of Elsevier's scholarly journal publishing.
Interviewer: Could you talk about version control, your thoughts on it and how Elsevier intends to handle it?
Karen: We get together with about six or eight publishers that talk on a regular basis about things like this; we're all struggling with it. I think all of us pretty much agree that when the publication first appears that then becomes the official publication. So, whether it first comes out in paper or electronically, the first release to the public is the official publication. If you catch an error that is the publisher's error, the inclination is increasingly to say that since we have an opportunity to fix it, we probably should fix it, and then link to an audit record that documents the correction. In the print world, a publisher's error once published stays that way; it is not corrected. There is, however, no unanimity on that; rather, this is a view that I am advocating for electronic publications in particular. I would rather give them the right version and show a link to what was wrong, than to give them the wrong version and make a file and a link to what's right. There is not a standard code of practice for handling publishers' errors, but however you do it you have to make sure that there is a good audit trail. You must provide a very clear indication of what happened and what was first released; that is, whatever went out first should stay unsullied. Publishers feel very strongly about this. Naturally it makes sense to do a certain amount of correction; for example, if you have a clinical medical journal and you, the publisher, publish an x-ray upside down by accident, it would be ridiculous to leave it that way permanently if you can fix it.
Interviewer: How exactly would you handle that sort of situation?
Karen: In this kind of situation, where you may have published an illustration, a table or a graph upside down, I would probably make the correction and insert a link to a box that would give the history of what has happened. However, if it's the author coming back and saying that he's had second thoughts on this, or that he doesn't agree with it, or he would really like to change that, I think that most publishers and editors would say that he can't go in and change the text, but that a link could be placed in the text which would connect to a box with the author's annotations indicating that he has had second thoughts.
Interviewer: Does that then become a new as opposed to a revised article?
Karen: That's right, it becomes a new article or comments on the article or something else. I'm not personally in favour of continuous revision of an article because it seems to not make sound sense. I think you can append things with something, but not totally change it.
Part 2: Archiving and Digital Preservation
Karen Hunter next discusses issues involved in the preservation and archiving of text and image-based documents. Hunter was asked to address this question from a commercial point of view, and to answer concerns related to both technical and social issues.
Interviewer: How do you see your company's role in preserving its electronic publications?
Karen: We have been working quite hard to focus on the question of preservation and digital archiving because in our minds it is the critical question which will determine the speed at which our library customers decide to go from print and electronic to electronic only. The questions that keep coming back, then, are: what is the archiving process; who is going to do the preserving; who is going to be responsible for it; and what kinds of actions do we need to pursue. Because it is both something we take seriously in our office and something we are regularly asked about by our customers, we started to put some significant time and effort into the issue about a year and a half ago, especially in terms of making policies and decisions.
Now, let me preface this somewhat because the things I want to say immediately have to do with our electronic journals for the most part (as opposed to books or other kinds of publications), especially in terms of the electronic journals that we make available through our Science Direct service. Science Direct has two different basic modes. We have a Science Direct on site where we physically deliver the files to the library. We have been doing that for about five years now. In effect, that takes care of the archiving question for those libraries; that is, since it is they who own the files, it is up to them whether they want to preserve the files, for how long they want to preserve them, and what they want to do with them. On the other hand, for the online files the question was what were we going to do with them; that is, would we take responsibility for archiving them and, if so, in what way and how. We decided that that is what we would do -- take on the responsibility of archiving and do so to such an extent that we would make it part of our license. Essentially, what we are saying is that we assume responsibility, and the public and legal commitment for the archiving. However, we also say that if for some reason we are bought out by someone who has no interest in doing this, or we decide in ten years that there is no economic reason for us to continue to offer files that are fifteen or twenty years old, we say in our license that we will turn the files over to one or more repositories selected by a board of library advisors.
Now, internally, it is easy to make a commitment, but it's another thing to make sure you live up to it, and so we have been organizing workshops to try to identify the problems associated with actually maintaining and migrating the files to ensure that everything that has to be saved is being saved, and is being saved in ways that make the files retrievable and reusable. In terms of these file repositories, to explain a little bit more, we have two. The first is what we call our electronic warehouse. This is our basic internal server, on which we have all the raw materials from which we can generate any of the journal products or any of the other products that we have. On the other hand, we have our distribution server which we developed with our sister company Lexis-Nexis in Dayton, Ohio. What we had to do was decide which of these really was the archives, the internal production facility or our online distribution facility. It was fairly unanimous among the people internally that it is the internal production facility wherein we have to have everything maintained and maintained permanently.
The next series of answers stemmed from a collection of questions regarding whether or not Elsevier had moved to an SGML document process, and also what exactly they preserved through their archiving process - content or content and presentation.
Karen: We generate both SGML and PDF files, keeping both of them in the warehouse as well as at Lexis-Nexis. When we render it in HTML we do so on the fly and put in all the links on the fly as well. The fact that we do everything on the fly means that what we are really saving is the raw material, not the printed product nor the online product as it appears on any given date. Essentially, the online product that you get today will not be the same one that you will have tomorrow.
In our minds, we are not preserving the look and feel of the online product forever; we will have the PDF file for as long as PDF is a viable technology and we are doing print, but if you ask me how long we are going to do print, the short answer is that I simply do not know.
Part 3: Metadata and the Archival Process
Karen: In addition to these processes and issues there are other things that we have decided to do but have yet to put into effect. For example, a question that was raised by one of our customers was this: If he wanted to go back and look at a journal from two years ago, would he be able to see who was on the editorial board at that time. The answer to his query, at the moment, would be no because we only maintain the front matter information (the editorial board, information about authors, copyright statements, etc.) on a current basis. However, as long as we have the paper around we can always advise the person in question to take note of that, but in anticipation of a time when we won't have the paper, we will have to keep a separate sort of metadata file; that is, a historical file of things such as editors and editorial boards, or anything else that might be unique to any given journal.
Interviewer: When you are looking at preservation, what sort of time period do you have in mind; that is, how far are you looking into the future, or how do you deal with that question at all?
Karen: Well, if you realistically ask whether or not you will still want your files up in a hundred years, for us the issue becomes a question of whether we will still be in business or whether somebody else will be running things. So, the long and short answer is that we have not really discussed it as a company at length right now. The presumption at the moment is focused on the notion that we know that we need to keep these things up for quite a long while.
Interviewer: Just to make sure that I clearly understand what you are saying, I would like to take a step back for a moment. When you were talking about preserving at the production end or the content end only and not on the presentation end, I assume in your view that this approach would cut down on some of the migration and emulation issues.
Karen: It will cut down on some of them, yes, because we're saying that we have to keep the raw material usable and move it along so that it stays usable in whatever presentation software that we are going to use, but we don't have to maintain the same presentation software.
Part 4: Archiving and Partnerships/ Business relationships
The following answers were offered in regards to questions posed concerning the possibilities of partners to share in the preservation role (such as libraries and archiving centres) and how that might work, and also to questions about technological and social obstacles in preserving scholarly publications.
Karen: One of the things we have just started doing, although I have to talk more with those of our onsite customers who get all of our journals, such as the University of Toronto, Ohio Link in the United States, and a couple in Asia and southern Europe, is to ask them whether they would be willing to take on a formal commitment as archives for that material.
Its interesting when you get into archiving discussions, especially within the last two years or so, to note how much of a strong emotional component there is. This whole issue is not just a logical one; that is, if you go logically down the path, what makes economic sense is not the only thing that makes sense of the situation. Most libraries don't have the money to do large scale archiving electronically, but they want to do it . The heads of libraries who are sitting there feel torn. Traditionally, its been their role; it's something they feel very strongly is a library's cultural role in terms of maintaining things. However, they don't trust us, Elsevier in particular or commercial publishers in general. And this, I believe, is for justifiable reasons; that is, we are businesses and we could go out of business. It's not unreasonable that they're worried. However, having said that, they don't know what to do because they don't have the money to stand up and say "we'll do it," and at the same time they don't want us doing it. So if you follow down a logical line of discussion, there's no question that the least expensive strategy is to let the publisher do it where the publisher is willing. Then you are not replicating the cost of migration and storage and everything else. That is clearly for the whole industry the least expensive way to do things; however, it isn't necessarily comforting or satisfactory.
In addition, I've been having discussions with other potential partners and I've been talking with the Library of Congress as one possible partner. I was recently approached by Yale University (they are, by the way, an exception as there are a few of the major universities that are saying they do want to take on some archiving responsibility), and they have asked to run an experiment. As such, we are going to meet at the end of this month to look at what that means, what they want to do, and what's involved. In connection with this type of initiative, I am going to look for similar partners in Europe and in Asia. My goal would be to have maybe three or four national libraries, or something similar to national libraries, and then maybe one or two other major research institutions that are in one way or another already making some kind of commitment in terms of archiving.
Interview with the National Library of Canada
The following is a summary transcription of a teleconference held in August between members of the Electronic Text Centre and staff from the National Library of Canada. Present at this interview were Alan Burk, Andy Pope and James Kerr from the Electronic Text Centre, and from the National Library, Tom Delsey, Director General of Corporate Policy and Communications; Alison Bullock, Preservation and Copying Services; Elizabeth Martin, Collection Development Policy and Planning Officer; and John Stegenga, Project Leader, National Electronic Archive on Online Publications.
The summary below is not a literal transcription. Some parts of the interview were not included. For the included parts, changes were made to the wording, grammatical structures, and order of comments in order to make the interview more readable but not to change or distort the content.
During the course of the interview, frequent reference is made to a document published by the National Library of Canada. (Networked Electronic Publications Policy and Guidelines, National Library of Canada, Electronic Collections Coordinating Group, October 1998) (http://www.nlc-bnc.ca/pubs/irm/eneppg.htm) A pre-reading of this paper will be helpful in fully understanding the comments by the interviewers and respondents.
The National Library has a mandate to preserve the Canadian published heritage as evidenced in their mission statement.The National Library of Canada is a federal institution located in Ottawa, established by Parliament in 1953, whose main role is to acquire, preserve and promote the published heritage of Canada for all Canadians, both now and in the years to come. (Mission Statement and Mandate of the National Library of Canada (http://www.nlc-bnc.ca/about/emandate.htm)Given this mandate the National Library believes that they must obtain electronic copies of published materials selected for their permanent collections in order to ensure their preservation, that preservation cannot be left to the publishers. In trying to carry out this mandate, there is a problem faced by the National Library. The legal deposit regulations as spelled out under the National Library Act regulations do not now extend to electronic publications. (Networked Electronic Publications Policy and Guidelines) The legal deposit system gives the National Library the authority to acquire publications for its collections.
The interview below indicates some of the electronic preservation challenges the National Library and others are facing, as well as the commitment the National Library has towards preserving Canada's published heritage.
Questions for the National Library:
- What is the status of the 1998 report, Networked Electronic Publications Policy and Guidelines? Does that document reflect the National Library's current thinking?
- Has a funding stream been established for the long-term and short-term preservation of electronic publications?
- On page 6 of Networked Electronic Publications Policy and Guidelines, there is the statement that the task of collecting electronic publications by the National Library will be limited in two ways: first, through the assignment of an electronic collecting status; and secondly, through more restrictive collection criteria than for non-electronic publications. Could you talk to these distinctions? And, are they actually being used?
- Some type of electronic objects or collections of objects offer more of a preservation challenge than others. It might be helpful to look at some examples, first, as to whether the National Library would likely attempt to preserve them and secondly, some of the issues involved. Some are: the SchoolNet site, which is a relatively complex collection of Canadian sites, most of which are coming off a SchoolNet server; NRC journals; electronic university calendars in the context of a move away from print production and, finally, dynamically produced electronic content.
- What is the possibility of having partners share in the National Library's preservation role, such as libraries and commercial publishers; and, how might that work?
- What are the appropriate levels of preservation given your resources and mandate? (In a recent article on preservation of electronic scientific serials, William Arms refers to three levels of preservation: conservation of the look and feel of the publication; preservation of the digital object and access to that object; and preservation of the content of a publication.)
Interviewer: Hopefully the letter that I sent you gave you a context for what we're trying to do. Let me reiterate that we are conducting a series of interviews with some key stake-holders involved with the prevailing issues surrounding electronic publication and preservation of materials in Canada and internationally. We had a chance to interview Elsevier about a week ago, and we are now very pleased that the National Library has agreed to be interviewed as well.
We are hoping that you will be able to shed some light on what the National Library is doing in preparation for the very challenging task of preserving Canada's heritage. Given that preface, then, perhaps we can jump right in with the status of Networked Electronic Publications Policy and Guidelines.
Tom: Okay, well maybe Libby can update us on the status of that report and what has happened since the report was issued in 1998.
Libby: Well, to answer your basic question, does the document reflect the National Library's current thinking, the short answer to that is "yes." I should put this in context though; that is, when this policy was developed, the committee that developed it deliberately took the view that we would create a policy that was not restricted to what we could do technically or to the resources we have. As such, the implementation of it is a work in progress. We have a working group of operational people reviewing the policy on a regular basis, bringing up issues and problems associated with implementation, and of course most of these problems relate to the lack of the necessary human resources and the technical infrastructure to support what we would like to do. The direction, however, remains the same.
Tom: Allison did you want to just add a little bit about some of the things that have been flagged in those working groups?
Allison: Yes. I sat on the working group for a while and one of the main things that we had identified as a central problem with the policy was the lack of resources that would be necessary to fully implement the policy. Essentially, we were concerned about the three collecting levels because we are mostly dealing with the archives level; however, the feeling is that those collecting levels are still valid and that we should strive to acquire original Canadiana more comprehensively than we currently do. We are also investigating access options, trying to put in a range of access options, rather than just either the restricted site or free access which is what we have at the moment. There's a lot of work that needs to be done in terms of preservation and procedures for preserving these publications. The upshot of the review, then, was that we decided the policy won't be revisited at this point. Having said that, we also realized that we do need to talk to publishers about things like significant change and enhanced functionality, especially regarding the business of capturing snapshots and different versions of things. In addition, the national and international programs branch is looking at national site licensing, and we're trying to improve the technical support through an Electronic Publications Management Systems project.
Interviewer: In trying to find funding streams for electronic preservation: do you have an idea of the potential numbers of objects that you are going to be archiving?
Tom: Maybe John can talk about volume, what we are dealing with currently plus any projections we may have.
John: We did a project last year in which we tried to estimate resources necessary for acquiring all the publications that were out there which fall within our mandate as Canadian publications or publications of Canadian interest. It was a rather challenging exercise because there are no figures that are solid; that is, everyone makes a guess at it and, so, we joined in and guessed away. We estimated on a very conservative level that approximately 8,000 new electronic publications, not web sites, but electronic publications that we are interested in come on stream every year. However, we had a counter estimate that estimated that approximately 60,000 Canadian electronic publications come on stream every year. You can take it from there.
Tom: In addition to that, since 1995, when we actually began building the electronic collection, the collection has doubled each year. For example, the first year brought in approximately 500 or something like that. The second year we brought in 1,000 more; the following year we brought in 2,000 more; and, it has continued to follow that kind of pattern. However, it is worth noting that this only represents what we've actually been able to get our hands on and as such probably only represents a portion of what has been produced on a national scale.
John: Right, and we know from conventional publishing in the legal deposit area that we bring in approximately 30,000 new titles of conventionally published material per year; that is, hard copy materials, which could be CD Roms, paper, etc. This may give you an indication as to where these figures would balance out after a couple of years in terms of electronic publication. My impression, and it's simply an impression, is that for conventionally published materials the process of publishing is slower than it is for electronic publication. Internet publishers can proportionately produce their works faster, thus, making more new titles per year every year.
Interviewer: I'd like to ask for clarification on something that is stated in Networked Electronic Publications Policy and Guidelines. You mention in that 1998 report that the status of the legal depository arrangements for electronic publications was unclear. Has that been clarified?
Tom: No, not really. To the extent that we've had a legal opinion on this from Canadian Heritage, the lawyers were certainly doubtful that the current wording in the act would be sufficient to allow us to exercise legal deposit on publications that were only issued in a networked form. Essentially, the lawyers were of the opinion that there are too many references in the Act to copies and there are inferences there which, from a legal point of view, imply distribution of multiple copies in a more or less conventional sense. From a legal point of view, the understanding is that the definition of publication in the Copyright Act is actually restricted to things that are released and distributed in multiple copies, and excludes communication to the public (like telecommunication) as a form of publication. As such, the opinion we have is that to extend legal deposit to networked publications would require changes to the Act itself. That's why, for the time being at least, we have decided to deal with publishers on a voluntary basis. This was part of the motivation for having the consultation with online publishers back in January; that is, to test the climate for the possibility of having a voluntary, non-legislated program that would be systematic and comprehensive, but not necessarily involving legislation. The outcome of this process was that a clear message came back that the publishers would be more willing to deal with it on a voluntary basis as opposed to having it legislated.
Interviewer: If I could ask you to speculate a little, why might that be, in your opinion?
Tom: Well, my speculation would be that it would give them a little more control; that is, they would be able to establish some of the terms. Even with legal deposit we have to remember that legal deposit only provides for the deposit of a physical copy here, it gives us no copyright rights to it. We've been able to lend materials from the collection, because lending is not a prohibited use under copyright, and we've been able to do the amount of copying that would be justified under fair dealing or now under the library exceptions. Beyond that, legal deposit, as it exists now and certainly as it would if it were revised to include electronic publication, would only cover the deposit, it would not deal with access. Again, we have to look at the whole picture, and we need to be talking to the publishers on a voluntary basis about access, acquisition and preservation.
To go back to funding, for a moment, perhaps John can start out by discussing what is funded and how it is funded right now, and then we can talk about what isn't funded.
John: In my area, the acquisitions/ intake area, funding for electronic publications is integrated into the salary and operational budget that exists for the processing of any other materials as well.
Tom: Maybe Allison has a few words to say as well about preservation funding more generally and then specifically about funding for electronic publication.
Allison: We don't have any funding stream established for long term preservation of electronic publications or a separate funding stream for conventional publications. As I mentioned, the Electronic Publications Management Systems project is designed to improve our technical infrastructure, but we also recognize that the money which is set aside for that is not sufficient to allow us to develop a system that will manage these publications through the long term. As a result we're almost invariably going to have to build off- the-shelf solutions, to allow us to deal with these publications in the longer term.
Tom: And, just to kind of round out the funding issue, let me say that our base funding, the money that is in our regular budget, is not sufficient to deal with any of this scaling-up that is required to deal with electronic publications. We have managed to do what we have done so far by simply mainstreaming to the extent possible through acquisition activity. As far as the actual storage and processing of the electronic publications go, we're basically using a web server for storing that collection, and there's not much in the way of software around that to facilitate the handling and the archiving etc. It's very heavily manual intervention on all of that.
Interviewer: I assume you're mainly dealing with static documents?
Tom: There are a fair number of journals and serials. What we do in our arrangement with acquisitions is receive updates, although we archive each stage of the documentation. As such, if it's a conventional serial that's just having issues added to it, each issue will get archived, but if it's a publication that's being updated in a different way, sort of more like a looseleaf publication or something like that, we archive all of the versions and iterations. However, we don't replace previous iterations with current ones, so we're capturing the dynamic nature that way through snapshots essentially.
Allison: The HTML publications are mirrored and we just collect periodic snapshots. One of the problems has been determining what constitutes a significant change. Sometimes they'll replace an ad or this sort of thing and it requires that our acquisitions staff go and look at the publication to determine what is it that has changed. It's definitely one of the big challenges as publications go more and more towards an electronic format; that is, how we will distinguish different versions and define which versions to capture.
Tom: To continue with regards to funding, as I said, we need investment in infrastructure. This was flagged as a requirement for the Treasury Board a couple of years ago in one of our annual reference level updates, and we did get some temporary funding, which started last year, continued this year, and will continue into next year; but, that was less than half of what we had identified as the need to begin the investment in the platform for an electronic publications management system, EPMS. We have had another opportunity this year though to identify funding needs related specifically to capital investment and IT, and so we have rolled some of these costs for the electronic collection into that as well as some new costs associated with the digital library initiative that we've started. So, of course we don't have answers back on that yet, as it is funding that would start next fiscal year, if it comes through. That is, I guess, just another stream that we're trying to tap.
Interviewer: I'd like to ask one follow up question regarding funding. I wonder if you can really get a handle on your long term funding needs for your preservation and migration strategies, possibly even emulation strategies. Is that something that's going to become clearer over time or do you feel like you already have a fairly good understanding and appreciation of the long term requirements?
Tom: As we were discussing these questions, we made a distinction between archiving in the technical sense and preservation, and we weren't quite sure whether when you said preservation you were always meaning preservation the way we understand the term, as opposed to just simple storage of a document as it comes in.
Interviewer: I take it by preservation that you're referring to the sorts of things that are discussed in Allison's publication, 'Preservation of Digital Information'?
Allison: Yes we are, and I would say that no we don't have a good sense of what this is going to cost over the very long term. There's one study that came out of Britain, for example, about how much it would cost to preserve an electronic publication in comparison with a conventional publication which we used for estimating these costs. However, I suspect that some of the answers to those sorts of questions are going to come out of the work that's being done by Cedars, by the National Library of Australia, and so on. As we start to work with the publications we'll have a better sense of what we have to do and how much it's going to cost us.
Interviewer: One thing that struck me in terms of the preservation requirements that you describe in your publication is that it's extraordinarily ambitious in that you talk about not merely preserving content, but a whole series of other levels. When we spoke to Elsevier, they were fairly clear that the most they were going to hope for was preserving content. However, in effect you are saying that you aim to preserve presentation, functionality, authenticity, and even the bit stream. Some of those seem so expensive that I guess we're having a hard time getting a handle on how anyone could ever do that.
Allison: You're talking about the network note? (www.nlc-bnc.ca/publications/netnotes/notes60.htm) In that particular publication I was trying to describe the breakdown of various components which we are able to preserve very easily with a book, but will not be able to preserve so easily with electronic publication. Our ultimate goal is to preserve the look and the feel of the publication, but I don't think there's any sense that we're going to be able to do that for absolutely every publication. What's interesting, and Tom was mentioning this earlier, is that when we talked to publishers they were very concerned about maintaining the look and the feel of any publication.
Interviewer: Do you have any means of indicating to the public at this time what is being preserved? Secondly, what are your criteria for evaluating archiving status within your holdings?
Libby: With a great deal of our collections we envision them gaining status. The selection criteria will operate in tandem and so works will have an actual tag on a title that will indicate to the user immediately what the preservation and access commitment by the National Library was to that particular publication. However, even though at the moment we're not indicating that collecting status at all, we still consider the concept a valid one.
The other part of your question was asking about how the collecting or selection criteria was more restrictive for electronic publications than for other formats of publications. Essentially, we discussed the various ways that we could restrict what we collected and the discussion came down on the side of putting emphasis on original publications -- those which are issued only in electronic format -- as being the ones of most concern in terms of our mandate and the publications that were most at risk of being lost; that is, once they were gone there was no backup in terms of content. So those are the major restrictions -- we said we would only aim to be comprehensive in terms of original works, and we said that we wouldn't collect all editions or versions necessarily of one particular title. This is very different from our approach to other formats where we try to collect everything. It's really the nature of the beast, so to speak, that determined our approach to these publications and the recognition that even now we're very ambitious in what we say we're going to do; however, if we hadn't restricted it in these ways we'd never get anywhere and it would be frustrated endlessly.
Tom: The reality though is a little different than what is in the policy and maybe John can just talk about the kind of things that are coming in and their relationship to printed versions or other versions.
John: Often when we go out and acquire materials that are electronic publications we run across this situation of duplication with conventionally published materials. We respond as closely as possible to the collection policy, but there are sometimes factors or arguments that enter into the picture which cause us to do things that deviate from the collection policy.
Tom: I think the bottom line is that in the electronic collection there is a greater proportion of material that actually does have a print or some other counterpart to it than the policy would lead you to believe, but as John says, that's just the nature of the acquisitions activity and the negotiations with publishers: everything that goes around, comes around.
Libby: Our plans are really to develop services and access to a wide range of publications that were used. For example, we use the linked status to act as a gateway or a portal, but then the National Library would actually select the publications and organize them around a subject. That was the thinking behind the policy -- that access was something that we wanted to develop, not just physical collections.
Interviewer: Where you collect a particular publication or document that has no metadata associated with it, what are you looking at as a kind of minimal cataloguing or metadata standard? Let's, for instance, say an English professor creates an electronic document in his office and puts it up on his local web site. You decide to collect it. Would you use MARC for the metadata or some other metadata approach?
Allison: Primarily, we looked at the list of metadata elements that were proposed by the Cedars project in Britain. You're probably familiar with them, there are 90 elements, some of which are already duplicated in the MARC records, so we pared them down to a set of probably 20 or so. In the context of this electronic publications management systems project, we tried to define something that we felt was manageable. We're certainly going to have to go after the publishers to give us more metadata than they currently do; otherwise it will be far too much legwork to try and figure out some of this stuff on our own.
Interviewer: So are you recommending a kind of an element set, or are you envisioning something different?
Allison: We hadn't reached that point. What we were trying to do was define a set that we thought would be manageable within this project in terms of disseminating that nationally. We haven't taken that extra step at this point though.
Tom: Alan, maybe I should clarify that we do keep metadata stored with, or as the access mechanism to, the electronic collection; that's what this is, an upgrade of that would be defined in this subset of Cedars, but once the electronic document is actually stored we also create a full MARC record for it that goes into the Amicus database and is linked to the electronic document. So we're doing those things in parallel right now. There's a set of metadata that's created during the acquisitions process and is enhanced somewhat during that process in the archiving and storage of it. Then we move over to conventional cataloguing once the thing is in place and create the MARC data as well.
Allison: In terms of your question about MARC records and so on, one of the things that we've been doing in this project is trying to decide how much of it needs to rest in an electronic management system, how much would reside in the MARC record, where the overlaps are, and where it would best be managed; that is, whether it would best be managed by our Amicus system or by a separate EPMS.
Interviewer: Do you think that we are going to be seeing more publications that are dynamically produced and that as such are going to be increasingly more difficult to archive?
Allison: Yes. We talked to Statistics Canada, for example, and they're talking about custom publications. This definitely begs the questions of what the National Library would collect as the publication if each one is different and created in response to a particular need.
Tom: And I think we will see the same thing with the educational publishers, because the technology allows them to customize things to specific audiences or even a specific course that a professor is giving or something. I think the nature of the market, especially the infotainment' market, is going to change things such that we'll have much more complex functional kinds of documents, but they will still fall within the category of articles in a lot of cases, especially if they have any kind of educational angle to it.
You had also asked in that fourth question specifically about NRC journals and university calendars -- maybe John can address those issues as well as maybe going back to speak briefly about SchoolNet.
John: With SchoolNet we have an agreement with Industry Canada wherein approximately once a year they take a copy of all the new and updated SchoolNet sites and then they put these sites on a CD Rom, send it to us, and then we take the year to load these SchoolNet sites onto our server. We run into some problems with that because there are often missing links and we often can't go back to the publisher of origin, or that publisher of origin is no longer identifiable, or sometimes during the course of the year these SchoolNet sites change and we can only capture it on a snapshot basis. From the very start, on the other hand, we have archived all fourteen of the NRC journals, two of which we are in an experiment with to see whether public access to them has an impact on their revenue stream.
Interviewer: The Text Centre at the University of New Brunswick has responsibility for producing the University calendars. This is carried out using an SGML encoding, database approach with the Web prsentation being dynamically constructed. As such, the calendars would be very hard to archive without our involvement and cooperation.
Tom: This leads tidily into your other question about partnerships as well. Would this be a case where a partnership deal could be worked out?
Interviewer: I was struck by a couple of publications and some comments made by Clifford Lynch concerning how there's going to have to be a number of data centers established if we're going to be able to archive the world's heritage. It's a very challenging problem so I'm quite sure that a number of people and institutions will be necessary to carry it through.
John: Does the university in fact have any responsibility or recognize any need to archive those dynamic things on its own?
Interviewer: I don't think the university does, but the library definitely has a role in that regard.
Tom: This is a very interesting point concerning the difference in attitude between the university and the library. Interesting because it makes me wonder whether the university thinks of the calendar as a publication, as something that needs to be stored or archived at all.
Interviewer: My guess would be that they're still producing a print copy and that the Text Centre is basically producing a file that can produce both print and electronic. As such, if they thought about archival copies at all it would be in terms of the print.
Tom: And I think as the organizations that we're dealing with begin to move more towards electronic publishing as the main way they do things -- rather than as either a secondary way or one that's always backed up by print -- it seems to me that they're going to have to deal with questions relating to both short term and long term archiving. As such, if we're looking for solutions regarding the handling of these more complex and dynamic documents, I would suspect it's not going to be just the National Library that's looking for those solutions. There are going to be publishers themselves who will also be searching, and hopefully we'll be able to share solutions, because administratively, legally and from a business point of view, everybody's going to have to deal with these problems. As a result, we've begun to have tentative discussions about partnerships and other kinds of cooperative endeavours. Maybe John or Allison could talk a little bit as well about some of those and then I can follow up with some of the discussions I've had with CANCOPY and PWGSC (Public Works and Government Services Canada) and Heritage.
John: We had issued a letter to a handful of private, online and commercial publishers to engage them in a dialogue on the question of archiving their electronic publications. We recognized that there were a lot of challenges inherent in such an offer, if you will. We don't necessarily claim to have all the answers by any means, but together I think it's a step forward and typifies the type of partnership that we're seeing emerging. We have another project as well that we've done some work on called the Electronic Clearing House. In that project we are engaging publishers to deposit the electronic versions of their text, which eventually come out in print on paper, as if it were a repository of electronic manuscripts. In this model, the National Library will keep these electronic manuscripts and serve as a clearing house so that alternate format producers can come in and borrow these electronic manuscripts in order to make alternate formats that handle the disabled community in Canada. Those would be some examples of partnerships.
Tom: As an extension to that, I've had one discussion with Fred Wardle from CANCOPY who had already been meeting with people in Canadian Heritage and with PWGSC in the context of developing a copyright clearance function, in the case of CANCOPY for private sector publications, and PWGFC for federal government publications. Although this was a very tentative first discussion around it, they do seem to be interested possibly over the mid to longer term. Some arrangements could be made, for example, where CANCOPY would serve a clearance function and the collector of royalties for the publishers who are in the CANCOPY stable as it were. As such, CANCOPY would link to them, and the user, having cleared the rights, would actually access the documents at the publisher's host site. What CANCOPY sees, and a number of the publishers acknowledge, is they won't be maintained as archival files forever, and so there seems to be the possibility that the National Library could act as a kind of third party there. At the consultation with online publishers there was certainly a great deal of reluctance to have the National Library act as a kind of broker and provide access to publications unless the publisher clearly had no more interest in it and was willing to allow it. However, they seem very reluctant to have the National Library play an access role even if there was tracking and remuneration involved.
Interview with David Seaman
The following is a transcription summary of an in-person interview of David Seaman by Alan Burk and Andy Pope from the Electronic Text Centre at the University of New Brunswick's Libraries. The interview was held in Fredericton, New Brunswick on the 25th day of August, 2000. This is not a literal transcription. Some parts of the interview were not included. For the included parts, changes were made to the wording, grammatical structures, and order of comments in order to make the interview more readable but not to change or distort the content.
David Seaman is Founding Director of the Electronic Text Centre at the University of Virginia. He is an expert on humanities computing. He writes, consults and speaks frequently on aspects of humanities computing. Information about Davids work has appeared in a number of serials, including The Economist and The Chronicle of Higher Education (for more information about David, see: http://etext.lib.virginia.edu/staff/dms8f.html).
The Electronic Text Centre at the University of Virginia, world renowned as a scholarly publishing and research center, has published tens of thousands of XML and SGML texts and digital images. The Centre's Web pages have tens of thousands of users every day.
Question for David Seaman and the Electronic Text Centre at the University of Virginia
- We would like to talk with you about issues related to the archiving or preserving of scholarly sites, using the University of Virginia as an example. We are also interested in the questions of version and edition control for a scholarly site and its publications.
As a preface to this interview summary, here is a brief exchange between Alan Burk and David Seaman which articulates clearly why it is that Seaman was interviewed and how he sees both his own situation in terms of these interviews and how he sees the issues in general.
Interviewer: With our interviews we wanted to question a range of electronic publishers and stake-holders from a large commercial scholarly publisher to a National Library to a university-based publishing center and archives. Given your extensive experience with scholarly publishing and humanities computing as Director of the Electronic Text Centre at the University of Virginia, we thought you might be able to shed some light on the difficult archiving and maintenance issues challenging managers of scholarly Web sites and those committed to their long term survival.
David: When I saw this list of questions what struck me first of all was that I really did not have anything particularly stunning to say about them which, probably given what I do, suggests that these are among the more critical things for someone like HSSFC to be examining. These two questions regarding version control and preservation of digital files are two of the most unexamined areas, and if you were talking about almost anything else I suspect -- cataloguing, collection development, user services, markup and data creation, methodologies and best practices -- you'd be snowed under with URLs to look at, software to evaluate, people to talk with and so forth. As such, the more I look at these, the more it strikes me that these two issues are in need of fairly immediate and preferably national attention.
Part 1: Version Control
Version Control is a critical issue for managers of scholarly web sites. There is not only the question of creating and formulating policies for managing changes to the site and its objects, but also dealing with the concerns of the scholarly community as they relate to the reliability and permanence of scholarly digital objects.
David: The question you have here about fluidity regarding a documents instability or dynamism, that is whether it's a negative or positive thing, is an important one. We at the University of Virginias Etext Centre build content to data tagging standards precisely so that it is future proofed; that is, it is ready to be turned into something else when the landscape moves on. For example, the TEI texts that that we worked on here at UNB last week for the Summer Institute ( (http://www.lib.unb.ca/Texts/SGML_course/Aug99/index.html) have already been turned into eBooks by the Electronic Text Centre here at the University of New Brunswick. We created these with a knowledge that there were non-web, non-HTML web browser mechanisms for delivering them because eBooks are with us now, but files we have created through the Institute three years ago are equally amenable to eBook-ing should anyone want to sit down and do it. In this regard one can see how fluidity is essential; that is, the landscape moves and if your data is not malleable and nimble then it's going to get stuck. This is why some specialists regard the fluidity as actually being a survival mechanism; that is, if it is not fluid, you are done for.
Another issue related to this is that of what happens when revisions or corrections are made in this fluid environment. Generally speaking, a computer program can tell you precisely what the change was. It may not be able to tell you who made it, but you can know what it was; so, administratively we can control versioning even if we don't date stamp or what-have-you. Where it's usually much less evident is on the front end. In this instance its very often the case that a user coming to a piece of scholarship does not necessarily know when that item was created or when it was last revised. They may know when it was first created; it may say on the title page, for example, that it was1997, but they have no real confidence that it hasn't been tinkered with since.
Interviewer: So, you need some sort of revision history?
David: You need some sort of a revision statement that is part of the primary act of reading and using that document; that is, not something that is buried away in the background, but something that is up-front and will easily and obviously tell you when the thing was last changed. Even though this type of material should be very in your face information, it's usually at the bottom of the prime page. The issue here is that one reacts differently to information pages that were last revised several years previous. On the whole, I dont think that we do a very good job of letting our users know how fixed some of that content is. It would be difficult, for example, in much of the available online scholarship to have an author write to you in an ad hoc way and say could somebody go in and just tidy this phrase up or fix this fact. And maybe this is even not a bad idea. However, it's also not hard to imagine a journal article that goes up and is commented on after the fact. The commentary is critical but correct. The author makes the changes to accommodate the criticism, and then I come along and read the criticism, go off to the same URL, read the article, and can't understand what on earth the guy's complaining about because the fixes have been made. Some of these things are software controllable. There are software packages that will do versioning control for you. However, I suspect that the software is only a tool, and so, before we make good use of it we really need to incorporate, somewhat more firmly, protocols for responsible versioning.
Interviewer: What might be the effects of the software in terms of how that might look to the user; that is, how would they recognize its use?
David: What I was thinking of, and I don't know this field particularly well, was not so much software that would be evident to the user, but more so to the people who are maintaining the data. There's software that, for example, will set very elaborate permissions on files, or even pieces of files for different audiences. It will allow you to login as the author and change some things but not others, or it will allow you to make sure that several people are not working on a file simultaneously.
Part 2: Digital Preservation and Archiving
Preservation and the archiving of scholarly electronic publications involve a number of issues, including social and technical considerations. In this section, David Seaman covers a number of crucial ideas and arguments and gives his opinions on these matters as based on his experiences as Director of the University of Virginias Electronic Text Centre.
Interviewer: Could you talk about some issues relating to the archiving or preserving of scholarly sites, perhaps using the University of Virginia as an example?
David: Electronic archiving long term is probably the least examined of the processes in most digital library operations that I'm aware of. We are still too often living in a honeymoon period where we figure we'll worry about this next year. However, when you do come to incorporate archiving firmly into your operation then there are at least some overriding points that I can talk to.
The first is that it's critical that the people who do the archiving are the people who have preservation skills; that is, librarians. Computer centers are very good at not losing files in the short term, but they have no real tradition of long term archiving in the way that a library conceives it. When you get to the point where you're migrating simply from device to device, or perhaps from format to format, the overwhelming critical piece is some reliable metadata describing what those objects are. We talk routinely in courses that teach about building standards-based data, SGML and XML data being excellent examples, of how important the metadata piece of those items will be for short term retrieval and searching. But, it's equally important when you come to move or manipulate those files that you have a system in place that allows you, as you create the files, to say in a very controlled way what you need to say about them; that there is administrative metadata that may not be something that benefits the user coming to those materials, but is critical when you come to move them.
Interviewer: Could you comment on who is doing archiving for scholarly Web sites?
David: At the moment, in a purely local sense, archiving has been done in a very ad hoc way and as such it's very much up to the whims of the centre's sense of itself. I suspect, however, that it's not going to be done well until, as it is with so many other things that we rely on, it's actually done by the people in our systems with the professional expertise. It's perhaps comparable to something like cataloguing with no professional cataloguing help. Any of us can fill in metadata templates and do Dublin Core and do TEI headers and so forth; but, without the professional expertise of our metadata specialists our online files would be much poorer. I suspect archiving and preservation is going to go in much the same way. It's perfectly possible for somebody like me, who is running a digital operation in isolation, to do something in our library with saving bits and pieces of stuff, but it doesn't really rise to the level of professional archiving and preservation until our preservation departments inform what we do.
Interviewer: Have you been approached by anyone within the university or from another organization to join in a coordinated archiving effort?
David: Not really, no. When we are approached it tends to be from a purely technological angle; that is to say that we've dramatically improved and coordinated the way that we manage the equipment on which our materials reside. For example, there are disk farms that are shared and are managed and backed up centrally by our computer departments. The library contracts this support, sort of. As such, we can predict the availability of space for incoming data with great facility; however, I havent had anybody contact me about archiving, either from the library world or from the commercial one. It's very rare that someone comes to me with a piece of software that is designed precisely for these tasks. So, no, I don't know offhand of anybody doing anything very serious.
Interviewer: For the long and short term, then, who is going to have to take responsibility for the archiving of publications off of scholarly sites such as yours? Will you have to take responsibility?
David: I think the library eventually has to do it. In part this is because they are best equipped to do it. We have a tradition of not losing things, which is not necessarily true of most of the rest of the world and it's certainly not true of publishers. Its just not in their business model to save things, even if those things may be very valuable. It is, on the other hand, within the mandate of libraries to do so. It takes resources and some really hard work to come up with national and international minimum standards and the usual array of best practices. As such, it's hard to imagine a community outside the information management and library community who could do this and who really have the requisite background in other media. There are some particular media specific challenges with archiving, but I would turn to those people who have done the same things with books, with LP records, and with video and audio tapes. The medium is certainly part of the problem, but I am sure that many of the skills you learn working in other areas translate as they have in so many other library disciplines.
Interviewer: You mentioned it would certainly be a good idea if there were policies indicating what's worth archiving and not worth archiving, do you know if anyone's done anything like that?
David: I certainly don't know the document that spells those things out. We make those decisions however in our own operations. We may not call it archiving, but we certainly make decisions about what faculty web sites move from being run from their faculty account to ours. We make a clear distinction for ourselves, at least, between content that is suited to that sort of delivery and content that really ought to be run by that faculty member. In some cases it's simply us working with the data to maintain it. The faculty member doesn't necessarily know or care what's happened because it doesn't effect the members immediate needs or his immediate uses of it. In many cases, for larger more ambitious projects, it's a matter of working with that person to explain to them what you're doing. Eventually they do some of it themselves. But, it's hard to do this because it's typically not the case that a scholar working on a scholarly edition or a set of essays is going to be focused on the library-specific needs of that data over the long term. The scholar is caught up in doing the scholarly work and gives very little thought to what he can do to make his data play well with other data like it. Our job, then, is often to provide faculty with some help and some project guidance. At best, what this means is that they do what we want, and, at worst, it means that they give us something that we can work with that has at least had some input from us. Certainly one of the things you can do as a means of investment is to provide them with a server and a support structure that is under the control of the library; however, what usually happens if you don't do that is that they show up anyway, late in the project. They then say, here, archive this, it is too important just to be on my site any longer, you've got to look after it for me. That's really where it gets expensive. So we'd much rather be involved from the beginning. Politically it doesn't do the library any harm either to be seen as the place that steps up to the plate to take that responsibility and be a partner in those things.
Interviewer: Have you had any situations where you've been left with a scholarly work or something that a faculty member has done and now the faculty member is gone, retired, so that in effect the site is static?
David: Yes, it happens most immediately of course with projects that are not faculty projects, but are graduate student projects. We don't see it as our mission to archive every student project. However, there certainly are occasional ones, even undergraduate projects that are perhaps worthwhile. It's certainly possible with some of those projects that their shelf life is not forever. Consequently, we re-evaluate them periodically to see whether they are still worth being a part of our live digital offerings. On the other hand, we've never thrown away physical books just because we think they are out of date, so there may be other values to those sites. Perhaps, going back to your earlier question, in five years time they may be ideal examples of what someone was doing five years ago. In those cases we lock them down. We don't work on them any longer except in small ways. On occasion, if they are looking particularly dated but they're still live, intellectual content, we may go in and tart up the look and feel a little bit, but they're still essentially dormant. It's amazing in fact how often we've looked at our web statistics and discovered that something that we haven't touched for three or four years is still getting huge amounts of use. One way of deciding if something is worth keeping, then, is by popular vote.
Ultimately, the point is that although we don't have a policy statement on this in writing, we do make decisions about what we are prepared to invest in. One of the things that gives you the luxury of being able to say this doesn't look like something we really want to move into the library is that a faculty member, completely independent of us, can run it from their own account. From that account, they can put up material like course listings, syllabi, or rough drafts of papers they want to share with colleagues. These are things that in past years you would circulate and photocopy; they're prepublications in a very real sense. We don't have any role really in brokering that sort of stuff; but, when it rises above that level, there is at least the opportunity for the library to be a repository. Dissertations and theses are of course another large area where things that are not ready for prime time for many years suddenly become signed off on in a very literal sense and inevitably become our responsibility.
It seems to me, then, that there's an opportunity for the library to step up and be the institution that does say, this is what we do, this is our role, and we can do it better than anybody else.
As evidenced by our summary papers and the comments made by those we interviewed, issues surrounding text fluidity and the archiving and preserving of electronic scholarly objects are important and complex. They call for systematic investigations by North American stakeholders and the international community. The problems are complex for many reasons: the ever-increasing number of publications; the many different types of scholarly electronic objects, from journal articles to scholarly sites; the number of different formats which are being used to represent the data, some of which are proprietary; and the different types of hardware these objects reside on, hardware which may cease to exist within short spans of time. This is a partial list and does not even include the larger social issues that may be an even more formidable barrier to the development of a national level solution.
Given the scope of this study and the resources available, we do not feel that we are in a position to make recommendations on an agenda for the preservation and archiving of Canada's scholarly electronic publications. However, there are several broad recommendations that we can make based on our research to-date:
- On October 2, in Ottawa there was a General Stakeholders Meeting on Research Data Archiving, Management and Access, sponsored by the Social Sciences and Humanities Research Council and the National Archives. The purpose was to make recommendations to the federal government on the structure for a new national data archiving facility. Data in this context is meant in a very broad sense to include a variety of media, including texts. If HSSFC is to have a stake in an emerging National Policy on preservation, the HSSFC should ensure that it is involved in this ongoing consultation and also work with the National Library and the National Archives who are developing their own programs.
- Five Centres at five universities are investigating the possibility of working in collaboration to create a network of Research E-Text services across Canada, somewhat along the lines of the Oxford Text Archive. The initiative would play both archiving and repository roles. Working with HSSF, SSHRC, the National Library, CARL and other organizations, this intiative could provide a powerful infrastructure for preserving Canada's scholarly electronic texts and objects. This collaboration might want to consider aligning servers (say three or four servers) across instutions along format lines. For example, one location and server would deal with ASCII and XML/SGML encoded text-like files; another with proprietary and related files, such as PDF, PostScript, WordPerfect; the third server with hyperlinked pages, and a fourth with collections of multimedia objects. A distribututed archiving model would allow sites to specialize their services and software for managing, distributing, and preserving their respective electronic objects.
- Designing methodologies to deal with text fluidity and the preservation of scholarly digital objects must be carried out at the international level. HSSFC in reviewing and addressing these questions must also be tied into that community, working with such groups as NISO (The National Information Standards Organization).
- Metadata needs to be designed by the international community for the purposes of managing, tracking, and presenting version control data. This is crucial if there is to be any confidence in electronic publications as a scholarly vehicle. There should also be metadata designed for the purposes of managing the peer review of electronic scholarly publications and sites. HSSFC could take a lead in this if sufficient resources were made available.
- Alexander, Adrian, and Marilu Goodyear. "Changing the Role of Research Libraries in Scholarly Communication." The Journal of Electronic Publishing 5.3 (March 2000) : 31 pars. 13 Sept. 2000
- Arms, William Y. "Preservation of Scientific Serials: Three Current Examples." The Journal of Electronic Publishing 5.2 (December 1999) : 47 pars. 13 Sept. 2000
- Beebe, Linda and Barbara Myers. "The Unsettled State of Archiving." The Journal of Electronic Publishing 4.4 (June 1999) : 95 pars. 13 Sept.2000
- Bearman, David. "Reality and Chimeras in the Preservation of Electronic Records." D-Lib Magazine 5.4 (April 1999) : 15 pars. 13 Sept. 2000
- Bernbom,Gerry, Joan Lippincott, and Fynnette Eaton. "Working Together: New Collaborations Among Information Professionals." CAUSE / EFFECT journal 22.2 (1999) : 27 pars. 13 Sept. 2000
- Day, Michael. "Metadata for digital preservation: an update." Ariadne 22 (Dec 1999) : 39 pars. 13 Sept. 2000
- Duranti, Luciana and Heather MacNeil. "The Protection of the Integrity of Electronic Records: An Overview of the UBC-MAS Research Project" Archivaria 42 (Fall 1996): 29 Nov. 2000
- Hedstrom, Margaret. "The Role of National Initiatives in Digital Preservation." RLG DigiNews 2.5 (October 1998) : 11 pars. 13 Sept. 2000
<http://www.rlg.org /pre serv/diginews/diginews2-5.html#feature2 >
- Hodge, Gail M. "Best Practices for Digital Archiving: An Information Life Cycle Approach." D-Lib Magazine 6.1 (January 2000) : 63 pars. 13 Sept. 2000
- Moore, Reagan, Baru Chaitan, Rajasekar Arcot, Ludaescher Bertram, Richard Marciano, Michael Wan, Wayne Schroeder, and Amarnath Gupta. "Collection-Based Persistent Digital Archives - Part 1." D-Lib magazine 6.3 ( Mar 2000) : 50 pars. 13 Sept. 2000
- Moore, Reagan, Baru Chaitan, Rajasekar Arcot, Ludaescher Bertram, Richard Marciano, Michael Wan, Wayne Schroeder, and Amarnath Gupta. "Collection-Based Persistent Digital Archives - Part 2." D-Lib magazine 6. 4 (April 2000) : 37 pars. 13 Sept. 2000
<http://www.dlib.org/dlib/apri l00/moore/04moore-pt2.html >
- Payette, Sandra and Carl Lagoze. "Value-Added Surrogates for Distributed Content: Establishing a Virtual Control Zone." D-Lib Magazine 6.6 (June 2000) : 53 pars. 14 Sept. 2000
<http://www.dlib.org/dlib/ju ne00 /payette/06payette.html>
- Philips, Margaret. "Ensuring Long-Term Access to Online Publications." Journal of Electronic Publishing 4.4 (June, 1999) : 48 pars. 14 Sept. 2000
- Smith, Abbey. "Preservation in the Future Tense." CLIR Issues 3 (May/June 1998) : 46 pars. 14 Sept. 2000
- Thomas, Timothy. "Archives in a New Paradigm of Scientific Publishing: Physical Review Online." D-Lib Magazine 3.5 (May 1998) 25 pars. 14 Sept. 2000
Papers and Conference Proceedings
- Eiteljorg, Harrison. "Preserving Electronic Data: an Active Archival Process." Electronic Media Special Interest Group. 1st Session. San Diego, California. (June 13, 1997) 20 Sept. 2000
<http://palimpsest.stanford.edu/ sg/e mg/he_paper.html>
- Greenstein, D. "A framework for sharing digital preservation practice." Digital Library Federation (18 April 2000) 14 Sept. 2000
<http://www.clir.org/diglib/preserve/p racshare.htm >
- Greenstein, D, and D. Marcum. "Minimum Criteria for an archival repository of digital scholarly journals." Digital Library Federation. (17 April 2000) 20 Sept. 2000
- Hedstrom, M. and S. Montgomery. "Digital Preservation Needs and Requirements in RLG Member Institutions." A study commissioned by the Research Libraries Group. RLG: (December 1998.) 20 Sept. 2000
- Lamont, Melissa. "Here Today, Gone Tomorrow? Preserving Electronic Government Information into the Future." 63rd IFLA General Conference. Conference Programme and Proceedings. 31 August - 5 September, 1997 20 Sept. 2000
- Treloar, Andrew. "Scholarly Publishing and the Fluid World Wide Web." Paper delivered at the AUUG '95 and Asia-Pacific World Wide Web Conference.
Reports and Miscellaneous Publications
- Bullock, Alison. Preservation of Digital Information: Issues and Current Status. Network Notes #60. Information Technology Services. National Library of Canada (April 22, 1999) 14 Sept. 2000
- Electronic Collections Coordinating Group. Networked Electronic Publications Policy and Guidelines. National Library Of Canada (October 1998) 29 Nov. 2000
- Hedstrom, Margaret. "Digital Preservation: A Time Bomb for Digital Libraries." (1997) : 38 pars. 13 Sept. 2000
- Hodge, G. and B. C. Carroll. "Digital Electronic Archiving: the State of the Art and the State of the Practice." A Report Sponsored by International Council for Scientific and Technical Information. Information Policy Committee and CENDI. Information International Associates Inc. (Date Created: 26 Apr 1999) 20 Sept. 2000
- Lynch, Clifford. "Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust." Authenticity in a Digital Environment. Council on Library and Information Resources. Washington, D.C. (May 2000) 20 Sept. 2000
- Matthews, G. A. Poulter, and E. Blagg. "Preservation of Digital Materials: Policy and Strategy for the UK." JISC/NPO Studies on the Preservation of Electronic Materials. British Library Research and Innovation Centre, 1997. ISBN: 0-7123-3313-4, ISSN: 1366-8218. British Library Research and Innovation Report 41. 20 Sept. 2000
- Rothenberg, Jeff. "Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation."
A Report to the Council on Library and Information Resources. Commission on Preservation and Access. (January 1999) 20 Sept. 2000
- Rothenberg, J. and T. Bikson. "Carrying Authentic, Understandable and Usable Digital Records Through Time." Report to the Dutch National Archives and Ministry of the Interior. RAND-Europe. (Date Created: 6 Aug 1999) 20 Sept. 2000
- Rothenberg, Jeff. "An Experiment in Using Emulation to Preserve Digital Publications." The Koninklijke Bibliotheek. RAND-Europe The Koninklijke Bibliotheek Den Haag. (Date Created: Apr 2000) 20 Sept. 2000
- Smith, Wendy. "Future Access - Long Term Preservation of Australian Electronic Legal Publications." A paper prepared for ELI 97. Standards and Issues in the Electronic Publication and Dissemination of Legal Information. College of Law, Sydney. (5 April 1997)
Initiatives and Projects
- Arts and Humanities Data Service
- "The Arts and Humanities Data Service (AHDS) is funded by the Joint Information Systems Committee of the Higher Education Funding Councils in the UK to collect, preserve, and encourage re-use of high-quality data resources which result from or support research and teaching in the arts and humanities. The AHDS Executive provides access to this page on digital preservation and AHDS publications on the creation, preservation and use of digital resources. A study on "Digital Collections: a strategic policy framework for the creation and preservation of digital resources" was undertaken for the UK Digital Archiving Working Group by the AHDS Executive."
- The Cedars Project
- Under the overall direction of CURL (the Consortium of University Research Libraries), and with funding from JISC/CEI through the eLib programme, the CEDARS project aims to address the strategic, methodological and practical issues and will provide guidance for libraries in best practice for digital preservation. The project will produce guidelines for developing digital collection management policies and about preserving different classes of digital resources, an analysis of the cost implications of digital preservation and will run pilot projects to test and promote the chosen strategy for digital preservation. The CEDARS Data Preservation Strategies Working Group is looking at issues which include the strategies available for migration, emulation and data refreshing for a variety of materials including CD-ROMs. The CEDARS project home page provides links to project information and a glossary of commonly used terms.
- The Cedars Project Team has now developed an outline specification for metadata to ensure long-term preservation for digital materials. This outline is now available for public consultation and the project team invites individuals or organisations interested or involved in long-term preservation of digital resources to read and provide feedback on this outline document.
- European Commission on Preservation and Access
- EPIC is a gateway for information on work directed at the preservation of the documentary heritage in Europe: first of all paper-based materials, but also sound, film, photographs, and digital archives. The focus on preservation as an integral element in collection management. It deals collections rather than single items, about issues like mass conservation, conversion or substitution rather than traditional restoration techniques. However preservation management relies on sound practical knowledge and research for its success, due attention is given to technical aspects as well.
- The InterPARES Project
- The InterPARES Project is a major international research initiative in which archival scholars, computer engineering scholars, national archival institutions and private industry representatives are collaborating to develop the theoretical and methodological knowledge required for the permanent preservation of authentical records created in electronic systems.
- (PADI) The National Library of Australia's Preserving Access to Digital Information
- The National Library of Australia's Preserving Access to Digital Information (PADI) initiative aims to provide mechanisms that will help to ensure that information in digital form is managed with appropriate consideration for preservation and future access.
- Its objectives are:
- 1.to facilitate the development of strategies and guidelines for the preservation of access to digital information;
- 2.to develop and maintain a web site for information and promotion purposes;
- 3.to actively identify and promote relevant activities; and
- 4.to provide a forum for cross-sectoral cooperation on activities promoting the preservation of access to digital information.
- The PADI web site is a subject gateway to digital preservation resources. It has an associated discussion list for the exchange of news and ideas about digital preservation issues.
- Research Libraries Group (RLG, US): The RLG Preservation Program (PRESERV)
- "A not-for-profit corporation of US institutions devoted to improving access to information that supports research and learning". Its "The Preservation Service - PRESERV " is a series of projects and activities that support local institutions in their efforts to preserve and thereby improve access to endangered research materials. Current projects include digital archiving, preservation and reformatting, digital image capture, preservation issues of metadata and preserving magnetic media. PRESERV focuses on achieving collaborative and innovative solutions to common preservation problems by inventing and testing new models for cooperation.... Working Groups Established to Achieve PRESERV's Strategic Plan include those on Digital Archiving, Preservation and Reformatting Information, Digital Image Capture, Preservation Issues Of Metadata, an Preserving Magnetic Media".
- UKOLN: The UK Office for Library and Information Networking<http://www.ukoln.ac.uk/>
- UKOLN, based at the University of Bath, is a national focus of expertise in digital information management. It provides policy, research and awareness services to the UK library, information and cultural heritage communities. They have published a wide variety of papers and reports dealing with digital preservation needs and the long term preservation of electronic materials.