Local Guidelines for uploading master files
Our cultural heritage materials are scanned at high resolution, resulting in large bit sized master files. Past practices for long term storage has been to backup master files to an independent secure file server outside the IR. Access to this server is heavily restricted; it is cumbersome to transfer files and it can be difficult to track down any particular file as the relationship between the master file and the IR record is not always obvious. Furthermore, we receive a number of patron requests for copies of master quality files, making the immediate retrieval of these masters a more urgent priority for supporting user needs.
To better manage our master files, we now are transferring them to the IR environment for long term management. In implementing this new storage procedure, we will limit masters to library staff access only and a custom ingest script has been developed to add masters files to existing records (in lieu of having to re-ingest the entire collection). Despite adding a very large amount of data to the IR environment, we do not anticipate any impact on IR performance since the masters are hidden from the public and the search function searches metadata and extracted text, not the actual files.
Systematically linking master files directly to their metadata using the IR infrastructure (versus storing tiffs loosely on a backup server)
Provide direct access to master versions by archivists on an as needed basis
Inclusion of masters in the duracloud preservation strategy (masters are deemed higher risked assets as they are difficult and costly to re-create if lost or damaged)
For more information, please see Presentation slides: Digital Preservation: Tales from the Precipice between theory and practice
Preference is for sustainable file formats that retain greatest quality (note 1). It is also recommended to embedded key metadata within the digital file itself. Please see more on embedding photometadata.
Media |
Preferred |
Audio |
WAV and/or Broadcast Wave @ 24 bit depth 96 kHz sampling rate |
Video (note2) |
- Motion JPE.G. 2000 (ISO/IEC 15444-4)(*.mj2), or - AVI (uncompressed, motion JPEG) (*.avi), or - QuickTime Movie (uncompressed, motion JPEG) (*.mov) 4:4:4 data sampling method |
Images / Text (note3) |
- TIFF (uncompressed) - JPEG2000 (lossless) (*.jp2) 24 RBG-color bit depth 300 ppi Note: resolution rates may vary depending on physical source (1) |
Note 1: Sustainability of Digital Formats: Planning for Library of Congress Collections. Caroline R. Arms and Carl Fleischhauer. http://www.digitalpreservation.gov/formats/
Note 2: Video or moving image "master" formats may vary greatly depending on a number of factors. These may include: quality of source files, sustainability of source formats (e.g. codex), evaluation of content for preservation (is access quality sufficient?), etc. Each collection will need to be assess to recommend a "master" version. For more information please see Audio-Video processing workflow
Note 3: Text based materials are typically scanned as tiff images (which become the "master" quality file) and access versions are typically the converted tiff files to OCR’d PDFs.
Batch ingests are performed by system admins, but require some initial data gathering and arrangement by collection curators.
Arrange digital files :
master files may be organized into object subfolders (eg. lots of master files per item, such as pages of a book) or may be stored within one main directory (eg. lots of single tiff files such as individual photographs).
Prepare mapfile. A mapfile should contain two columns of data, matching the digital id to uri (handle).
Example Mapfile Contents:
3. Create lighthouse ticket to request a batch ingest from DSpace programmers. Describe arrangement of files (e.g. are files arranged in subfolders?) , provide mapfile or it's location on server and the collection URL where these files should be added.
Mapfiles can be prepared in a number of ways. The below examples provide different methods for identifying master files and their corresponding digital objects ids. This information can later be used to validate that the master ingest was successful.
If the collection was created following current file management practices, then the data for a mapfile already exists as part of the item metadata record (Method 1). This may not be the case for legacy collections or collections created by Vendors. Please see methods 2 or 3 for such cases. Also Methods 2 and 3 contain data at the file level that can be helpful in troubleshooting any issues where total files ingested does not match total master files on the server.
run command: DIR *.tif /s /b /a-d >output.txt
Overview: Once ingest is complete, ensure masters have been uploaded completely. Compare data from the curation task : count masters to directory listing of digital files taken from the server. if counts do not match, need to investigate differences and re-run ingest of any missing masters.
DSPACE | |
SERVER |
Count of master files by item |
Is equal to |
Count of digital files by object |
Identify which handles have reported mismatched numbers for master files – look in the directory and compare to the bundle in dspace to see what is missing. Create a new mapfile for the missing tiffs and send to programmer for ingest.
Example: Results from Curation Task:
The task, file counts was completed with the status: Success.
|
total count = 47 files
tip: can copy / past above data to excel and use “text to data” to parse out handle (item identifier) from count.
The final step is to delete master files (and the Project Folder if this ingest was one time upload). This is an important step as it frees up needed space on project server for processing new collections.
Suitable for single file upload or when there is a very small number of master files to be added.
Log into DSpace (must have collection admin privileges)
Navigate to item
Under “Context”, select “Edit item”
At the Edit Item Screen, select “Item bitstreams”
Select “upload a new bitstream” (located at bottom of list)
In the “Upload a new bitstream” screen
change Bundle type to Masters
browse to add file (see recommended file types above)
Press Upload button
Should receive notice on screen “The new bitstream was successfully uploaded.” and file will appear under the Bundle: MASTER group in the Bitstream window.