If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Batch ingest steps

Page history last edited by Monica 5 years, 5 months ago

Pre-Ingest Steps
Ingest for new items (Files + Metadata)
Ingests to add files to existing items (Files only)
1. Bitstream mapfile (masters or other supplemental files; TXT format)
Re-Ingest / replacement requests
Metadata-only Batch Imports

Pre-Ingest Steps

applicable to any ingest type

Confirm filenames follow best practices (no spaces or special characters) and match project conventions (use consistent suffix/prefix defined syntax). See File naming conventions
Validate total items in metadata spreadsheet equal total number of digital files; if multiple files per item, validate at digital objects level (dc.identifier.digital = filenames’ prefix number)
Confirm access file types are appropriate for document type

dc.type	dc.format.medium	bitstream format
Image	photographs, etc	.jpg / jp2
Text	pamphlets, etc	*.pdf

Investigate any odd pairing (for example why would a Text based document be a jpg instead of pdf?)
however there are legitimate exceptions for seemingly mis-matched pairings. For example: a book of photographs may be classified as “Image” document type and use “PDF” file format for easier display of content. In such cases, an explanation is usually included as part of dc.description field.

Confirm digital files tie out

there is at least one access file per item in the metadata spreadsheet.
In cases where both jpeg and jpeg2000 are used, then total *.jpg = total *jp2
If masters versions are available, that there is a corresponding derivative file per master (if multiple masters per access version, then tie out at object level)
tip: create a directory listing (cmd) and then use excel pivot table to summarize by document type vs file extension. Investigate anything odd. See tips: Extract filenames and file sizes

Confirm any items without masters are clearly documented as such in dc.digitization.specifications field. (entered as separate value “No master version”).

for new collections, please create collection in DSpace prior to ingest request. Please consult with DSS staff before creating new top level IR communities.

for any new metadata fields, request update to metadata registry before ingest. Recommend reviewing existing list of available fields before requesting new one.

Ingest for new items (Files + Metadata)

General

Use the Simple Archive Format (SAF) for batch ingests. Detail steps are provided in Google Docs (staff access only)
Only COPY files to server folders. This is to avoid permission issues that may be inherited from source folders.
Batch ingests are restricted to one collection per job. If you have items to ingest that span multiple collections, you will need to request/create separate ingest ZIP packages.
If Zipped SAF is less than 400MB, it can be ingested via the UI
For larger sized ingests, please request via command line (programmer)
When requesting a programmer ingest,

Provide the collection url for the ingest in the lighthouse ticket. (e.g. where in the IR do you want your images / files to appear ?) Please feel free to discuss options for IR structure with DSS staff.)
Include a short description of ingest type (e.g. Ingest of 500 new items, all PDFs, or mixture of masters and access versions, etc.)

Bitstream description mapfile (Optional; XML format)

bitstream descriptions are optional. May use when more than one access type is ingested (mp3, pdf, jpg or multiple PDFs etc)
The structure of this mapfile should be two-column, with headers "filename" and "description" (lower case). Include full filename with extension.
Though there is no set limit to the description field per se, this data will appear next to the files displayed on the item page. Shorter text is better.

Note! Save this map file to XML format (save as “XML spreadsheet” NOT XML data) and use the filename: “descriptions.xml”

Example:

TIP: Can easily create a bistream mapfile using EXIFTOOL or simple DIR commands.

Primary bitstream mapfile (Optional; TXT format)

May use when have multiple access files and wish to designate one file to always show up as the first file in the list of bitstreams.
Primary bitstream mapfiles should be saved as a *.txt format
No Header, just list of filenames including extension.
Save with filename: “primary.txt”

Example

Ingests to add files to existing items (Files only)

For cases when supplemental files need to be added to existing items. Such cases may include: adding audio files for an oral history or Shepherd performance where the initial item contains only a PDF of the transcript or PDF of the performance program.
Create lighthouse ticket to request this type of ingest
Provide programmer the local of files on the server, the mapfile and (optionally) a bitstream description mapfile.
Indicate if you wish the files to be added to the ORIGINAL bundle for public display or MASTER bundle for preservation.
NOTE! if requesting master files, then provide a mapfile for master only items. See Preserving Master Files in the IR

Bitstream mapfile (masters or other supplemental files; TXT format)

A mapfile should contain two columns of data, matching the digital id to uri handle id (not the full web address)

Example Mapfile Contents:

mapfiles should be saved as a *.txt format (tab delimited)
first column should be digital id and then the handle.
No headers
save the mapfile with a distinct name, such as “mapfile” dash “CollectionName/ID” (eg mapfile-HAAA.txt or mapfile-1911-36136.txt)
mapfile should ONLY contain data for items with associated files for ingest.
Please prepare a mapfile for EACH collection. For communities comprised of multiple collections, each sub collection must have a separate mapfile.

Re-Ingest / replacement requests

If you need to replace pre-existing files, provide a bitstream mapfile (see steps above)
If you also need to modify related metadata then use the metadata batch import tool (see steps below)

Note on impacts of re-ingest: As of 10/18/2012, statistic data is preserved at the item level. The new re-import code preserves date.accessioned, date.available, description.provenance, and any item mappings to multiple collections. However, it does not preserve bitstream statistics since this is unique to the individual file that is being replaced.

Metadata-only Batch Imports

metadata-only ingests are made via the UI
When creating an metadata import spreadsheet include columns for ID and Collection. See detail guidelines at Metadata batch process
Use double pipe symbol to separate multiple values.
Then may manually add files to newly created items.

Comments (0)

You don't have permission to comment on this page.

Batch ingest steps

Pre-Ingest Steps

Ingest for new items (Files + Metadata)

General

Bitstream description mapfile (Optional; XML format)

Primary bitstream mapfile (Optional; TXT format)

Ingests to add files to existing items (Files only)

Bitstream mapfile (masters or other supplemental files; TXT format)

Re-Ingest / replacement requests

Metadata-only Batch Imports

Batch ingest steps

Page Tools

Insert links

Comments (0)

Navigator

SideBar

Recent Activity