• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Batch ingest steps

Page history last edited by Monica 5 years, 5 months ago Saved with comment


 

Pre-Ingest Steps

applicable to any ingest type

 

  • Confirm filenames follow best practices (no spaces or special characters) and match project conventions (use consistent suffix/prefix defined syntax). See File naming conventions

  • Validate total items in metadata spreadsheet equal total number of digital files; if multiple files per item, validate at digital objects level (dc.identifier.digital = filenames’ prefix number)

  • Confirm access file types are appropriate for document type

 

dc.type

dc.format.medium

bitstream format

Image

photographs, etc

*.jpg / *jp2

Text

pamphlets, etc

*.pdf

 

    • Investigate any odd pairing (for example why would a Text based document be a jpg instead of pdf?)

    • however there are legitimate exceptions for seemingly mis-matched pairings. For example: a book of photographs may be classified as “Image” document type and use “PDF” file format for easier display of content. In such cases, an explanation is usually included as part of dc.description field.

 

  • Confirm digital files tie out

    • there is at least one access file per item in the metadata spreadsheet.

    • In cases where both jpeg and jpeg2000 are used, then total *.jpg = total *jp2

    • If masters versions are available, that there is a corresponding derivative file per master (if multiple masters per access version, then tie out at object level)

    • tip: create a directory listing (cmd) and then use excel pivot table to summarize by document type vs file extension. Investigate anything odd. See tips: Extract filenames and file sizes

 

  • Confirm any items without masters are clearly documented as such in dc.digitization.specifications field. (entered as separate value “No master version”).

 

  • for new collections, please create collection in DSpace prior to ingest request. Please consult with DSS staff before creating new top level IR communities.

 

 

Ingest for new items (Files + Metadata)

 

General 

 

  • Use the Simple Archive Format (SAF) for batch ingests. Detail steps are provided in Google Docs (staff access only)
  • Only COPY files to server folders. This is to avoid permission issues that may be inherited from source folders.
  • Batch ingests are restricted to one collection per job.  If you have items to ingest that span multiple collections, you will need to request/create separate ingest ZIP packages.
  • If Zipped SAF is less than 400MB, it can be ingested via the UI

  • For larger sized ingests, please request via command line (programmer)
  • When requesting a programmer ingest,
    • Provide the collection url for the ingest  in the lighthouse ticket.  (e.g. where in the IR do you want your images / files to appear ?) Please feel free to discuss options for IR structure with DSS staff.)

    • Include a short description of ingest type (e.g. Ingest of 500 new items, all PDFs, or mixture of masters and access versions, etc.)

 

Bitstream description mapfile (Optional; XML format)

  • bitstream descriptions are optional. May use when more than one access type is ingested (mp3, pdf, jpg or multiple PDFs etc)

  • The structure of this mapfile should be two-column, with headers "filename" and "description" (lower case). Include full filename with extension.
  • Though there is no set limit to the description field per se, this data will appear next to the files displayed on the item page. Shorter text is better. 

 

Note! Save this map file to XML format (save as “XML spreadsheet” NOT XML data) and use the filename: “descriptions.xml”

 

Example:

 

 

TIP: Can easily create a bistream mapfile using EXIFTOOL or simple DIR commands.

 

Primary bitstream mapfile (Optional; TXT format)

  • May use when have multiple access files and wish to designate one file to always show up as the first file in the list of bitstreams.
  • Primary bitstream mapfiles should be saved as a *.txt format
  • No Header, just list of filenames including extension.
  • Save with filename: “primary.txt” 

 

Example

 

 

 

Ingests to add files to existing items (Files only)

  • For cases when supplemental files need to be added to existing items. Such cases may include: adding audio files for an oral history or Shepherd performance where the initial item contains only a PDF of the transcript or PDF of the performance program.
  • Create lighthouse ticket to request this type of ingest
  • Provide programmer the local of files on the server, the mapfile and (optionally) a bitstream description mapfile. 
  • Indicate if you wish the files to be added to the ORIGINAL bundle for public display or MASTER bundle for preservation.
  • NOTE! if requesting master files, then provide a mapfile for master only items. See Preserving Master Files in the IR

 

Bitstream mapfile (masters or other supplemental files; TXT format)

 

  •  A mapfile should contain two columns of data, matching  the digital id to uri handle id (not the full web address)

               Example Mapfile Contents:

               

  • mapfiles should be saved as a *.txt format (tab delimited)

  • first column should be digital id and then the handle.

  • No headers

  • save the  mapfile with a distinct name, such as “mapfile” dash “CollectionName/ID”  (eg mapfile-HAAA.txt or mapfile-1911-36136.txt)

  • mapfile should ONLY contain data for items with associated files for ingest.

  • Please prepare a mapfile for EACH collection. For communities comprised of multiple collections, each sub collection must have a separate mapfile. 

 

 

Re-Ingest / replacement requests

  • If you need to replace pre-existing files, provide a bitstream mapfile (see steps above)
  • If you also need to modify related metadata then use the metadata batch import tool (see steps below)

 

Note on impacts of re-ingest: As of 10/18/2012, statistic data is preserved at the item level. The new re-import code preserves date.accessioned, date.available, description.provenance, and any item mappings to multiple collections. However,  it does not preserve bitstream statistics since this is unique to the individual file that is being replaced.

 

 

Metadata-only Batch Imports

  • metadata-only ingests are made via the  UI
  • When creating an metadata import spreadsheet include columns for ID and Collection. See detail guidelines at Metadata batch process

  • Use double pipe symbol to separate multiple values.

  • Then may manually add files to newly created items.

 

 

Comments (0)

You don't have permission to comment on this page.