• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Shepherd Recital programs: Post Processing Guidelines

Page history last edited by Monica 2 years, 2 months ago

 

This guideline covers key tasks such as PDF creation, quality review, progress tracking and file management. It is recommended to perform these tasks on a weekly basis.

 

Software/Tools Needed:

Basic command line (DOS), Exiftool, ImageMagick, Adobe Acrobat, Adobe Bridge, Adobe xml template, Microsoft Excel

 

Contents

 


 

QC Review

 

Check List

  • TIFFs scan at project specs

  • No distortion of images has occurred

  • Images follow project filenaming convention

  • Completeness (all pages and all programs for the period are included)

 

 

Update Tracking Spreadsheet

  • Open google tracking worksheet, and goto second tab SCAN STATUS

  • calculate scan rate per batch by entering number of tiffs per current batch (Window Explorer\PROCESS folder) and hours worked (per google calendar).

  • enter total GBs for TIFF files (This is to help monitor total size used for TIFF files on local workstation).

  • If any volumes were completed, update Date Completed in INVENTORY tab

    • GOTO bottom of page for next steps

 

 

Visual Image Review

  • Open adobe bridge

  • Navigate to RAW folder. Confirm folder is empty (if any TIFFs remain, need to investigate why, maybe these need to be cropped or merely need to be moved?)

  • Navigate to PROCESS folder

  • Visually span through images, checking for any blank, distorted or cropped pages.

  • If any errors found, move all related pages of that pamphlet to the RAW folder (to be processed at next session) and/or rescan any missing pages as necessary

     

Extract exifdata (Exiftool)

Capture a snapshot of images before any further edits. This provides an inventory of digital assets with key characteristics (resolution, size, etc.) And a quick check of filenaming conventions

 

  • Go to SHEPHERD folder

  • RUN exif.bat (tip: double click on file) – this produces an output file: e.g. exifdata-date.csv

  • Move output file to Labscan folder (tip: shift + grab and drag)

 

Example exif.bat

ref: ISO Date and Time Formats (W3C-DTF) http://www.w3.org/TR/NOTE-datetime/(YYYY-MM-DD)

 

Check filename syntax

  • Open exif output file in MS excel (tip: double click on file)

  • Parse filenames into Prefix and suffix (page numbers) sections (Tip: use text-to-column function, delimiter = underscore)

  • Check filename length (Per filenaming conventions filenames should be 14 characters). TIP: insert column to right of filename and use =LEN() function. Can use the FILTER function to quickly see length of filenames. Investigate any lengths not equal to 14.  

  • Create Pivot Table to summarize Prefix ID by Filenames (see steps below)

  • Investigate any Prefix IDs with greater than 10 associated files (this may be an indicator that more than one performance occurred on the same day and the filenames do not reflex that, e.g. missing Alpha character)

 

 

Create a Pivot Table[2]

1. Click any single cell inside the data set.

2. On the Insert tab, click PivotTable.

A dialog box appears. Excel automatically selects the data for you. The default location for a new pivot table is New Worksheet.

3. Click OK.

4. Drag fields: The PivotTable field list appears at right side of new spreadsheet

  • Drag Prefix to Row Labels

  • Drag Filenames to Values

 

 

PDF Processing

 

 

Organize TIFFs into subfolders (CLI + Excel)

Storing TIFFs into subfolders supports easier file management as the number of files grows. This step is also a prerequisite for using ImageMagick commands to automate combining files into single PDFs.

 

PART I: Get Data

 

  • Open Windows Explorer Window

  • Go to PROCESS folder

  • RUN dir.bat (tip: double click on file) – this produces an output file: directory.txt

  • Open output file (tip: double click on file) – this produces a list of filenames plus path

 

 

PART II : Parse Data

 

  • Open template: subfolders.xls (stored in top level project folder)

  • Follow steps provided in spreadsheet. See detail screen shots here.

 

Example of final parsed data

:

 

PART III: Sort Data

 

1. List object identifiers

Get a list of object identifiers by using a pivot table to sort data by prefix number.

  • Open template: subfolders.xls (stored in top level project folder)

  • Goto tab labelled “table” in file subfolders.xls

  • Right click over pivot table

  • Select “refresh”

 

Example of summarized data (using Pivot Table):

 

2. Batch create subfolders

     basic command: mkdir directoryname

 

  • In template: subfolders.xls (stored in top level project folder)

  • Goto tab labelled “table”  

  • Goto middle section of worksheet (highlighted below)

  • NOTE: Make sure formulas are populated for each row in the Pivot Table. May need to copy formulas for new rows.
    • excel tip: to select a range of cells, select first cell in range, then hold shift key, press END plus Down Arrow keys
  • Copy commands

 

 

  • Open mkdir.bat should be saved within PROCESS folder (Open using NotePad)

  • Replace with new data (see figure above. Only copy commands not headers. In NotePad, select Edit>All, Control + V)

  • Save changes to mkdir.bat file
  • Run mkdir.bat (TIP: in windows explorer, double click .bat file) 
  • NOTE: to confirm folders were created by opening the PROCESS folder and count number of subfolders shown. Total number of subfolders should match number of rows in pivot table 
    • excel tip: to select a range of cells, select first cell in range, then hold shift key, press END plus Down Arrow keys

 

3. Move files into their respective subfolders

     basic command: move oldpath\file newpath\file

 

  • In template: subfolders.xls (stored in top level project folder)

  • Goto tab labelled “filenames” 

  • Goto right section of worksheet labeled "MOVE" (highlighted below)

  • NOTE: Make sure formulas are populated for each row in the worksheet that has a corresponding filename (left side of worksheet). May need to copy formulas for new filenames.
    • excel tip: to select a range of cells, select first cell in range, then hold shift key, press END plus Down Arrow keys
  • copy commands

 

  • Open move.bat (Open using NotePad)

  • Replace with new data (see figure above. Only copy commands not headers. In NotePad, select Edit>All, Control + V)

  • Save changes to move.bat file
  • Run move.bat (TIP: in windows explorer, double click .bat file)
  • NOTE: to confirm all files have been moved to corresponding subfolders by opening the PROCESS folder and viewing that no TIFFs are not within a subfolder.

 

 

Batch create PDF file (IM)

Automate creation of simple PDFs using ImageMagick commands

basic command: convert *.tif foldername.pdf

 

Note: In recent versions you have to add magick before convert e.g. magick convert

 

  • In template: subfolders.xls (stored in top level project folder)

  • Goto tab labelled “table”  

  • Goto far right section of worksheet (highlighted below)

  • NOTE: Make sure formulas are populated for each row in the Pivot Table. May need to copy formulas for new rows.
    • excel tip: to select a range of cells, select first cell in range, then hold shift key, press END plus Down Arrow keys
  • Copy commands

 

 

  • Open createPDF.bat found in the PDF folder (Open using NotePad).

    • The createPDF.bat should always be stored and executed from PDF folder 
  • Replace with new data (see figure above. Only copy commands not headers) In NotePad, select Edit>All, Control + V

  • Save changes to createPDF.bat file
  • Run createPDF.bat (TIP: in windows explorer, double click .bat file)
    • TIP: command window will close when batch job is complete

  • NOTE: Confirm all PDF files have been created, compare count of PDFs to count of subfolders in the PROCESS folder

 

 

Batch OCR PDFs (Acrobat)

  • Open Adobe Acrobat

  • Tools>Action Wizard>select action: Batch OCRd PDFs

  • Status is shown in small blue window appearing in lower right corner

 

Batch Reduce PDF File Size (Acrobat)

  • Open Adobe Acrobat

  • Tools>Action Wizard>select action: Batch Reduce PDF Filesize

  • Status is shown in small blue window appearing in lower right corner

 

Embed metadata (Bridge/XML template)

Batch embed general description, copyright and source metadata to all PDFs.

  • Open Adobe Bridge

  • Navigate to PDF folder (tip: filter by PDF file type)

  • Edit>Select all PDF files

  • Tools>Replace>select Shepherd template

  • View status bar in lower left corner for when operation is complete (no spinning wheel)

  • Check a sampling of PDFs to confirm

 

Status bar

 

Note[1]

Append will add values from the template to fields that are empty. Existing information is not replaced.

Replace adds values from the template to empty fields AND replaces existing values in fields.

 

Example of Embedded Metadata for PDFs

 

File Management and tracking updates

  • MOVE TIFF subfolders from PROCESS to TIFF-backup folder

 

 

  • Update tracking spreadsheet with size of TIFF-backup folder (confirm total shown in tracking google worksheet is same as folder)

    • Monitor the overall size in this folder, due to limited space on Indus computer.

    • FUTURE TASKS: transfer TIFFs to external drive. Guidelines forthcoming

      • - completed 12-2015 through 1996 programs

 

 

  • Update tracking spreadsheet with PDF filenames and selected technical data

    • PART A 
      • Open command window at Shepherd Folder
      • type command: exiftool -csv -r PDF>pdf.csv and press ENTER key to execute. (When the command is complete, the number of files reviewed will show on screen).
      • Open pdf.csv file in excel (tip: double click file)
      • Copy the following columns to google spreadsheet: FileName, FileSize, CreateDate, PageCount
      • Check for any duplicate filenames by copying formula in columnE. Investigate any duplicates.
      • NOTE: this data can later be used to confirm completeness (eg. when book is completed, comparing total # PDFs per physical count of programs)
    • PART B
      • additional tracking: to summarize status by academic year (box)
      • In google tracking spreadsheet, PDF List tab, COPY formulas for columns: Year, Month volume and online (highlighted in orange)
      • then in the pivot table table, update Report Editor for new range

 

 

  • MOVE exiftdata file to Fonlibstor\LabScan\Shepherd folder

  • MOVE PDFs from PDF folder to Fonlibstor\LabScan\Shepherd folder

 

 

  • Double check that RAW and PDF folders on Indus computer are empty of TIFFs and PDFs. (only *.bat files should remain)

  • Email Monica that PDFs are located on LabScan Folder (passing on to the next phase of min. metadata preparation and IR ingest)
    • include any notes about special filenaming issues that may have arose in this batch

 

  • If any volumes were completed, update Date Completed in INVENTORY tab

    • at the time when a volume is complete (and before sending box to the archives), do a physical comparison of physical  programs to PDF files

    • Investigate any missing documents.

    • Form: Shepherd physical inventory check

 

  • A complete list of PDF filenames is provided in tracking google spreadsheet

 

 

Appendix

 

Adobe settings

One time setups for software preferences and actions

 

Acrobat: Confirm software preferences

  • Edit>Preferences>Convert to PDF>TIFF, use settings below:

 

 

Acrobat: Create Action: Batch OCRd PDFs

  • Tools>Action Wizard>Create new action

  • Start with>folder on my computer>select PDF folder

  • Steps>Recognize Text (using OCR); confirm options are English lang. and Exact searchable image

  • Save to>Same folder as start and check Overwrite existing files

 

 

 

 

 

Acrobat: Create Action: Batch Reduce PDF Filesize

  • Tools>Action Wizard>Create new action

  • Start with>folder on my computer>select PDF folder

  • Steps>Document Processing>Reduce File Size

  • Save to>Same folder as start and check Overwrite existing files

 

 

 

Bridge: Set up XML metadata template

  • Tools>Create Metadata Template

  • Enter data as shown in screen shot below

 

Other resources

Steps to batch move files or folders https://digitalriceprojects.pbworks.com/w/page/61452473/Steps%20to%20batch%20move%20files

 

Embedded Image and PDF Metadata

https://digitalriceprojects.pbworks.com/w/page/50636422/Embedded%20Image%20Metadata

 

How to Write a Batch File

http://www.wikihow.com/Write-a-Batch-File

 

12

Comments (1)

Monica said

at 1:21 pm on Jul 22, 2014

tinyurl for this page: http://tinyurl.com/mazd6od

You don't have permission to comment on this page.