This document provides key characteristics and tips for creating high quality PDFs from a printed book chapter, journal article or thesis.
Though preferable to have the original born-digital document to place online, in some situations this is not possible. For example, when there is no digital version or when the electronic version is obsolete or cannot be easily converted or when copyright concerns prevent using pre-existing electronic versions. For these reasons and others, may require scanning from a physical hard copy (owned by the library) in order to place materials online in the institutional repository.
Process overview
The process is based on testing of local scanning equipment and shared practices from the field. Pages are scanned as tiff images and then converted to PDF and OCRd . Scanning high quality page images and then down sampling produces a much higher quality PDF than scanning direct to PDF.
The primary purpose is re-use and readability of text. However, text can be exported and converted to alternative formats directly from the PDF for future possible migration purposes.
No master tiff page images are retained long term. This is NOT an archival scanning method. (For archival approach see Quality control checks for archival images )
Software needed: Photoshop, Adobe Acrobat Professional
Stages
Following info is intended as a sort of checklist. Step by step instructions are created per equipment. Contact DSS staff for details.
Scanning
-
Scan each page as a single uncompressed tiff file (see filenaming techniques below)
-
Lock page dimensions so all images are exact same width x height (do not include borders)
-
Scan Text and line art pages @ 600 ppi B/W
-
Scan pages with any color artwork or photographs @ at least 400 ppi 24-bit depth (color)
Post editing
In Photoshop and as needed.
-
Straighten pages (Analysis>Ruler tool | Image>Image rotation>arbitrary)
-
Remove any noise such as black lines, shadows, etc. (Marquee tool + Delete; Fill contents = white, normal)
-
Adjust any bended or warped text due to curve of book (Edit>Transform>Skew)
-
Restore any faded text (magic wand or Filter>other>minimum; Radius=1 or2 pixels )
For color pages
-
Crop any photographs and save as separate tiff files (will re-insert as last step)
-
Remove any text bleed-through by saving page to B/W (Image>mode>bitmap; Method=50% threshold)
-
Follow steps for text above if needed
-
Save tiff as RGB (Image>mode> RGB)
-
Remove any moire pattern on image insert (Filter>Gaussian blur; range = 1 or2 pixels)
-
Remove image from page (creates box for insert) and insert cropped image (File>Place)
Examples of normalizing page images
Create PDF
In Adobe Acrobat X Pro
-
Create> Combine Files into single PDF
-
Remove any bookmarks
-
Confirm all pages are included and in the proper order
-
OCR: View>Tools>Recognize Text ; confirm language matches text AND use Searchable Image (but NOT exact) --> this will auto straighten pages
-
Optimize: View>Tools>Document Processing; 60-80%
In Adobe Acrobat Pro DC
Search for "Optimize PDF" or "Recognize Text" for same menus. Short cuts or buttons have been setup in the quick toolbar
Filenaming
-
Label files sequentially (this will help when combining tiff files in conversion to PDF); may match suffix number to actual printed page numbers
-
Do not use spaces or special characters ( % \ / @ ! # )
-
Limit length to less than 32 characters
Examples:
-
Thesis = Author Last Name plus first initial as prefix, plus underscore then page number
-
Journal Article PDF filename, use a condensed title, using dashes between words : Long-and-short-of-it.pdf
Resources
Royster, Paul, "The Art of Scanning" (2011). Digital Commons@ University of Nebraska-Lincoln. http://digitalcommons.unl.edu/ir_information/67
Comments (1)
Monica said
at 1:21 pm on Oct 27, 2013
tinyurl this page: http://tinyurl.com/mxgob5m
You don't have permission to comment on this page.