• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

PDF-A file guidelines

Page history last edited by Monica 10 years, 6 months ago Saved with comment

PDF/A file Creation steps -- for Commencement Digitization project, Spring 2011  

UPDATE: converted to simple OCR'd PDFs 8/2012

Current practice is to use simple OCR'd PDFs. Found that PDF/A format limits ease of re-use or re-purpose (such as highlighting, extracting texts).  Since IR content is open access and many documents are in public domain there is no real need to lock down texts. Though these functions can be returned by removing the PDF/A setting, this would require the user to have access to software capable of removing this setting (e.g. X) which is cost prohibitive to end user as well as extra work.

 

see also How to Remove PDF/A Information from a file

 

Guidelines for creating PDF/A files in Adobe Acrobat X Professional.

 

 



 

 

Step 1: Set up the pdf

  • Open Adobe Acrobat X Professional
  • Go to "Create" menu -- choose Combine files into a Single PDF.
  •  This brings up the Add Files box. 

Select the TIFFs to be combined (from \\fonlibstor.rice.edu\Projects\Commence\Images\MasterTiffs)

Drag and drop your files into the box.

    • Alternative: You can also click on the Add Files link at the top left of the box. Navigate to the MasterTiffs folder and select all the files you want to combine. Then click Open to add the tiffs to the AddFiles box.

Make sure they are in the correct order in the Acrobat window.

o   If the file names were correctly done, the pages should automatically sort out in the correct order.

o   The exception would be if you have an item with files names including c in the page numbering for front cover, or f for frontmatter, etc – those may need to be manually arranged in Acrobat to display in the correct order.

  • Leave the conversion setting on default file size (the middle sized icon in the bottom of the screen)  and click on “Combine files”.

 

Step 2: OCR text recognition

  • Tools menu - Recognize text – In this file
    • You can also navigate by going to View > Tools > Recognize text 
  • Leave settings as: “all pages”, Language: English, PDF Output style searchable, Downsample to 600 dpi.
  • Click ok – it will process each page

 

 

Step 3: Rights metadata

 

Step 3.A – Embed Rights metadata within the PDF file

  • While in the PDF document, go to File menu, to Properties, then click on the Additional Metadata button.
  • The default screen which comes up will be for Descriptive metadata.
  • Select "Copyrighted" in the pull down menu and enter the following text block in the "Copyright Notice" box.

Published by Rice University at scholarship.rice.edu.

This work is licensed under a CreativeCommonsAttribution-NonCommercial-ShareAlike3.0 UnportedLicense. (http://creativecommons.org/licenses/3.0/).

See the item's full record online at scholarship.rice.edu for the specific item's full citation.

  • Click OK (twice) to embed the metadata

 

 Step 3.B – Add rights metadata as a separate page at the end of each PDF - OBSOLETE

  •  Open the Page Thumbnails in Adobe Acrobat – the stacked papers icon in the left side gray navigation bar. That will show you the pages you have in the PDF you are creating.
  • Using Windows Explorer or My Computer, open the Derivatives folder (located in \\fonlibstor.rice.edu\Projects\Commence\Images\Derivatives) and drag the PDF called “Published by Rice University at scholarship.pdf” into Acrobat at the end of your page thumbnails. This adds the Rights metadata page to the end of your PDF. 

  

 

Step 4: "Save As" PDF/A

  • File menu > Save As > More Options > PDF/A.
    • Notice the Settings box next to the Save settings – click on that when PDF/A is selected and it should show PDF/A1b as the setting.
    • Clicking on the Settings button every time is important, or the document may not save as a PDF/A - this seems to be a bug in Acrobat.
  • Name the file as the root file name (wrc#####) and save to \\fonlibstor.rice.edu\Projects\Commence\Images\Derivatives. Click the Save button.
    •  when we do this in TechServices, we get an error box that said "the file PreflightLib.dll file is missing or corrupt." -- problem seems to have been solved by Marcus logging in as administrator on each computer, performing the same task, then afterward everything works just fine.
  • Confirm format : Once file is finished saving, a blue box across the top of each page in the PDF will appear with the message: The file you have opened complies with the PDF/A standard and has been opened read-only to prevent modification.

 

  • Update the Google Tracking spreadsheet to show you made the derivative PDf file. Update the "Derivatives created by -- name" and "Derivatives created -- date" columns.
  • After the PDF/A is created, then the item is ready for Metadata Creation.

 


 

OBSOLETE-- Directions for creating files in Adobe Acrobat Professional 8.0 - obsolete as of 5/2011 

 

Open Adobe Acrobat Professional 8.0. 

Select File  --  Create PDF  -- From Multiple Files…

 

Click on “Add Files,”

    • navigate to the correct folder (from MasterFiles) for the object being working on and

    • select the TIFFs to be combined,

    • make sure they are in the correct order in the Acrobat window. If the file names were correctly done, the pages should automatically sort out in the correct order.

    • Leave the conversion setting on “Default File Size” and click on “Next.”

 

Leave “Merge files into a single PDF” selected and click on “Create” at the bottom-right hand corner of the window.

 

 

Step 2: Rights metadata

Step 2.A – Embed Rights metadata within the PDF file:

  • While in the PDF document, go to File menu, to Properties, to Additional Metadata.
  • The default screen which comes up will be for Descriptive metadata.
  • Select "Copyrighted" in the pull down menu and enter the following text in the "Copyright Notice" box, then click OK. 

Published by Rice University at scholarship.rice.edu.

This work is licensed under a CreativeCommonsAttribution-NonCommercial-ShareAlike3.0 UnportedLicense.(http://creativecommons.org/licenses/by-nc-sa/3.0/)

See the item's full record online at scholarship.rice.edu for the specific item's full citation.

 Step 2.B – Add rights metadata as a separate page at the end of each PDF.

  •  Open the Page Thumbnails in Adobe Acrobat – the stacked papers icon in the left side navigation. That will show you the pages you have in the PDF you are creating.
  • Open the Derivatives folder (\\fonlibstor.rice.edu\Projects\Commence\Images\Derivatives) and drag the PDF called “Published by Rice University at scholarship.pdf” into Acrobat at the end of your page thumbnails. This adds this Rights metadata page to the end of your PDF.   

 

Step 3:  SAVE:

  • click on the “Save” button that appears in the lower right hand corner, and

  • save directly to \\fonlibstor.rice.edu\Projects\Commence\....

  • “File name” only the identifier number (wrc####)

  • “Save as type”: PDF/A

  • Notice the Settings box next to the Save settings – click on that when PDF/A is selected and it should show PDF/A1b as the setting.

 

 

 

Step 4: After saving, conduct OCR Text Recognition.

  • Go to Document menu, to OCR Text Recognition, to Recognize text, and let that process the all the pages (which takes as long as making the pdf from the Tiff).  Save again.

 


 

Overview of PDF/A files with OCR from TIFFs

  • PDF/A is an archival format of PDF.

  • We will be scanning materials, saving them as TIFFs, then making PDF/A files for user access.

  • PDF/A is a format is curently in version 1. Version 2 is in development.

  • There are further compliance levels of the format. We will be using PDF/A-1, compliance level “b”, known as PDF/A-1b.

    • PDF/A-1b has the objective of ensuring reliable reproduction of the visual appearance of the document. PDF/A-1a includes all the requirements of PDF/A-1b and additionally requires that document structure be included (also known as being "tagged"/"Tagged PDF"), with the objective of ensuring that document content can be searched and repurposed. PDF/A-1a also requires Unicode character maps.

    • The requirements for Level A conformance place greater responsibilities on writers preparing conforming files, but these requirements allow for a higher level of document preservation service and confidence over time. Level A conformance also facilitates the accessibility of conforming files for physically impaired users.

    • According to the specification, the following terms are recommended when referring to the ISO 19005-1:2005 specification when the full ISO name is not being used:

  • PDF/A – a synonym for the ISO 19005 family of standards

  • PDF/A-1 – a synonym for ISO 19005-1

  • PDF/A-1a – a synonym for ISO 19005-1 Level A conformance

  • PDF/A-1b – a synonym for ISO 19005-1 Level B conformance

  • Excerpted from http://en.wikipedia.org/wiki/PDF/A, on March 4, 2011.

  • For further inf on PDF/A, see http://www.pdfa.org/doku.php?id=pdfa:en:articles

 

 

Comments (0)

You don't have permission to comment on this page.