How-to: Update PDF Metadata Using pdftk

1 minute read

Published:

I often download and read scientific journal articles on my Kobo eReader. Unfortunately, the metadata in the PDFs for such journal articles may lack title information. This means that the searchable title that appears for the article once loaded onto the eReader is something very ugly like an uninterpretable string of digits. An easy way to fix this is using the open-source tool pdftk.

A look at the man pages for pdftk via man pdftk tells us everything we need to know:

Figure (1): Manual for pdftk.

A bit of a deeper look at the man pages reveals that we need to dump the PDF meta data, modify this in a text editor, and then update the original PDF with the new metadata.

As an example, take Tsunami Propagation from a Finite Source (Carrier 2005). Once you’ve downloaded it, you can inspect the metadata contents with:

1
pdftk ~/Downloads/cmes.2005.010.113-2.pdf dump_data_ut8 output ~/Downloads/cmes.2005.utf 

Opening ~/Downloads/cmes.2005.utf, you’ll see a number of fields, one of which looks like the following:

1
2
3
InfoBegin
InfoKey: Title
InfoValue: main.dvi

If you change the InfoValue here in ~/Downloads/cmes.2005.utf to your desired name, e.g.,

1
InfoValue: Carrier 2005: Tsunami Propagation from a Finite Source

and then call

1
pdftk ~/Downloads/cmes.2005.010.113-2.pdf update_info_utf8 ~/Downloads/cmes.2005.utf output ~/Downloads/cmes_updated.pdf 

then your PDF now has the correct metadata and is ready for reading on an eReader!