Creating a single PDF from web based epaper readers

Updated to work again (2016-11-08)

epaper-combined

I have a epaper subscription for the newspaper der Standard. The iPad app is good, the Android one good enough, but the web client, for the desktop, is an abomination.
You get to click throuh JPEG images, too small to read the text, and if you are interested in an article, you have to click a button, which opens a single page PDF. After reading, you go back to the awful web client, click the next page button, and the pdf one again.

There must be a better way, I thought, and so i made one.

Using Firefox’ development tools (F12) and the ‘Scratchpad’, I poked around the javascript code, which opens the single page PDF documents.
In a javascript-script, I managed to reconstruct the URL string, where the PDF documents are stored and found an array, where all the names for the documents are stored, and used them to generate a shell script, which downloads each of these files into a new folder. afterwards, the script runs pdfunite, which combines all of the pages into a single document.
Using Greasemonkey, this javascript code will be executed when you open the web client. Using Blob(), the generated shell script will be downloaded, so it can be run from the command line.

If you are interested, here is a link to the greasemonkey userscript: GitHub Gist. Feel free to adapt the code to your needs!

the output of the javascript generates a shell script like this one (annotated, no comments in real file; hashes removed, for copyright reasons):


#!/bin/bash
agent="Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" #use the same user agent as the browser (generated by js) to avoid detection

mkdir DERSTANDARD_20160409_4_WI_1 #create a folder; i am using part of the URL, as it is quite descriptive

wget http://derstandard.diginews-service.apa.at/download/{SECRET 1}/{SECRET 2}/DERSTANDARD_20160409_4_WI_2/iPad_PDF_1.pdf -U "${agent}" -O DERSTANDARD_20160409_4_WI_2/iPad_PDF_01.pdf

# ... (as you can see, not all pages of the newspaper reside in the same folder and the page number resets (in the output, i carry on with the number so the pages stay in order)

wget http://derstandard.diginews-service.apa.at/download/{SECRET 1}/{SECRET 2}/DERSTANDARD_20160409_Karrieren_4/iPad_PDF_14.pdf -U "${agent}" -O DERSTANDARD_20160409_4_WI_2/iPad_PDF_78.pdf

cd DERSTANDARD_20160409_4_WI_1
find -size 0|xargs rm # delete failed downloads, as it intereres with pdfunite
pdfunite * alles.pdf #create a single pdf file from all downloaded pages