Ok so I want to download this embedded PDF/document in order to physically print it. The website allows me to view it as much as I want, but is asking me to fork over 25 + tax USD so i can download the document.
Obviously, i don’t want to do that, so I try to download the embedded document via inspect element. But, the weird thing is it not actually loading a pdf, but like really small pictures of each page:
So, my question is basically how can I download this document in order to print it?
FOR LINUX, COMPLETE AND WORKING
- Install xdotool, AutoKey
- In Firefox get Save Screenshot: https://addons.mozilla.org/en-US/firefox/addon/savescreenshot/ Then, in Firefox Shortcuts add Ctrl+1 as a hotkey to capture visible page.
- Create a script for Autokey in Python, mine is:
import time import os import subprocess pages = dialog.input_dialog(title='', message='Number of pages:', default='5').data time.sleep(1) for k in range(1,int(pages)): subprocess.run(["xdotool", "key", "ctrl+1"]) # Plugin's hotkey time.sleep(2) subprocess.run(["xdotool", "click", "1"]) # Mouseclick time.sleep(2) subprocess.run(["xdotool", "key", "ctrl+1"]) # to screenshot the last one
- In the bottom of a program, set a hotkey to launch it (I set it to Home).
- Open OP’s page and via Inspect Element find the link to embed. It’s https://www.sbcaplanroom.com/preview/2477/12610/200647
- Press F11, make the whole picture fit.
- Place mouse pointer over next page button, so it clicks each time.
- Lauch my Autokey script via Home button.
- Enter number of pages.
- See how it does it.
- Open screenshots directory in XnView, select them. Locate it’s BatchConvert tool, in Actions tab select a crop action and adjust it to pages’ margins. ACHTUNG The last one should be done differently, you can open it in XnV and crop this one alone.
- Use any tool to stitch them back together into a PDF. I’ve used PDF Arranger: https://github.com/pdfarranger/pdfarranger But some user down there said it crashed on 600-something pages document.
Compatible with any system? AFAIK autohotkey is windows-only
Damn, you are right.
I’ve found this Q-A thread about alternatives for Linux: https://unix.stackexchange.com/a/533392 I’d need to look into it.
Good god XnV and plugins aren’t.
Upd: AutoKey works. Scripts are on Python.
They say the op has magic powers!
OP, I did it: https://files.catbox.moe/6eofj6.pdf
I will edit my reply with linux specifics.
My link was updated with a slightly better PDF. Comparison on max zoom: https://files.catbox.moe/5q3v4b.png A person with a 4k display could make better, but that’s what my screen is capable of.
Either way, it was a fun puzzle for my entry knowledge of linux\python\macroses and I feel I’ll use this method a couple of times myself. Hope someone would make use of it.
Holy shit dude you are awesome, thanks alot!
You are welcome 😉
👏 👏 👏
Fuck you and thank you, mr Spez ☺
Okay so, PDF documents are actually already “a collection of images” basically. This website is clearly trying to make it an extra step harder by loading the images individually as you browse the document. You could manually save/download all the images and use a tool to turn it back into a pdf. I haven’t heard of a tool that does this automatically, but it should be possible for a web scraper to make the GET requests sequentially then stitch the pdf back together.
Just as a tiny nitpick: PDFs can be just a collection of images. If there’s text I would hope that they’re embedded as text or vectors, which is what would usually happen if you export a document from a program (that isn’t completely bs). So texts / vector graphics etc should not be images usually (unless you scan a document as a PDF - which I hate with a passion but can see that people prefer a single PDF over a collection of image files). Even then you could OCR the imges / PDF to make it searchable etc. Anyway in this case it seems to be a collection of images indeed __
To semi automate downloading all the pages, try jdownloader 2. You probably need to paste the url into jdownloader manually for jdownloader to load all the downloadable bits it finds on the page.