This is a tutorial to fetch JPEG files (scores) from flash based websites using a Mac (other users could use similar techniques).
Put together with lots of help from others (Thank you) by: Generoso
This includes sites like The Juilliard Manuscript Collection and others which have wonderful manuscripts. These sites use zoomify. Using a Mac (and for sure many other computers as well) it is possible to download these jpeg's simply by downloading a few programs.
Contents |
Python Imaging Library (PIL) on Mac OS X Snow Leopard using Python 2.7
Here is how to build PIL. You must have the Apple Developer tools installed in order for this work. They are located on your Snow Leopard install disc. I am assuming you're using the Python 2.7 distribution from www.python.org. Download the linked requirements below and save them in your /tmp directory or copy them there manually.
If you wish to use the Finder to copy them, you'll need to press Command-Shift-G to get the "Go to folder dialog". Enter /tmp to open a Finder window for that directory. For the rest of this tutorial, however, you will need to know how to use the Terminal.app to complete this tutorial.
Before we can build PIL we will need to build both libjpeg and libfreetype. We will start with libjpeg. Follow these commands. $ means the user can be any non-super user. # means that the user must be logged in as a super user or root level account. (You can copy the text one-by-one after each $ sign and paste into Terminal.)
Next up we need to build libfreetype
Finally before we can build PIL. First make sure you're using the correct python.
If you don't see 2.7 in the output from the which command, try:
If you need to use python2.7 instead of python, please replace python with python2.7 in all places going forward. Without further ado, let's get to the final step and build PIL.
Ok, let's test to see if it's installed correctly.
Python 2.7 (r27:82508, Jul 3 2010, 21:12:11) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import Image >>>
As long as you get another empty prompt you have successfully installed and built PIL. Enjoy using it.
Save this script dezoomify with Textmate (and not Word) and save as dezoomify.py . Remember where you save the script. If you are in a terminal, you have to "cd" (change directory) to the directory holding the dezoomify.py script, or give the full path to the script. eg. I saved mine in Applications in a folder I named Python (see script below)
or
How are Zoomify objects arranged?
Zoomify objects present on one page (let's say www.site.org/gallery/zoomify_page_1.html ) draw on resources in another part of the site to construct the image. These resources are:
The image folders and the XML file are contained in a "base directory". This is an entirely separate web directory, and could even be on a different webiste! Let's say our example base directory is www.site.org/images/zoomify/1/. The XML file is then located at www.site.org/images/zoomify/1/ImageProperties.xml, and image tiles are at www.site.org/images/zoomify/1/TileGroup0/0-0-0.jpg and so on.
So, the base directory is the important location. Dezoomify will get all the data and images from there, and the display page is just the gateway page that Dezoomify uses to find the base directory to make your life easier.
The display page http://www.juilliardmanuscriptcollection.org/composers.php#/hires/BEET/BEET_KREU
Gets resources from:
The XML file: http://juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001/ImageProperties.xml
The image tiles: http://juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001/TileGroup0/2-0-0.jpg
To Determine the base directory manually You need to get the Firefox browser, and the Firebug plugin. You then open the Firebug plugin on the Zoomify page (click the little insect logo in the bottom-right of your screen), and navigate to the "Net" panel. Reloading the page will cause the panel to fill with the reqested files, one of which is the XML info file (it will say "GET ImageProperies.xml", and hovering on it will pop up the full URL). From that you can get the base directory easily (it's the XML URL without the /ImageProperties.xml on the end).
The Base directory: http://juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001
Here are some of the parameter and what they do:
python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -s -o c:\output.jpg
python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -z 3 -o c:\output.jpg
After you have set up all of the above. Terminal and copy this in it then press return.
If you are in a terminal, you have to "cd" (change directory) to the directory holding the dezoomify.py script, or give the full path to the script. eg.
or
Then Mine would be (for the 1st page) (This is the one I use as it shows where my dezoomify.py file is.) You can copy the text...It is all on 1 line:
You run the script by writing a command like the examples below into the Windows or Linux (or whatever) command line, and not into the Python interactive prompt. If your prompt begins ">>>", you are in Python already, and need to exit.
To download an image from a page containing a Zoomify object:
python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -o c:\output.jpg
To download an image from a page containing a Zoomify object, saving at 90% quality:
python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -q 90 -o c:\output.jpg
To download from the base URL (add the -b flag, but don't remove the -i parameter, -b just modifies the intent of -i, and doesn't replace it):
python dezoomify.py -i http://fotothek.slub-dresden.de/zooms/df/dk/0001000/df_dk_0001708 -b -o c:\output.jpg
To download from pages containing Zoomify objects, but using a list of page URLs:
python dezoomify.py -i c:\list.txt -l -o c:\output.jpg
To download from pages containing Zoomify objects, but using a list of base URLs:
python dezoomify.py -i c:\list.txt -l -b -o c:\output.jpg
To get the image at a different zoom level (0 is lowest, highest depends on the image), add -z <level> eg "-z 3":
python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -z 3 -o c:\output.jpg
To save the tiles as they are downloaded (in the same directory as the output), add -s
python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -s -o c:\output.jpg
Manually make a list of URLs/base directories in a text file in order to batch downloading these files. This would automatically get all of the 84 pages (in this case) one after the other. For instance, the base directories would go from:
http://www.juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001 to http://www.juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_084 .
Hint: When you initially open TextEdit your text file will be saved as .rtf (Rich Text Format, which allows colors styling and all that jazz ) If you want just a regular .txt file, Click 'Format' from the menubar, and choose 'Make Plain Text'. Now when you save your text file it will be in plain text .txt format in whatever encoding you choose at the save menu (ie, western, UTF-8 ...etc..). Save as listfile.txt
Use the list "-l" argument. The list is simply a text file with each URL on a new line.
python /Applications/Python/dezoomify.py -i /tmp/listfile.txt -l -b -d -o /tmp/outputfile.jpg
Q: What would I add to the script to make it beep (or make a sound) when It finishes?
A: Untested, but You could try this:
import Carbon.Snd Carbon.Snd.SysBeep(1)