Music from Flash-Based Websites

This is a tutorial to fetch JPEG files (scores) from flash based websites using a Mac (other users could use similar techniques).

Put together with lots of help from others (Thank you) by: Generoso

This includes sites like The Juilliard Manuscript Collection and others which have wonderful manuscripts. These sites use zoomify. Using a Mac (and for sure many other computers as well) it is possible to download these jpeg's simply by downloading a few programs.

1 MacOS X 10.6.x, Python2.7 and PIL
2 Copy-paste the code (dezoomify.py) into a file on your computer
3 Determine the base directory
4 Parameters:
5 The proper command is:
6 Examples
7 To Batch Download
8 Beep when finished

MacOS X 10.6.x, Python2.7 and PIL

Python Imaging Library (PIL) on Mac OS X Snow Leopard using Python 2.7

Here is how to build PIL. You must have the Apple Developer tools installed in order for this work. They are located on your Snow Leopard install disc. I am assuming you're using the Python 2.7 distribution from www.python.org. Download the linked requirements below and save them in your /tmp directory or copy them there manually.

If you wish to use the Finder to copy them, you'll need to press Command-Shift-G to get the "Go to folder dialog". Enter /tmp to open a Finder window for that directory. For the rest of this tutorial, however, you will need to know how to use the Terminal.app to complete this tutorial.

Before we can build PIL we will need to build both libjpeg and libfreetype. We will start with libjpeg. Follow these commands. $ means the user can be any non-super user. # means that the user must be logged in as a super user or root level account. (You can copy the text one-by-one after each $ sign and paste into Terminal.)

$ cd /tmp
$ tar xzvf jpegsrc.v7.tar.gz
$ cd jpeg-7
$ env CFLAGS="-O -g -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64" LDFLAGS="-arch i386 -arch x86_64" ./configure --disable-dependency-tracking
<when that completes>
$ make
<if there are no errors>
$ sudo make install

Next up we need to build libfreetype

$ cd /tmp
$ tar xzvf freetype-2.4.4.tar.gz
$ cd freetype-2.4.4
$ env CFLAGS="-O -g -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64" LDFLAGS="-arch i386 -arch x86_64" ./configure --disable-dependency-tracking
<when that completes>
$ make
<if there are no errors>
$ sudo make install

Finally before we can build PIL. First make sure you're using the correct python.

$ which python
<should be something like '/Library/Frameworks/Python.framework/Versions/2.7/bin/python'>

If you don't see 2.7 in the output from the which command, try:

$ which python2.7
<as long as you don't see '-bash: python2.7: command not found', you'll likely be ok>

If you need to use python2.7 instead of python, please replace python with python2.7 in all places going forward. Without further ado, let's get to the final step and build PIL.

$ cd /tmp
$ tar xzvf Imaging-1.1.7.tar.gz
$ cd Imaging-1.1.7
$ python setup.py build
<when that completes and if there are no errors>
$ sudo python setup.py install

Ok, let's test to see if it's installed correctly.

$ python

Python 2.7 (r27:82508, Jul 3 2010, 21:12:11)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import Image
>>>

As long as you get another empty prompt you have successfully installed and built PIL. Enjoy using it.

Copy-paste the code (dezoomify.py) into a file on your computer

Save this script dezoomify with Textmate (and not Word) and save as dezoomify.py . Remember where you save the script. If you are in a terminal, you have to "cd" (change directory) to the directory holding the dezoomify.py script, or give the full path to the script. eg. I saved mine in Applications in a folder I named Python (see script below)

$ cd /home/littlebeethoven/progs/
$ python dezoomify.py -i....

$ python /home/littebeethoven/progs/dezoomify.py -i....

Determine the base directory

How are Zoomify objects arranged?

Zoomify objects present on one page (let's say www.site.org/gallery/zoomify_page_1.html ) draw on resources in another part of the site to construct the image. These resources are:

The Zoomify tiles, little squarei mages that are pieced together to make the image you see in the Flash applet. The are divided into folders (TileGroups) of 256 tiles.
The ImageProperties.xml file, which holds vital information used in constructing the image (i.e. width and height)

The image folders and the XML file are contained in a "base directory". This is an entirely separate web directory, and could even be on a different webiste! Let's say our example base directory is www.site.org/images/zoomify/1/. The XML file is then located at www.site.org/images/zoomify/1/ImageProperties.xml, and image tiles are at www.site.org/images/zoomify/1/TileGroup0/0-0-0.jpg and so on.

So, the base directory is the important location. Dezoomify will get all the data and images from there, and the display page is just the gateway page that Dezoomify uses to find the base directory to make your life easier.

The display page http://www.juilliardmanuscriptcollection.org/composers.php#/hires/BEET/BEET_KREU

Gets resources from:

The XML file: http://juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001/ImageProperties.xml

The image tiles: http://juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001/TileGroup0/2-0-0.jpg

To Determine the base directory manually You need to get the Firefox browser, and the Firebug plugin. You then open the Firebug plugin on the Zoomify page (click the little insect logo in the bottom-right of your screen), and navigate to the "Net" panel. Reloading the page will cause the panel to fill with the reqested files, one of which is the XML info file (it will say "GET ImageProperies.xml", and hovering on it will pop up the full URL). From that you can get the base directory easily (it's the XML URL without the /ImageProperties.xml on the end).

The Base directory: http://juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001

Parameters:

Here are some of the parameter and what they do:

-b If you set the "-b" flag, the script will take the "-i" parameter as the raw base directory of the Zoomify file structure. (There is no slash at the end of the URL.)
-d If you wrote "-d" as an option, it will tell you how it is doing.
-e The "-e" flag sets the extension of the files we will be collecting and tiling: in all cases I have seen, this is the dafault, jpg.
-i The image locations are given with the "-i" parameter.
-l If you set the "-l" flag, the script takes as an input a local file containing a list of URLs, one on each line. Each of these is interpreted as before (you can give a simple page URL or a base URL with the -b parameter). If you give a list, you currently cannot specify unique file names for each output file, they will have a numeric suffix to differentiate them.
-o When it is done, the file should exist where you asked for it after the -o parameter.
-q The "-q" parameter allows you to set the quality that Python Imaging Library (PIL) http://www.pythonware.com/products/pil/ saves the image at. The default is 75, which is more than adequate for most images. Images with fine, sharp detail may need a higher quality. It is not recommended to exceed 95, as this will produce a huge image with no noticeable quality improvement. 100% quality is included in PIL mostly for testing purposes, not for actual use.
-s To save the tiles as they are downloaded (in the same directory as the output), add -s

python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -s -o c:\output.jpg

-z To get the image at a different zoom level (0 is lowest, highest depends on the image), add -z <level> eg "-z 3":

python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -z 3 -o c:\output.jpg

The proper command is:

After you have set up all of the above. Terminal and copy this in it then press return.

$ python dezoomify.py -i http://www.juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001 -b -d -o /tmp/outputfile_001.jpg

If you are in a terminal, you have to "cd" (change directory) to the directory holding the dezoomify.py script, or give the full path to the script. eg.

$ cd /home/yourname/progs/
$ python dezoomify.py -i....

$ python /home/littebeethoven/progs/dezoomify.py -i....

Then Mine would be (for the 1st page) (This is the one I use as it shows where my dezoomify.py file is.) You can copy the text...It is all on 1 line:

$ python /Applications/Python/dezoomify.py -i http://www.juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001 -b -d -o /tmp/outputfile_001.jpg

Examples

You run the script by writing a command like the examples below into the Windows or Linux (or whatever) command line, and not into the Python interactive prompt. If your prompt begins ">>>", you are in Python already, and need to exit.

To download an image from a page containing a Zoomify object:

python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -o c:\output.jpg

To download an image from a page containing a Zoomify object, saving at 90% quality:

python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -q 90 -o c:\output.jpg

To download from the base URL (add the -b flag, but don't remove the -i parameter, -b just modifies the intent of -i, and doesn't replace it):

python dezoomify.py -i http://fotothek.slub-dresden.de/zooms/df/dk/0001000/df_dk_0001708 -b -o  c:\output.jpg

To download from pages containing Zoomify objects, but using a list of page URLs:

python dezoomify.py  -i c:\list.txt -l -o c:\output.jpg

To download from pages containing Zoomify objects, but using a list of base URLs:

python dezoomify.py  -i c:\list.txt -l -b -o c:\output.jpg

To get the image at a different zoom level (0 is lowest, highest depends on the image), add -z <level> eg "-z 3":

python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -z 3 -o c:\output.jpg

To save the tiles as they are downloaded (in the same directory as the output), add -s

python dezoomify.py -i http://www.bl.uk/onlinegallery/onlineex/evancoll/a/zoomify72459.html -s -o c:\output.jpg

To Batch Download

Manually make a list of URLs/base directories in a text file in order to batch downloading these files. This would automatically get all of the 84 pages (in this case) one after the other. For instance, the base directories would go from:

http://www.juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_001 to http://www.juilliardmanuscriptcollection.org/zoomify/BEET_KREU/BEET_KREU_PF_084 .

Hint: When you initially open TextEdit your text file will be saved as .rtf (Rich Text Format, which allows colors styling and all that jazz ) If you want just a regular .txt file, Click 'Format' from the menubar, and choose 'Make Plain Text'. Now when you save your text file it will be in plain text .txt format in whatever encoding you choose at the save menu (ie, western, UTF-8 ...etc..). Save as listfile.txt

Use the list "-l" argument. The list is simply a text file with each URL on a new line.

python /Applications/Python/dezoomify.py -i /tmp/listfile.txt -l -b -d -o /tmp/outputfile.jpg

Beep when finished

Q: What would I add to the script to make it beep (or make a sound) when It finishes?

A: Untested, but You could try this:

import Carbon.Snd
Carbon.Snd.SysBeep(1)

Music from Flash-Based Websites

Contents

MacOS X 10.6.x, Python2.7 and PIL

MacOS X 10.6.x, Python2.7 and PIL

Copy-paste the code (dezoomify.py) into a file on your computer

Determine the base directory

Parameters:

The proper command is:

Examples

To Batch Download

Beep when finished