1) DIFFER is a tool that analyzes and compares digital image file formats like TIFF, JPEG, JP2 and DjVu to identify properties, validate formats, detect differences and glitches.
2) It incorporates various existing tools and uses techniques like hashing, PSNR and pixel detection to analyze files.
3) The tool can be used to help set digital preservation standards and quality control for file conversion and master files.
1 of 23
More Related Content
Bedrich Vychodil DIFFER
1. DIFFER
Determinator of Image File
Format propERties
Lecture: 2012 Future Perfect, 26 MAR, 2012
Lecturer: Bedrich Vychodil
Web: www.nkp.cz, www.ndk.cz
Contact: bedrich@gmail.com
bedrich.vychodil@nkp.cz
Digital Preservation Standards Department
The National Library of the Czech Republic
2. Klementinum - built (1653–1726)
Digital Preservation Standards Department
The National Library of the Czech Republic
2
3. Overview
1992 Take-off Pilot project under UNESCO
2005 Award UNESCO/Jikji
Memory of the World Prize
2011 Current state ~10,000,000 pages
2011-14 Our goal ~26,000,000 pages
2011-16 Google ~20,000,000 pages
(200,000 books)
Digital Preservation Standards Department
The National Library of the Czech Republic
3
5. Migration from JPEG to JP2
Difference between layers
DEVIATION:
Black - Min
White - Max
JPEG JPEG2000
Digital Preservation Standards Department
The National Library of the Czech Republic
5
6. JPEG2000 Reference Chart
Master Copy Production Master Copy Production Master Copy
Used for Books, periodicals, maps, manuscripts Books, periodicals Maps, manuscripts
Conversion software used Kakadu Kakadu Kakadu
File format Part 1 (.jp2) Part 1 (.jp2) Part 1 (.jp2)
Lossy or lossless Lossless Lossy Lossy
Typical compression 1:2 to 1:3 1:20 to 1:30 1:8 to 1:10
Tiling 4096x4096 1024x1024 1024x1024
Progression order RPCL RPCL RPCL
5 or 6 5 or 6
Number of decomposition levels 5
/6 layers for over-sized material/ /6 layers for over-sized material/
Number of quality layers 1 12 /logarithmic/ 12 /logarithmic/
Code block size (xcb = ycb) 6 6 6
Transformation 5-3 reversible 9-7 irreversible 9-7 irreversible
256x256 for first tw o decomp. levels, 128 by 256x256 for first tw o decomp. levels, 128 by 256x256 for first tw o decomp. levels, 128 by
Precinct size
128 for low er levels 128 for low er levels 128 for low er levels
Regions of Interest No No No
Code block size 64x64 64x64 64x64
TLM markers Yes “R” Yes “R” Yes “R”
Bypass YES YES YES
ICC profiles YES ? YES
Embedded as XMP metadata in JP2 Embedded as XMP metadata in JP2 Embedded as XMP metadata in JP2
Metadata
XML box XML box XML box
Greatly limits the impact on bit
Cuse_sop=yes
flipping, as it limits the damage to ? ?
Cuse_eph=yes
a single block in the JPEG 2000 file
Digital Preservation Standards Department
The National Library of the Czech Republic
6
7. Kakadu Command-lines
Master Copy
kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={4096,4096}"
"Cprecincts={256,256},{128,128}" ORGtparts=R Creversible=yes Clayers=1 Clevels=5
"Cmodes={BYPASS}" -double_buffering Cuse_sop=yes Cuse_eph=yes
Production Master Copy
Compress Ratio 1:8
kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={1024,1024}"
"Cprecincts={256,256},{128,128}" ORGtparts=R -rate 3 Clayers=12 Clevels=5
"Cmodes={BYPASS}"
Compress Ratio 1:20
kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={1024,1024}"
"Cprecincts={256,256},{128,128}" ORGtparts=R -rate 1.2 Clayers=12 Clevels=5
"Cmodes={BYPASS}"
Digital Preservation Standards Department
The National Library of the Czech Republic
7
8. Differences in rendering /24bits, RGB, 300 PPI/
Photoshop CS5 KDU_show IrfanView
(v.12.0x64) (v.6.4.1) (v.4.27)
TIFF No compression
123 MB
JP2 lossless
21,5 MB
JP2 1:8
11,5 MB
JP2 1:20
4,6 MB
JP2 1:30
3,0 MB
Digital Preservation Standards Department
The National Library of the Czech Republic
8
9. Differences in rendering /24bits, RGB, 600 PPI/
Photoshop CS5 KDU_show IrfanView
(v.12.0x64) (v.6.4.1) (v.4.27)
TIFF No compression
215 MB
JP2 lossless
28,3 MB
JP2 1:8
6,7 MB
JP2 1:20
2,7 MB
JP2 1:30
1,8 MB
Digital Preservation Standards Department
The National Library of the Czech Republic
9
10. PROJECT - tool wrapper
DIFFER
(Determinator of Image File
Format propERties)
Digital Preservation Standards Department
The National Library of the Czech Republic
10
11. WHAT IT DOES
TIFF, JPEG, JP2, DjVu, (PNG, PDF)
Identification
Characterization
Validation
Visual comparison
Numerical comparison
Detection of glitches
JP2 profile validator
Digital Preservation Standards Department
The National Library of the Czech Republic
11
12. WHAT IS IN IT
JHOVE (JSTOR/Harvard Object Validation Environment)
Identifies, extracts technical metadata, and validates files
ExifTool (Read, Write and Edit Meta Information!)
Identifies and extracts technical metadata
KDU_expand (library at Kakadu)
Identifies and extracts technical metadata and properties from JP2
DJVUDUMP
Extracts internal structure of DjVu files
DROID (Digital Record Object Identification)
Identifies files
FFIdent (tool wrapper)
Identifies files
FITS (File Information Tool Set)
Identifying, validating, and extracting technical metadata
NLNZ MTD Extraction Tool (tool wrapper)
Identifies and extracts technical metadata
PRONOM (The technical registry PRONOM)
Identifies files
Jpylyzer (by van der Knijff)
JP2 validator / properties extractor file, structure checker
Digital Preservation Standards Department
The National Library of the Czech Republic
12
13. DIFFER – Finds Differences
HASH IS
EQUAL
INFINITY
PSNR
Digital Preservation Standards Department
The National Library of the Czech Republic
13
14. DIFFER – Finds Differences
HASH
IS NOT
EQUAL
26,14 dB
Digital Preservation Standards Department
The National Library of the Czech Republic
14
15. DIFFER – Finds Differences
HASH
IS NOT
EQUAL
16,76 dB
Digital Preservation Standards Department
The National Library of the Czech Republic
15
16. DIFFER – Pixels Detection
CYAN
MAGENTA
HASH
IS NOT
EQUAL
YELLOW
Digital Preservation Standards Department
The National Library of the Czech Republic
16
17. DIFFER – Glitches Detection
Digital Preservation Standards Department
The National Library of the Czech Republic
17
18. DIFFER – Glitches Detection
Digital Preservation Standards Department
The National Library of the Czech Republic
18
19. DIFFER – Corrupted file Detection
Digital Preservation Standards Department
The National Library of the Czech Republic
19
20. DIFFER – Corrupted file Detection
Digital Preservation Standards Department
The National Library of the Czech Republic
20
21. DIFFER – JP2 profile validator
PRODUCTION
MASTER COPY USER TEST
MASTER COPY
PROFILE PROFILE
PROFILE
Digital Preservation Standards Department
The National Library of the Czech Republic
21
22. Follow-up Study
Web Service – JAVA
Google Summer of Code
http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2012/home
Open Source
https://github.com/moravianlibrary/differ
MSSIM (Multi Structural SIMilarity index)
Lossless vs. Lossy for Master Copy
Digital Images Production and QC
Digital Preservation Standards Department
The National Library of the Czech Republic
22
23. Questions…?
Lecture: 2012 Future Perfect, 26 MAR, 2012
Lecturer: Bedrich Vychodil
Web: www.nkp.cz, www.ndk.cz
Contact: bedrich@gmail.com
bedrich.vychodil@nkp.cz
Digital Preservation Standards Department
The National Library of the Czech Republic