Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
DIFFER
Determinator of Image File
Format propERties
Lecture:     2012 Future Perfect, 26 MAR, 2012
Lecturer:    Bedrich Vychodil
Web:         www.nkp.cz, www.ndk.cz
Contact:     bedrich@gmail.com
             bedrich.vychodil@nkp.cz
            Digital Preservation Standards Department
            The National Library of the Czech Republic
Klementinum - built (1653–1726)




          Digital Preservation Standards Department
          The National Library of the Czech Republic
                                                       2
Overview


1992    Take-off     Pilot project under UNESCO
2005    Award        UNESCO/Jikji
                     Memory of the World Prize

2011    Current state ~10,000,000 pages

2011-14 Our goal     ~26,000,000 pages
2011-16 Google       ~20,000,000 pages
                                              (200,000 books)




                   Digital Preservation Standards Department
                   The National Library of the Czech Republic
                                                                3
Compression Ratio TEST
                                       Scan                                   MC                            UC                                            MC/UC
                                          TIFF                                 JPEG                         DjVu                                            JPEG2000


                                                                                                  DJV photo DJV photo  DJV                                                                  JPM          JPM
                Format           BMP       TIFF        TIFF LZW    PNG      JPEG (12) JPEG (11)                                  JP2 (0)   JP2 (1:1) JP2 (1:10) JP2 (1:25) JPM photo
                                                                                                    MAX      preset manuscript                                                         standard/good standard/low

               A - 8bit, Gray   100%      100%         4,30%      2,83%     1,81%      1,20%       1,05%    0,25%     0,06%      2,45%     2,28%      1,15%     0,46%      0,41%          0,13%        0,09%
Comparison %




               A - 24bit, RGB   100%      100%         0,27%      0,21%     0,96%      0,76%       0,85%    0,38%     0,01%      0,71%     1,03%      0,38%     0,15%      0,14%          0,05%        0,05%
               B - 8bit, Gray   100%      100%         0,42%      0,19%     1,12%      0,90%       0,85%    0,38%     0,01%      0,70%     1,05%      1,05%     0,46%      0,41%          0,08%        0,08%
               B - 24bit, RGB   100%      100%         0,88%      0,60%     0,76%      0,55%       0,55%    0,20%     0,02%      0,71%     0,86%      0,37%     0,15%      0,14%          0,05%        0,04%

  File size compare
                                100%      100%        22,97%      15,70%          0,66%           14,36%    5,17%     0,54%      18,47%               0,78%                                0,14%
  to TIFF
  Storage gain 0,0%                       0,0% 77,0% 84,3%                      91,2%             85,6% 94,8% 99,5% 81,5%                           93,0%                                98,0%
  Number of layers              1 layer           1 layer         1 layer        1 layer          1 layer   1 layer   3 layer                  1 layer                                    3 layers




                                BMP                TIFF PNG
                                                    (LZW)


                                                                                    Digital Preservation Standards Department
                                                                                    The National Library of the Czech Republic
                                                                                                                                                                                                          4
Migration from JPEG to JP2
                                                         Difference between layers
                                                                 DEVIATION:
                                                                 Black - Min
                                                                 White - Max




JPEG      JPEG2000
            Digital Preservation Standards Department
            The National Library of the Czech Republic
                                                                                5
JPEG2000 Reference Chart
                                                 Master Copy                           Production Master Copy Production Master Copy
Used for                               Books, periodicals, maps, manuscripts                      Books, periodicals                              Maps, manuscripts
Conversion software used                                 Kakadu                                          Kakadu                                          Kakadu
File format                                           Part 1 (.jp2)                                   Part 1 (.jp2)                                   Part 1 (.jp2)
Lossy or lossless                                   Lossless                                           Lossy                                           Lossy
Typical compression                                    1:2 to 1:3                                     1:20 to 1:30                                     1:8 to 1:10
Tiling                                                 4096x4096                                       1024x1024                                       1024x1024
Progression order                                         RPCL                                            RPCL                                            RPCL
                                                         5 or 6                                                                                          5 or 6
Number of decomposition levels                                                                              5
                                          /6 layers for over-sized material/                                                              /6 layers for over-sized material/
Number of quality layers                                    1                                       12 /logarithmic/                                12 /logarithmic/

Code block size (xcb = ycb)                                 6                                               6                                               6
Transformation                                       5-3 reversible                                 9-7 irreversible                                9-7 irreversible
                                       256x256 for first tw o decomp. levels, 128 by   256x256 for first tw o decomp. levels, 128 by   256x256 for first tw o decomp. levels, 128 by
Precinct size
                                                  128 for low er levels                           128 for low er levels                           128 for low er levels

Regions of Interest                                        No                                               No                                              No
Code block size                                          64x64                                           64x64                                           64x64
TLM markers                                              Yes “R”                                         Yes “R”                                         Yes “R”
Bypass                                                   YES                                             YES                                             YES
ICC profiles                                              YES                                               ?                                             YES
                                        Embedded as XMP metadata in JP2                Embedded as XMP metadata in JP2                 Embedded as XMP metadata in JP2
Metadata
                                                   XML box                                        XML box                                         XML box
Greatly limits the impact on bit
                                                    Cuse_sop=yes
flipping, as it limits the damage to                                                                        ?                                               ?
                                                    Cuse_eph=yes
a single block in the JPEG 2000 file

                                                           Digital Preservation Standards Department
                                                           The National Library of the Czech Republic
                                                                                                                                                                                       6
Kakadu Command-lines
Master Copy
 kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={4096,4096}"
 "Cprecincts={256,256},{128,128}" ORGtparts=R Creversible=yes Clayers=1 Clevels=5
 "Cmodes={BYPASS}" -double_buffering Cuse_sop=yes Cuse_eph=yes

Production Master Copy
 Compress Ratio 1:8
 kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={1024,1024}"
 "Cprecincts={256,256},{128,128}" ORGtparts=R -rate 3 Clayers=12 Clevels=5
 "Cmodes={BYPASS}"

 Compress Ratio 1:20
 kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={1024,1024}"
 "Cprecincts={256,256},{128,128}" ORGtparts=R -rate 1.2 Clayers=12 Clevels=5
 "Cmodes={BYPASS}"



                                Digital Preservation Standards Department
                                The National Library of the Czech Republic
                                                                                         7
Differences in rendering                                              /24bits, RGB, 300 PPI/

                          Photoshop CS5                            KDU_show            IrfanView
                              (v.12.0x64)                            (v.6.4.1)           (v.4.27)

TIFF No compression
          123 MB

   JP2 lossless
         21,5 MB



         JP2 1:8
         11,5 MB




        JP2 1:20
           4,6 MB




        JP2 1:30
           3,0 MB



                      Digital Preservation Standards Department
                      The National Library of the Czech Republic
                                                                                                    8
Differences in rendering                                              /24bits, RGB, 600 PPI/

                          Photoshop CS5                            KDU_show            IrfanView
                              (v.12.0x64)                            (v.6.4.1)           (v.4.27)

TIFF No compression
          215 MB

   JP2 lossless
         28,3 MB



        JP2 1:8
         6,7 MB




       JP2 1:20
          2,7 MB




       JP2 1:30
          1,8 MB


                      Digital Preservation Standards Department
                      The National Library of the Czech Republic
                                                                                                    9
PROJECT - tool wrapper




DIFFER
 (Determinator of Image File
    Format propERties)
          Digital Preservation Standards Department
          The National Library of the Czech Republic
                                                       10
WHAT IT DOES
    TIFF, JPEG, JP2, DjVu, (PNG, PDF)
    Identification
    Characterization
    Validation
    Visual comparison
    Numerical comparison
    Detection of glitches
    JP2 profile validator
            Digital Preservation Standards Department
            The National Library of the Czech Republic
                                                         11
WHAT IS IN IT
         JHOVE (JSTOR/Harvard Object Validation Environment)
  Identifies, extracts technical metadata, and validates files
         ExifTool (Read, Write and Edit Meta Information!)
  Identifies and extracts technical metadata
         KDU_expand (library at Kakadu)
  Identifies and extracts technical metadata and properties from JP2
         DJVUDUMP
  Extracts internal structure of DjVu files
         DROID (Digital Record Object Identification)
  Identifies files
         FFIdent (tool wrapper)
  Identifies files
         FITS (File Information Tool Set)
  Identifying, validating, and extracting technical metadata
         NLNZ MTD Extraction Tool (tool wrapper)
  Identifies and extracts technical metadata
         PRONOM (The technical registry PRONOM)
  Identifies files
         Jpylyzer (by van der Knijff)
  JP2 validator / properties extractor file, structure checker
                                 Digital Preservation Standards Department
                                 The National Library of the Czech Republic
                                                                              12
DIFFER – Finds Differences



                                                       HASH IS
                                                       EQUAL




                                                       INFINITY
                                                       PSNR
          Digital Preservation Standards Department
          The National Library of the Czech Republic
                                                             13
DIFFER – Finds Differences


                                                       HASH
                                                       IS NOT
                                                       EQUAL




                                                       26,14 dB
          Digital Preservation Standards Department
          The National Library of the Czech Republic
                                                                14
DIFFER – Finds Differences


                                                       HASH
                                                       IS NOT
                                                       EQUAL




                                                       16,76 dB
          Digital Preservation Standards Department
          The National Library of the Czech Republic
                                                                15
DIFFER – Pixels Detection
                                           CYAN




                                           MAGENTA

                               HASH
                              IS NOT
                              EQUAL
                                           YELLOW




          Digital Preservation Standards Department
          The National Library of the Czech Republic
                                                       16
DIFFER – Glitches Detection




          Digital Preservation Standards Department
          The National Library of the Czech Republic
                                                       17
DIFFER – Glitches Detection




          Digital Preservation Standards Department
          The National Library of the Czech Republic
                                                       18
DIFFER – Corrupted file Detection




           Digital Preservation Standards Department
           The National Library of the Czech Republic
                                                        19
DIFFER – Corrupted file Detection




           Digital Preservation Standards Department
           The National Library of the Czech Republic
                                                        20
DIFFER – JP2 profile validator
               PRODUCTION
MASTER COPY                                                USER TEST
               MASTER COPY
  PROFILE                                                   PROFILE
                 PROFILE




              Digital Preservation Standards Department
              The National Library of the Czech Republic
                                                                       21
Follow-up Study
  Web Service – JAVA
  Google Summer of Code
 http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2012/home


  Open Source
 https://github.com/moravianlibrary/differ


  MSSIM (Multi Structural SIMilarity index)
  Lossless vs. Lossy for Master Copy
  Digital Images Production and QC

                             Digital Preservation Standards Department
                             The National Library of the Czech Republic
                                                                               22
Questions…?
 Lecture:     2012 Future Perfect, 26 MAR, 2012
 Lecturer:    Bedrich Vychodil
 Web:         www.nkp.cz, www.ndk.cz
 Contact:     bedrich@gmail.com
              bedrich.vychodil@nkp.cz
             Digital Preservation Standards Department
             The National Library of the Czech Republic

More Related Content

Bedrich Vychodil DIFFER

  • 1. DIFFER Determinator of Image File Format propERties Lecture: 2012 Future Perfect, 26 MAR, 2012 Lecturer: Bedrich Vychodil Web: www.nkp.cz, www.ndk.cz Contact: bedrich@gmail.com bedrich.vychodil@nkp.cz Digital Preservation Standards Department The National Library of the Czech Republic
  • 2. Klementinum - built (1653–1726) Digital Preservation Standards Department The National Library of the Czech Republic 2
  • 3. Overview 1992 Take-off Pilot project under UNESCO 2005 Award UNESCO/Jikji Memory of the World Prize 2011 Current state ~10,000,000 pages 2011-14 Our goal ~26,000,000 pages 2011-16 Google ~20,000,000 pages (200,000 books) Digital Preservation Standards Department The National Library of the Czech Republic 3
  • 4. Compression Ratio TEST Scan MC UC MC/UC TIFF JPEG DjVu JPEG2000 DJV photo DJV photo DJV JPM JPM Format BMP TIFF TIFF LZW PNG JPEG (12) JPEG (11) JP2 (0) JP2 (1:1) JP2 (1:10) JP2 (1:25) JPM photo MAX preset manuscript standard/good standard/low A - 8bit, Gray 100% 100% 4,30% 2,83% 1,81% 1,20% 1,05% 0,25% 0,06% 2,45% 2,28% 1,15% 0,46% 0,41% 0,13% 0,09% Comparison % A - 24bit, RGB 100% 100% 0,27% 0,21% 0,96% 0,76% 0,85% 0,38% 0,01% 0,71% 1,03% 0,38% 0,15% 0,14% 0,05% 0,05% B - 8bit, Gray 100% 100% 0,42% 0,19% 1,12% 0,90% 0,85% 0,38% 0,01% 0,70% 1,05% 1,05% 0,46% 0,41% 0,08% 0,08% B - 24bit, RGB 100% 100% 0,88% 0,60% 0,76% 0,55% 0,55% 0,20% 0,02% 0,71% 0,86% 0,37% 0,15% 0,14% 0,05% 0,04% File size compare 100% 100% 22,97% 15,70% 0,66% 14,36% 5,17% 0,54% 18,47% 0,78% 0,14% to TIFF Storage gain 0,0% 0,0% 77,0% 84,3% 91,2% 85,6% 94,8% 99,5% 81,5% 93,0% 98,0% Number of layers 1 layer 1 layer 1 layer 1 layer 1 layer 1 layer 3 layer 1 layer 3 layers BMP TIFF PNG (LZW) Digital Preservation Standards Department The National Library of the Czech Republic 4
  • 5. Migration from JPEG to JP2 Difference between layers DEVIATION: Black - Min White - Max JPEG JPEG2000 Digital Preservation Standards Department The National Library of the Czech Republic 5
  • 6. JPEG2000 Reference Chart Master Copy Production Master Copy Production Master Copy Used for Books, periodicals, maps, manuscripts Books, periodicals Maps, manuscripts Conversion software used Kakadu Kakadu Kakadu File format Part 1 (.jp2) Part 1 (.jp2) Part 1 (.jp2) Lossy or lossless Lossless Lossy Lossy Typical compression 1:2 to 1:3 1:20 to 1:30 1:8 to 1:10 Tiling 4096x4096 1024x1024 1024x1024 Progression order RPCL RPCL RPCL 5 or 6 5 or 6 Number of decomposition levels 5 /6 layers for over-sized material/ /6 layers for over-sized material/ Number of quality layers 1 12 /logarithmic/ 12 /logarithmic/ Code block size (xcb = ycb) 6 6 6 Transformation 5-3 reversible 9-7 irreversible 9-7 irreversible 256x256 for first tw o decomp. levels, 128 by 256x256 for first tw o decomp. levels, 128 by 256x256 for first tw o decomp. levels, 128 by Precinct size 128 for low er levels 128 for low er levels 128 for low er levels Regions of Interest No No No Code block size 64x64 64x64 64x64 TLM markers Yes “R” Yes “R” Yes “R” Bypass YES YES YES ICC profiles YES ? YES Embedded as XMP metadata in JP2 Embedded as XMP metadata in JP2 Embedded as XMP metadata in JP2 Metadata XML box XML box XML box Greatly limits the impact on bit Cuse_sop=yes flipping, as it limits the damage to ? ? Cuse_eph=yes a single block in the JPEG 2000 file Digital Preservation Standards Department The National Library of the Czech Republic 6
  • 7. Kakadu Command-lines Master Copy kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={4096,4096}" "Cprecincts={256,256},{128,128}" ORGtparts=R Creversible=yes Clayers=1 Clevels=5 "Cmodes={BYPASS}" -double_buffering Cuse_sop=yes Cuse_eph=yes Production Master Copy Compress Ratio 1:8 kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={1024,1024}" "Cprecincts={256,256},{128,128}" ORGtparts=R -rate 3 Clayers=12 Clevels=5 "Cmodes={BYPASS}" Compress Ratio 1:20 kdu_compress -i example.tif -o example.jp2 "Cblk={64,64}" Corder=RPCL "Stiles={1024,1024}" "Cprecincts={256,256},{128,128}" ORGtparts=R -rate 1.2 Clayers=12 Clevels=5 "Cmodes={BYPASS}" Digital Preservation Standards Department The National Library of the Czech Republic 7
  • 8. Differences in rendering /24bits, RGB, 300 PPI/ Photoshop CS5 KDU_show IrfanView (v.12.0x64) (v.6.4.1) (v.4.27) TIFF No compression 123 MB JP2 lossless 21,5 MB JP2 1:8 11,5 MB JP2 1:20 4,6 MB JP2 1:30 3,0 MB Digital Preservation Standards Department The National Library of the Czech Republic 8
  • 9. Differences in rendering /24bits, RGB, 600 PPI/ Photoshop CS5 KDU_show IrfanView (v.12.0x64) (v.6.4.1) (v.4.27) TIFF No compression 215 MB JP2 lossless 28,3 MB JP2 1:8 6,7 MB JP2 1:20 2,7 MB JP2 1:30 1,8 MB Digital Preservation Standards Department The National Library of the Czech Republic 9
  • 10. PROJECT - tool wrapper DIFFER (Determinator of Image File Format propERties) Digital Preservation Standards Department The National Library of the Czech Republic 10
  • 11. WHAT IT DOES  TIFF, JPEG, JP2, DjVu, (PNG, PDF)  Identification  Characterization  Validation  Visual comparison  Numerical comparison  Detection of glitches  JP2 profile validator Digital Preservation Standards Department The National Library of the Czech Republic 11
  • 12. WHAT IS IN IT  JHOVE (JSTOR/Harvard Object Validation Environment) Identifies, extracts technical metadata, and validates files  ExifTool (Read, Write and Edit Meta Information!) Identifies and extracts technical metadata  KDU_expand (library at Kakadu) Identifies and extracts technical metadata and properties from JP2  DJVUDUMP Extracts internal structure of DjVu files  DROID (Digital Record Object Identification) Identifies files  FFIdent (tool wrapper) Identifies files  FITS (File Information Tool Set) Identifying, validating, and extracting technical metadata  NLNZ MTD Extraction Tool (tool wrapper) Identifies and extracts technical metadata  PRONOM (The technical registry PRONOM) Identifies files  Jpylyzer (by van der Knijff) JP2 validator / properties extractor file, structure checker Digital Preservation Standards Department The National Library of the Czech Republic 12
  • 13. DIFFER – Finds Differences HASH IS EQUAL INFINITY PSNR Digital Preservation Standards Department The National Library of the Czech Republic 13
  • 14. DIFFER – Finds Differences HASH IS NOT EQUAL 26,14 dB Digital Preservation Standards Department The National Library of the Czech Republic 14
  • 15. DIFFER – Finds Differences HASH IS NOT EQUAL 16,76 dB Digital Preservation Standards Department The National Library of the Czech Republic 15
  • 16. DIFFER – Pixels Detection CYAN MAGENTA HASH IS NOT EQUAL YELLOW Digital Preservation Standards Department The National Library of the Czech Republic 16
  • 17. DIFFER – Glitches Detection Digital Preservation Standards Department The National Library of the Czech Republic 17
  • 18. DIFFER – Glitches Detection Digital Preservation Standards Department The National Library of the Czech Republic 18
  • 19. DIFFER – Corrupted file Detection Digital Preservation Standards Department The National Library of the Czech Republic 19
  • 20. DIFFER – Corrupted file Detection Digital Preservation Standards Department The National Library of the Czech Republic 20
  • 21. DIFFER – JP2 profile validator PRODUCTION MASTER COPY USER TEST MASTER COPY PROFILE PROFILE PROFILE Digital Preservation Standards Department The National Library of the Czech Republic 21
  • 22. Follow-up Study  Web Service – JAVA  Google Summer of Code http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2012/home  Open Source https://github.com/moravianlibrary/differ  MSSIM (Multi Structural SIMilarity index)  Lossless vs. Lossy for Master Copy  Digital Images Production and QC Digital Preservation Standards Department The National Library of the Czech Republic 22
  • 23. Questions…? Lecture: 2012 Future Perfect, 26 MAR, 2012 Lecturer: Bedrich Vychodil Web: www.nkp.cz, www.ndk.cz Contact: bedrich@gmail.com bedrich.vychodil@nkp.cz Digital Preservation Standards Department The National Library of the Czech Republic