United States Patent: (10) Patent No.: (45) Date of Patent
United States Patent: (10) Patent No.: (45) Date of Patent
United States Patent: (10) Patent No.: (45) Date of Patent
time ----->
Turn on App Gathering image Augmented Reality
Watermark Read (AR)
(point Camera at frames from
subject surface) Carea
Watermark
image Modification:mammammammamx Detection
US 9,684.941 B2
Page 2
(56) References Cited Genc, Marker-less Tracking for AR: A Learning-Based Approach,
Proc. 1st IEEE/ACM Int. Symp. on Mixed and Augmented Reality,
U.S. PATENT DOCUMENTS Aug. 2002, pp. 295-304.
Shi et al. (Jun. 1994) “Good features to track.” Proc. 1994 IEEE
7,116,781 B2 10/2006 Rhoads Comp. Sci. Soc. Conf. on Computer Vision and Pattern Recognition
7.289,643 B2 10/2007 Brunk et al. (CVPR 1994), pp. 593-600.
7,359,526 B2 4/2008 Nister
7,443,537 B2 10/2008 Reed Skrypnyk, Iryna et al. “Scene Modelling. Recognition and Tracking
7,616,807 B2 11/2009 Zhang et al. with Invariant Image Features,” 2004, Proceedings of the Third
7,715,446 B2 5, 2010 Rhoads et al. IEEE and ACM International Symposium on Mixed and Augmented
7,761,326 B2 7/2010 Miyaoku et al. Reality, 10 pages.
8, 107,721 B2 1/2012 Beardsley et al. U.S. Appl. No. 61/719,920, filed Oct. 29, 2012.
8,243,980 B2 8, 2012 Rhoads et al. U.S. Appl. No. 61/749,767, filed Jan. 7, 2013.
8,565,815 B2 10/2013 Rhoads et al. Wang, S.; Wu, A.Y.; Rosenfeld, A., in IEEE Transactions on Pattern
8,762,852 B2 6, 2014 Davis et al.
8,831,279 B2 9/2014 Rodriguez et al. Analysis and Machine Intelligence, vol. PAMI-3, Nov. 1981, p.
8,855,712 B2 10/2014 Lord et al. 687-696.
8,886.222 B1 11/2014 Rodriguez et al. David Lowe. “Distinctive Image Features from Scale-Invariant
9,008,353 B2 4/2015 Aller Keypoints.”.
9,269,022 B2 2/2016 Rhoads David Lowe, "Object Recognition from Local Scale-Invariant Fea
9,398,210 B2 7, 2016 Stach et al.
2001/004O979 A1 1 1/2001 Davidson et al. tures.” International Conference on Computer Vision, Corfu,
2003, OO12410 A1 1/2003 Nawab et al. Greece (Sep. 1999), pp. 1150-1157.
2004/O1285.12 A1* 7/2004 Sharma .............. G06Q 20/3823 Bonato etal, "Parallel Hardware Architecture for Scale and Rotation
713, 176 Invariant Feature Detection.” IEEE Trans on Circuits and Systems
2006, OO31684 A1 2/2006 Sharma et al. for Video Tech, vol. 18, No. 12, 2008.
2006/028O246 A1 12/2006 Alattar et al. Seet al. “Vision Based Modeling and Localization for Planetary
2008/0174570 A1 7/2008 JobS et al. Exploration Rovers.” Proc. of Int. Astronautical Congress (IAC),
2008/0252727 A1 10/2008 Brown et al. Oct. 2004.
2010, O2327 27 A1 9/2010 Engedal Mikolajczyk et al., “Performance Evaluation of Local Descriptors.”
2010/0322469 A1 12/2010 Sharma
2012/0154633 A1* 6/2012 Rodriguez ............... 348,231.99 IEEE Trans. Pattern Anal. Mach. Intell., vol. 27. No. 10, pp.
2012/0210233 A1* 8, 2012 Davis ................. G06Q 30/0201 1615-1630, 2005.
71.5/727 Bay et al. SURF: Speeded Up Robust Features, Eur, Conf. on
2012,030.0979 A1* 11, 2012 Pirchheim ............. GO6T 7,2033 Computer Vision (1), pp. 404-417, 2006.
382,103 Chen et al. “Efficient Extraction of Robust Image Features on
2014/002885.0 A1* 1/2014 Keating ................ GO6T 19,006 Mobile Devices.” Proc. of the 6th IEEE and ACM Int. Symp. on
348,158 Mixed and Augmented Reality, 2007.
2014/0037137 A1 2/2014 Broaddus ................ GO6T 7,004 Takacs et al. “Outdoors Augmented Reality on Mobile Phone Using
382,103 Loxel-Based Visual Feature Organization.” ACM Int. Conf. on
2014/0044304 A1 2/2014 Rhoads Multimedia Information Retrieval, Oct. 2008.
2014f0071268 A1 3/2014 Lord et al. Klein, et al., “Parallel Tracking and Mapping on a camera phone.”
2014/O112524 A1 4/2014 Bai et al. Mixed and Augmented Reality, ISMAR 2009, 8th IEEE Interna
2014/037581.0 A1 12/2014 Rodriguez tional Symposium on Oct. 19-22, 2009.
2016,0189381 A1 6, 2016 Rhoads Benhimane et al., “Homography-based 2d visual tracking and servo
ing.” The International Journal of Robotics Research, vol. 26, No.
OTHER PUBLICATIONS 7, pp. 661-676, Jul. 2007.
Ruiz, P. E. L. de Teruel, and L. Fernandez. Practical planar metric
AZuma, “A Survey of Augmented Reality.” Presence: Teleoperators rectification. In Proc. BMVC 2006.
and Virtual Environments 6, 4, Aug. 1997, pp. 355-385. Pirchheim, et al., “Homography-Based Planar Mapping and Track
Kato et al., “Marker Tracking and HMD Calibration for a Video ing for Mobile Phones.” IEEE International Symposium on Mixed
Based Augmented Reality Conferencing System.” Proc. Of IWAR and Augmented Reality, 2011.
99, San Francisco, CA, Oct. 20-21, 1999, pp. 85-94. Koz, “Watermarking for 3D Representations.” A Thesis Submitted
Lepetit et al. (Oct. 2003) “Fully Automated and Stable Registration to the Graduate School of Natural and Applied Sciences of Middle
for Augmented Reality Applications.” Proc. 2. Sup,nd IEEE/ACM East Technical University, Aug. 2007.
Int'l Symposium on Mixed and Augmented Reality, pp. 93-102. Paucher et al., “Location-Based Augmented Reality on Mobile
Reitmayr, "Going Out: Robust Model-based Tracking for Outdoor Phones.” IEEE Conf. on Computer Vision and Pattern Recognition,
Augmented Reality.” Proc. 5. Sup,th IEEE/ACM Int. Symp. on Jun. 13, 2010, pp. 9-16.
Mixed and Augmented Reality, 2006, pp. 109-118. Rohs, “Using Camera-Equipped Mobile Phones for Interacting with
Rekimoto, Matrix: A Realtime Object Identification and Registra Real-World Objects.” Advances in Pervasive Computing, 2004.
tion Method for Augmented Reality, Proc. of Asia Pacific Computer
Human Interaction, pp. 63-68, Jul. 1998. * cited by examiner
U.S. Patent Jun. 20, 2017 Sheet 1 of 5 US 9,684.941 B2
ingerprin
input Calculatoff Database Database
Signals Waternark Entry Organization
Embedder
100 O2 104. O6
identification
Network
Fig. 1
ingerprint
Captured Calculatorf Database
Receiver Waternark Search
Extracter
110 12 A
Output
device 116
Metadata
i............................................ Databases
Fig. 2
U.S. Patent Jun. 20, 2017 Sheet 2 of 5 US 9,684.941 B2
CEL PHONE
Processor
Display MEMORY:
Op. SyS.
U
TouchSCreen SW Modules
EtC,
Physical U.
40
Fig. 3
U.S. Patent Jun. 20, 2017 Sheet 3 of 5 US 9,684.941 B2
Capture image
Frames
identify image
Key Points FG 5
Resolve image
Pose by reference
to the Kev Points
F.G. 6
time --~~~-8-
Pose Detection ... ... -- a-- a--ax a-a-as saw war wax ex Lar as *** * -> Updated Pose for
Graphics
'--------------------------------------- '------------------ “
: t :
image Modification --> Watermark is a ~ as a r - x
Detection
FG. 8
US 9,684.941 B2
1. 2
DETERMINING POSE FOR USE WITH For some applications, it is possible to identify an image
DIGITAL WATERMARKING, using a one-to-many pattern matching scheme. Images to be
FINGERPRINTING AND AUGMENTED uniquely identified are enrolled in a reference database,
REALITY along with metadata. In image fingerprinting schemes,
image features are stored in the reference database. Then, to
APPLICATION FIELD recognize an image, Suspect images, or its features, are
matched with corresponding images or features in the ref
This application claims the benefit of U.S. Provisional erence database. Once matched, the reference database can
Patent Application No. 61/719,920, filed Oct. 29, 2012. This provide associated digital data stored with the image.
application is also related to U.S. Provisional Patent Appli 10 Data carrying signals and matching schemes may be used
cation No. 61/749,767, filed Jan. 7, 2013. Each of the above together to leverage the advantages of both. In particular, for
patent documents is hereby incorporated herein by reference applications where maintaining the aesthetic value or the
in its entirety. information content of the image is important, a combina
tion of digital watermarking and image fingerprinting can be
TECHNICAL FIELD 15 used.
Combinations of watermarks and fingerprints for content
This disclosure relates to digital signal processing, image identification and related applications are described in
rendering (including Raster Image Processing for print), assignees U.S. Patent Publications 2006003.1684 and
image recognition, data signal detection, and computer 20100322469, which are each hereby incorporated by ref
generated graphics in conjunction with live image capture erence in its entirety. Watermarking, fingerprinting and
and recognition (e.g., Augmented Reality). content recognition technologies are also described in
assignee's U.S. Patent Publication 20060280246 and U.S.
BACKGROUND AND SUMMARY Pat. Nos. 6,122,403, 7,289,643, 6,614,914, and 6,590,996
which are each hereby incorporated by reference in its
There are a variety of ways to encode machine readable 25 entirety.
information on objects, and in particular, on printed objects. In many applications, it is advantageous to insert auxiliary
Conventional visible data carriers for printed media include data in printed object in a way that does not impact the other
various forms of bar codes, including monochrome (e.g., visual information on the object, yet still enables the data to
black and white) 1D and 2D bar codes, as well as newer be reliably retrieved from an image captured of the object.
higher density codes that use additional colors to carry data. 30 To achieve this, a technique to exploit the gap between the
One example of higher density bar codes are data glyphs, limit of human visual perception and the limit of an image
which are marks (e.g., forward and back slash marks) sensor has been developed. The gamut of human visual
printed at higher resolution. When viewed from a distance, perception and the gamut of an image sensor are defined in
glyph codes can appear as a uniform tone, and as Such, can terms of characteristics of the rendered output, including
be printed in the background around other visual informa 35 spatial resolution or spatial frequency and color. Each gamut
tion. is a multi-dimensional space expressed in terms of these
In these types of data carriers, the elementary units (bars characteristics. The gap between the gamut of human and
or data glyph marks) are independent of other visual infor sensor perception is a multidimensional space that our data
mation and convey auxiliary data. A mark or arrangement of insertion schemes exploit to insert auxiliary data without
marks is a pattern that corresponds to an auxiliary data 40 impacting other visual information on the object.
symbol. To read the data from a printed object, the object is This multi-dimensional gap is a 5-dimensional space (2
first optically scanned with an image sensor, converting light spatial+3 color) or higher (spatial/color shapes, frequencies,
to an electronic signal. The electronic signal is then analyzed distributions) where our methods insert:
to detect the elements of the mark and convert them to data. (1) uniform texture watermarks (independent of con
Digital watermarking is a machine readable code in which 45 tent but controlled for visibility), and
the data is hidden within an image leveraging human vis (2) content-based watermarks where the content is used as
ibility models to minimize visual impact on the image. For a reference framework. As a reference, the content is either
certain types of applications where image information is altered in a measurable but imperceptible way or used (e.g.,
sparse, the auxiliary data signal can still be applied to the edges) to locate and orient an underlying variation that is
printed object with minimal visual impact by inserting 50 intended to keep the content unchanged.
imperceptible structures having spatial frequency and color Digital printing is becoming increasingly more advanced,
beyond the range of human visual perception. The auxiliary enabling greater flexibility and control over the image
data signal can be conveyed by printing ink structures or characteristics used for data insertion when preparing an
modifying existing structures with changes that are too small image for printing. The process of preparing a digital image
to see or use colors that are difficult to discern. As such, 55 for printing encompasses conversion of an image by a Raster
digital watermarking techniques provide the flexibility of Image Processor, Raster Image Processing, halftoning, and
hiding data within image content, as well as inserting data in other pre-print image processing. Background on these
parts of a printed object where there is little or no other processes is provided below.
visual information. Along with advances in printing, the gamut of even
These types of visible and hidden data carriers are useful 60 widely used image sensors is becoming greater. For hidden
for applications where there is a need to convey variable data insertion, the challenge is to insert the data in the
digital data in the printed object. Hidden data carriers human-sensor perception gap so that it can be widely
increase the capacity of printed media to convey visual and detected across many consumer devices. Of course, for
machine readable information in the same area. Even printed certain security applications, more expensive printers and
objects, or portions of objects (such as logos, pictures or 65 image scanners can be designed to insert security features
graphics on a document or package) that appear identical are and expand the gamut of the Scanning equipment used to
transformed into variable data carriers. detect such features. This is useful to detect security features
US 9,684.941 B2
3 4
and/or tampering with Such features. However, the human between these two types of patterns. One particular example
device perception gap is Smaller for more widely deployed is an image that appears as a uniform texture, yet a water
sensors, such as those commonly used in mobile devices like mark pattern is inserted into it by modulating the line Screen
smart phones and tablet PCs. frequency and direction according to a watermark signal
Our data insertion methods exploit the gap more effec pattern. In particular, the locations of a watermark pattern
tively through data insertion in the process of preparing a are printed using a higher frequency line pattern at first
digital image for printing. Additional control over the pro direction (e.g., vertical Screen angle). The other locations are
cess of inserting auxiliary data is achieved by implementing printed with a lower frequency line pattern in another
the process in the Raster Image Processor (RIP). direction (e.g., diagonal screen angle). The watermark signal
A raster image processor (RIP) is a component used in a 10 is modulated into the image by selection of a higher fre
printing system which produces a raster image also known quency Screen at an arrangement of spatial locations that
as a bitmap. The bitmap is then sent to a printing device for form the watermark signal pattern. When printed, these
output. The input may be a page description in a high-level locations look similar to Surrounding locations. However,
page description language such as PostScript, Portable when Scanned, the sensor sees these locations as being
Document Format, XPS or another bitmap of higher or 15 different, and the watermark pattern in the resulting elec
lower resolution than the output device. In the latter case, the tronic image is easier to detect.
RIP applies either Smoothing or interpolation algorithms to This approach allows a whole set of new messaging
the input bitmap to generate the output bitmap. techniques to be used in the range between 150 lpi and 600
Raster image processing is the process and the means of lpi where 2 spatial dimensions and 3 dimensions of color
turning vector digital information Such as a PostScript file information can be inserted. This information can be water
into a high-resolution raster image. A RIP can be imple mark, barcode or any other signaling mechanism.
mented either as a software component of an operating
system or as a firmware program executed on a micropro BRIEF DESCRIPTION OF THE DRAWINGS
cessor inside a printer, though for high-end typesetting,
standalone hardware RIPs are sometimes used. Ghostscript 25 FIG. 1 is a block diagram illustrating the creation of a
and GhostPCL are examples of software RIPs. Every Post content recognition system using fingerprints and water
Script printer contains a RIP in its firmware. marks.
Half-toning is a process of converting an input image into FIG. 2 is a block diagram illustrating the content identi
halftone structures used to apply ink to a medium. The fication process.
digital representation of a halftone image is sometimes 30 FIG. 3 is a diagram of a cell phone, which may be used
referred to as a binary image or bitmap, as each elementary in Some content recognition systems.
image unit or pixel in the image corresponds to the presence, FIG. 4 is a diagram showing image capture of a subject
or not, of ink. Of course, there are more variables that can Surface.
be controlled at particular spatial location, Such as various FIG. 5 is a block diagram of resolving pose information
color components (CMYK and spot colors). Some advanced 35 from captured imagery.
printers can control other attributes of the ink placement, FIG. 6 is a timeline associated with resolving pose infor
Such as its density or spatial depth or height. mation to aid digital watermark detection.
This half-toning process is typically considered to be part FIG. 7 is a diagram of an Augmented Reality system
of the RIP or Raster Image Processing. In some printing providing a video overlay in a device display that corre
technologies, these halftone structures take the form of 40 sponds to a watermarked area on a Subject Surface.
clustered dots (clustered dot half-toning). In others, the FIG. 8 shows a Subject area including a watermarked area
halftone structures take the form of noise-like dot patterns having different watermarked areas.
(e.g., stochastic screens, blue noise masks, etc.).
Our patent literature provides several techniques for digi DETAILED DESCRIPTION
tal watermarking in the halftone process. Examples of these 45
techniques are detailed in U.S. Pat. Nos. 6,694,041 and The process of digital watermark insertion includes gen
6,760.464, which are each hereby incorporated herein by erating a watermark signal, and then using that signal to
reference in its entirety. modulate characteristics of the image in the human-sensor
New printing techniques enable very fine structures to be gap. As described above, this process is preferably con
created in the RIP which will appear visually identical to the 50 ducted at the RIP stage to enable control over the image
eye. For example a 50% gray can be created with a con representation used to control application of ink to a print
ventional clustered dot screen pattern at 150 lines per inch, media.
or exactly the same visual effect can be created with a much In prior work, several methods for generating the water
higher frequency line structure Such as a stochastic screen. mark signal, and for detecting the watermark signal in
Usually, these two structures are not mixed on one page, as 55 images captured of printed objects, are detailed. Please see
they have very different dot gain characteristics and require U.S. Pat. Nos. 6,614,914 and 6,590,996, which are incor
different corrections. However, our methods are able to porated by reference. Therefore, for this discussion, the
correct for the mechanical dot gain, so that the two patterns focus is on techniques used within the RIP to insert the
appear identical when they appear on the same page. See, in watermark signal.
particular, our prior work in dot gain correction, printer 60 In one implementation, the watermark signal is generated
calibration, and compensating for printer and scanner as an array of watermark signal elements. These elements
effects, in U.S. Pat. Nos. 6,700,995, 7,443,537, and U.S. are mapped to spatial locations within an image block, called
Patent Publication 20010040979, which are each hereby a tile. This tile is then replicated (e.g., tiled in a regular,
incorporated herein by reference in its entirety. contiguous array of blocks in two dimensions) across the
Mobile devices have a capture resolution of much greater 65 area of the host image in which the watermark signal is to
than 150 lpi (resolution of newer phones, such as iPhone 4 be inserted. At a spatial location where there is image
is about 600 lpi or better), so they can be used to distinguish content in the host image, the watermark signal element is
US 9,684.941 B2
5 6
used to modify the host image content at that location to the edges and then detecting the watermark signal relative to
carry the watermark signal element, Subject to constraints set these edges (e.g., by correlating the image signal with the
for perceptual masking. These constraints enable the water regular pattern of the data signal).
mark signal to be increased or decreased (possibly to Zero), d) Use the edges of the content to define a signal perpen
depending on perceptual masking, and desired watermark dicular to the edges in an imperceptible set of dimensions
signal strength. Conversely, where there is no image content, (color & spatial frequency). As in the previous example, the
the watermark signal element is either not applied, or it can edges provide a reference orientation and location of the
be asserted as a texture, using colors and spatial resolution watermark signal.
that make it difficult to discern. As such, for every location e) Use higher dimensional shapes/patterns of color and
in the watermark signal mapping, there is an opportunity for 10 spatial variations where pixels separated spatially may still
watermark modulation. be close in either spatial or color patterns. This reduces
As noted in more examples below, the watermark signal sensitivity to geometric distortions.
need not be mapped to a uniform array of blocks. One In some embodiments, those higher frequency spatial/
alternative is use feature points in the image to form a spatial color variations are designed to take advantage of lower
reference for insertion of a data signal. 15 resolution devices to generate shifts in image characteristics
The watermark signal can be comprised of a single data that can be measured. The data signal elements are inserted
component, or more than one component, as detailed in U.S. to exploit the Bayer pattern of RGB sensors to enhance a
Pat. No. 6,614.914. One component is a direct sequence desired data signal that would otherwise be imperceptible.
spread spectrum modulated data signal. This component is These signal elements are designed to induce distortion
generated by applying error correction coding (convolu (e.g., aliasing, or a color shift) in the image captured through
tional coding) to a data signal, which produces an error the sensor of the printed object. This distortion at the data
correction coded data signal. This signal is than modulated signal locations enhances the pattern because the shift in
onto pseudorandom carrier signals to produce a spread signal characteristics at these locations increases the data
spectrum modulated signal, which is then mapped to loca signal at these locations relative to Surrounding image
tions in a tile. This is one example of watermark signal 25 content and noise. For example, aliasing caused by capturing
generation, and there are many others. a high frequency screen region with a lower frequency
Above, a method of inserting a watermark signal by sensor creates a detectable data signal element at that region.
varying the print structure within the RIP to modulate the A similar effect can also be achieved by modulating ink
watermark into an image is illustrated. A specific example is height using a printer that is capable of controlling the height
given for varying the density and direction or angle of print 30 of ink deposited at a particular location. These printers
primitives (e.g., line structures or dots) used to print a enable control over height of ink by building up ink at a
particular color in an image having a uniform tone. A data particular print location. This is useful for authentication or
signal pattern may also be introduced by varying the half copy protection applications.
tone screen type for different regions of an image. Print The height of the structure can be used to carry informa
structures can vary among a set of screening types, including 35 tion by viewing at an angle with a device Such as a fixed
noise like (e.g., stochastic screens) and structured (clustered focus (or Lytro) camera.
dot, line screens, etc.). This approach is not limited to The height variations can also be designed to cause color
watermark modulation of images with uniform tones, as it changes that are used to carry information. When the print
applies to inserting watermarks into various types of image is viewed normally, these height variations would be imper
COntent. 40 ceptible if the pigment is opaque. This information can be
Some embodiment examples—include the following: watermark, barcode or any other signaling mechanism.
a) Choose the angle in colorspace of watermark signal The above methods apply to variety of print primitives,
modulation (e.g., in the ab plane of Lab colorspace) to be and are not limited to particular line screens or clustered dot
different at different regions throughout the image. In one structures. With control over the RIP, the shape, spatial
class of digital watermark embodiments, these regions cor 45 frequency, and orientation of structures can be specifically
respond to watermark signal elements, and an arrangement designed to exploit sensor geometries and Modulation
of the regions forms a spatial watermark signal pattern of Transfer Function (MTF) characteristics to cause discrimi
watermark signal elements. Data may be modulated into the nation between local regions of an image. For example,
pattern using spread spectrum modulation as noted above, or small lines slanted left and right at different spatial frequen
other data modulation schemes. The arrangement, orienta 50 cies. Or solid dots vs. tiny dot clusters which contain the
tion and shape of these regions may be designed to convey same ink density on physical object, but differ in color after
alternative data code signals. Multiple data signals may be acquisition through a class of image sensors (such as those
interleaved for different spatial locations, as well as different sensors widely used in Smartphone cameras). Some regions
directions in color space. may use a form of noise like dot pattern (e.g., stochastic
b) Choose the spatial frequency of watermark signal modu 55 screening), while others use a shape with particular struc
lation to be different at different regions throughout the ture, like a clustered dot or line screen. The dot gain varies
image. Similar data insertion as mentioned for section a) with number of edges (perimeter) of the print structures, so
also applies to this section b). the amount of dot gain correction is also adapted based on
c) Use the edges of the image content to define a signal along the print structure. For example, in the example above where
the edges in an imperceptible set of dimensions (color & 60 Some regions are printed with high frequency line structures
spatial frequency). In this case, the edges detected in the and others with lower frequency, the line widths in the high
image are used as a reference for the watermark signal. frequency structure have to be reduced more than the line
Thus, rather than being arranged in a pre-determined array widths in the lower frequency structure to compensate for
of blocks or regions, the watermark signal is inserted along dot gain.
the direction of the edge. Along this edge, the watermark 65 Another approach that can be implemented within the RIP
signal can have a regular pattern or structure to facilitate is to transform the image into a form for printing so that it
detection. The watermark signal is detected by first finding has carefully controlled noise characteristics. The noise
US 9,684.941 B2
7 8
characteristics can be set globally across an image to indi a scale-localized Laplacian transform of the image. The
cate the presence of a watermark. The noise itself can difference of Gaussians approach is an approximation of
comprise digital data, such as a spread spectrum modulated Such Laplacian operation, expressed in a pyramid setting.)
data signal. Alternatively, the RIP can generate an image The above procedure typically identifies many keypoints
with a pattern of regions that are detectable based on 5 that are unsuitable, e.g., due to having low contrast (thus
distinguishable noise characteristics. The arrangement of being Susceptible to noise), or due to having poorly deter
this pattern can be used as a reference signal to provide the mined locations along an edge (the Difference of Gaussians
location and orientation of a watermark signal inserted in the function has a strong response along edges, yielding many
image. candidate keypoints, but many of these are not robust to
The watermark may also be conveyed using a reversible 10 noise). These unreliable keypoints are screened out by
image transform or detailed image characterization by performing a detailed fit on the candidate keypoints to
manipulating the image through either transform coefficients nearby data for accurate location, Scale, and ratio of prin
or through local noise manipulations in a detectable yet cipal curvatures. This rejects keypoints that have low con
imperceptible way. One form of reversible transform is the trast, or are poorly located along an edge.
grayscale medial axis transform applied separately to the 15 More particularly this process starts by for each candi
color directions. See, in particular, Image approximation date keypoint interpolating nearby data to more accurately
from gray scale medial axes by Wang, S.; Wu, A. Y.: determine keypoint location. This is often done by a Taylor
Rosenfeld, A., in IEEE Transactions on Pattern Analysis and expansion with the keypoint as the origin, to determine a
Machine Intelligence, vol. PAMI-3, November 1981, p. refined estimate of maxima/minima location.
687-696. The value of the second-order Taylor expansion can also
A stochastic modeling approach that allows for detectable be used to identify low contrast keypoints. If the contrast is
manipulations is the Markov Random Field (MRF) model less than a threshold (e.g., 0.03), the keypoint is discarded.
that can be used to define local pixel relationships that To eliminate keypoints having strong edge responses but
convey watermark signal data elements. The MRF manipu that are poorly localized, a variant of a corner detection
lation is particularly interesting because it can be designed 25 procedure is applied. Briefly, this involves computing the
to have particular noise properties that might be exploited at principal curvature across the edge, and comparing to the
the detector. See, How to generate realistic images using principal curvature along the edge. This is done by Solving
gated MRF's Marc'Aurelio Ranzato Volodymyr Minih Geof for eigenvalues of a second order Hessian matrix.
frey E. Hinton, Department of Computer Science, Univer Once unsuitable keypoints are discarded, those that
sity of Toronto 30 remain are assessed for orientation, by a local image gradi
SIFT Description ent function. Magnitude and direction of the gradient are
SIFT is an acronym for Scale-Invariant Feature Trans calculated for every pixel in a neighboring region around a
form, a computer vision technology developed by David keypoint in the Gaussian blurred image (at that keypoints
Lowe and described in various of his papers including scale). An orientation histogram with 36 bins is then com
“Distinctive Image Features from Scale-Invariant Key 35 piled with each bin encompassing ten degrees of orienta
points.” International Journal of Computer Vision, 60, 2 tion. Each pixel in the neighborhood contributes to the
(2004), pp. 91-110; and "Object Recognition from Local histogram, with the contribution weighted by its gradients
Scale-Invariant Features.” International Conference on magnitude and by a Gaussian with O 1.5 times the scale of
Computer Vision, Corfu, Greece (September 1999), pp. the keypoint. The peaks in this histogram define the key
1150-1157, as well as in U.S. Pat. No. 6,711,293. 40 points dominant orientation. This orientation data allows
SIFT works by identification and description—and sub SIFT to achieve rotation robustness, since the keypoint
sequent detection—of local image features. The SIFT fea descriptor can be represented relative to this orientation.
tures are local and based on the appearance of the object at From the foregoing, plural keypoints are different scales
particular interest points, and are invariant to image scale, are identified—each with corresponding orientations. This
rotation and affine transformation. They are also robust to 45 data is invariant to image translation, Scale and rotation. 128
changes in illumination, noise, and some changes in view element descriptors are then generated for each keypoint,
point. In addition to these properties, they are distinctive, allowing robustness to illumination and 3D viewpoint.
relatively easy to extract, allow for correct object identifi This operation is similar to the orientation assessment
cation with low probability of mismatch and are straight procedure just-reviewed. The keypoint descriptor is com
forward to match against a (large) database of local features. 50 puted as a set of orientation histograms on (4x4) pixel
Object description by a set of SIFT features is also robust to neighborhoods. The orientation histograms are relative to
partial occlusion; as few as 3 SIFT features from an object the keypoint orientation and the orientation data comes from
can be enough to compute location and pose. the Gaussian image closest in scale to the keypoints scale.
The technique starts by identifying local image features— As before, the contribution of each pixel is weighted by the
termed keypoints—in a reference image. This is done by 55 gradient magnitude, and by a Gaussian with O 1.5 times the
convolving the image with Gaussian blur filters at different scale of the keypoint. Histograms contain 8 bins each, and
scales (resolutions), and determining differences between each descriptor contains a 4x4 array of 16 histograms around
Successive Gaussian-blurred images. Keypoints are those the keypoint. This leads to a SIFT feature vector with
image features having maxima or minima of the difference (4x4x8=128 elements). This vector is normalized to
of Gaussians occurring at multiple scales. (Each pixel in a 60 enhance invariance to changes in illumination.
difference-of-Gaussian frame is compared to its eight neigh The foregoing procedure is applied to training images to
bors at the same scale, and corresponding pixels in each of compile a reference database. An unknown image is then
the neighboring scales (e.g., nine other scales). If the pixel processed as above to generate keypoint data, and the
value is a maximum or minimum from all these pixels, it is closest-matching image in the database is identified by a
selected as a candidate keypoint. 65 Euclidian distance-like measure. (A “best-bin-first algo
(It will be recognized that the just-described procedure is rithm is typically used instead of a pure Euclidean distance
a blob-detection method that detects space-scale extrema of calculation, to achieve several orders of magnitude speed
US 9,684.941 B2
10
improvement.) To avoid false positives, a “no match’ output One form of robust hash for video is a waveform con
is produced if the distance score for the best match is structed from statistics of each frame in the video as shown
close—e.g., 25%—to the distance score for the next-best in FIGS. 160 and 162 in FIG. 6 of the 684 publication.
match. These statistics can be representing compactly as a vector of
To further improve performance, an image may be the changes in the statistic from frame to frame in a video
matched by clustering. This identifies features that belong to sequence, such as a Group of Pictures (GOP) in a video
the same reference image—allowing unclustered results to coding format like MPEG. Examples of the statistics that
be discarded as spurious. A Hough transform can be used— can be used for the fingerprint include the frame average for
identifying clusters of features that vote for the same object 10
luminance and the variance. For compressed streams, fin
pose. gerprints can be extracted from the compressed data, Such as
An article detailing a particular hardware embodiment for the DCT coefficients in the I-frames, the motion vectors, etc.
performing the SIFT procedure, suitable for implementation FIG. 2 is a block diagram illustrating the content identi
in a next generation cell phone, is Bonato et al., “Parallel fication process. Incoming signals 109 are captured in a
Hardware Architecture for Scale and Rotation Invariant 15 receiver 110. This includes still or video image capture in
Feature Detection.” IEEE Trans on Circuits and Systems for which images are captured and digitized with an image
Video Tech, Vol. 18, No. 12, 2008. sensor like a camera or other image capture device, as well
An alternative hardware architecture for executing SIFT as ambient audio capture by microphone. It also includes
techniques is detailed in Se et al., “Vision Based Modeling receipt of audio, image or video content in a broadcast or
and Localization for Planetary Exploration Rovers.” Proc. of transmission channel, including broadcast stream or file
Int. Astronautical Congress (IAC), October, 2004. transfer. The recognition process may be invoked as part of
While SIFT is perhaps the most well-known technique for a systematic Internet monitoring or broadcast monitoring of
generating robust local descriptors, there are others, which content signals, in home audience measurement, batch data
may be more or less Suitable—depending on the application. base searching and content indexing, or user requests for
These include GLOH (c.f., Mikolajczyk et al., “Performance 25 content recognition and metadata searching. The fingerprint
Evaluation of Local Descriptors.” IEEE Trans. Pattern Anal. calculator/watermark extracter 112 computes fingerprints
Mach. Intell. Vol. 27, No. 10, pp. 1615-1630, 2005) and and/or watermarks for incoming content items and issues
SURF (c.f., Bay et al. SURF: Speeded Up Robust Features.” them to a database for database search for matching finger
Eur. Conf. on ComputerVision (1), pp. 404-417, 2006; Chen prints and data look up for watermark based identifiers 114.
et al., “Efficient Extraction of Robust Image Features on 30 The fingerprint matches found in the search process and
Mobile Devices.” Proc. of the 6th IEEE and ACM Int. Symp. watermark identifiers provide content identification (a num
On Mixed and Augmented Reality, 2007; and Takacs et al. ber or some other form of index for metadata lookup), which
“Outdoors Augmented Reality on Mobile Phone Using in turn, enables look up of metadata corresponding to the
Loxel-Based Visual Feature Organization.” ACM Int. Conf. content identification in one or more metadata databases.
on Multimedia Information Retrieval, October 2008). 35 The metadata is then returned to device 116 for display/
Watermarking and Fingerprinting System Configurations output or further processing. This may involve returning
FIG. 1 is a block diagram illustrating the creation of a metadata to a device that requested the database search or
content recognition system using fingerprints and water some other device to which the search results are directed
marks. The digitized input image/video/audio signals 100 (e.g., a user's home device, or a monitoring system's data
are input to the fingerprint calculator/watermark embedder 40 collection database in which the metadata and recognition
102, which computes multiple fingerprints for each content events are aggregated and compiled for electronic report
item to be uniquely recognized, and also watermarks the generation).
content item. In a database entry process 102, the finger AR Exploitation
prints are entered and stored in a database, along with Sometimes watermark detection needs properly aligned
additional information, such as metadata for the content 45 image data to establish a proper registration for reliable
item, a digital master copy for use as needed (see Patent payload recovery. Suitable image alignment is difficult to
Application Publication 20100322469 for description of achieve in many mobile environments. For example, and
techniques involving use of original content in watermark with reference to FIG. 4, a Smartphone captures imagery of
detection and determining location within content). A data a subject Surface (e.g., a magazine, newspaper, object, etc.).
base organization process 106 in a database system sorts and 50 The pose relative to the Smartphone's video camera and the
arranges the fingerprints in a data structure. Such as a tree Subject Surface (sometimes referred to as "image pose”)
structure to enable fast searching and matching. This data changes as a user positions the phone to capture video. In
base itself may be distributed over an array of computers in this context, pose can include perspective angle, Scale,
an identification network (108). This network receives que rotation and translation.
ries to identify or recognize content items based on a stream 55 I have developed methods and systems to accurately
of fingerprints and/or watermarks from a requesting device, estimate geometry capture distortion and modify imagery
Such as a users handheld mobile device or other computing prior to watermark detection. This can be used in connection
device (node in a network of monitoring devices). with augmented reality overlays to provide rich user expe
US Patent Publication No. 2006028.0246 includes the riences. But it all starts with determining the correct relative
following description: 60 pose.
Fingerprinting is a method of identifying multimedia As an initial overview, and with reference to FIG. 5,
content by deriving a number or set of numbers that uniquely captured image frames are analyzed to identify key points.
identify that content. The fingerprint may be fragile, such as These key points can be tracked over time to resolve relative
a secure hash (e.g., SHA, MD5, etc.) or robust. In the case image geometry including pose. The captured imagery can
of a robust fingerprint, the fingerprint is expected to remain 65 be modified according to the resolved geometry to remove
relatively the same despite processing distortion due to any distortion introduced by relative camera positioning
broadcasting, compression, geometrical distortion, etc. including, e.g., removing rotation, perspective angle, Scale,
US 9,684.941 B2
11 12
etc. The watermark detector can analyze the modified, al, "Homography-based 2d visual tracking and servoing.”
captured imagery in search of a previously hidden digital The International Journal of Robotics Research, Vol. 26, No.
watermark. 7, pages 661-676, July 2007, could be used to represent a
Our methods can be implemented by many suitable transform between key points in different image frames. The
electronic devices. One example is a portable device includ Benhimane paper is hereby incorporated herein by reference
ing a video camera, e.g., Such as a Smartphone, tablet, pad, in its entirety. In noisy imagery, we've found that 20-60 key
etc. With reference to FIG. 6, software (e.g., a smartphone points are sufficient. Of course, more or less key points could
App) is enabled on the portable device. (One example of the be used with varying degrees of Success.
software may include a modified version of Digimarc's Multiple pose Homographies can be constructed, e.g.,
Digimarc Discover application. From Digimarc's website: 10 between I1 and I2, I2 and I3, I3 and I4, and so on. Given at
“Digimarc Discover uses multiple content identification least four (4) views (e.g., frames) of the Subject Surface, and
technologies—digital watermarking, audio fingerprinting corresponding pose Homographies between the frames, a
and QR code and barcode detection—to give Smartphones cost function can be utilized to find pose information that
the ability to see, hear and engage with all forms of media. best fits a current frame. I prefer to use between 4-10
Consumers simply launch the Digimarc Discover app and 15 homographies with a cost function; however, additional
point their phone at the content of interest—an ad, article, homographies may be used as well. The techniques (includ
package, retail sign, etc.—and are instantly connected to a ing the cost function in section 2.2.1) described in Pirch
menu of optional experiences such as learn more, view a heim, et al., “Homography-Based Planar Mapping and
Video, launch an app, map directions, share via Social media, Tracking for Mobile Phones.” IEEE International Sympo
save for later or make a purchase.) sium on Mixed and Augmented Reality, 2011, could be used
Image data, e.g., video frames captured by the device's to find such pose information. The Pirchheim paper is
Video camera is gathered and provided to a pose detector or hereby incorporated herein by reference in its entirety. The
detection process to determine pose of the camera relative to Homography that minimizes the cost function can be used to
a depicted Subject Surface. Captured imagery can be modi provide pose information.
fied to remove any distortion, e.g., Scale, perspective, trans 25 Pirchheim’s Section 2.2.1 states:
lation, rotation. The modified imagery is analyzed for hidden “2.2.1 Cost Function and Parameterization
digital watermarking. Once detected, the digital watermark In the following we describe the mathematical formula
ing can serve as a backbone for an augmented reality (AR) tion of the optimization scheme given in A. Ruiz, P. E. L.
experience. For example, the watermarking may include a de Teruel, and L. Fernandez. Practical planar metric recti
link to obtain video. The video can be overlaid in a device 30 fication. In Proc. BMVC 2006, 2006 for completeness. We
display area. In some cases, the video can be overlaid in define the scene plane to be located in the canonical position
image display area spatially corresponding to the subject Z=0 corresponding to the (x:y) plane. Thus, points on the
surfaces that includes digital watermarking (FIG. 7). plane have a Z-coordinate equal Zero and can be written as
Updated pose information can be provided to ensure that the (X; y; 0, 1) in homogeneous coordinates.
overlaid graphics or video continue to be positioned where 35 The unknowns in the optimization are the camera poses Pi
intended, e.g., the video can continue to be played in the relative to this plane. Under the assumption that all world
intended spatial area, even as the camera moves relative to points are located on the plane, camera poses can easily be
the object's surface. re-formulated as 2D homographies by eliminating the third
Positioning and tracking of overlay graphics and video column of the pose matrix Pi:
can be enhanced e.g., by tracking and mapping image frames 40
or features with the image frames. For example, a keyframe
based SLAM system as discussed in Klein, et al., “Parallel ii. X
(1)
Tracking and Mapping on a camera phone. Mixed and
Augmented Reality, ISMAR 2009, 8th IEEE International
Symposium on 19-22 Oct. 2009, which is hereby incorpo
rated by reference in its entirety, could be used. Other
45 1 ten solar 1