United States Patent: (10) Patent No.: (45) Date of Patent

USOO9684941B2
(12) United States Patent (10) Patent No.: US 9,684.941 B2

Filer (45) Date of Patent: Jun. 20, 2017
(54) DETERMINING POSE FOR USE WITH (56) References Cited
DIGITAL WATERMARKING,
FNGERPRINTING AND AUGMENTED U.S. PATENT DOCUMENTS
REALITY
6,122.403 A 9, 2000 Rhoads
6,590,996 B1 7/2003 Reed et al.
(71) Applicant: Digimarc Corporation, Beaverton, OR 6,614,914 B1 9, 2003 Rhoads et al.
(US) 6,694,041 B1 2/2004 Brunk
6,700,995 B2 3/2004 Reed
(72) Inventor: Tomas Filler, Tigard, OR (US) 6,711,293 B1 3/2004 Lowe
6,760,464 B2 7/2004 Brunk
(73) Assignee: Digimarc Corporation, Beaverton, OR (Continued)
(US)
OTHER PUBLICATIONS
(*) Notice: Subject to any disclaimer, the term of this
patent is extended or adjusted under 35 Pirchheim et al., “Homography-Based Planar Mapping and Track
U.S.C. 154(b) by 438 days. ing for Mobile Phones', IEEE International Symposium on Mixed
and Augmented Reality 2011, Oct. 26-29, 2001, pp. 27-36.*
(21) Appl. No.: 13/789,126 (Continued)
(22) Filed: Mar. 7, 2013
Primary Examiner — Matthew Bella
(65) Prior Publication Data
Assistant Examiner — Jose M Torres
US 2014/O119593 A1 May 1, 2014 (74) Attorney, Agent, or Firm — Digimarc Corporation
Related U.S. Application Data
(57) ABSTRACT
(60) Provisional application No. 61/719,920, filed on Oct.
29, 2012. Image recognition and augmented reality experiences utilize
auxiliary data extracted from an image or video, or image
(51) Int. C.
fingerprints, or a combination of both. One claim recites a
G06K 9/00 (2006.01) method comprising: receiving a plurality of imagery frames
G06T I/O (2006.01) captured by a device sensor; identifying a plurality of key
G06T 7/73 (2017.01) points in each of the plurality of imagery frames; resolving
G06T 7/246 (2017.01) image pose by utilizing relative relationships of the key
(52) U.S. C. points between different frames of the plurality of frames:
CPC ............ G06T 1/0064 (2013.01); G06T 7/246 modifying imagery of at least one of the plurality of frames
(2017.01); G06T 7/73 (2017.01); G06T based on resolved image pose; and deriving a fingerprint
2201/0065 (2013.01); G06T 2207/10016 from modified imagery. Of course other claims and combi
(2013.01); G06T 2207/30244 (2013.01) nations are provided as well.
(58) Field of Classification Search
None
See application file for complete search history. 17 Claims, 5 Drawing Sheets
time ----->
Turn on App Gathering image Augmented Reality
Watermark Read (AR)
(point Camera at frames from
subject surface) Carea
Updated Pose for

Pose Detection - . . . . . . . Graphics
Watermark
image Modification:mammammammamx Detection
US 9,684.941 B2
Page 2
(56) References Cited Genc, Marker-less Tracking for AR: A Learning-Based Approach,
Proc. 1st IEEE/ACM Int. Symp. on Mixed and Augmented Reality,
U.S. PATENT DOCUMENTS Aug. 2002, pp. 295-304.
Shi et al. (Jun. 1994) “Good features to track.” Proc. 1994 IEEE
7,116,781 B2 10/2006 Rhoads Comp. Sci. Soc. Conf. on Computer Vision and Pattern Recognition
7.289,643 B2 10/2007 Brunk et al. (CVPR 1994), pp. 593-600.
7,359,526 B2 4/2008 Nister
7,443,537 B2 10/2008 Reed Skrypnyk, Iryna et al. “Scene Modelling. Recognition and Tracking
7,616,807 B2 11/2009 Zhang et al. with Invariant Image Features,” 2004, Proceedings of the Third
7,715,446 B2 5, 2010 Rhoads et al. IEEE and ACM International Symposium on Mixed and Augmented
7,761,326 B2 7/2010 Miyaoku et al. Reality, 10 pages.
8, 107,721 B2 1/2012 Beardsley et al. U.S. Appl. No. 61/719,920, filed Oct. 29, 2012.
8,243,980 B2 8, 2012 Rhoads et al. U.S. Appl. No. 61/749,767, filed Jan. 7, 2013.
8,565,815 B2 10/2013 Rhoads et al. Wang, S.; Wu, A.Y.; Rosenfeld, A., in IEEE Transactions on Pattern
8,762,852 B2 6, 2014 Davis et al.
8,831,279 B2 9/2014 Rodriguez et al. Analysis and Machine Intelligence, vol. PAMI-3, Nov. 1981, p.
8,855,712 B2 10/2014 Lord et al. 687-696.
8,886.222 B1 11/2014 Rodriguez et al. David Lowe. “Distinctive Image Features from Scale-Invariant
9,008,353 B2 4/2015 Aller Keypoints.”.
9,269,022 B2 2/2016 Rhoads David Lowe, "Object Recognition from Local Scale-Invariant Fea
9,398,210 B2 7, 2016 Stach et al.
2001/004O979 A1 1 1/2001 Davidson et al. tures.” International Conference on Computer Vision, Corfu,
2003, OO12410 A1 1/2003 Nawab et al. Greece (Sep. 1999), pp. 1150-1157.
2004/O1285.12 A1* 7/2004 Sharma .............. G06Q 20/3823 Bonato etal, "Parallel Hardware Architecture for Scale and Rotation
713, 176 Invariant Feature Detection.” IEEE Trans on Circuits and Systems
2006, OO31684 A1 2/2006 Sharma et al. for Video Tech, vol. 18, No. 12, 2008.
2006/028O246 A1 12/2006 Alattar et al. Seet al. “Vision Based Modeling and Localization for Planetary
2008/0174570 A1 7/2008 JobS et al. Exploration Rovers.” Proc. of Int. Astronautical Congress (IAC),
2008/0252727 A1 10/2008 Brown et al. Oct. 2004.
2010, O2327 27 A1 9/2010 Engedal Mikolajczyk et al., “Performance Evaluation of Local Descriptors.”
2010/0322469 A1 12/2010 Sharma
2012/0154633 A1* 6/2012 Rodriguez ............... 348,231.99 IEEE Trans. Pattern Anal. Mach. Intell., vol. 27. No. 10, pp.
2012/0210233 A1* 8, 2012 Davis ................. G06Q 30/0201 1615-1630, 2005.
71.5/727 Bay et al. SURF: Speeded Up Robust Features, Eur, Conf. on
2012,030.0979 A1* 11, 2012 Pirchheim ............. GO6T 7,2033 Computer Vision (1), pp. 404-417, 2006.
382,103 Chen et al. “Efficient Extraction of Robust Image Features on
2014/002885.0 A1* 1/2014 Keating ................ GO6T 19,006 Mobile Devices.” Proc. of the 6th IEEE and ACM Int. Symp. on
348,158 Mixed and Augmented Reality, 2007.
2014/0037137 A1 2/2014 Broaddus ................ GO6T 7,004 Takacs et al. “Outdoors Augmented Reality on Mobile Phone Using
382,103 Loxel-Based Visual Feature Organization.” ACM Int. Conf. on
2014/0044304 A1 2/2014 Rhoads Multimedia Information Retrieval, Oct. 2008.
2014f0071268 A1 3/2014 Lord et al. Klein, et al., “Parallel Tracking and Mapping on a camera phone.”
2014/O112524 A1 4/2014 Bai et al. Mixed and Augmented Reality, ISMAR 2009, 8th IEEE Interna
2014/037581.0 A1 12/2014 Rodriguez tional Symposium on Oct. 19-22, 2009.
2016,0189381 A1 6, 2016 Rhoads Benhimane et al., “Homography-based 2d visual tracking and servo
ing.” The International Journal of Robotics Research, vol. 26, No.
OTHER PUBLICATIONS 7, pp. 661-676, Jul. 2007.
Ruiz, P. E. L. de Teruel, and L. Fernandez. Practical planar metric
AZuma, “A Survey of Augmented Reality.” Presence: Teleoperators rectification. In Proc. BMVC 2006.
and Virtual Environments 6, 4, Aug. 1997, pp. 355-385. Pirchheim, et al., “Homography-Based Planar Mapping and Track
Kato et al., “Marker Tracking and HMD Calibration for a Video ing for Mobile Phones.” IEEE International Symposium on Mixed
Based Augmented Reality Conferencing System.” Proc. Of IWAR and Augmented Reality, 2011.
99, San Francisco, CA, Oct. 20-21, 1999, pp. 85-94. Koz, “Watermarking for 3D Representations.” A Thesis Submitted
Lepetit et al. (Oct. 2003) “Fully Automated and Stable Registration to the Graduate School of Natural and Applied Sciences of Middle
for Augmented Reality Applications.” Proc. 2. Sup,nd IEEE/ACM East Technical University, Aug. 2007.
Int'l Symposium on Mixed and Augmented Reality, pp. 93-102. Paucher et al., “Location-Based Augmented Reality on Mobile
Reitmayr, "Going Out: Robust Model-based Tracking for Outdoor Phones.” IEEE Conf. on Computer Vision and Pattern Recognition,
Augmented Reality.” Proc. 5. Sup,th IEEE/ACM Int. Symp. on Jun. 13, 2010, pp. 9-16.
Mixed and Augmented Reality, 2006, pp. 109-118. Rohs, “Using Camera-Equipped Mobile Phones for Interacting with
Rekimoto, Matrix: A Realtime Object Identification and Registra Real-World Objects.” Advances in Pervasive Computing, 2004.
tion Method for Augmented Reality, Proc. of Asia Pacific Computer
Human Interaction, pp. 63-68, Jul. 1998. * cited by examiner
U.S. Patent Jun. 20, 2017 Sheet 1 of 5 US 9,684.941 B2
ingerprin
input Calculatoff Database Database
Signals Waternark Entry Organization
Embedder
100 O2 104. O6
identification
Network
Fig. 1
ingerprint
Captured Calculatorf Database
Receiver Waternark Search
Extracter
110 12 A
Output
device 116
Metadata
i............................................ Databases
Fig. 2
CEL PHONE
Processor
Display MEMORY:
Op. SyS.
U
TouchSCreen SW Modules
EtC,
Physical U.
40
Fig. 3
Video Capture Pose FG 4.
Subject surface (e.g., magazine,

newspaper, object, etc.)
Capture image
Frames
identify image
Key Points FG 5
Resolve image
Pose by reference
to the Kev Points
Modify imagery Watermark

Detector
F.G. 6
time --~~~-8-
turn on App Gathering image Watermark Read Augmented Reality

(point camera at frames from (AR)
subject surface) Carea
Pose Detection ... ... -- a-- a--ax a-a-as saw war wax ex Lar as *** * -> Updated Pose for
Graphics
'--------------------------------------- '------------------ “
: t :
image Modification --> Watermark is a ~ as a r - x
Detection
Video overlay area

corresponding to
Watermarked
(WM) Area
U.S. Patent Jun. 20, 2017 Sheet S of 5 US 9,684.941 B2
FG. 8
US 9,684.941 B2
1. 2
DETERMINING POSE FOR USE WITH For some applications, it is possible to identify an image
DIGITAL WATERMARKING, using a one-to-many pattern matching scheme. Images to be
FINGERPRINTING AND AUGMENTED uniquely identified are enrolled in a reference database,
REALITY along with metadata. In image fingerprinting schemes,
image features are stored in the reference database. Then, to
APPLICATION FIELD recognize an image, Suspect images, or its features, are
matched with corresponding images or features in the ref
This application claims the benefit of U.S. Provisional erence database. Once matched, the reference database can
Patent Application No. 61/719,920, filed Oct. 29, 2012. This provide associated digital data stored with the image.
application is also related to U.S. Provisional Patent Appli 10 Data carrying signals and matching schemes may be used
cation No. 61/749,767, filed Jan. 7, 2013. Each of the above together to leverage the advantages of both. In particular, for
patent documents is hereby incorporated herein by reference applications where maintaining the aesthetic value or the
in its entirety. information content of the image is important, a combina
tion of digital watermarking and image fingerprinting can be
TECHNICAL FIELD 15 used.
Combinations of watermarks and fingerprints for content
This disclosure relates to digital signal processing, image identification and related applications are described in
rendering (including Raster Image Processing for print), assignees U.S. Patent Publications 2006003.1684 and
image recognition, data signal detection, and computer 20100322469, which are each hereby incorporated by ref
generated graphics in conjunction with live image capture erence in its entirety. Watermarking, fingerprinting and
and recognition (e.g., Augmented Reality). content recognition technologies are also described in
assignee's U.S. Patent Publication 20060280246 and U.S.
BACKGROUND AND SUMMARY Pat. Nos. 6,122,403, 7,289,643, 6,614,914, and 6,590,996
which are each hereby incorporated by reference in its
There are a variety of ways to encode machine readable 25 entirety.
information on objects, and in particular, on printed objects. In many applications, it is advantageous to insert auxiliary
Conventional visible data carriers for printed media include data in printed object in a way that does not impact the other
various forms of bar codes, including monochrome (e.g., visual information on the object, yet still enables the data to
black and white) 1D and 2D bar codes, as well as newer be reliably retrieved from an image captured of the object.
higher density codes that use additional colors to carry data. 30 To achieve this, a technique to exploit the gap between the
One example of higher density bar codes are data glyphs, limit of human visual perception and the limit of an image
which are marks (e.g., forward and back slash marks) sensor has been developed. The gamut of human visual
printed at higher resolution. When viewed from a distance, perception and the gamut of an image sensor are defined in
glyph codes can appear as a uniform tone, and as Such, can terms of characteristics of the rendered output, including
be printed in the background around other visual informa 35 spatial resolution or spatial frequency and color. Each gamut
tion. is a multi-dimensional space expressed in terms of these
In these types of data carriers, the elementary units (bars characteristics. The gap between the gamut of human and
or data glyph marks) are independent of other visual infor sensor perception is a multidimensional space that our data
mation and convey auxiliary data. A mark or arrangement of insertion schemes exploit to insert auxiliary data without
marks is a pattern that corresponds to an auxiliary data 40 impacting other visual information on the object.
symbol. To read the data from a printed object, the object is This multi-dimensional gap is a 5-dimensional space (2
first optically scanned with an image sensor, converting light spatial+3 color) or higher (spatial/color shapes, frequencies,
to an electronic signal. The electronic signal is then analyzed distributions) where our methods insert:
to detect the elements of the mark and convert them to data. (1) uniform texture watermarks (independent of con
Digital watermarking is a machine readable code in which 45 tent but controlled for visibility), and
the data is hidden within an image leveraging human vis (2) content-based watermarks where the content is used as
ibility models to minimize visual impact on the image. For a reference framework. As a reference, the content is either
certain types of applications where image information is altered in a measurable but imperceptible way or used (e.g.,
sparse, the auxiliary data signal can still be applied to the edges) to locate and orient an underlying variation that is
printed object with minimal visual impact by inserting 50 intended to keep the content unchanged.
imperceptible structures having spatial frequency and color Digital printing is becoming increasingly more advanced,
beyond the range of human visual perception. The auxiliary enabling greater flexibility and control over the image
data signal can be conveyed by printing ink structures or characteristics used for data insertion when preparing an
modifying existing structures with changes that are too small image for printing. The process of preparing a digital image
to see or use colors that are difficult to discern. As such, 55 for printing encompasses conversion of an image by a Raster
digital watermarking techniques provide the flexibility of Image Processor, Raster Image Processing, halftoning, and
hiding data within image content, as well as inserting data in other pre-print image processing. Background on these
parts of a printed object where there is little or no other processes is provided below.
visual information. Along with advances in printing, the gamut of even
These types of visible and hidden data carriers are useful 60 widely used image sensors is becoming greater. For hidden
for applications where there is a need to convey variable data insertion, the challenge is to insert the data in the
digital data in the printed object. Hidden data carriers human-sensor perception gap so that it can be widely
increase the capacity of printed media to convey visual and detected across many consumer devices. Of course, for
machine readable information in the same area. Even printed certain security applications, more expensive printers and
objects, or portions of objects (such as logos, pictures or 65 image scanners can be designed to insert security features
graphics on a document or package) that appear identical are and expand the gamut of the Scanning equipment used to
transformed into variable data carriers. detect such features. This is useful to detect security features
US 9,684.941 B2
3 4
and/or tampering with Such features. However, the human between these two types of patterns. One particular example
device perception gap is Smaller for more widely deployed is an image that appears as a uniform texture, yet a water
sensors, such as those commonly used in mobile devices like mark pattern is inserted into it by modulating the line Screen
smart phones and tablet PCs. frequency and direction according to a watermark signal
Our data insertion methods exploit the gap more effec pattern. In particular, the locations of a watermark pattern
tively through data insertion in the process of preparing a are printed using a higher frequency line pattern at first
digital image for printing. Additional control over the pro direction (e.g., vertical Screen angle). The other locations are
cess of inserting auxiliary data is achieved by implementing printed with a lower frequency line pattern in another
the process in the Raster Image Processor (RIP). direction (e.g., diagonal screen angle). The watermark signal
A raster image processor (RIP) is a component used in a 10 is modulated into the image by selection of a higher fre
printing system which produces a raster image also known quency Screen at an arrangement of spatial locations that
as a bitmap. The bitmap is then sent to a printing device for form the watermark signal pattern. When printed, these
output. The input may be a page description in a high-level locations look similar to Surrounding locations. However,
page description language such as PostScript, Portable when Scanned, the sensor sees these locations as being
Document Format, XPS or another bitmap of higher or 15 different, and the watermark pattern in the resulting elec
lower resolution than the output device. In the latter case, the tronic image is easier to detect.
RIP applies either Smoothing or interpolation algorithms to This approach allows a whole set of new messaging
the input bitmap to generate the output bitmap. techniques to be used in the range between 150 lpi and 600
Raster image processing is the process and the means of lpi where 2 spatial dimensions and 3 dimensions of color
turning vector digital information Such as a PostScript file information can be inserted. This information can be water
into a high-resolution raster image. A RIP can be imple mark, barcode or any other signaling mechanism.
mented either as a software component of an operating
system or as a firmware program executed on a micropro BRIEF DESCRIPTION OF THE DRAWINGS
cessor inside a printer, though for high-end typesetting,
standalone hardware RIPs are sometimes used. Ghostscript 25 FIG. 1 is a block diagram illustrating the creation of a
and GhostPCL are examples of software RIPs. Every Post content recognition system using fingerprints and water
Script printer contains a RIP in its firmware. marks.
Half-toning is a process of converting an input image into FIG. 2 is a block diagram illustrating the content identi
halftone structures used to apply ink to a medium. The fication process.
digital representation of a halftone image is sometimes 30 FIG. 3 is a diagram of a cell phone, which may be used
referred to as a binary image or bitmap, as each elementary in Some content recognition systems.
image unit or pixel in the image corresponds to the presence, FIG. 4 is a diagram showing image capture of a subject
or not, of ink. Of course, there are more variables that can Surface.
be controlled at particular spatial location, Such as various FIG. 5 is a block diagram of resolving pose information
color components (CMYK and spot colors). Some advanced 35 from captured imagery.
printers can control other attributes of the ink placement, FIG. 6 is a timeline associated with resolving pose infor
Such as its density or spatial depth or height. mation to aid digital watermark detection.
This half-toning process is typically considered to be part FIG. 7 is a diagram of an Augmented Reality system
of the RIP or Raster Image Processing. In some printing providing a video overlay in a device display that corre
technologies, these halftone structures take the form of 40 sponds to a watermarked area on a Subject Surface.
clustered dots (clustered dot half-toning). In others, the FIG. 8 shows a Subject area including a watermarked area
halftone structures take the form of noise-like dot patterns having different watermarked areas.
(e.g., stochastic screens, blue noise masks, etc.).
Our patent literature provides several techniques for digi DETAILED DESCRIPTION
tal watermarking in the halftone process. Examples of these 45
techniques are detailed in U.S. Pat. Nos. 6,694,041 and The process of digital watermark insertion includes gen
6,760.464, which are each hereby incorporated herein by erating a watermark signal, and then using that signal to
reference in its entirety. modulate characteristics of the image in the human-sensor
New printing techniques enable very fine structures to be gap. As described above, this process is preferably con
created in the RIP which will appear visually identical to the 50 ducted at the RIP stage to enable control over the image
eye. For example a 50% gray can be created with a con representation used to control application of ink to a print
ventional clustered dot screen pattern at 150 lines per inch, media.
or exactly the same visual effect can be created with a much In prior work, several methods for generating the water
higher frequency line structure Such as a stochastic screen. mark signal, and for detecting the watermark signal in
Usually, these two structures are not mixed on one page, as 55 images captured of printed objects, are detailed. Please see
they have very different dot gain characteristics and require U.S. Pat. Nos. 6,614,914 and 6,590,996, which are incor
different corrections. However, our methods are able to porated by reference. Therefore, for this discussion, the
correct for the mechanical dot gain, so that the two patterns focus is on techniques used within the RIP to insert the
appear identical when they appear on the same page. See, in watermark signal.
particular, our prior work in dot gain correction, printer 60 In one implementation, the watermark signal is generated
calibration, and compensating for printer and scanner as an array of watermark signal elements. These elements
effects, in U.S. Pat. Nos. 6,700,995, 7,443,537, and U.S. are mapped to spatial locations within an image block, called
Patent Publication 20010040979, which are each hereby a tile. This tile is then replicated (e.g., tiled in a regular,
incorporated herein by reference in its entirety. contiguous array of blocks in two dimensions) across the
Mobile devices have a capture resolution of much greater 65 area of the host image in which the watermark signal is to
than 150 lpi (resolution of newer phones, such as iPhone 4 be inserted. At a spatial location where there is image
is about 600 lpi or better), so they can be used to distinguish content in the host image, the watermark signal element is
US 9,684.941 B2
5 6
used to modify the host image content at that location to the edges and then detecting the watermark signal relative to
carry the watermark signal element, Subject to constraints set these edges (e.g., by correlating the image signal with the
for perceptual masking. These constraints enable the water regular pattern of the data signal).
mark signal to be increased or decreased (possibly to Zero), d) Use the edges of the content to define a signal perpen
depending on perceptual masking, and desired watermark dicular to the edges in an imperceptible set of dimensions
signal strength. Conversely, where there is no image content, (color & spatial frequency). As in the previous example, the
the watermark signal element is either not applied, or it can edges provide a reference orientation and location of the
be asserted as a texture, using colors and spatial resolution watermark signal.
that make it difficult to discern. As such, for every location e) Use higher dimensional shapes/patterns of color and
in the watermark signal mapping, there is an opportunity for 10 spatial variations where pixels separated spatially may still
watermark modulation. be close in either spatial or color patterns. This reduces
As noted in more examples below, the watermark signal sensitivity to geometric distortions.
need not be mapped to a uniform array of blocks. One In some embodiments, those higher frequency spatial/
alternative is use feature points in the image to form a spatial color variations are designed to take advantage of lower
reference for insertion of a data signal. 15 resolution devices to generate shifts in image characteristics
The watermark signal can be comprised of a single data that can be measured. The data signal elements are inserted
component, or more than one component, as detailed in U.S. to exploit the Bayer pattern of RGB sensors to enhance a
Pat. No. 6,614.914. One component is a direct sequence desired data signal that would otherwise be imperceptible.
spread spectrum modulated data signal. This component is These signal elements are designed to induce distortion
generated by applying error correction coding (convolu (e.g., aliasing, or a color shift) in the image captured through
tional coding) to a data signal, which produces an error the sensor of the printed object. This distortion at the data
correction coded data signal. This signal is than modulated signal locations enhances the pattern because the shift in
onto pseudorandom carrier signals to produce a spread signal characteristics at these locations increases the data
spectrum modulated signal, which is then mapped to loca signal at these locations relative to Surrounding image
tions in a tile. This is one example of watermark signal 25 content and noise. For example, aliasing caused by capturing
generation, and there are many others. a high frequency screen region with a lower frequency
Above, a method of inserting a watermark signal by sensor creates a detectable data signal element at that region.
varying the print structure within the RIP to modulate the A similar effect can also be achieved by modulating ink
watermark into an image is illustrated. A specific example is height using a printer that is capable of controlling the height
given for varying the density and direction or angle of print 30 of ink deposited at a particular location. These printers
primitives (e.g., line structures or dots) used to print a enable control over height of ink by building up ink at a
particular color in an image having a uniform tone. A data particular print location. This is useful for authentication or
signal pattern may also be introduced by varying the half copy protection applications.
tone screen type for different regions of an image. Print The height of the structure can be used to carry informa
structures can vary among a set of screening types, including 35 tion by viewing at an angle with a device Such as a fixed
noise like (e.g., stochastic screens) and structured (clustered focus (or Lytro) camera.
dot, line screens, etc.). This approach is not limited to The height variations can also be designed to cause color
watermark modulation of images with uniform tones, as it changes that are used to carry information. When the print
applies to inserting watermarks into various types of image is viewed normally, these height variations would be imper
COntent. 40 ceptible if the pigment is opaque. This information can be
Some embodiment examples—include the following: watermark, barcode or any other signaling mechanism.
a) Choose the angle in colorspace of watermark signal The above methods apply to variety of print primitives,
modulation (e.g., in the ab plane of Lab colorspace) to be and are not limited to particular line screens or clustered dot
different at different regions throughout the image. In one structures. With control over the RIP, the shape, spatial
class of digital watermark embodiments, these regions cor 45 frequency, and orientation of structures can be specifically
respond to watermark signal elements, and an arrangement designed to exploit sensor geometries and Modulation
of the regions forms a spatial watermark signal pattern of Transfer Function (MTF) characteristics to cause discrimi
watermark signal elements. Data may be modulated into the nation between local regions of an image. For example,
pattern using spread spectrum modulation as noted above, or small lines slanted left and right at different spatial frequen
other data modulation schemes. The arrangement, orienta 50 cies. Or solid dots vs. tiny dot clusters which contain the
tion and shape of these regions may be designed to convey same ink density on physical object, but differ in color after
alternative data code signals. Multiple data signals may be acquisition through a class of image sensors (such as those
interleaved for different spatial locations, as well as different sensors widely used in Smartphone cameras). Some regions
directions in color space. may use a form of noise like dot pattern (e.g., stochastic
b) Choose the spatial frequency of watermark signal modu 55 screening), while others use a shape with particular struc
lation to be different at different regions throughout the ture, like a clustered dot or line screen. The dot gain varies
image. Similar data insertion as mentioned for section a) with number of edges (perimeter) of the print structures, so
also applies to this section b). the amount of dot gain correction is also adapted based on
c) Use the edges of the image content to define a signal along the print structure. For example, in the example above where
the edges in an imperceptible set of dimensions (color & 60 Some regions are printed with high frequency line structures
spatial frequency). In this case, the edges detected in the and others with lower frequency, the line widths in the high
image are used as a reference for the watermark signal. frequency structure have to be reduced more than the line
Thus, rather than being arranged in a pre-determined array widths in the lower frequency structure to compensate for
of blocks or regions, the watermark signal is inserted along dot gain.
the direction of the edge. Along this edge, the watermark 65 Another approach that can be implemented within the RIP
signal can have a regular pattern or structure to facilitate is to transform the image into a form for printing so that it
detection. The watermark signal is detected by first finding has carefully controlled noise characteristics. The noise
US 9,684.941 B2
7 8
characteristics can be set globally across an image to indi a scale-localized Laplacian transform of the image. The
cate the presence of a watermark. The noise itself can difference of Gaussians approach is an approximation of
comprise digital data, such as a spread spectrum modulated Such Laplacian operation, expressed in a pyramid setting.)
data signal. Alternatively, the RIP can generate an image The above procedure typically identifies many keypoints
with a pattern of regions that are detectable based on 5 that are unsuitable, e.g., due to having low contrast (thus
distinguishable noise characteristics. The arrangement of being Susceptible to noise), or due to having poorly deter
this pattern can be used as a reference signal to provide the mined locations along an edge (the Difference of Gaussians
location and orientation of a watermark signal inserted in the function has a strong response along edges, yielding many
image. candidate keypoints, but many of these are not robust to
The watermark may also be conveyed using a reversible 10 noise). These unreliable keypoints are screened out by
image transform or detailed image characterization by performing a detailed fit on the candidate keypoints to
manipulating the image through either transform coefficients nearby data for accurate location, Scale, and ratio of prin
or through local noise manipulations in a detectable yet cipal curvatures. This rejects keypoints that have low con
imperceptible way. One form of reversible transform is the trast, or are poorly located along an edge.
grayscale medial axis transform applied separately to the 15 More particularly this process starts by for each candi
color directions. See, in particular, Image approximation date keypoint interpolating nearby data to more accurately
from gray scale medial axes by Wang, S.; Wu, A. Y.: determine keypoint location. This is often done by a Taylor
Rosenfeld, A., in IEEE Transactions on Pattern Analysis and expansion with the keypoint as the origin, to determine a
Machine Intelligence, vol. PAMI-3, November 1981, p. refined estimate of maxima/minima location.
687-696. The value of the second-order Taylor expansion can also
A stochastic modeling approach that allows for detectable be used to identify low contrast keypoints. If the contrast is
manipulations is the Markov Random Field (MRF) model less than a threshold (e.g., 0.03), the keypoint is discarded.
that can be used to define local pixel relationships that To eliminate keypoints having strong edge responses but
convey watermark signal data elements. The MRF manipu that are poorly localized, a variant of a corner detection
lation is particularly interesting because it can be designed 25 procedure is applied. Briefly, this involves computing the
to have particular noise properties that might be exploited at principal curvature across the edge, and comparing to the
the detector. See, How to generate realistic images using principal curvature along the edge. This is done by Solving
gated MRF's Marc'Aurelio Ranzato Volodymyr Minih Geof for eigenvalues of a second order Hessian matrix.
frey E. Hinton, Department of Computer Science, Univer Once unsuitable keypoints are discarded, those that
sity of Toronto 30 remain are assessed for orientation, by a local image gradi
SIFT Description ent function. Magnitude and direction of the gradient are
SIFT is an acronym for Scale-Invariant Feature Trans calculated for every pixel in a neighboring region around a
form, a computer vision technology developed by David keypoint in the Gaussian blurred image (at that keypoints
Lowe and described in various of his papers including scale). An orientation histogram with 36 bins is then com
“Distinctive Image Features from Scale-Invariant Key 35 piled with each bin encompassing ten degrees of orienta
points.” International Journal of Computer Vision, 60, 2 tion. Each pixel in the neighborhood contributes to the
(2004), pp. 91-110; and "Object Recognition from Local histogram, with the contribution weighted by its gradients
Scale-Invariant Features.” International Conference on magnitude and by a Gaussian with O 1.5 times the scale of
Computer Vision, Corfu, Greece (September 1999), pp. the keypoint. The peaks in this histogram define the key
1150-1157, as well as in U.S. Pat. No. 6,711,293. 40 points dominant orientation. This orientation data allows
SIFT works by identification and description—and sub SIFT to achieve rotation robustness, since the keypoint
sequent detection—of local image features. The SIFT fea descriptor can be represented relative to this orientation.
tures are local and based on the appearance of the object at From the foregoing, plural keypoints are different scales
particular interest points, and are invariant to image scale, are identified—each with corresponding orientations. This
rotation and affine transformation. They are also robust to 45 data is invariant to image translation, Scale and rotation. 128
changes in illumination, noise, and some changes in view element descriptors are then generated for each keypoint,
point. In addition to these properties, they are distinctive, allowing robustness to illumination and 3D viewpoint.
relatively easy to extract, allow for correct object identifi This operation is similar to the orientation assessment
cation with low probability of mismatch and are straight procedure just-reviewed. The keypoint descriptor is com
forward to match against a (large) database of local features. 50 puted as a set of orientation histograms on (4x4) pixel
Object description by a set of SIFT features is also robust to neighborhoods. The orientation histograms are relative to
partial occlusion; as few as 3 SIFT features from an object the keypoint orientation and the orientation data comes from
can be enough to compute location and pose. the Gaussian image closest in scale to the keypoints scale.
The technique starts by identifying local image features— As before, the contribution of each pixel is weighted by the
termed keypoints—in a reference image. This is done by 55 gradient magnitude, and by a Gaussian with O 1.5 times the
convolving the image with Gaussian blur filters at different scale of the keypoint. Histograms contain 8 bins each, and
scales (resolutions), and determining differences between each descriptor contains a 4x4 array of 16 histograms around
Successive Gaussian-blurred images. Keypoints are those the keypoint. This leads to a SIFT feature vector with
image features having maxima or minima of the difference (4x4x8=128 elements). This vector is normalized to
of Gaussians occurring at multiple scales. (Each pixel in a 60 enhance invariance to changes in illumination.
difference-of-Gaussian frame is compared to its eight neigh The foregoing procedure is applied to training images to
bors at the same scale, and corresponding pixels in each of compile a reference database. An unknown image is then
the neighboring scales (e.g., nine other scales). If the pixel processed as above to generate keypoint data, and the
value is a maximum or minimum from all these pixels, it is closest-matching image in the database is identified by a
selected as a candidate keypoint. 65 Euclidian distance-like measure. (A “best-bin-first algo
(It will be recognized that the just-described procedure is rithm is typically used instead of a pure Euclidean distance
a blob-detection method that detects space-scale extrema of calculation, to achieve several orders of magnitude speed
US 9,684.941 B2
10
improvement.) To avoid false positives, a “no match’ output One form of robust hash for video is a waveform con
is produced if the distance score for the best match is structed from statistics of each frame in the video as shown
close—e.g., 25%—to the distance score for the next-best in FIGS. 160 and 162 in FIG. 6 of the 684 publication.
match. These statistics can be representing compactly as a vector of
To further improve performance, an image may be the changes in the statistic from frame to frame in a video
matched by clustering. This identifies features that belong to sequence, such as a Group of Pictures (GOP) in a video
the same reference image—allowing unclustered results to coding format like MPEG. Examples of the statistics that
be discarded as spurious. A Hough transform can be used— can be used for the fingerprint include the frame average for
identifying clusters of features that vote for the same object 10
luminance and the variance. For compressed streams, fin
pose. gerprints can be extracted from the compressed data, Such as
An article detailing a particular hardware embodiment for the DCT coefficients in the I-frames, the motion vectors, etc.
performing the SIFT procedure, suitable for implementation FIG. 2 is a block diagram illustrating the content identi
in a next generation cell phone, is Bonato et al., “Parallel fication process. Incoming signals 109 are captured in a
Hardware Architecture for Scale and Rotation Invariant 15 receiver 110. This includes still or video image capture in
Feature Detection.” IEEE Trans on Circuits and Systems for which images are captured and digitized with an image
Video Tech, Vol. 18, No. 12, 2008. sensor like a camera or other image capture device, as well
An alternative hardware architecture for executing SIFT as ambient audio capture by microphone. It also includes
techniques is detailed in Se et al., “Vision Based Modeling receipt of audio, image or video content in a broadcast or
and Localization for Planetary Exploration Rovers.” Proc. of transmission channel, including broadcast stream or file
Int. Astronautical Congress (IAC), October, 2004. transfer. The recognition process may be invoked as part of
While SIFT is perhaps the most well-known technique for a systematic Internet monitoring or broadcast monitoring of
generating robust local descriptors, there are others, which content signals, in home audience measurement, batch data
may be more or less Suitable—depending on the application. base searching and content indexing, or user requests for
These include GLOH (c.f., Mikolajczyk et al., “Performance 25 content recognition and metadata searching. The fingerprint
Evaluation of Local Descriptors.” IEEE Trans. Pattern Anal. calculator/watermark extracter 112 computes fingerprints
Mach. Intell. Vol. 27, No. 10, pp. 1615-1630, 2005) and and/or watermarks for incoming content items and issues
SURF (c.f., Bay et al. SURF: Speeded Up Robust Features.” them to a database for database search for matching finger
Eur. Conf. on ComputerVision (1), pp. 404-417, 2006; Chen prints and data look up for watermark based identifiers 114.
et al., “Efficient Extraction of Robust Image Features on 30 The fingerprint matches found in the search process and
Mobile Devices.” Proc. of the 6th IEEE and ACM Int. Symp. watermark identifiers provide content identification (a num
On Mixed and Augmented Reality, 2007; and Takacs et al. ber or some other form of index for metadata lookup), which
“Outdoors Augmented Reality on Mobile Phone Using in turn, enables look up of metadata corresponding to the
Loxel-Based Visual Feature Organization.” ACM Int. Conf. content identification in one or more metadata databases.
on Multimedia Information Retrieval, October 2008). 35 The metadata is then returned to device 116 for display/
Watermarking and Fingerprinting System Configurations output or further processing. This may involve returning
FIG. 1 is a block diagram illustrating the creation of a metadata to a device that requested the database search or
content recognition system using fingerprints and water some other device to which the search results are directed
marks. The digitized input image/video/audio signals 100 (e.g., a user's home device, or a monitoring system's data
are input to the fingerprint calculator/watermark embedder 40 collection database in which the metadata and recognition
102, which computes multiple fingerprints for each content events are aggregated and compiled for electronic report
item to be uniquely recognized, and also watermarks the generation).
content item. In a database entry process 102, the finger AR Exploitation
prints are entered and stored in a database, along with Sometimes watermark detection needs properly aligned
additional information, such as metadata for the content 45 image data to establish a proper registration for reliable
item, a digital master copy for use as needed (see Patent payload recovery. Suitable image alignment is difficult to
Application Publication 20100322469 for description of achieve in many mobile environments. For example, and
techniques involving use of original content in watermark with reference to FIG. 4, a Smartphone captures imagery of
detection and determining location within content). A data a subject Surface (e.g., a magazine, newspaper, object, etc.).
base organization process 106 in a database system sorts and 50 The pose relative to the Smartphone's video camera and the
arranges the fingerprints in a data structure. Such as a tree Subject Surface (sometimes referred to as "image pose”)
structure to enable fast searching and matching. This data changes as a user positions the phone to capture video. In
base itself may be distributed over an array of computers in this context, pose can include perspective angle, Scale,
an identification network (108). This network receives que rotation and translation.
ries to identify or recognize content items based on a stream 55 I have developed methods and systems to accurately
of fingerprints and/or watermarks from a requesting device, estimate geometry capture distortion and modify imagery
Such as a users handheld mobile device or other computing prior to watermark detection. This can be used in connection
device (node in a network of monitoring devices). with augmented reality overlays to provide rich user expe
US Patent Publication No. 2006028.0246 includes the riences. But it all starts with determining the correct relative
following description: 60 pose.
Fingerprinting is a method of identifying multimedia As an initial overview, and with reference to FIG. 5,
content by deriving a number or set of numbers that uniquely captured image frames are analyzed to identify key points.
identify that content. The fingerprint may be fragile, such as These key points can be tracked over time to resolve relative
a secure hash (e.g., SHA, MD5, etc.) or robust. In the case image geometry including pose. The captured imagery can
of a robust fingerprint, the fingerprint is expected to remain 65 be modified according to the resolved geometry to remove
relatively the same despite processing distortion due to any distortion introduced by relative camera positioning
broadcasting, compression, geometrical distortion, etc. including, e.g., removing rotation, perspective angle, Scale,
US 9,684.941 B2
11 12
etc. The watermark detector can analyze the modified, al, "Homography-based 2d visual tracking and servoing.”
captured imagery in search of a previously hidden digital The International Journal of Robotics Research, Vol. 26, No.
watermark. 7, pages 661-676, July 2007, could be used to represent a
Our methods can be implemented by many suitable transform between key points in different image frames. The
electronic devices. One example is a portable device includ Benhimane paper is hereby incorporated herein by reference
ing a video camera, e.g., Such as a Smartphone, tablet, pad, in its entirety. In noisy imagery, we've found that 20-60 key
etc. With reference to FIG. 6, software (e.g., a smartphone points are sufficient. Of course, more or less key points could
App) is enabled on the portable device. (One example of the be used with varying degrees of Success.
software may include a modified version of Digimarc's Multiple pose Homographies can be constructed, e.g.,
Digimarc Discover application. From Digimarc's website: 10 between I1 and I2, I2 and I3, I3 and I4, and so on. Given at
“Digimarc Discover uses multiple content identification least four (4) views (e.g., frames) of the Subject Surface, and
technologies—digital watermarking, audio fingerprinting corresponding pose Homographies between the frames, a
and QR code and barcode detection—to give Smartphones cost function can be utilized to find pose information that
the ability to see, hear and engage with all forms of media. best fits a current frame. I prefer to use between 4-10
Consumers simply launch the Digimarc Discover app and 15 homographies with a cost function; however, additional
point their phone at the content of interest—an ad, article, homographies may be used as well. The techniques (includ
package, retail sign, etc.—and are instantly connected to a ing the cost function in section 2.2.1) described in Pirch
menu of optional experiences such as learn more, view a heim, et al., “Homography-Based Planar Mapping and
Video, launch an app, map directions, share via Social media, Tracking for Mobile Phones.” IEEE International Sympo
save for later or make a purchase.) sium on Mixed and Augmented Reality, 2011, could be used
Image data, e.g., video frames captured by the device's to find such pose information. The Pirchheim paper is
Video camera is gathered and provided to a pose detector or hereby incorporated herein by reference in its entirety. The
detection process to determine pose of the camera relative to Homography that minimizes the cost function can be used to
a depicted Subject Surface. Captured imagery can be modi provide pose information.
fied to remove any distortion, e.g., Scale, perspective, trans 25 Pirchheim’s Section 2.2.1 states:
lation, rotation. The modified imagery is analyzed for hidden “2.2.1 Cost Function and Parameterization
digital watermarking. Once detected, the digital watermark In the following we describe the mathematical formula
ing can serve as a backbone for an augmented reality (AR) tion of the optimization scheme given in A. Ruiz, P. E. L.
experience. For example, the watermarking may include a de Teruel, and L. Fernandez. Practical planar metric recti
link to obtain video. The video can be overlaid in a device 30 fication. In Proc. BMVC 2006, 2006 for completeness. We
display area. In some cases, the video can be overlaid in define the scene plane to be located in the canonical position
image display area spatially corresponding to the subject Z=0 corresponding to the (x:y) plane. Thus, points on the
surfaces that includes digital watermarking (FIG. 7). plane have a Z-coordinate equal Zero and can be written as
Updated pose information can be provided to ensure that the (X; y; 0, 1) in homogeneous coordinates.
overlaid graphics or video continue to be positioned where 35 The unknowns in the optimization are the camera poses Pi
intended, e.g., the video can continue to be played in the relative to this plane. Under the assumption that all world
intended spatial area, even as the camera moves relative to points are located on the plane, camera poses can easily be
the object's surface. re-formulated as 2D homographies by eliminating the third
Positioning and tracking of overlay graphics and video column of the pose matrix Pi:
can be enhanced e.g., by tracking and mapping image frames 40
or features with the image frames. For example, a keyframe
based SLAM system as discussed in Klein, et al., “Parallel ii. X
(1)
Tracking and Mapping on a camera phone. Mixed and
Augmented Reality, ISMAR 2009, 8th IEEE International
Symposium on 19-22 Oct. 2009, which is hereby incorpo
rated by reference in its entirety, could be used. Other
45 1 ten solar 1
tracking Such as natural feature tracking or marker-based

systems, etc. could be used as well for the position and The resulting pose homographies have the following
tracking of overlay graphics, video and other AR features. important property based on the observation that their first
Butlets go back and even further discuss pose detection. 50 and second columns are ortho-normal vectors, where r and
Imagery (video) frames are captured with a device sensor, rare the first and second column of R respectively:
e.g., a camera. A first image frame I is analyzed to detect
"key points'. A key point generally represents a robust
image characteristic. Some examples of key points include, r 1 O ... (2)
e.g., a feature corner or other characteristic, an area having 55
one or more (locally) large non-Zero derivatives, etc. Other C. C = r clan- 1
features as discussed above under the SIFT section can be
used as well. Homography matrices can be constructed
representing key points from I relative to another image Additionally, given a pose homography C and the
frame I. (Of course, it is not necessary for frames I and I 60 homography H. mapping from camera C, to C, the cor
to be adjacently located frames. In fact, there is some benefit responding pose homography C. can be computed as fol
for frames to have some sufficient distance between them to lows:
have a representable difference in rotation, Scale, translation,
perspective, etc. Additionally, homography can be estimated C2-H2C. (3)
from an image pair itself (e.g., two images), instead of from 65 C must observe the constraint (2). Moreover, by substi
two (2) or more sets of corresponding keypoints.) For tuting (3) into (2) we obtain the following additional con
example, the EMS homography described in Benhimane et straint for C:
US 9,684.941 B2
13 14
based on one or more pose parameters, and then refine the
1 O (4) estimate by using additional parameters.
Captured image data can be modified to remove or modify
de-icini co- 1 | distortion based on the pose information. Watermark detec
tion can be carried out on the modified imagery.
The pose information need not be perfect, but provides
We can formulate the constraint as a cost function on C pose information that preferably gets the pose detection in
by enforcing that the off-diagonal entries are 0 and the the ball park for watermark detection. For example, the
diagonal entries have the same value. Thus, we define the digital watermarking detection currently used in the Digi
following cost function for one homography H: 10 marc Discover application currently can produce reads with
a perspective angle of up to +30-35%.
Successful watermark detection can launch an AR expe
(11 (1.2 (5) rience as discussed above. A watermark payload bit (or bits)
(HIC) (HC) = a1.2 a2.2 ... . can also be used to trigger an announcement to a user that
15 an AR overlay is about to launch and/or offer, e.g., the user
(6)
a chance to cancel or proceed with the AR experience.
e;(C) = (a12/a11) + (a2.2/a1.1 - 1). The pose detector can continue to detect pose information
(e.g., based on minimizing a cost function associated with
The resulting cost function (6) exploits well-known pose Homographies) from captured imagery long after a
orthogonality constraints over the image of the absolute watermark has been detected. This may provide the AR
conic R. I. Hartley and A. Zisserman. Multiple View system with continuing pose information as the AR experi
Geometry in Computer Vision. Cambridge University Press, ence continues. This continuing pose information can be
second edition, 2004 and holds for any homography H, provided to the AR system to help determine relative posi
mapping from the reference camera to another camera i. For tioning of any overlay graphics relative to captured imagery.
a set of cameras C, all connected with individual homog 25 A potentially more accurate approach is to provide base
raphies H, to a reference camera C, we construct a cost line orientation information from digital watermark detec
function by adding up individual costs, obtaining a single tion. For example, Successful watermark detection may also
cost function for the unknown reference camera pose C provide image orientation information. Indeed, digital
watermarking many include orientation attributes (see, e.g.,
30 U.S. Pat. Nos. 8,243,980; 7,116,781 and 6,614,914; which
are each hereby incorporated herein by reference in its
e(C)=Xe;(C1).
i
(7) entirety) that are helpful to identify the original rotation,
scale and translation of the imagery when the watermark
was inserted. This base-line orientation information can be
Overall, the whole problem of estimating all camera poses 35 used by an AR system, e.g., for transforming captured
Ci can be reduced to finding one camera pose C1 that imagery for display on a device screen to accommodate for
minimizes the total cost function (7). A homography H. relative capture device pose. (Watermark orientation infor
between two cameras has 8 degrees of freedom because it is mation can also be used to update or reset pose information
defined up to Scale. By fixing the unknown plane and being calculated by a Pose Detector.)
allowing the second camera C to move freely, the first 40 Watermark information can be used to modify or remove
camera C1 has only 2 degrees of freedom left. Ruiz unwanted rotation, Scaling or translation, essentially restor
et al. . . . propose to fix the camera position and vary the ing the image to the state in which it was watermarked. This
camera tilt (X-axis) and roll (Z-axis) angles but remain vague restored image content allows for reliable digital fingerprint
concerning the valid 2DOF parameter range. Geometrically, analysis. Consider the possibilities.
we interpret the parameterization as depicted in FIG. 4. 45 Having access to the original image when embedding
Plane and reference camera are defined to be located in watermarking, a watermark embedder can analyze image
canonical position, the plane aligning with the world (x, y) areas and based, e.g., on color, luminance, texture and/or
plane and the reference camera located at position (0, 0, -1) coefficient information, can calculate a fingerprint of the
Such that world and camera coordinate systems align. We area. For example, and with reference to FIG. 8, areas 1-6
assume that the plane rotates and the camera stays fixed. The 50 are separately fingerprinted. This information can be stored
first rotation around the X-axis lets the plane move along a in association with a digital watermark that is embedded in
circle aligned with the (y,z) camera plane. The second the areas.
rotation lets the plane move along another circle aligned A watermark detector later encounters imagery depicting
with the (x:y) camera plane. Avoiding the plane to be rotated areas 1-6. If the watermark is redundantly encoded in areas
behind the camera, we define (-7L/2; L/2) as range for the 55 1-6 (e.g., the same watermark is placed in each area), the
X-rotation parameter. For the Z-rotation parameter we define detector might have trouble determining whether it detected
- L/2, TL/2) as the valid range to avoid solution symmetry.” the watermark from area 3 vs. area 1 vs. area 4, and so on.
The above mentioned papers: i) A. Ruiz, P. E. L. de This may matter if a different AR experience is intended for
Teruel, and L. Fernandez. Practical planar metric rectifica different areas on the subjects surface.
tion. In Proc. BMVC 2006, 2006, and ii) R. I. Hartley and 60 Since the imagery is restored to its original or near
A. Zisserman, Multiple View Geometry in ComputerVision, original form, the watermark detector, or a unit cooperating
Cambridge University Press, second edition, 2004, are each with the watermark detector, may compute a corresponding
hereby incorporated herein by reference in their entireties. digital fingerprint of the detection area. This can be com
There are many refinements. For example, different pared to the original fingerprint (created at embedding) to
homographies can be created for different pose parameters, 65 determine the location of the watermark detection area, e.g.,
e.g., separate out image translation or group together scale does the fingerprint correspond to areas 1 or 3 or 4. In one
and rotation, etc. Also, a first pose estimate can be provided example the fingerprint calculation process uses coefficients
US 9,684.941 B2
15 16
of a linear projection. When a watermark is read, the processing to compute signatures, including various data
watermark detector (or software/device) cooperating with structure representations of the signatures as explained
the detector, may communicate the watermark payload to a above. In turn, the data structure signals in memory are
registry. This registry may include the original fingerprint transformed for manipulation during searching, Sorting,
information that the detector can use to determine the digital reading, writing and retrieval. The signals are also trans
watermark read location. Knowing the location of a detec formed for capture, transfer, storage, and output via display
tion block can be important in Some applications where the or audio transducer (e.g., speakers).
spatial position of the watermark on a surface is used by an While reference has been made to mobile devices (like
AR system (e.g., overlaying video only over certain areas of cell phones) and embedded systems, it will be recognized
a photograph that contains multiple watermark areas or 10 that this technology finds utility with all manner of
blocks). devices—both portable and fixed. PDAs, organizers, por
The area or block position alternatively can be included in table music players, desktop computers, wearable comput
a watermark payload. For example, an ID or other indicator ers, servers, etc., can all make use of the principles detailed
may indicate the location, or relative location of the water herein. Particularly contemplated cell phones include the
marked area. 15 Apple iPhone, and cell phones following Google's Android
System and Components specification (e.g., the G1 phone, manufactured for T-Mobile
It is envisioned that the above processes, systems and by HTC Corp.). The term “cell phone' should be construed
system components can be implemented in a variety of to encompass all Such devices, even those that are not
computing environments and devices. It is specifically con strictly-speaking cellular, nor telephones. (Details of an
templated that the processes and components will be imple iPhone, including its touch interface, are provided in pub
mented within devices and across multiple devices. For lished patent application 20080174570, which is hereby
example, signal capture, signature calculation and database incorporated herein by reference.)
entry and organization are performed on a set of devices to The design of cell phones and other computers that can be
construct a recognition system, and signal capture, signature employed to practice the methods of the present disclosure
calculation and database search and retrieval are performed 25 are familiar to the artisan. In general terms, each includes
on another set of devices, which may be distinct or overlap. one or more processors, one or more memories (e.g. RAM),
The computing environments used to implement the pro storage (e.g., a disk or flash memory), a user interface
cesses and system components encompass a broad range (which may include, e.g., a keypad, a TFT LCD or OLED
from general purpose, programmable computing devices to display screen, touch or other gesture sensors, a camera or
specialized circuitry, and devices including a combination of 30 other optical sensor, a microphone, etc., together with Soft
both. The processes and system components may be imple ware instructions for providing a graphical user interface), a
mented as instructions for computing devices, including battery, and an interface for communicating with other
general purpose processor instructions for a variety of devices (which may be wireless, such as GSM, CDMA,
programmable processors, including microprocessors, Digi W-CDMA, CDMA2000, TDMA, EV-DO, HSDPA, WiFi,
tal Signal Processors, electronic processors, etc. These 35 WiMax, or Bluetooth, and/or wired, such as through an
instructions may be implemented as Software, firmware, etc. Ethernet local area network, a T-1 internet connection, etc.).
These instructions can also be converted to various forms of An exemplary cell phone that can be used to practice part or
processor circuitry, including programmable logic devices, all of the detailed arrangements is shown in FIG. 3.
application specific circuits, including digital, analog and The processor can be a special purpose hardware device,
mixed analog/digital circuitry. Execution of the instructions 40 or may be implemented by a programmable device execut
can be distributed among processors and/or made parallel ing Software instructions read from a memory or storage, or
across processors within a device or across a network of by combinations thereof. (The ARM series of CPUs, using
devices. Transformation of content signal data may also be a 32-bit RISC architecture developed by Arm, Limited, is
distributed among different processor and memory devices. used in many cell phones.) References to “processor should
The computing devices include, e.g., one or more pro 45 thus be understood to refer to functionality, rather than any
cessors, one or more memories (including computer read particular form of implementation.
able media), input devices, output devices, and communi In addition to implementation by dedicated hardware, or
cation among these components (in some cases referred to as Software-controlled programmable hardware, the processor
a bus). For software/firmware, instructions are read from can also comprise a field programmable gate array, such as
computer readable media, Such as optical, electronic or 50 the Xilinx Virtex series device. Alternatively the processor
magnetic storage media via a communication bus, interface may include one or more digital signal processing cores,
circuit or network and executed on one or more processors. such as Texas Instruments TMS320 series devices.
The above processing of content signals includes trans Software instructions for implementing the detailed func
forming of these signals in various physical forms. Images tionality can be readily authored by artisans, from the
and video (forms of electromagnetic waves traveling 55 descriptions provided herein.
through physical space and depicting physical objects) may Typically, devices for practicing the detailed methods
be captured from physical objects using cameras or other include operating system Software that provides interfaces to
capture equipment, or generated by a computing device. hardware devices and general purpose functions, and also
Similarly, audio pressure waves traveling through a physical include application software that can be selectively invoked
medium may be captured using an audio transducer (e.g., 60 to perform particular tasks desired by a user. Known browser
microphone) and converted to an electronic signal (digital or Software, communications Software, and media processing
analog form). While these signals are typically processed in software can be adapted for uses detailed herein. Some
electronic and digital form to implement the components embodiments may be implemented as embedded sys
and processes described above, they may also be captured, tems—a special purpose computer system in which the
processed, transferred and stored in other physical forms, 65 operating system software and the application Software is
including electronic, optical, magnetic and electromagnetic indistinguishable to the user (e.g., as is commonly the case
wave forms. The content signals are transformed during in basic cell phones). The functionality detailed in this
US 9,684.941 B2
17 18
specification can be implemented in operating system soft encode and decode digital data signals, calculate relative
ware, application Software and/or as embedded system soft attributes of source signals from different Sources, etc.
Wae. The above methods, instructions, and hardware operate on
Different of the functionality can be implemented on reference and Suspect signal components. As signals can be
different devices. For example, in a system in which a cell represented as a sum of signal components formed by
phone communicates with a server at a remote service projecting the signal onto basis functions, the above meth
provider, different tasks can be performed exclusively by ods generally apply to a variety of signal types. The Fourier
one device or the other, or execution can be distributed transform, for example, represents a signal as a sum of the
between the devices. For example, extraction of signatures signals projections onto a set of basis functions.
10
from a test image on a cell phone, and searching of a The particular combinations of elements and features in
database for corresponding reference images on a remote the above-detailed embodiments are exemplary only; the
server, is one architecture, but there are many others. For interchanging and Substitution of these teachings with other
example, information about reference images may be stored teachings in this and the incorporated-by-reference patents/
on the cell phone—allowing the cell phone to capture a test 15
applications are also contemplated.
image, generate signatures, and compare against Stored
signature data structures for reference images—all without What is claimed is:
reliance on externals devices. Thus, it should be understood 1. A method comprising:
that description of an operation as being performed by a receiving imagery captured by a device sensor, in which
particular device (e.g., a cell phone) is not limiting but the device sensor comprises an image or video camera;
exemplary; performance of the operation by another device analyzing the imagery to decode digital watermarking
(e.g., a remote server), or shared between devices, is also therefrom, in which the digital watermarking comprises
expressly contemplated. (Moreover, more than two devices orientation attributes;
may commonly be employed. E.g., a service provider may
refer some tasks, functions or operations, to servers dedi 25 determining orientation information from the orientation
cated to such tasks.) attributes, the orientation information being associated
In like fashion, data can be stored anywhere: local device, with a capture position of the device sensor relative to
remote device, in the cloud, distributed, etc. an imaged subject;
Operations need not be performed exclusively by specifi providing the orientation information for use by an aug
cally-identifiable hardware. Rather, some operations can be 30 mented reality (AR) system; and
referred out to other services (e.g., cloud computing), which using one or more configured processors, refining the
attend to their execution by still further, generally anony orientation information determined from the orienta
mous, Systems. Such distributed systems can be large scale tion attributes with image homography, in which the
(e.g., involving computing resources around the globe), or orientation information is utilized as a starting pose for
local (e.g., as when a portable device identifies nearby 35 the image homography.
devices through Bluetooth communication, and involves one 2. The method of claim 1 further comprising: identifying
or more of the nearby devices in an operation.) a plurality of key points in the imagery, in which the key
CONCLUDING REMARKS points represent robust image characteristics that are inde
40 pendent of the digital watermarking, in which said refining
Having described and illustrated the principles of the utilizes pose homographies associated with the key points.
technology with reference to specific implementations, it 3. The method of claim 2 in which said refining further
will be recognized that the technology can be implemented minimizes a cost function associated with the pose homog
in many other, different, forms. To provide a comprehensive raphies.
disclosure without unduly lengthening the specification, 45 4. The method of claim 1 in which the AR system overlays
applicants incorporate by reference the patents and patent graphics or video on a display Screen in an area correspond
applications referenced above, in their entireties. ing to a spatial area in which the digital watermarking was
The methods, processes, and systems described above detected from the imagery.
may be implemented in hardware, software or a combination 5. The method of claim 1 in which the AR system utilizes
of hardware and software. For example, the signal process 50
the orientation information to transform the imagery for
ing operations described above may be implemented as display on a display screen.
instructions stored in a memory and executed in a program
mable computer (including both software and firmware 6. The method of claim 1 in which the digital watermark
instructions), implemented as digital logic circuitry in a ing comprises a payload, in which said method further
special purpose digital circuit, or combination of instruc 55 comprises initiating an announcement associated with the
tions executed in one or more processors and digital logic AR system utilizing at least a portion of the payload.
circuit modules. The methods and processes described 7. The method of claim 1 further comprising:
above may be implemented in programs executed from a modifying the imagery according to the orientation infor
system's memory (a computer readable medium, Such as an mation to yield modified imagery; and
electronic, optical or magnetic storage device). The meth 60
ods, instructions and circuitry operate on electronic signals, deriving a fingerprint from the modified imagery, the
or signals in other electromagnetic forms. These signals fingerprint comprising a hash of the modified imagery.
further represent physical signals like image signals (e.g., 8. The method of claim 7 further comprising providing the
light waves in the visible spectrum) captured in image fingerprint to the AR system to help facilitate display of
sensors. These electromagnetic signal representations are 65 graphics or video on a display screen corresponding to a
transformed to different states as detailed above to detect spatial area of modified imagery from which the fingerprint
signal attributes, perform pattern recognition and matching, was derived from.
US 9,684.941 B2
19 20
9. A mobile device comprising: 13. The mobile device of claim 10 in which the encoded
a device sensor comprising an image or video camera; signal comprises a payload, in which at least a portion of the
memory storing instructions; payload is used to initiate an announcement associated with
one or more processors configured by the instructions for: said AR system.
5 14. The mobile device of claim 10 in which said instruc
analyzing imagery captured by said device sensor to tions comprise instructions to configure said one or more
extract an encoded signal therefrom, in which the processors for:
encoded signal comprises orientation attributes; transforming the imagery according to the orientation
determining orientation information from the orienta information to yield modified imagery; and
tion attributes, the orientation information being 10
extracting a fingerprint from the modified imagery, the
associated with a capture position of said device fingerprint comprising a hash of the modified imagery.
sensor relative to an imaged subject; 15. The mobile device of claim 14 further comprising a
providing the orientation information for use by an display Screen, and instructions to configure said one or
augmented reality (AR) system; and more processors for providing the fingerprint to said AR
system to help facilitate display of graphics or video on said
adapting the orientation information determined from 15 display Screen corresponding to a spatial area of modified
the orientation attributes with image homography, in imagery from which the fingerprint was derived from.
which the orientation information is utilized as a 16. The mobile device of claim 9 in which said instruc
starting pose for the image homography. tions comprise instructions to configure said one or more
10. The mobile device of claim 9 further comprising the processors for: identifying a plurality of key points in the
AR system. imagery, in which the key points represent robust image
11. The mobile device of claim 10 in which said AR characteristics that are independent of the encoded signal, in
system is configured for overlaying graphics or video on a which said adapting utilizes pose homographies associated
display screen in an area corresponding to a spatial area in with the key points.
which the encoded signal was detected from the imagery. 17. The mobile device of claim 16 in which said instruc
12. The mobile device of claim 10 in which said mobile 25 tions comprise instructions to configure said one or more
device comprises a display Screen, and in which said AR processors for minimizing a cost function associated with
system is configured for utilizing the orientation information the pose homographies.
to transform the imagery for display on said display screen. k k k k k

United States Patent: (10) Patent No.: (45) Date of Patent

Uploaded by

Copyright:

Available Formats

United States Patent: (10) Patent No.: (45) Date of Patent

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

United States Patent: (10) Patent No.: (45) Date of Patent

Uploaded by

Copyright:

Available Formats

USOO9684941B2

(12) United States Patent (10) Patent No.: US 9,684.941 B2

Updated Pose for

Video Capture Pose FG 4.

Subject surface (e.g., magazine,

Modify imagery Watermark

turn on App Gathering image Watermark Read Augmented Reality

Video overlay area

tracking Such as natural feature tracking or marker-based

You might also like