Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
42 views

Automatically Detect and Recognize Text in Natural Images

Automatic detection and reconfirmation of text
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Automatically Detect and Recognize Text in Natural Images

Automatic detection and reconfirmation of text
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Automatically Detect and Recognize Text in Natural Images

AKULA PAVAN VEERA VENKATESH – U17EC183

K.V.K.RAMA CHANDRA SETTY- U17EC218

S.GANESH-U17EC216
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

BHARATH INSTITUTE OF HIGHER EDUCATION AND RESEARCH

173,Agharam Road, Selaiyur, Tambaram, Chennai – 600073

Abstract

We present a novel image operator that


indistinguishable without detecting and OC Ring the
seeks to find the value of stroke width for each text. The problem of text detection was considered in
image pixel, and demonstrate its use on the task of a number of recent studies [1, 2, 3, 4, 5, 6, 7].
text detection in natural images. The suggested
Two competitions (Text Location
operator is local and data dependent, which Competition at ICDAR 2003 [8] and ICDAR 2005
makes it fast and robust enough to eliminate the [9]) have been held in order to assess the state of the
art. The qualitative results of the competitions
need for multi-scale computation or scanning
demonstrate that there is still room for improvement
windows. Extensive testing shows that the (the winner of ICDAR 2005 text location competition
suggested scheme outperforms the latest published shows recall=67% and precision=62%). This work
deviates from the previous ones by defining a
algorithms. Its simplicity allows the algorithm to
suitable image operator whose output enables fast
detect texts in many fonts and languages. and dependable detection of text. We call this
operator the Stroke Width Transform (SWT), since it
Introduction transforms the image data from containing color
values per pixel to containing the most likely stroke
Detecting text in natural images, as opposed width. The resulting system is able to detect text
to scans of printed pages, faxes and business cards, is regardless of its scale, direction, font and language.
an important step for a number of Computer Vision When applied to images of natural scenes, the
applications, such as computerized aid for visually success rates of OCR drop drastically, as shown in
impaired, automatic geocoding of businesses, and Figure 11.
robotic navigation in urban environments. Retrieving
texts in both indoor and outdoor environments There are several reasons for this. First, the
provides contextual clues for a wide variety of vision majorities of OCR engines are designed for scanned
tasks. Moreover, it has been shown that the text and so depend on segmentation which correctly
performance of image retrieval algorithms depends separates text from background pixels. While this is
critically on the performance of their text detection usually simple for scanned text, it is much harder in
modules. For example, two book covers of similar natural images. Second. natural images exhibit a wide
design but with different text, prove to be virtually range of imaging conditions, such as color noise,
blur, occlusions, etc. Finally, while the page layout
for traditional OCR is simple and structured, in
natural images it is much harder, because there is far pairs in a small region. difference of our approach
less text, and there exists less overall structure with from pre absence of scanning window over a m
high variability both in geometry and appearance. required by several other approaches Instead, we
perform a bottom-u information, merging pixels of
similar connected components, which allows across a
wide range of scales in the sam do not use a filter
bank of a few discr detect strokes (and, consequently,
t direction. Additionally, we do not use any filtering
mechanisms, such as OCR fi statistics of gradient
directions in a pertaining to a certain alphabet.

This al with a truly multilingual text


detection Figure 2: Detected text in nature Not every
application of text detection step of character
recognition. When su successful text segmentation
step has g recognition performance. Several prev
algorithms [3, 18, 19] rely on class regions and
therefore are not segmentation mask required for
subse method carries enough information
Figure 1: The SWT converts the image (a) from segmentation and so a good mask is r detected text. n
containing gray values to an array containing likely idea presented in roke width for each ator output can
stroke widths for each pixel (b). This information be -frequency content flexible geometric idth can be
suffices for extracting the text by measuring the grouped t are likely to be the algorithm to drawings
width variance in each component as shown in (c) as shown the stroke width to allow slow bounded rs
because text tends to maintain fixed stroke width. from previous a separating feature instead, we collect
This puts it apart from other image elements such as uping of pixels. In important if it has a This
foliage. The detected text is shown in (d). geometric of detected pixels, of many similarly
Another notable vious work is the ultiscale pyramid,
One feature that separates text from other s [e.g. 3, 4, 25]. p integration of r stroke width into us
elements of a scene is its nearly constant stroke to detect letters e image. Since we ete orientations,
width. This can be utilized to recover regions that are we text lines) of any language-specific altering stage
likely to contain text. In this work, we leverage this [3] or candidate window lows us to come up
fact. We show that a local image operator combined algorithm.
with geometric reasoning can be used to recover text
reliably. The mai this work shows how to compute
the st pixel. Figure 1c shows that the oper utilized to
separate text from other high of a scene. Using a
logical and reasoning, place with similar stroke w
together into bigger components that words. This
reasoning also allows distinguish between text and
arbitrary in Figure 2. Note that we do not require be Figure 2: Detected text in natural images
constant throughout a letter, but a variations instead.
The method suggested here diffe approaches in that it Optical character recognition (OCR) is one
does not look for per pixel, like gradient or color. of the most important fields in pattern recognition
world which is able to recognize handwritten
I enough information to enable smart gro
our approach, a pixel gradient is only i corresponding characters, not well-organized characters and
opposing gradient. Verification greatly reduces the machine printed characters. Generally character
amount as a stroke forces the co-occurrence matched
recognition system consists of five major tasks (fig 1) Researchers are interested in
which are involved pre-processing, segmentation, developing OCR systems due to numerous potential
feature extraction, classification and recognition. Pre- applications in business and industry. The usage of
processing includes thresholding and determining the OCR varies across application areas.
size and aspect ratio of images that has been Banking is one of the widely known area where OCR
normalized. is used for automatic processing
Thresholding is applied to the images for of cheques and other forms. The writing on the
removing the noises and separating background and cheque can be scanned and recognized

foreground[1]. Image segmentation is the next instantly to reduce the wait times in banks. OCR

critical step which is used to cluster the pixel of systems enable form processing tools to

image in order to provide information for the surface extract and read the relevant information from paper

of the image. Feature extraction that comes after based forms. Medical professionals
have to deal with a large volume of forms containing
image segmentation has been applied in order to
important information about the
extract all the useful characteristic of character
patients. It is useful to keep up with all the
features and reduce the number of errors in character
information by putting it into a central
recognition process.
database digitally, so that the information can be
Replication of human reading
accessed efficiently as required. The
process with the help of machines has been an old
large scale digitization projects need efficient OCR
area of research in the field of pattern recognition and
system to convert millions of printed
machine learning [Cle65]. In spite of
books and documents to digital archives. Digital
this early start, machine reading or optical character
archives provide searchable access to the content,
recognition (OCR) of degraded text
easy backup facilities and eliminate the need for
(broken or merged characters and presence of noise)
physical storage of printed
without human intervention remains
documents.
an elusive goal. This thesis introduces a new
OCR technology is widely used in many
segmentation free OCR approach using a
other fields like mail sorting, education, finance
combination of artificial neural networks (ANNs) and
and government or private offices. It automates the
hidden Markov models (HMMs) for
reading of addresses on letters and
degraded text recognition. In addition, it provides
parcels for efficient mail disbursement. It facilitates
novel applications of ANNs and HMMs
in digital archiving of conference
in document image analysis and recognition. The
proceeding and journals to make them available for
thesis also contributes in the field of
on-line access. Invoice imaging tools
cognitive psychology by presenting new
help in many businesses to keep track of financial
psychophysical experiments to determine the
records. In offices, it simplifies the
impact of overall word shape and importance of letter
collection of data from printed documents for
positions during visual recognition
analysis and further usage. In short, OCR
of words.
technology has revolutionized the document
management process in a wide range of
industries by turning a scanned document image into
a computer readable text document.
OCR systems transform a two dimensional image of
text, that could contain machine
printed or handwritten text, ideally in any script, from
its image representation into
machine readable text. OCR systems usually work in
a pipeline and there are several
steps before actual text recognition takes place
[Bre08]. A typical OCR system may comprises of
preprocessing, layout analysis, character recognition
and language modeling.
Preprocessing normally includes
binarization, noise removal, skew correction and
optionally script and orientation detection. Layout Figure 1.1: A block diagram of a typical OCR

analysis identifies text columns, text blocks, System. The diagram shows different

text lines and reading order of the text page. intermediate processing steps during OCR process.

Character recognition is responsible for


With some exceptions, most OCR methods are based
recognition of the text contained in the text line.
on segmentation based character
Statistical language modeling improves
recognition approaches, in which words are
the text recognition results by integrating prior
segmented into isolated characters and characters are
knowledge about language, vocabulary
recognized individually [DTJT96]. However, in case
and domain of the document. Figure 1.1 shows a
of degraded and low resolution text, character
block diagram of a typical OCR system.
segmentation is problematic and the performance of

The database has been developed on diverse character segmentation significantly affects character

collection of document images from scientific, legal recognition accuracies. Sophisticated character

and technical backgrounds. segmentation techniques have been proposed in past


by many researchers [CL96,SREB11],
but achieving a good segmentation on various kinds
of degradations is still a hard problem.

Combination of HMMs and ANNs has also been


used effectively in various text and speech
recognition problems and is generally referred as
hybrid recognition approach [BM94]. Development
of hybrid systems using HMMs and ANNs usually pattern classification, whereas HMMs are powerful
requires to address statistical tool in learning dynamic
two design issues. The first issue concerns with the time varying properties of a given input signal. By
architectural aspects of ANNs and combining these two paradigm, the
HMMs, i.e, how structure of the HMMs (discrete or complementary properties of ANNs and HMMs are
continuous, character or word level used with a view to improve character
HMMs) should be selected, and what kind of ANNs recognition accuracies in degraded text recognition.
paradigm (multilayer perceptrons,
radial basis or time delay neural networks etc.)
The thesis introduces a simple training mechanism in
should be used. The second issue
which ANNs and HMMs are trained
concerns how these two modules can be best trained
separately on a common set of text line images.
together. In past many hybrid
During recognition, the separately
HMMs/ANNs systems have been revealed using
trained modules are combined together to provide
different kinds of HMMs and ANNs
segmentation free text recognition of
structures. In most of the hybrid systems, ANNs are
a complete text line. The proposed methods employs
used to augment HMMs either
a line scanning mechanism using
as an approximation of the probability density
multilayer perceptrons (MLPs) in order to extract
function or as a neural vector quantizer [BKW+99,
discriminative features along with
MAGD01, KKS00, ECBGMZM11]. Some other
contextual information from the input text line
hybrid approaches use
[RSB12]. The features basically represent
ANNs to obtain observation probabilities for HMMs
posterior probabilities of each character class
[SGH95,BLNB95,KA98]. These hybrid approaches
contained in the text line. The posterior
either require combined training criteria for ANNs
probabilities based features are used by already
and HMMs [ECBGMZM11], or they use complex
trained HMMs to provide segmentation
neural network architectures like time delay neural
free character recognition of a complete text line.
networks (TDNNs) or space displacement neural
Interestingly, the proposed approach
networks (SDNNs) [BLNB95].
also realizes to some extent several cognitive reading
This thesis addresses OCR problem from various based strategies like eye fixations,
prospects. Main objective of this thesis serial scanning of text lines, neural processing,
is to provide segmentation free character recognition activation of letter units and contextual analysis
of degraded document images. This during OCR. A performance comparison of the
is done by providing a novel and straightforward proposed method with existing OCR systems and
combination of artificial neural networks techniques resulted in a 30% reduction in error rate as
and hidden Markov models to recognize entire text compared to Google’s Tesseract OCR system
lines without character segmentation. [Smi13] and 43% reduction in error as compared to
ANNs are best known for their discriminative OCRopus OCR system [Bre08], which are the best
learning capabilities and are good in static open source OCR systems available today.
Additionally, this thesis evaluates usage of HMMs extraction of discriminative features from raw pixel
based segmentation free OCR approaches on low values of input images. The ANNs based
resolution screen rendered text and degraded text line discriminative features analysis is applied to
images. Recognition of low resolution text is an orientation detection and script recognition problems
interesting OCR problem due to its applications and [RSB10b,RBSB09]. Orientation detection and script
occurrence of low resolution text in screen-shots, recognition are two important preprocessing steps in
images and videos [RSB11]. Recognition of low OCR. Orientation detection detects and corrects the
resolution text poses many challenges to traditional deviation of the document’s orientation angle from
OCR methods due to touching characters and very the horizontal direction. Script recognition
small text size. For example, in case of low resolution determines the written script on the page for the
screen rendered text, x-height of most of the application of an appropriate character recognition
lowercase characters may only be four pixels and algorithm. Script recognition is necessary when the
width may only be two pixels. Due to anti-aliased OCR system does not have prior knowledge about
rendering process, recognition of low resolution text the language on the page or the page is written in
share the same character recognition problem. The more than one scripts. These useful applications of
anti-aliased characters are mostly connected to each ANNs in document image preprocessing provides
other and it is difficult to segment them using basis for the development of a combined
common character segmentation methods. Therefore, ANNs/HMMs OCR technique, in which ANNs are
segmentation free text recognition approaches like used as a tool to obtain discriminative features from
HMMs are more suitable for this problem. In this input text line images.
work, HMMs based OCR approaches are applied for
Another contribution of this thesis is to investigate
recognition of isolated screen rendered characters and
into cognitive reading process using
screen rendered text lines using simple pixel based
psychophysical experiments. Human reading is a
features. The performance of the proposed method on
nearly analogous cognitive process to
low resolution text recognition is benchmarked
OCR that involves decoding of printed symbols into
against existing state-of-the-art OCR systems.
meanings. Psychologist are interested
Evaluation results show that HMMs based
to know how readers extract visual information?,
approaches reach the performance of other
what writing is and how it is related to
systems in recognition of low resolution text. The speech and meanings? Is recognition of a word done
proposed HMM based approaches are also evaluated by recognizing its consists characters
on degraded text line images but these approached or its holistic shape? Even more interestingly, how
are unable to give good character recognition are humans able to read words even
accuracies on degraded text lines. in permuted or jumbled forms. From many years,
researchers in the field of cognitive
In addition to OCR, the thesis introduces novel
psychology, neurophysiology and linguistics have
applications of ANNs in document image
extensively studied these questions and
preprocessing. In this work, ANNs are used for the
a large number of theories and reading models exists
that explain different aspects of This technique uses the features such as centre of
visual word recognition or reading. masses by dividing the images into sub-images with
sub-pixel accuracy on various handwritten and
This thesis provides a hypothesis about visual
machine printed historical documents. A SVM
recognition of words and permuted nonwords
hieratical classification method with higher
[RSB10c]. The hypothesis states that reading of
granularity is applied to the extracted feature in order
words and permuted non-words are
to avoid the misleading for similar character
two distinct mental processes and human use
especially in Greek alphabet. The experimental
different strategies in handling of permuted
results on two separate databases which were from
non-words as compared to normal words. The
old Christian documents from Greece show the high
hypothesis is presented in the context of
accuracy in both handwritten and typewritten
dual route theories of word recognition and it is
databases those are 94.51% and 97.71%. The
observed that dual route theory is consistent in
experimental results on modern characters indicate
explanation of the hypothesis. The hypothesis is
better accuracy as comparison to four other feature
tested for three orthographically
extraction techniques.
dissimilar languages, Urdu, German and English by
conducting psychophysical experiments in visual Statistical and signal processing methods are the most
recognition of words and permuted non-words. The used texture analysis methods because they are easy
outcomes of the to implement and reliable for different types of
experiments lead towards many interesting insights textures and applications [4].
of the reading process that can be
Signal processing methods converted or filtered the
used to provide basis for the development of a robust
image into new format image, and description values
OCR system to recognize an almost
are then extracted directly from the new format
innumerable variety of text.
image. This method extracts features based on
LITERATURE SURVEY machine vision, and thus, these features are not
understandable for human vision[5] Also, when the
The proposed method in this works (eGFTA) is based
document images are converted from one form to
on the consideration of some new feature and
another, the noise affects the new form, and physical
declining some others in order to achieve a better
properties such as pixel values and relationships are
recognition as comparison to other available
neglected. Moreover, some disadvantages appear
techniques. Consequently the most accurate classifier
based on the method used. Ding et al. claimed that
is identified which was support vector machine while
the relationship between the accuracy rate of the
it takes longer processing time for a single character
Gabor filter and the number of classes is inversely
in compare with neural network classifier. Another
proportionate. The accuracy rate decreases with
Feature Extraction Methodology that is for the
increasing class number. Furthermore, Tuceryan and
Historical Documents recognition proposed a
Jain [4] and Petrou[6] claimed that signal processing
technique for the character recognition of historical
methods consumes long processing time.
fonts [3].
Gray Level Co-occurrence Matrix (GLCM) method occurrence of each value in EDM2 the most
is a feature extraction method for global feature important pixel relationship were identified.
extraction that has been proposed by Haralick et al
By using these two matrixes, various features can be
[7]. This method has been applied in various texture
extracted such as Homogeneity, Contrast, Angular
feature extraction. GLCM is based on a matrix that
Second Moment (ASM), Entropy, Energy,
shows the distribution of occurrences in selected
Correlation with 0°, 45°, 90° and 135° angles which
image. As it is observable in the formula c is
are 28 features. Three different classifiers were used
identified as a GLCM and I is a n × m image and ( x,
for the purpose of classification. The algorithm that
y) is a pixel pair that indicate the gray level value of i
has been used in this method is as follow:
and j. Consequently the final result is a matrix which
explains the occurrence of any pair of two pixels
which are gray scale. The number of feature that has
been derived from the matrix is 36.

One of the techniques that has been applied in this


paper which named EDMS, discussed about a feature
extraction method for optical font recognition that is
Arabic calligraphy script image proposed by Bilal
Bataineh et al [8]. The result of the proposed method
had been compared with the Gray Level Co-
occurrence Matrix (GLCM) method.

The proposed technique is based on generating two


matrix which named EDM 1 and EDM2. In the first
matrix each cell contains a position based on pixel
neighbourhood association and the position is 0 to
315 degree. By calculating the occurrence of EDM 1 According to the experimental results EDMS
values the relationship of pixel values can be performed better than the GLCM method and reach
determined, while we must consider that in edge to the considerable performance which was 97.85%
image each pixel is related to two pixels. with applying the decision tree as the classifier. The
EDMS method which was applied to the Arabic
The second matrix which is 3 × 3 considered as a
calligraphy script image shows a considerable
edge direction matrix and contains the relationship
improve in compared to the GLCM, so in for the
presentation of each pixel. By measuring the
purpose of performing comparative study on the for type-3 and type-1 images respectively. The
feature extraction methods. extraction of the embedded data does not require the
original host image at the decoder side, only the stego
Recently data embedding over images has drawn
image is required. It should be noted that the C-array
tremendous interest, using either lossy or lossless
can be extracted from the palette of the stego image.
techniques. Although lossy techniques can allow
The proposed technique can be applied for non-
large hiding capacity, host image cannot be recovered
secure applications like embedding personal data in
with high fidelity. Some applications require exact
medical images or
recovery of the host image, i.e. in medicine patient
for copyright applications.
data can be embedded without affecting the medical
image. In general lossless data hiding techniques In this paper, we have presented a simple and
suffer from limited capacity as the host image should efficient reversible date-embedding method for
be kept intact. In this paper a lossless embedding digital images. We explored the redundancy in the
technique is proposed. In this technique image digital content to achieve reversibility. Both the
histograms are analyzed to identify the embedding payload capacity limit and the visual quality of
capacity of different image types. Histogram maxima embedded images are among the best in the literature.
and minima are used in embedding capacity
In implement prediction-based reversible
estimation. The proposed technique gives hiding
steganographic scheme based on image in painting.
capacity that can reach up to 50% of the host image
In this scheme the reference pixels are adaptively
size for images with large homochromatic regions
selected according to the distribution characteristics
(cartoons-like). In fact, our study showed that the
of the content of the image. Image in painting based
embedding capacity is not only affected by the host
on partial differential equations is used to complete
image size but also by its histogram distribution. The
the prediction process by the reference pixels. The
data embedding and extraction is performed using
histogram prediction error is shifted to embed the
simple processing operations that can save on power
secret bits reversibly. During the extraction
consumption for wireless devices.
procedure, the same reference pixel can be exploited
In this study a lossless data embedding technique for to conduct the prediction, which guarantees the
256-color palletized images has been proposed. The lossless recovery of the cover image. Key words :-
embedding capacity is based on the image histogram rever steganographic, image in painting, histogram
and the number of unused colors. The stego image prediction error.
quality is not affected as the color values are the
According to the reference pixels that were chosen,
same, only used indices are changed. Histogram
the PDE-based in painting algorithm using the Fast
analysis is performed in order to understand the
Numerical model can generate the prediction image.
capacity potential of different image types. The
Use of the adaptive strategy for choosing reference
unused colors in the palette have been used to
pixels and the in painting predictor, the accuracy of
optimize the embedding capacity. Capacity more than
the prediction result was high, and larger numbers of
30 and 50% of the host image size has been obtained
embeddable pixels are acquired. Thus, the embedded
secret data can be extracted from the stego image achieve the optimal embedding performance on the
correctly. generated PE sequence. Experiments demonstrates
that the proposed method outperforms the previous
A novel lossless (reversible) data embedding (hiding)
state-of-art counterparts significantly in terms of both
technique is presented. The technique provides high-
the prediction accuracy and the final embedding
embedding capacities, allows complete recovery of
performance. Some theoretical analysis and proofs
the original host signal, and introduces only a small
may be covered in our future work.
distortion between the host and image bearing the
embedded data. The capacity of the scheme depends In reversible data hiding techniques, the value so host
on the statistics of the host image. For typical images, data are modified according to some particular rules
the scheme offers adequate capacity to address most and the original host content can be perfectly restored
applications. In applications requiring high after extraction of the hidden data on receiver side. In
capacities, the scheme can be modified to adjust the this paper, the optimal rule of value modification
embedding parameters to meet the capacity under payload-distortion criterion is found by using
requirements, thus trading off intermediate distortion an iterative procedure, and a practical reversible data
for increased capacity. In such scenarios, the G-LSB hiding scheme is proposed. The secret data, as well as
embedding proposed in the current paper is the auxiliary information used for content recovery,
significantly advantaged over conventional LSB- are carried by the differences between the original
embedding techniques because it offers finer grain pixel-values and the corresponding values estimated
scalability along the capacity distortion curve. The from the neighbors. Here, the estimation errors are
performance of the algorithm—and its extensions—is modified according to the optimal value transfer rule.
rigorously tested with representative images and Also, the host image is divided into a number of pixel
compared with the earlier methods. The proposed subsets and the auxiliary information of a subset is
algorithm is shown to out-perform bit-plane always embedded into the estimation errors in the
compression and RS embedding methods, especially next subset. A receiver can successfully extract the
at moderate- to high-distortion regions embedded secret data and recover the original content
in the subsets with an inverse order. This way, a good
This paper propose a pixel prediction method for
reversible data hiding performance is achieved.
PEE based reversible data hiding schemes based on
the minimum rate criterion, which establishes the This way, one can successfully extract the embedded
consistency in essence between the two steps of PEE secret data and recover the original content in the
based reversible data hiding schemes. Previous PEE subsets with an inverse order. The payload-distortion
methods treat the two steps independently while they performance of the proposed scheme is excellent. For
either focus on pixel prediction to obtain a sharp PE the smooth host images, the proposed scheme
histogram, or aim at histogram modification to significantly outperforms the previous reversible data
enhance the embedding performance for a given PE hiding methods. The optimal transfer mechanism
histogram. And correspondingly, a novel optimized proposed in this work is independent from the
histograms modification scheme is presented to generation of available cover values. In other words,
the optimal transfer mechanism gives a new rule of watermarking operation and encryption operation
value modification and can be used on various cover commutative. The scheme keeps secure against
values. present attacks, is efficient in implementation, keeps
imperceptible, and is robust against recompression in
If a smarter prediction method is exploited to make
some extent. These properties make the scheme a
the estimation errors closer to zero, a better
choice for secure video transmission or distribution.
performance can be achieved, but the computation
In future work, the encryption algorithm’s security
complexity due to the prediction will be higher. The
against such new attack as replacement attack or Said
combination of the optimal transfer mechanism and
attack will be evaluated, and some means will be
other kinds of available cover data deserves further
taken to improve the watermark’s robustness against
investigation in the future.
such attack as collusion or de synchronization.

A scheme is proposed to implement commutative


In this paper a commutative watermarking and
video encryption and watermarking during advanced
ciphering scheme for digital images is presented. The
video coding process. In H.264/AVC compression,
commutative property of the proposed method allows
the intra-prediction mode, motion vector difference
to cipher a watermarked image without interfering
and discrete cosine transform (DCT) coefficients’
with the embedded signal or to watermark an
signs are encrypted, while DCT coefficients’
encrypted image still allowing a perfect deciphering.
amplitudes are watermarked adaptively. To avoid that
Both operations are performed on a parametric
the watermarking operation affects the decryption
transform domain: the Tree Structured Haar
operation, a traditional watermarking algorithm is
transform. The key dependence of the adopted
modified. The encryption and watermarking
transform domain increases the security of the overall
operations are commutative. Thus, the watermark can
system. In fact, without the knowledge of the
be extracted from the encrypted videos, and the
generating key it is not possible to extract any useful
encrypted videos can be re-watermarked. This
information from the ciphered-watermarked image.
scheme embeds the watermark without exposing
Experimental results show the effectiveness of the
video content’s confidentiality, and provides a
proposed scheme.
solution for signal processing in encrypted domain.
Additionally, it increases the operation efficiency, In this work we have presented a commutative
since the encrypted video can be watermarked watermarking and encryption system, based on a
without decryption. These properties make the layered scheme and on a key dependent transform
scheme a good choice for secure media transmission domain. Although the proposed method presents
or distribution. some similarities with [12,13], there are important
differences with the cited works. First of all, here we
In this paper, a commutative video watermarking and
proposed to modify the same wave let coefficients
encryption scheme based on H.264/AVC codec is
with both the AES encryption and the watermarking,
presented. The video is watermarked and/or
achieving the commutative property in a different
encrypted during H.264/AVC compression process.
way. Moreover, the key dependent Tree Structured
The modified watermarking algorithm makes the
Haar transform domain improves the overall security additional data into various bit planes of the invariant
of the system. The proposed method grants the space with a reversible manner. Also, a capacity–
authenticity of the transmitted data, thanks to the distortion criterion is employed to find the optimal
watermarking technique, and the privacy, obtained values of parameters to ensure a good performance.
through the encryption procedure. These Using this scheme, the data embedded in
curitysystemis extremely flexible since the plain/encrypted domain can be extracted from
decryption and the watermark extraction can be encrypted/plain domain, and the way of
performed simultaneously or in different stages. insertion/extraction of additional data in plain domain
Experimental tests have shown the effectiveness of is same as that in encrypted domain. Furthermore, the
the proposed method. original image can be perfectly recovered from an
image containing additional data
This work proposes a novel scheme of commutative
reversible data hiding and encryption. In encryption This work proposes a novel reversible data hiding
part, the gray values of two neighboring pixels are scheme for encrypted image. After encrypting the
masked by same pseudo-random bits. In data-hiding entire data of an uncompressed image by a stream
part, the additional data are embedded into various bit cipher, the additional data can be embedded into the
planes with a reversible manner, and a parameter image by modifying a small proportion of encrypted
optimization method based on a capacity–distortion data. With an encrypted image containing additional
criterion is used to ensure a good performance. data, one may firstly decrypt it using the encryption
Because the data space used for accommodating the key, and the decrypted version is similar to the
additional data is not affected by the encryption original image. According to the data-hiding key,
operation, the data embedded in plain/encrypted with the aid of spatial correlation in natural image,
domain can be extracted from encrypted/plain the embedded data can be successfully extracted and
domain, and the way of insertion/extraction of the original image can be perfectly recovered.
additional data in plain domain is same as that in
In this work, a novel reversible data hiding scheme
encrypted domain. Furthermore, the original
for encrypted image with a low computation
image can be recovered without any error from an
complexity is proposed, which consists of image
image containing additional data
encryption, data embedding and data
In this work, a novel commutative reversible data- extraction/image-recovery phases. The data of
hiding and encryption scheme is proposed, which original image are entirely encrypted by a stream
consists of image encryption, data embedding, and cipher. Although a data-hider does not know the
data-extraction/image recovery parts. In the original content, he can embed additional data into
encryption part, the gray values of two neighboring the encrypted image by modifying a part of encrypted
pixels belonging to a same block are masked by same data. With an encrypted image containing embedded
pseudo-random bits. With an original image or an data, a receiver may firstly decrypt it using the
encrypted version, a data hider can generate a data encryption key, and the decrypted version is similar
space that is not affected by encryption and embed to the original image. According to the data-hiding
key, with the aid of spatial correlation in natural We used a new algorithm to better estimate the
image, the embedded data can be correctly extracted smoothness of image blocks. The extraction and
while the original image can be perfectly recovered. recovery of blocks are performed according to the
Although someone with the knowledge of encryption descending order of the absolute smoothness
key can obtain a decrypted image and detect the difference between two candidate blocks. The side
presence of hidden data using LSB-steganalytic match technique is employed to further reduce the
methods, if he does not know the data-hiding key, it error rate. The experimental results show that the
is still impossible to extract the additional data and propose method effectively improves Zhang’s
recover the original image. For ensuring the correct method, especially when the block size is small.
data-extraction and the perfect image recovery, we
This technology proposes a lossless, a reversible, and
may let the block side length be a big value, such as
a combined information activity schemes for cipher
32, or introduce error correction mechanism before
text pictures encrypted by public key cryptosystems
data hiding to protect the additional data with a cost
with probabilistic and polymorphic properties. within
of payload reduction.
the lossless theme, the cipher text pixels square
This letter proposes an improved version of Zhang’s measure replaced with new values to implant the
reversible data hiding method in encrypted images. extra information into many LSB-planes of cipher
The original work partitions an encrypted image into text pixels by multiple layer wet paper committal to
blocks, and each block carries one bit by flipping writing. Then, the embedded information may be
three LSBs of a set of pre-defined pixels. The data directly extracted from the encrypted domain and
extraction and image recovery can be achieved by also the information embedding operation doesn't
examining the block smoothness. Zhang’s work did have an effect on the secret writing of original
not fully exploit the pixels in calculating the plaintext image. within the reversible theme, a pre
smoothness of each block and did not consider the processing is utilized to shrink the image bar graph
pixel correlations in the border of neighboring before image coding, in order that the modification
blocks. These two issues could reduce the correctness on encrypted pictures for information embedding
of data extraction. This letter adopts a better scheme won't cause any constituent oversaturation in
for measuring the smoothness of blocks, and uses the plaintext domain. though a small distortion is
side-match scheme to further decrease the error rate introduced, the embedded information may be
of extracted-bits. The experimental results reveal that extracted and also the original image may be
the proposed method offers better performance over recovered from the directly decrypted image. as a
Zhang’s work. For example, when the block size is result of the compatibility between the lossless and
set to 88, the error rate of the Lena image of the reversible schemes, the info embedding operations
proposed method is 0. 34%, which is significantly within the 2 manners may be at the same time
lower than 1.21% of Zhang’s work. performed in associate degree encrypted image. With
the combined technique, a receiver could extract a
This letter proposes improved data extraction and
region of embedded information before secret
image recovery strategies based on Zhang’s work.
writing, and extract another a part of embedded
information and recover the initial plaintext image solutions have been proposed to combine image
when secret writing. encryption and compression for example. Nowadays,
a new challenge consists to embed data in encrypted
This work proposes a lossless, a reversible, and a
images. Since the entropy of encrypted image is
combined data concealing plans for figure content
maximal, the embedding step, considered like noise,
foot age disorganized by open key cryptography with
is not possible by using standard data hiding
probabilistic and homomorphism properties. Within
algorithms. A new idea is to apply reversible data
the lossless set up, the cipher text element qualities
hiding algorithms on encrypted images by wishing to
are supplanted with new values for putting in the
remove the embedded data before the image
additional data into the LSB-planes of cipher text
decryption. Recent reversible data hiding methods
pixels. Thus, the data put in may be squarely far from
have been proposed with high capacity, but these
the disorganized space, and therefore the data
methods are not applicable on encrypted images. In
implanting operation doesn't influence the
this paper we propose an analysis of the local
unscrambling of distinctive plaintext image. Within
standard deviation of the marked encrypted images in
the reversible set up, a preprocessing of bar graph
order to remove the embedded data during the
healer is created before encoding, and a 1/2 cipher
decryption step. We have applied our method on
text element qualities are altered for data inserting.
various images, and we show and analyze the
On beneficiary facet, the additional data may be
obtained results.
separated from the plaintext area, and, in spite of the
very fact that a small twisting is bestowed in In conclusion, with our proposed reversible data
unscrambled image, the primary plaintext image may hiding method for encrypted images we are able to
be recuperated with no mistake. Attribute able to the embed data in encrypted images and then to decrypt
two's similarity plots, the data implanting operations the image and to rebuild the original image by
of the lossless and therefore the reversible plans may removing the hidden data. In this paper, we detailed
be all the where as performed during a disorganized all the steps of the proposed method and we
image. during this means, the collector could take illustrated the method with schemes. We presented
away a bit of put in data within the disorganized area, and analyzed various results by showing the plots of
and concentrate another piece of inserted data and the local standard deviations. In the proposed
recoup the primary plaintext image within the method, the embedding factor is 1 bit for 16 pixels.
plaintext space. This small value of the embedding factor is only is to
have to choose between two values for each block
Since several years, the protection of multimedia data
during the decryption. For the future, we are thinking
is becoming very important. The protection of this
to improve this method by increasing the payload but
multimedia data can be done with encryption or data
also the complexity.
hiding algorithms. To decrease the transmission time,
the data compression is necessary. Since few years, a This work proposes a novel scheme for separable
new problem is trying to combine in a single step, reversible data hiding in encrypted images. In the
compression,encryption and data hiding. So far, few first phase, a content owner encrypts the original
uncompressed image using an encryption key. Then, additional data can be still extracted and the original
a data-hider may compress the least significant bits of content can be also recovered since the lossless
the encrypted image using a data-hiding key to create compression does not change the content of the
a sparse space to accommodate some additional data. encrypted image containing embedded data.
With an encrypted image containing additional data, However, the lossy compression method in [3]
if a receiver has the data-hiding key, he can extract compatible with encrypted images generated by pixel
the additional data though he does not know the permutation is not suitable here since the encryption
image content. If the receiver has the encryption key, is performed by bit-XOR operation. In the future, a
he can decrypt the received data to obtain an image comprehensive combination of image encryption and
similar to the original one, but cannot extract the data hiding compatible with lossy compression
additional data. If the receiver has both the data- deserves further investigation.
hiding key and the encryption key, he can extract the
This correspondence proposes a framework of
additional data and recover the original content
reversible data hiding (RDH) in an encrypted JPEG
without any error by exploiting the spatial correlation
bit stream. Unlike existing RDH methods for
in natural image when the amount of additional data
encrypted spatial-domain images, the proposed
is not too large.
method aims at encrypting a JPEG bit stream into a
In this paper, a novel scheme for separable reversible properly organized structure, and embedding a secret
data hiding in encrypted image is proposed, which message into the encrypted bit stream by slightly
consists of image encryption, data embedding and modifying the JPEG stream. We identify usable bits
data-extraction/image-recovery phases. In the first suitable for data hiding so that the encrypted bit
phase, the content owner encrypts the original stream carrying secret data can be correctly decoded.
uncompressed image using an encryption key. The secret message bits are encoded with error
Although a data-hider does not know the original correction codes to achieve a perfect data extraction
content, he can compress the least significant bits of and image recovery. The encryption and embedding
the encrypted image using a data-hiding key to create are controlled by encryption and embedding keys
a sparse space to accommodate the additional data. respectively. If a receiver has both keys, the secret
With an encrypted image containing additional data, bits can be extracted by analyzing the blocking
the receiver may extract the additional data using artifacts of the neighboring blocks, and the original
only the data-hiding key, or obtain an image similar bit stream perfectly recovered. In case the receiver
to the original one using only the encryption key. only has the encryption key, he/she can still decode
When the receiver has both of the keys, he can the bit stream to obtain the image with good quality
extract the additional data and recover the original without extracting the hidden data.
content without any error by exploiting the spatial
In this correspondence, we propose an RDH
correlation in natural image if the amount of
framework for encrypted JPEG bit stream. The
additional data is not too large. If the lossless
original JPEG bit stream is properly encrypted to
compression method in [1] or [2] is used for the
hide the image content with the bit stream structure
encrypted image containing embedded data, the
preserved. The secret message bits are encoded with
ECC and embedded into the encrypted bit stream by
modifying the appended bits corresponding to the AC
coefficients. By using the encryption and embedding
keys, the receiver can extract the embedded data and
perfectly restore the original image. When the
embedding key is absent, the original image can be
approximately recovered with satisfactory quality
without extracting the hidden data.

TEXT DETECTION ALGORITHM

In this section, we describe th We first


Figure 3.1: Implementation of the SWT (a) A typical
define the notion of a Stroke Width Transform (3.1)
stroke. The pixels of the stroke in this example are
grouping pixels into letter can describe the
darker than the back ground pixels. (b) p is a pixel on
mechanism for gr constructs of words and line
the boundary of the stroke. Searching in the direction
filtering (3.3). The flowchart of Fig. 5.
of the gradient

3.1. The Stroke Width Tran


3.2. Finding letter candidates

The Stroke Width Transform image operator


which computes most likely stroke containing the
SWT is an image of size equal image where each
element content associated with the pixel. W
contiguous part of an image that constant width, as
depicted in assume to know the actual wid recover it.
Figure 3.2: The flowchart
(a) (c) Figure 3: Implementation of the SW pixels of
the stroke in this ex background pixels. (b) p is a 3.3. Grouping letters into text line
pixel Searching in the direction of the gr the
continue a step forward to c stroke width
corresponding pixel on the other pixel along the ray within each t the ones whose variance such as
is assigned by value and the found width of the str foliage, that is es including both city and e hard to
distinguish from 1(c), this test suffices to h is much
The initial value of each element In order to recover
more consistent between 0.1 and 10. tween the
strokes, we image using Canny edge det gradient diameter of the stroke width to be a is connected
direction dp of each e (Fig. 3b). components sign frames. We eliminate ding box of a
component o other components (this is too small or
too large our training set, we limit be between 10 and
300 enables us to detect writing and Arabic fonts, f
small letters in a word to d imperfection of the edge
letter candidates describe how these are es of text.
tests were learned on set [8] by optimizing on the
training set we representing letters by annotation) by
ing Otsu algorithm [20], components. We ltering rule shown for our database [26] we do not employ this
so that 99% of detected. of the algorithm lines of the step, as we have marked whole text lines.
algorithm, we groups of letters.
Experiments
Finding such groups is a significant filtering
mechanism as single letters do not usually appear in In order to provide a baseline comparison,
images and this reasoning allows us to remove we ran our algorithm on the publicly available dataset
randomly scattered noise. An important cue for text is in [24]. It was used in two most recent text detection
that it appears in a linear form. Text on a line is competitions: ICDAR 2003 [8] and ICDAR 2005 [9].
expected to have similarities, including similar stroke Although several text detection works have been
width, letter width, height and spaces between the published after the competitions, no one claimed to
letters and words. Including this reasoning proves to achieve better results on this database; moreover, the
be both straightforward and valuable. For example, a ICDAR dataset remains the most widely used
lamp post next to a car wheel would not be mistaken benchmark for text detection in natural scenes. Many
for the combination of letters “O” and “I” as the post other works remain impossible to compare to due to
is much higher than the wheel. We consider each pair unavailability of their custom datasets. The ICDAR
of letter candidates for the possibility of belonging to dataset contains 258 images in the training set and
the same text line. 251 images in the test set. The images are full-color
and vary in size from 307×93 to 1280×960 pixels.
Two letter candidates should have similar Algorithms are compared with respect to f-measure
stroke width (ratio between the median stroke widths which is in itself a combination of two measures:
has to be less than 2.0). The height ratio of the letters precision and recall. We follow [8] and describe these
must not exceed 2.0 (due to the difference between here for completeness sake.
capital and lower case letters). The distance between
letters must not exceed three times the width of the
wider one. Additionally, average colors of candidates
for pairing are compared, as letters in the same word
are typically expected to be written in the same color.
All parameters were learned by optimizing
performance on the training set, as described in
Section 3.2. At the next step of the algorithm, the
candidate pairs determined above are clustered
together into chains. Initially, each chain consists of a
single pair of letter candidates. Two chains can be
merged together if they share one end and have
similar direction. The process ends when no chains
can be merged.

Each produced chain of sufficient length (at


least 3 letters in our experiments) is considered to be
a text line. Finally, text lines are broken into separate Figure 4.1: Text detection results on several images
words, using a heuristic that computes a histogram of from the ICDAR test set. Notice the low number of
horizontal distances between consecutive letters and false positives.
estimates the distance threshold that separates intra-
word letter distances from inter-word letter distances.
While the problem in general does not require this
The output of each algorithm is a set of
step, we do it in order to compare our results with the
rectangles designating bounding boxes for detected
ones in ICDAR 2003 database [8]. In the results
words. This set is called the estimate (see Fig. 6). A
set of ground truth boxes, called the targets is
provided in the dataset. The match m p between two These are due to strong highlights, transparency of
rectangles is defined as the area of intersection the text, size that is out of bounds, excessive blur, and
divided by the area of the minimum bounding box curved baseline.
containing both rectangles. This number has the
value one for identical rectangles and zero for Table 1: Performance comparison of text detection
rectangles that have no intersection. For each algorithms. For more details on ICDAR 2003 and
estimated rectangle, the closest match was found in ICDAR 2005 text detection competitions, as well as
the set of targets, and vice versa. Hence, the best the participating algorithms, see [9] and [10]. *The
algorithm is not published.
match ݉ሺܴ ;‫ݎ‬ሻ for a rectangle ‫ݎ‬in a set of rectangles ܴ
is defined by

Then, the definitions for precision and recall is

where T and E are the sets of ground-truth In order to compare our results with [7], we
and estimated rectangles respectively. The standard f have implemented the comparison measures
measure was used to combine the precision and recall proposed there. Our algorithm performance is as
figures into a single measure of quality. The relative follows: the Word Recall rate is 79.04%, and the
weights of these are controlled by a parameter ,ߙ Stroke Precision is 79.59% (since our definition of a
which we set to 0.5 to give equal weight to precision stroke is different from [7], we counted connected
and recall: components inside and outside the ground truth
rectangles. Additionally, we counted Pixel Precision,
the number of pixels inside ground truth rectangles
divided by the total number of detected pixels. This
ratio is 90.39%. This outperforms the results shown
in [7] In addition to providing result on ICDAR
database, we propose a new benchmark database for
The comparison between precision, recall text detection in natural images [26]. The database,
and f-measure of different algorithms tested on the which will be made freely downloadable from our
ICDAR database is shown in Table 1. In order to website, consists of 307 color images of sizes ranging
determine the importance of stroke width information from 1024x1360 to 1024x768.
(Section 3.1) and geometric filtering (Section 3.2),
we additionally run the algorithm on the test set in The database is much harder than ICDAR,
two more configurations: configuration #1 had all the due to the presence of vegetations, repeating patterns,
stroke width values less than ∞ set to 5 (changing this such as windows, virtually undistinguishable from
constant did not affect the results significantly). text without OCR, etc. Our algorithm's performance
Configuration #2 had the geometric filtering turned on the database is as follows: precision: 0.54, recall:
off. In both cases, the precision and recall dropped 0.42, f-measure: 0.47. Again, in measuring these
(p=0.66, r=0.55 in configuration #1, p=0.65, r=0.5 in values we followed the methodology described in [8].
configuration #2). This shows the importance of Since one of the byproducts of our algorithm is a
information provided by the SWT. In Figure 7 we letter mask, this mask can be used as a text
show typical cases where text was not detected. segmentation mask. In order to evaluate the usability
of the text segmentation produced by our algorithm, FOR VIDEO TECHNOLOGY, VOL. 12, NO. 4,
we presented an off-the-shelf OCR package with APRIL 2002, pp. 256- 268
several natural images, containing text and,
additionally, with the binarized images representing [5] A. Jain, B. Yu, “Automatic Text Location in
text-background segmentation. The results of the Images and Video Frames”, Pattern Recognition
OCR in both cases are shown in Figure 11. 31(12): 2055-2076 (1998)

Conclusion [6] H-K Kim, "Efficient automatic text location


method and content-based indexing and structuring
In this work we show how to leverage on the of video database". J Vis Commun Image Represent
idea of the recovery of stroke width for text 7(4):336–344 (1996)
detection. We define the notion of a stroke and derive
an efficient algorithm to compute it, producing a new [7] K. Subramanian, P. Natarajan, M. Decerbo, D.
image feature. Once recovered, it provides a feature Castañòn, "Character-Stroke Detection for Text-
that has proven to be reliable and flexible for text Localization and Extraction", International
detection. Unlike previous features used for text Conference on Document Analysis and Recognition
detection, the proposed SWT combines dense (ICDAR), 2005
estimation (computed at every pixel) with non-local
[8] “ICDAR 2003 robust reading competitions”,
scope (stroke width depends on information
Proceedings of Seventh International Conference on
contained sometimes in very far apart pixels).
Document Analysis and Recognition, 2003, pp. 682-
Compared to the most recent available tests, our
68
algorithm reached first place and was about 15 times
faster than the speed reported there. The feature was [9] “ICDAR 2005 text locating competition results”,
dominant enough to be used by itself, without the Eighth International Conference on Document
need for actual character recognition step as used in Analysis and Recognition, 2005. Proceedings. pp 80-
some previous works [3]. This allows us to apply the 84(1)
method to many languages and fonts. There are
several possible extensions for this work. The [10] L.i J. Quackenbush, "A Review of Techniques
grouping of letters can be improved by considering for Extracting Linear Features from Imagery",
the directions of the recovered strokes. This may Photogrammetric Engineering & Remote Sensing,
allow the detection of curved text lines as well. We Vol. 70, No. 12, December 2004, pp. 1383–1392
intend to explore these directions in the future
[11] P. Doucette, P. Agouris,, A. Stefanidis,
References "Automated Road Extraction from High Resolution
Multispectral Imagery", Photogrammetric
[1] J. Liang, D. Doermann, H. Li, "Camera-based Engineering & Remote Sensing, Vol. 70, No. 12,
analysis of text and documents: a survey", December 2004, pp. 1405–1416
International Journal on Document Analysis and
Recognition", 2005, vol. 7, no 2-3, pp. 83-200 [2] K. [12] A. Baumgartner, C. Steger, H. Mayer, W.
Jung, K. Kim, A. K. Jain, “Text information Eckstein, H. Ebner, "Automatic road extraction based
extraction in images and video: a survey”, Pattern on multi-scale, grouping, and context",
Recognition, p. 977 – 997, Vol 5. 2004. Photogrammetric Engineering & Remote Sensing,
65(7): 777–785 (1999)
[3] X. Chen, A. Yuille, "Detecting and Reading Text
in Natural Scenes", Computer Vision and Pattern [13] C. Kirbas, F. Quek, "A review of vessel
Recognition (CVPR), pp. 366-373, 2004 extraction techniques and algorithms", ACM
Computing Surveys (CSUR), Vol. 36(2), pp. 81-121
[4] R. Lienhart, A. Wernicke, “Localizing and (2004)
Segmenting Text in Images and Videos” IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS

You might also like