research-article

Open access

Robust Image Hashing via CP Decomposition and DCT for Copy Detection

Authors:

Xiaoping Liang,

Wanting Liu,

Xianquan Zhang,

Zhenjun TangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 7

Article No.: 201, Pages 1 - 22

https://doi.org/10.1145/3650112

Published: 25 April 2024 Publication History

PDF eReader

Abstract

Copy detection is a key task of image copyright protection. This article proposes a robust image hashing algorithm by CP decomposition and discrete cosine transform (DCT) for copy detection. The first contribution is the third-order tensor construction with low-frequency coefficients in the DCT domain. Since the low-frequency DCT coefficients contain most of the image energy, they can reflect the basic visual content of the image and are less disturbed by noise. Hence, the third-order tensor construction with the low-frequency DCT coefficients can ensure robustness of our algorithm. Another contribution is the application of the CP decomposition to the third-order tensor for learning a short binary hash. As the factor matrices learned from the CP decomposition can preserve the topology of the original tensor, the binary hash derived from the factor matrices can reach good discrimination. Lots of experiments and comparisons are done to validate effectiveness and advantage of our algorithm. The results demonstrate that our algorithm has superior classification and copy detection performances than several baseline algorithms. In addition, our algorithm is also better than some baseline algorithms with regard to hash length and computational time.

1 Introduction

In the digital era, it is convenient to obtain digital images via mobile devices, e.g., tablet computers and smartphones. Meanwhile, the online social platforms, such as Instagram, Little red book, and Oasis, are popular and result in an explosive growth of digital images on the Internet [1, 2, 3, 4]. How to efficiently store and manage massive images has emerged as an important issue of the current research of data science. For example, many users would like to post digital images on the social platform. During the process, digital images may undergo some manipulations, e.g., image compression and format conversion, which do not change image contents. Hence, after these manipulations, the processed images have similar contents with the original one. This means that there are many image copies on the Internet. Figure 1 presents an instance of three copy versions of a digital image, in which (a) is a digital image, and (b), (c), and (d) are its copies generated by brightness adjustment (BA), JPEG compression (JC), and text addition (TA). An efficient technique is expected to accurately detect all image copies, e.g., all three copies of Figure 1 (a). Therefore, it is imperative to devise efficient methods for detecting image copies. This article introduces a robust hashing algorithm for image copy detection.

Fig. 1.

Image hashing [5, 6, 7, 8, 9] is an effective technology to achieve efficient management of massive images. It can not only reduce the storage space but also reduce computational complexity of similarity search. Motivated by these advantages, many researchers have applied image hashing algorithms to many scenarios, such as authentication, copy detection, retrieval, tampering detection, and quality evaluation [10, 11, 12, 13]. Generally, an image hashing algorithm should have two basic performance indicators, namely robustness and discrimination[14, 15, 16, 17]. Robustness requires a hashing algorithm to map images with similar visual contents to the similar hashes. Discrimination is also called anti-collision [5, 12], which requires a hashing algorithm to map images with different visual contents to different hashes. There are constraints on the two basic performance indicators. Currently, most image hashing algorithms do not get a desirable balance between the two basic performance indicators, and thus their performance in copy detection is unsatisfactory. To handle this problem, we introduce a new hashing algorithm via CP decomposition and DCT for image copy detection. The contributions of this work are outlined as follows:

(1) A third-order tensor is constructed by using low-frequency coefficients in the DCT domain. As the low-frequency DCT coefficients contain most of the image energy, they can reflect the basic visual content of the image and are less disturbed by noise. Hence, the third-order tensor construction with the low-frequency DCT coefficients can guarantee the robustness of our algorithm.

(2) CP decomposition is applied to the third-order tensor for learning a short binary hash. The CP decomposition can decompose a third-order tensor into three factor matrices. As the factor matrices learned from the CP decomposition can preserve the topology of the original tensor, the binary hash derived from the factor matrices can reach good discrimination.

Lots of experiments and comparisons are done on public image datasets. The results show that our algorithm has superior classification and copy detection performances than some baseline algorithms. Moreover, our algorithm also outperforms several baseline algorithms with regard to hash length and computational time. The rest parts of the article are arranged below. Related work is introduced in Section 2. Our algorithm is elaborated in Section 3. Experiments and comparisons are discussed in Sections 4 and 5, respectively. At last, conclusions are given in Section 6.

2 Related Work

In literature, lots of robust hashing algorithms have been reported for different purposes. Based on the type of feature extraction techniques, these reported algorithms are divided into three kinds. Typical techniques of each kind are illustrated below.

2.1 Statistical Features-based Hashing Algorithms

These image hashing algorithms exploit different statistical features to derive a hash, such as moments, histogram, variance, skewness, and kurtosis. For example, Zhao et al. [6] extracted the amplitude and phase of image blocks by Zernike moments to produce an image hash. Based on the Zernike moment theory, Ouyang et al. [2] combined global features of quaternion Zernike moments (QZMs) and local feature points of SIFT to form a hash. This scheme exhibits strong security and is suitable for image authentication. Hosny et al. [18] introduced an algorithm via quaternion polar complex exponential transform (QPCET) moments. This algorithm is robust to some content-preserving operations. Tang et al. [19] employed the histogram of the color vector angle (CVA) matrix by ring partition to design a hashing algorithm. This hashing has the capability to resist rotation. In addition to moments and histogram, some other statistical features have also been employed in image hashing research. For instance, Tang et al. [20] utilized stable statistical features of mean, variance, skewness, and kurtosis from image rings to produce a hash. This hashing can ensure rotation robustness effectively, but the discrimination needs to be strengthened.

2.2 Transform Domain-based Hashing Algorithms

These image hashing algorithms extract features in the transform domain to form a hash. Some frequently used techniques include discrete Fourier transform (DFT) [21], discrete wavelet transform (DWT) [22, 23], and DCT [24, 25]. To improve robustness, Wang et al. [26] developed an algorithm via Watson’s visual model generated by DCT technique. This algorithm performs well in resisting blur operation and noise. Tang et al. [22] employed low-frequency coefficients in the DCT domain to form a hash. This hashing reaches preferable robustness, but the discrimination needs to be strengthened. To improve discrimination, Laradji et al. [27] employed quaternion discrete Fourier transform (QFT) to extract a hash. This hashing has good discrimination, but its robustness is not ideal. In another work, Yan et al. [28] designed a new hashing scheme for tamper detection by quaternion Fourier-Mellin transform (QFMT). This scheme has good discrimination because it can well capture color information via QFMT.

2.3 Data Dimensionality Reduction-based Hashing Algorithms

These hashing algorithms utilize low-dimensional data features to produce an image hash. For example, Tang et al. [29] employed tucker decomposition (TD) to derive a hash from the third-order tensor in the spatial domain. This algorithm reaches good discrimination. In another work, Tang et al. [30] employed MDS to calculate a hash. There is a need to enhance the classification of this algorithm. To make better classification performance, Qin et al. [31] combined the singular value decomposition (SVD), Canny operator, and CVA to produce a hash. This algorithm demonstrates good classification. However, the calculation speed of this algorithm is slow. To make a fast speed, Liang et al. [32] used feature map (FM) and 2D PCA to construct a hashing algorithm. This algorithm has a fast computational efficiency. Tang et al. [33] employed the quaternion SVD (QSVD) to compute a hash via the Euclidean distance between the singular values. The classification performance of these two algorithms needs further improvement. Recently, Liang et al. [34] employed saliency map (SM) and isometric mapping (Isomap) to form a hash. This algorithm has competitive classification performance and copy detection. In another work, Huang et al. [35] used locally-preserving projection (LPP) and Gabor filtering (GF) to form a hash. This algorithm has good classification performance and high security.

Apart from the three kinds mentioned above, there are other techniques that can be employed for image hashing. For example, to enhance rotation robustness, Li et al. [36] introduced an algorithm via random GF and lattice vector quantization (LVQ). This algorithm can effectively resist rotation. Huang et al. [37] produced a hash via random walk (RW) on Zigzag block of the image. This scheme can obtain remarkable security. Zhao et al. [38] combined cool and warm tones quaternion features and 3D spatial angle features to produce a hash. Table 1 summarizes the core techniques of some hashing algorithms. The above review demonstrates that substantial improvement has been made. Nevertheless, most algorithms fail to obtain the desired classification performance. Therefore, their performance in copy detection application is not satisfactory yet. To handle this problem, we introduce a new image hashing algorithm via CP decomposition and DCT for image copy detection.

Table 1.

Algorithm	Core techniques	Year
[2]	QZMs + SIFT	2016
[18]	QPCET	2018
[27]	QFT	2013
[28]	QFMT	2016
[29]	TD	2018
[31]	SVD + CVA	2018
[32]	FM + 2D PCA	2022
[34]	SM+ Isomap	2023
[35]	LPP + GF	2023

Table 1. Core Techniques of Some Hashing Algorithms

3 Our Algorithm

Our algorithm includes four steps. The block diagram of our algorithm is displayed in Figure 2. Firstly, the input image is adjusted and filtered to obtain a pre-processed image. Secondly, local features are generated from the pre-processed image via DCT. Thirdly, a third-order tensor is constructed by using the local features in DCT domain. In the end, CP decomposition is applied to the third-order tensor, and the factor matrices of the CP decomposition are employed to construct a short binary hash. The following sections elaborate on these four parts.

Fig. 2.

3.1 Pre-processing

The first part is the pre-processing operation which consists of converting the input image to \(F\times F\) size using bi-cubic interpolation and filtering the resized image by the Gaussian low-pass filtering with a 3×3 convolution mask. Typically, the filtering template is defined below.

\begin{equation} T_g=\frac{T^{(1)}(i,j)}{\sum _i\sum _jT^{(1)}(i,j)}, \end{equation}

(1)

where \(T^{(1)}(i,j)\) is defined as below.

\begin{equation} T^{(1)}(i,j)=e^{\frac{-(i^2+j^2)}{2\sigma ^2}}, \end{equation}

(2)

where \(\sigma\) is the standard deviation.

3.2 Local Feature Extraction via DCT

We select the low-frequency DCT coefficients of the pre-processed image in the RGB color space as local features. Since the low-frequency DCT coefficients contain most of the image energy, they can reflect the basic visual content of the image and are less disturbed by noise [1, 39]. Specifically, the red, green, and blue components of the pre-processed image in RGB color space are first extracted. Then, the three color components are partitioned into non-overlapping blocks with a size of \(S\times S\). Note that the S value is selected to ensure that the remainder of \(F/S\) is zero. Therefore, there are \(N={(F/S)}^2\) blocks for each color component. The DCT feature extraction in the red component is explained below.

Let that \(\mathbf {R}_i\) denote the ith block of the red component labeled from top to bottom and left to right. The two-dimensional DCT is then applied to the block \(\mathbf {R}_i\). The 2D DCT is computed by the below formula.

\begin{equation} \begin{aligned}D_i(u,v)=a(u)a(v)\sum _{l=0}^{S-1}\sum _{n=0}^{S-1}R_i(l,n)\cos \left[\frac{(2l+1)u\pi }{2S}\right] \cos \left[\frac{(2n+1)v\pi }{2S}\right] \end{aligned}, \end{equation}

(3)

where \(R_i(l,n)\) represents the pixel value in the \((l+1)\)-th row and the \((n+1)\)-th column of \(\mathbf {R}_i\), \(D_i(u,v)\) represents the DCT coefficients in the \((u+1)\)-th row and the \((v+1)\)-th column (\(u,v=0,1, \ldots ,S-1\)), and \(a(u)\) is defined as follows:

\[\begin{eqnarray} a(u)= \left\lbrace \begin{array}{ll} \sqrt {1/S}, & \text{if} \quad u = 0 \\ \sqrt {2/S}, & \text{otherwise} \end{array} \right. . \end{eqnarray}\]

(4)

The DCT coefficients are scanned in the zigzag order to obtain a sequence of DCT coefficients in one-dimensional form. As the high-frequency DCT coefficients contain only a small amount of image information and are susceptible to noise influence, the first K elements in the zigzag scanned DCT sequence of the block \(\mathbf {R}_i\) is selected as the feature vector \(\mathbf {x}_i=[x_{1,i},x_{2,i}, \ldots ,x_{K,i}]^{\rm {T}}\). The vectors of all image blocks are arranged to obtain a feature matrix \(\mathbf {X}_R\) sized \(K\times N\) as follows:

\[\begin{eqnarray} \mathbf {X}_R=[\mathbf {x}_1,\mathbf {x}_2, \ldots ,\mathbf {x}_N] = \begin{bmatrix} x_{1,1}& x_{1,2} & \cdots & x_{1,N} \\ x_{2,1} & x_{2,2}& \cdots & x_{2,N} \\ \vdots & \vdots & \ddots & \vdots \\ x_{K,1} & x_{K,2} & \cdots & x_{K,N} \\ \end{bmatrix}. \end{eqnarray}\]

(5)

Similarly, the green and blue components are processed in the same way and thus two local feature matrices \(\mathbf {X}_G\) and \(\mathbf {X}_B\) are constructed.

3.3 Tensor Construction

After the local feature extraction, three feature matrices in the DCT domain are generated, i.e., \(\mathbf {X}_R\), \(\mathbf {X}_G\), and \(\mathbf {X}_B\). To create a robust third-order tensor, the three feature matrices are partitioned into non-overlapping blocks, which are then stacked to form a third-order tensor. Suppose that the block size is \(D\times D\). For simplicity, the D value is selected to ensure that the remainders of \(K/D\) and \(N/D\) are both zero. Thus, there are \(M=3\times (K/D)\times (N/D)\) blocks in total. The blocks of each feature matrix are stacked from left to right and top to bottom, and the stacking order of the feature matrices is \(\mathbf {X}_R\), \(\mathbf {X}_G\), and \(\mathbf {X}_B\). Finally, a third-order tensor of size \(D\times D\times M\) is constructed. To visualize the tensor construction process, a block diagram is given in Figure 3. Note that the third-order tensor is constructed by using the low-frequency DCT coefficients. As these DCT coefficients are less disturbed by noise, hash calculation from the third-order tensor can ensure the robustness of our algorithm.

Fig. 3.

3.4 CP Decomposition

CP (CANDECOMP and PARAFAC) decomposition [40, 41] is an efficient data dimensionality reduction technology that can learn low-dimensional features from the input tensor. At present, CP decomposition has been employed in image fusion [42], feature extraction [43], data analysis [44], and so on. In general, CP decomposition can decompose a third-order tensor into three factor matrices. Detailed calculation steps are as follows.

For a third-order tensor \(\mathbf {X}\in \mathbb {R}^{D\times D\times M}\), suppose that its three factor matrices are denoted by \(\mathbf {A}\), \(\mathbf {B}\), and \(\mathbf {C}\), respectively, where \(\mathbf {A} \in \mathbb {R}^{D \times P}\), \(\mathbf {B} \in \mathbb {R}^{D \times P}\), and \(\mathbf {C} \in \mathbb {R}^{M \times P}\). Thus, CP decomposition is represented as below.

\[\begin{eqnarray} \mathbf {X} \approx [\![ \mathbf {A}, \mathbf {B}, \mathbf {C} ]\!] \equiv \sum _{p=1}^{P} \mathbf {a}_{p} \circ \mathbf {b}_{p} \circ \mathbf {c}_{p}, \end{eqnarray}\]

(6)

in which \(\circ\) is the inner product of two vectors, P is a positive integer representing the decomposition rank, \(\equiv\) is the constant equals, \([\![ \quad ]\!]\) is a concise expression of CP decomposition, \(\mathbf {a}_p\in \mathbf {A}\), \(\mathbf {b}_p\in \mathbf {B}\) and \(\mathbf {c}_p\in \mathbf {C}\) are the column vectors. In an element-wise manner, the CP decomposition can be denoted as below.

\[\begin{eqnarray} x_{ijk}\approx \sum _{p=1}^Pa_{ip}b_{jp}c_{kp}, \end{eqnarray}\]

(7)

where \(a_{ip}\), \(b_{jp}\), and \(c_{kp}\) are the elements in the \(\mathbf {A}\), \(\mathbf {B}\), and \(\mathbf {C}\), respectively.

Note that the factor matrices obtained from CP decomposition have the same number of column vectors. Specifically, the factor matrix \(\mathbf {A}\) is composed of \(\mathbf {a}_p=[a_{1,p},a_{2,p}, \ldots ,a_{D,p}]^{\rm {T}} \quad (p=1, 2, \ldots , P)\), the factor matrix \(\mathbf {B}\) is composed of \(\mathbf {b}_p=[b_{1,p},b_{2,p}, \ldots ,b_{D,p}]^{\rm {T}} \quad (p=1, 2, \ldots , P)\) and the factor matrix \(\mathbf {C}\) is composed of \(\mathbf {c}_p=[c_{1,p},c_{2,p}, \ldots ,c_{M,p}]^{\rm {T}} \quad (p=1, 2, \ldots , P)\). The formal definitions are as follows:

\[\begin{eqnarray} \mathbf {A}=[\mathbf {a}_1,\mathbf {a}_2, \ldots ,\mathbf {a}_P], \end{eqnarray}\]

(8)

\[\begin{eqnarray} \mathbf {B}=[\mathbf {b}_1,\mathbf {b}_2, \ldots ,\mathbf {b}_P], \end{eqnarray}\]

(9)

\[\begin{eqnarray} \mathbf {C}=[\mathbf {c}_1,\mathbf {c}_2, \ldots ,\mathbf {c}_P]. \end{eqnarray}\]

(10)

If the column vectors of \(\mathbf {A}\), \(\mathbf {B}\), and \(\mathbf {C}\) are normalized and the weights \(\lambda \in \mathbb {R}^P\) are taken into account in the vectors, the CP decomposition is written below.

\[\begin{eqnarray} \mathbf {X}\approx [\![ \lambda ;\mathbf {A},\mathbf {B},\mathbf {C}]\!] \equiv \sum _{p=1}^P\lambda _p \mathbf {a}_p\circ \mathbf {b}_p\circ \mathbf {c}_p. \end{eqnarray}\]

(11)

In practice, the alternating least squares (ALS) method [45, 46, 47] is used to achieve the CP decomposition by solving an optimization problem, which is mathematically expressed as below.

\[\begin{eqnarray} \min _{\lambda _p \mathbf {a}_p \mathbf {b}_p \mathbf {c}_p}\left\Vert \mathbf {X}-\sum _{p=1}^P\lambda _p \mathbf {a}_p\circ \mathbf {b}_p\circ \mathbf {c}_p\right\Vert . \end{eqnarray}\]

(12)

The ALS method can find an optimal value by iterative calculation. More details of the ALS method can be found in the reference [48]. Figure 4 illustrates the schematic diagram of CP decomposition.

Fig. 4.

As the factor matrices are low-dimensional representations which can preserve the topology of the original tensor, image hash derived from the factor matrices can reach good discrimination. Details of hash generation from the factor matrices are explained as follows. To convert each factor matrix into a vector, the means of the elements in the rows of the matrix are calculated. Thus, the feature vectors \(\mathbf {y}^{\rm {(A)}}\), \(\mathbf {y}^{\rm {(B)}}\), and \(\mathbf {y}^{\rm {(C)}}\) of \(\mathbf {A}\), \(\mathbf {B}\), and \(\mathbf {C}\) are obtained as follows:

\[\begin{eqnarray} \mathbf {y}^{\rm {(A)}}= \left[y_1^{\rm {(A)}},y_2^{\rm {(A)}}, \ldots ,y_D^{\rm {(A)}}\right]^{\rm {T}}, \end{eqnarray}\]

(13)

\[\begin{eqnarray} \mathbf {y}^{\rm {(B)}}=\left[y_1^{\rm {(B)}},y_2^{\rm {(B)}}, \ldots ,y_D^{\rm {(B)}}\right]^{\rm {T}}, \end{eqnarray}\]

(14)

\[\begin{eqnarray} \mathbf {y}^{\rm {(C)}}=\left[y_1^{\rm {(C)}},y_2^{\rm {(C)}}, \ldots ,y_M^{\rm {(C)}}\right]^{\rm {T}}, \end{eqnarray}\]

(15)

where \(y_i^{\rm {(A)}}\), \(y_i^{\rm {(B)}}\), and \(y_i^{\rm {(C)}}\) represent the means of the ith rows of \(\mathbf {A}\), \(\mathbf {B}\), and \(\mathbf {C}\), respectively.

To reduce the storage space of saving our hash, the mean value of the elements of each feature vector is computed. Then, the \(\mathbf {y}^{\rm {(A)}}\), \(\mathbf {y}^{\rm {(B)}}\), and \(\mathbf {y}^{\rm {(C)}}\) can be converted to binary sequences by comparing their elements with their mean values. Detailed calculations are defined as follows:

\[\begin{eqnarray} h_i^{\rm {(A)}}= \left\lbrace \begin{array}{ll} 1, & \text{if} \quad y_i^{\rm {(A)}}\gt t^{\rm {(A)}} \\ 0, & \text{otherwise} \end{array} \right. , \end{eqnarray}\]

(16)

\[\begin{eqnarray} h_i^{\rm {(B)}}= \left\lbrace \begin{array}{ll} 1, & \text{if} \quad y_i^{\rm {(B)}}\gt t^{\rm {(B)}} \\ 0, & \text{otherwise} \end{array} \right. , \end{eqnarray}\]

(17)

\[\begin{eqnarray} h_i^{\rm {(C)}}= \left\lbrace \begin{array}{ll} 1, & \text{if} \quad y_i^{\rm {(C)}}\gt t^{\rm {(C)}} \\ 0, & \text{otherwise} \end{array} \right. , \end{eqnarray}\]

(18)

in which \(t^{\rm {(A)}}\), \(t^{\rm {(B)}}\), and \(t^{\rm {(C)}}\) represent the means of \(\mathbf {y}^{\rm {(A)}}\), \(\mathbf {y}^{\rm {(B)}}\), and \(\mathbf {y}^{\rm {(C)}}\), respectively. Finally, our image hash \(\mathbf {h}\) is constructed by concatenating the binary sequences of \(\mathbf {y}^{\rm {(A)}}\), \(\mathbf {y}^{\rm {(B)}}\), and \(\mathbf {y}^{\rm {(C)}}\).

\[\begin{eqnarray} \mathbf {h}= \left[h_1^{\rm {(A)}},h_2^{\rm {(A)}}, \ldots ,h_D^{\rm {(A)}},h_1^{\rm {(B)}},h_2^{\rm {(B)}}, \ldots ,h_D^{\rm {(B)}},h_1^{\rm {(C)}},h_2^{\rm {(C)}}, \ldots ,h_M^{\rm {(C)}}\right]. \end{eqnarray}\]

(19)

Hence, the hash length of our algorithm is \(L=2D+M\) bits.

3.5 Hash Similarity Evaluation

Since our image hash is a compact sequence of binary bits, the Hamming distance is employed to analyze the similarity of two given hashes. Assume that \(\mathbf {h}_1\) and \(\mathbf {h}_2\) are two hash sequences. Then, their Hamming distance can be expressed as below.

\[\begin{eqnarray} d_H(\mathbf {h}_1,\mathbf {h}_2)=\sum _{f=1}^L\vert h_1(f)-h_2(f)\vert , \end{eqnarray}\]

(20)

where \(h_1(f)\) and \(h_2(f)\) represent the fth bit values of the two given hashes \(\mathbf {h}_1\) and \(\mathbf {h}_2\), respectively. In general, a smaller \(d_H\) implies more similar hash sequences.

3.6 Pseudo-code Description

Our algorithm contains four steps: pre-processing, local feature extraction via DCT, tensor construction, and CP decomposition. To improve readability, the pseudo-code of our algorithm is described in Algorithm 1.

4 Experimental Results

The experimental parameters are listed below. The selected image size is \(512\times 512\), and the \(\mathbf {R}\), \(\mathbf {G}\) and \(\mathbf {B}\) components are divided into \(32\times 32\) non-overlapping blocks. Thus, for each component, a total of 256 blocks are obtained. For each block, the first 32 low-frequency DCT coefficients are extracted to construct feature matrix. Then, three feature matrices sized \(32\times 256\) are obtained. To construct a three-order tensor, the three feature matrices are divided into \(32\times 32\) non-overlapping blocks and then there are 24 blocks in total. Next, a three-order tensor of size \(32\times 32\times 24\) is constructed by stacking these blocks. The rank of CP decomposition is 1. Briefly, our parameters are set as follows: \(F=512\), \(S=32\), \(N=(F/S)^2=256\), \(K=32\), \(D=32\), and \(P=1\). Therefore, \(M=3\times (32/32)\times (256/32)= 24\) and our hash length is \(L=2D+M=88\) bits. Our algorithm is coded by MATLAB R2018b. The adopted computer has a CPU of Intel i7 dual-core 8700 with a main frequency of 3.20 GHz and the memory size of 8 GB.

4.1 Robustness Analysis

To analyze the robustness of our algorithm, we use the open dataset called Berkeley dataset [49] to construct a new dataset with 25,800 pairs of similar color images. Specifically, the used Berkeley dataset contains 300 color images with the sizes of 481×321 and 321×481. Typical images in the Berkeley dataset are shown in Figure 5. To create visually similar images, StirMark [50], MATLAB, and Photoshop are utilized. In the experiments, there are 12 operations for testing robustness, of which nine operations are single attacks and three operations are combinational attacks.

Fig. 5.

The parameter settings of the nine single operations are as follows:

—

Gamma Correction (GC): Four \(\gamma\) values of 1.25, 1.1, 0.9, and 0.75;

—

BA: Four magnitudes of ±20 and ±10;

—

Speckle Noise (SN): Nine variances ranging from 0.01 to 0.09 with a step size of 0.01;

—

Salt and Pepper Noise (SPN): Nine densities ranging from 0.01 to 0.09 with a step size of 0.01;

—

Contrast Adjustment (CA): Four magnitudes of ±20 and ±10;

—

Gaussian Low-pass filtering (GLF): Nine standard deviations ranging from 0.9 to 2.5 with a step size of 0.2;

—

Image Scaling (IS): Five ratio values of 2.0, 1.5, 1.1, 0.9, and 0.75;

—

JC: Eight quality factors ranging from 30 to 100 with a step size of 10;

—

Watermark Embedding (WE): Eight strengths ranging from 30 to 100 with a step size of 10.

The parameter settings of the three combinational operations are detailed below. The combinational operation 1 (CO1) consists of rotation (10 rotation angles: ±1\(^{\circ }\), ±2\(^{\circ }\), ±3\(^{\circ }\), ±4\(^{\circ }\), and ±5\(^{\circ }\)), cropping and rescaling. The CO1 is provided by the well-known tool called StirMark [50]. This operation firstly conducts rotation. Since the rotation will introduce some padded regions around the four corners of image, the CO1 exploits the cropping to remove the padded regions and then resizes the cropped image to the size of the original image. More details of the CO1 can be referred to [50]. The combinational operation 2 (CO2) is SPN+JC, where the density of SPN is 0.05 and the quality factor of JC ranges from 30 to 100 with a step size of 10. The combinational operation 3 (CO3) is BA+JC, where the magnitude of BA is 20 and the quality factor of JC ranges from 30 to 100 with a step size of 10. CO2 and CO3 both use JC because it is a commonly-used operation in practice. In summary, there are 86 different operations for each original image. Therefore, there are \(300 \times 86=25,800\) pairs of similar images for robustness analysis.

To view the quality of these similar images, two well-known image quality assessment (IQA) metrics called SSIM [51] and PSNR [52] are used to conduct quality evaluation. Note that SSIM and PSNR are full-reference IAQ metrics which require that the size of a reference image must be equal to the size of its original image. As the sizes of the images attacked by IS are changed, they are resized to the same sizes as their original images by the bicubic interpolation before quality evaluation. Table 2 presents the statistical results of SSIM and PSNR based on Berkeley dataset. Clearly, except for the CO1 results, all other SSIM means are bigger than 0.7 and all other PSNR means are bigger than 22 dB. In addition, the Std. Dev. results are small values. This illustrates that these attacked images are visually the same as their original images. The CO1 results are smaller than those of other operations. This is because it contains the rotation operation and cropping operation, which are not considered by the SSIM and PSNR.

Table 2.

Operation	Mean (SSIM)	Std. Dev. (SSIM)	Mean (PSNR)	Std. Dev. (PSNR)
BA	0.9696	0.0108	26.9132	3.7959
CA	0.9805	0.0066	33.6820	1.4228
GC	0.9452	0.0309	25.6641	3.6833
JC	0.9562	0.0313	25.6279	0.4663
SN	0.7920	0.1285	24.8308	3.2624
SPN	0.7004	0.1466	23.0555	2.9075
GLF	0.9217	0.0475	29.6256	3.3004
WE	0.9542	0.0400	22.9630	2.1918
IS	0.9346	0.0515	24.7304	1.2437
CO1	0.5955	0.1630	16.2169	2.7083
CO2	0.9198	0.0398	23.1965	0.6217
CO3	0.8951	0.0422	28.7454	1.1595

Table 2. Statistical Results of SSIM and PSNR (dB) Based on Berkeley Dataset

Hamming distances between 25,800 pairs of similar images are calculated. Figure 6 displays the mean values of Hamming distances under different operations, in which the abscissa indicates the parameter value of each operation and the ordinate is the mean value of Hamming distances. In Figure 6, the mean values of Hamming distances for all operations are smaller than 1.0, except for GC, CO1, and CO2. For GC, only three means are slightly bigger than 1.0. For CO1, all means are bigger than 2, but they are all smaller than 7. For CO2, its 8 means are around 1.0 and only two values are slightly bigger than 1.0. Small mean values illustrate that our algorithm can generate similar hash sequences for similar image pairs. Table 3 lists the statistical results of Hamming distances under different operations based on the Berkeley dataset. If the threshold of Hamming distance is set to 7, our algorithm can correctly detect all similar images without consideration of the operation of CO1. Even if the CO1 is considered, our algorithm can also make a correct detection rate of 97.26%. High correct detection rate verifies the robustness of our algorithm.

Table 3.

Operation	Max	Mean	Std. Dev.
BA	7	0.7075	1.0055
CA	4	0.4966	0.7486
GC	7	0.8133	1.0737
JC	3	0.3854	0.6439
SN	7	0.6289	0.9414
SPN	4	0.3318	0.5903
GLF	3	0.0996	0.3233
WE	5	0.4258	0.7631
IS	3	0.3833	0.6137
CO1	28	4.3913	3.2689
CO2	5	0.9895	1.0436
CO3	3	0.2025	0.4590

Table 3. Statistical Results of Hamming Distances Based on Berkeley Dataset

Fig. 6.

4.2 Discrimination Test

To assess the discrimination of our algorithm, the VOC2012 dataset [53] is employed. This dataset includes 17,125 different color images and some typical images are shown in Figure 7. In the experiment, the Hamming distances between hash codes of each pair of images are computed. Then, the total number of distances reaches \(C_{17125}^2=17125\times (17125-1)/2=146,624,250\). Figure 8 displays the distribution of 146,624,250 Hamming distances. In Figure 8, the x-axis is the Hamming distance and the y-axis denotes the frequency of each Hamming distance. The calculation results demonstrate that the smallest Hamming distance is 0 and the largest Hamming distance is 57. In addition, the mean Hamming distance of different images is 27.2452, which is much bigger than the largest mean Hamming distance of similar images (4.3913). Consequently, it can intuitively find that our algorithm is discriminative.

Fig. 7.

Fig. 8.

To analyze the quantitative results of our discrimination, Table 4 lists the correct recognition rate and the false detection rate under different thresholds. In Table 4, the discrimination is expressed as the false detection rate of different images and the robustness is represented by the correct recognition rate of similar images. Obviously, as the threshold value increases, the robustness will increase and the discrimination will decrease. Based on the results of Table 4, a proper threshold value can be set for a practical application according to its performance requirement.

Table 4.

Threshold	Correct recognition rate of similar images	False detection rate of different images
3	89.58%	0.005%
4	93.10%	0.016%
5	95.02%	0.038%
6	96.36%	0.080%
7	97.26%	0.160%

Table 4. Detection Performances Under Different Thresholds

4.3 Selection of Block Size

The popular tool called Receiver Operating Characteristics (ROC) graph [54] is adopted. In the ROC graph, the horizontal axis presents False Positive Rate (FPR) and the vertical axis presents True Positive Rate (TPR). Their calculation formulas are defined below.

\[\begin{eqnarray} P_{\rm {{FPR}}}(d_H\le T)=\frac{N_f}{N_d}, \end{eqnarray}\]

(21)

\[\begin{eqnarray} P_{\rm {{TPR}}}(d_H\le T)=\frac{N_t}{N_s}, \end{eqnarray}\]

(22)

where \(N_d\) is the number of different images, \(N_f\) is the number of different images incorrectly distinguished, \(N_s\) is the number of similar images, and \(N_t\) is the number of similar images successfully identified. Note that the \(P_{\rm {{FPR}}}\) and \(P_{\rm {{TPR}}}\) correspond to discrimination and robustness, respectively. A group of points with the coordinates \((P_{\rm {{FPR}}}, P_{\rm {{TPR}}})\) are calculated by using some thresholds and these points are used to plot an ROC curve. According to the meanings of \(P_{\rm {{FPR}}}\) and \(P_{\rm {{TPR}}}\), an ROC curve near the top left corner makes better classification than that far away from the corner.

In this section, the used datasets are identical to Sections 4.1 and 4.2. The selection of block size \(S \times S\) is discussed. In the experiments, only the S value is different and other parameters are kept the same. The selection sizes are \(S=16\), \(S=32\), and \(S=64\). Figure 9 shows the ROC curves of different S values. It can be seen that the ROC curve of \(S=16\) is slightly nearer than the curves of \(S=32\) and \(S=64\), illustrating a little better performance of classification. Therefore, compared to \(S=32\) and \(S=64\), \(S=16\) can provide a better classification performance. The area under the ROC curve (AUC) is calculated and its interval ranges from 0 to 1. A greater AUC value indicates superior classification performance. Table 5 compares the AUC values, hash length and the running time of different S values. The AUCs of \(S=16\), \(S=32\), and \(S=64\) are 0.99976, 0.99967, and 0.99958, respectively. Clearly, \(S=16\) has the largest AUC value, but the AUC difference among these S values are not large. For example, the AUC difference between \(S=16\) and \(S=32\) is 0.00009, while the AUC difference between \(S=16\) and \(S=64\) is 0.00018. The hash lengths of \(S=16\), \(S=32\), and \(S=64\) are 160, 88, and 70 bits, respectively. The running time of \(S=16\), \(S=32\), and \(S=64\) is 0.226, 0.096, and 0.074 seconds, respectively. In view of the whole performances of the AUC, hash length, and time, the selection of \(S=32\) is preferable for our algorithm.

Table 5.

S	AUC	Hash (bit)	Time (s)
16	0.99976	160	0.226
32	0.99967	88	0.096
64	0.99958	70	0.074

Table 5. Performances of Different S Values

Fig. 9.

4.4 Selection of K Value

In our algorithm, the K value is used to select the number of the low-frequency DCT coefficients for local feature extraction. To view the effect of the K value, we select the K value from the set \(\left\lbrace 32, 64, 128, 256 \right\rbrace\) and keep other parameters constant. Figure 10 displays the ROC curves of different K values. The curve comparison shows that \(K=32\) is significantly superior to \(K=64\), \(K=128\), and \(K=256\) because its curve is much nearer the top left corner. This indicates that \(K=32\) achieves superior classification performance than other K values. In addition, the AUCs of different K values are calculated, and Table 6 shows that the AUCs of \(K=32\), \(K=64\), \(K=128\), and \(K=256\) are 0.99967, 0.99757, 0.99640, and 0.99629, respectively. Obviously, \(K=32\) has the largest value. As to the performances of hash length and running time, the hash lengths of \(K=32\), \(K=64\), \(K=128\), and \(K=256\) are 88, 112, 160, and 256 bits, and the time of \(K=32\), \(K=64\), \(K=128\), and \(K=256\) is 0.096, 0.103, 0.107, and 0.113 seconds, respectively. It can be found that the hash length of \(K=32\) is shorter than those of other K values, and the time varies slightly when the K value is changed. Taken together, the whole performance of our algorithm is superior when \(K=32\).

Table 6.

K	AUC	Hash (bit)	Time (s)
32	0.99967	88	0.096
64	0.99757	112	0.103
128	0.99640	160	0.107
256	0.99629	256	0.113

Table 6. Performances of Different K Values

Fig. 10.

4.5 Selection of P Value

In our algorithm, the P value is the rank for CP decomposition. This section discusses the effect of the P value on hash performance. Since the rank of CP decomposition cannot be greater than the order of tensor, the rank selection includes \(P=1\), \(P=2\), and \(P=3\). Similarly, we only change the P value and keep other parameters constant. Figure 11 illustrates the ROC curves of different P values. Compared to the ROC curves of other P values, the curve of \(P=1\) is nearest to the top left corner. It implies that \(P=1\) can provide better classification than other P values. In addition, the AUC values of \(P=1\), \(P=2\), and \(P=3\) are 0.99967, 0.99794, and 0.89141, respectively. As to the running time, the results of \(P=1\), \(P=2\), and \(P=3\) are 0.096, 0.121, and 0.114 seconds, respectively. Table 7 presents the performance comparison of different P values. Taken together, the whole performance of our algorithm is superior when \(P=1\).

Table 7.

P	AUC	Time (s)
1	0.99967	0.096
2	0.99794	0.121
3	0.89141	0.114

Table 7. Performance of Different P Values

Fig. 11.

4.6 Selection of Color Space

To analyze the validity of our choice in color space, some common color spaces are compared, such as HSV space, RGB space, YCbCr space, CIE L*a*b* space, and HSI space. In the experiment, three components of each color space are selected to construct the tensor, and other parameters are kept constant. Figure 12 displays the ROC curves of these color spaces. To view the detailed information, the local parts of these curves are enlarged and presented in Figure 12. Figure 12 shows that the curve nearest to the top left corner is given by the RGB space. Thus, the AUC of the RGB space gets the largest value. In addition, the running time of CIE L*a*b* space, RGB space, YCbCr space, HSV space, and HSI space is 0.151, 0.096, 0.128, 0.116, and 0.129 seconds, respectively. It can be found that the time varies slightly. Table 8 summarizes the AUCs and the running time of different color spaces. Taken together, the whole performance of our algorithm is superior when the RGB color space is selected.

Table 8.

Color space	AUC	Time (s)
RGB	0.99967	0.096
CIE Lab*	0.99592	0.151
YCbCr	0.99758	0.128
HSV	0.98968	0.116
HSI	0.94894	0.129

Table 8. Performances of Different Color Spaces

Fig. 12.

5 Performance Comparisons

To demonstrate the superiority of our algorithm, we compare our algorithm with several advanced algorithms, including GF-LVQ algorithm [36], TD algorithm [29], MDS algorithm [30], RW algorithm [37], and QSVD algorithm [33]. These algorithms are reported in prestigious journals or conferences, and all of them are selected as the compared algorithms in many articles. In addition, QSVD algorithm, TD algorithm, and MDS algorithm also utilize the techniques of dimensionality reduction, such as SVD, TD, and MDS. To make a fair comparison, all images are converted to the size of 512×512 before they are input to the compared algorithms, and the similarity metrics and parameter settings of the compared algorithms are in line with their source articles. The main parameters of our algorithm are \(S=32\), \(K=32\), and \(P=1\).

5.1 Classification Performance

The image datasets utilized in Sections 4.1 and 4.2 are adopted to test classification performances. Specifically, 25,800 pairs of similar images are exploited in the robustness analysis and 17,125 images are used in the discrimination test. The ROC graph is also utilized to conduct visual comparison. Figure 13 presents the ROC curves of all algorithms in the same ROC graph for easy identification. Obviously, the curve of our algorithm is closest to the upper-left corner of the graph. This means that our algorithm makes better classification performance than the compared algorithms. To further illustrate this, the AUC values of different algorithms are listed in Table 9. As can be seen, the AUC of our algorithm is 0.99967. The AUCs of GF-LVQ algorithm, TD algorithm, MDS algorithm, RW algorithm, and QSVD algorithm are 0.97027, 0.99807, 0.97587, 0.96161, and 0.99825, respectively. Clearly, the AUC of our algorithm is bigger than those of the advanced algorithms. To view more quantitative results, TPR comparison when FPR\(\approx 0.01\) is presented in Table 10. Clearly, our TPR is bigger than those of the compared algorithms. Our algorithm can achieve competitive advantage of classification performance due to the following reasons. The third-order tensor construction with the low-frequency DCT coefficients can guarantee robustness of our algorithm because they contain most of the image energy and are less disturbed by noise. Moreover, Since the factor matrices of the CP decomposition can preserve the topology of original tensor, the binary hash derived from the factor matrices can provide our algorithm with good discrimination.

Table 9.

Algorithm	AUC	Length (bit)	Time (s)
GF-LVQ	0.97027	120	0.241
TD	0.99807	96	0.147
MDS	0.97587	900	0.132
RW	0.96161	144	0.039
QSVD	0.99825	640	0.352
Our	0.99967	88	0.096

Table 9. Performance Comparison

Table 10.

Algorithm	GF-LVQ	TD	MDS	RW	QSVD	Our
TPR when FPR \(\approx\) 0.01	0.84164	0.96391	0.86993	0.75646	0.99102	0.99417

Table 10. TPR Comparison

Fig. 13.

5.2 Performance of Time and Hash Storage

Time and hash storage are also two critical performance indicators. Computational time of GF-LVQ algorithm, TD algorithm, MDS algorithm, RW algorithm, QSVD algorithm, and our algorithm is 0.241, 0.147, 0.132, 0.039, 0.352, and 0.096 seconds, respectively. Clearly, our algorithm is slower than the RW algorithm, but it is faster than other compared algorithms. Our algorithm runs faster than most compared algorithms. This can be understood as follows. The 2D DCT and CP decomposition are the main calculations of our algorithm. Since the block size is small, the computational cost of the block-based DCT is low. In addition, as the size of the third-order tensor is small, the computational cost of the CP decomposition is also low. Therefore, our algorithm has a low computational cost. As to the hash storage, our hash length is 88 bits, while the hash lengths of GF-LVQ algorithm, TD algorithm, MDS algorithm, RW algorithm, and QSVD algorithm are 120, 96, 900, 144, and 640 bits, respectively. It’s evident that our algorithm has the shortest hash length. Performances of time and hash storage are presented in Table 9. In summary, our algorithm exhibits superior performance advantages in regards to time and hash storage.

5.3 Copy Detection Performance

To further demonstrate our advantage, the experiments about copy detection are also conducted for comparison. To create a copy detection dataset, Wang’s dataset [55] is used. In the experiments, 10 randomly selected images from the 1,000 color images of the Wang’s dataset are employed as query images. To simulate image copy detection, 18 digital operations are performed on each query image, resulting in the generation of 18 image copies. These image copies are generated by the below operations: JC (compression factor: 30, 50, 80), logo insertion (LI) (weight: 0.2, size: \(66\times 70\)), TA (text: copyright 2023), mosaic (parameter: 5, 10), additive white Gaussian noise (AWGN) (variance: 0.01), CA (parameter: 20), GLF (standard deviation: 0.3), BA (parameter: 20), GC (\(\gamma\): 0.75), SPN (density: 0.02), SN (density: 0.02), CO1 (angle: 1°, 5°), and IS (ratio: 0.5, 0.75). Thus, there are 180 image copies in total. Therefore, the copy image dataset has 1,180 images.

The precision-recall (PR) graph is employed to check the detection performances of different algorithms. Specifically, the precision and recall values under different thresholds are computed and thus some points with the coordinates (recall, precision) are generated for plotting the P-R curve. The quantitative metric called the PR Area Under the Curve (PRAUC) is calculated for comparison. The range of PRAUC is [0, 1]. A bigger PRAUC means higher accuracy in detecting image copies. Figure 14 presents the PRAUC comparison of different algorithms. The PRAUC of our algorithm is 0.99144. The PRAUCs of the GF-LVQ algorithm, TD algorithm, MDS algorithm, RW algorithm, and QSVD algorithm are 0.60946, 0.64472, 0.74353, 0.82583, and 0.98455, respectively. The PRAUC of our algorithms is bigger than the PRAUCs of the compared algorithms. Our algorithm can perform better than these baseline algorithms in copy detection. The reason is that our algorithm has better classification than the compared algorithms, resulting in a reduction of classification error during copy detection.

Fig. 14.

6 Conclusions

This article has presented a new hashing algorithm via CP decomposition and DCT for copy detection. A crucial contribution is the third-order tensor construction with low-frequency coefficients in the DCT domain. Since the low-frequency DCT coefficients contain most of the image energy, they can reflect the basic visual content of the image and are less disturbed by noise. Hence, the third-order tensor construction with the low-frequency DCT coefficients can ensure robustness of our algorithm. Another key contribution is the application of the CP decomposition to the third-order tensor for learning a short binary hash. As the factor matrices learned from the CP decomposition can preserve the topology of the original tensor, the binary hash derived from the factor matrices can reach good discrimination. Extensive experiments have been executed and proved the effectiveness of our algorithm. Performance comparisons have shown that our algorithm outperforms several baseline algorithms in classification and copy detection. In addition, our algorithm has low costs in computational time and storage.

Acknowledgments

Many thanks to the referees for their good suggestions.

References

[1]

Shiguang Liu and Ziqing Huang. 2019. Efficient image hashing with geometric invariant vector distance for copy detection. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 4 (2019), 1–22.

Abstract

1 Introduction

2 Related Work

2.1 Statistical Features-based Hashing Algorithms

2.2 Transform Domain-based Hashing Algorithms

2.3 Data Dimensionality Reduction-based Hashing Algorithms

3 Our Algorithm

3.1 Pre-processing

3.2 Local Feature Extraction via DCT

3.3 Tensor Construction

3.4 CP Decomposition

3.5 Hash Similarity Evaluation

3.6 Pseudo-code Description

4 Experimental Results

4.1 Robustness Analysis

4.2 Discrimination Test

4.3 Selection of Block Size

4.4 Selection of K Value

4.5 Selection of P Value

4.6 Selection of Color Space

5 Performance Comparisons

5.1 Classification Performance

5.2 Performance of Time and Hash Storage

5.3 Copy Detection Performance

6 Conclusions

Acknowledgments

References

Cited By

Index Terms

Recommendations

Robust Hashing with Deep Features and Meixner Moments for Image Copy Detection

Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

Incremental CP Tensor Decomposition by Alternating Minimization Method

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations