1 Introduction
The word ‘steganography’ is a combination of two words, ‘steganos’ (cover or protection) and “graphy” (to write). Thus, steganography literally means covered writing [
167,
258].
Digital steganography is concerned with hiding any type of data in any type of cover medium (image, audio, video, text) in such a way that no third party will suspect the existence of it [
74]. It is in this sense that steganography is different from cryptography, which aims to protect the content of a message rather than its existence [
167,
352]. For steganography one needs a cover medium (e.g., an image), a secret message to hide inside this cover, and a hiding procedure. Additionally, to increase security, one can encrypt the hidden information.
Steganalysis is concerned with detection of the presence of hidden information. It does not go further than binary classification based on a presented image telling the user whether steganography was used or not. Forensic steganalysis goes a step further and is concerned with derivation of any extra information about the hidden message (e.g., message length, embedding scheme used, and ultimately its content). See Figure
1 for a graphical summary of what makes up the process of steganography and the process of (forensic) steganalysis.
Steganography can be applied in many situations, including secure transmission of classified documents and securing online banking [
74,
260]. But steganography can also be used for criminal purposes. In the non-digital era the Nazis wrote over cover objects with invisible ink [
49,
135,
200]. In more recent years digital steganography examples of use include the preparations of the 9/11 attacks (as US officials and many articles claim), a child pornography network, and communication of 10 Russian spies in America (claims the FBI) [
74]. Given its malicious use, it is important to develop steganalysis methods to actively look for the hidden messages within digital media.
Every medium (video, audio, text, and image files) has particular steganography methods. Images are most popular [
169] as they possess a high degree of redundancy [
352] and have a widespread dissemination [
92]. This review will restrict itself to image steganography and image steganalysis.
Current reviews within steganography and steganalysis have several shortcomings because of which they do not provide a clear and complete overview. Many were published in or before 2014, 10 years ago; therefore, recent developments in the fields are not covered [
22,
41,
49,
80,
120,
155,
238,
297]. Other reviews have fewer than 60 references, which cannot provide a complete overview [
64,
76,
80,
158,
303]. Other reviews have shortcomings too. First, they often go into individual implementations of methodologies, which fails to give the reader an overview of the different types of implementations within each approach [
89,
92,
171,
262,
318,
330,
354]. Additionally, many review papers only cover some approaches (e.g., only spatial domain or only one approach from each domain), which inherently fails to give an overview of the entire workfield [
74,
76,
89,
92,
157,
187,
203,
330,
354,
387]. Lastly, very few papers provide a full overview of both steganography and (forensic) steganalysis, failing to show how these two are connected and influence each other [
76,
82,
92,
157,
166,
169,
171,
187,
203,
244,
271,
314,
318,
354,
387]. Our article aims to solve all of the above shortcomings. By providing the reader an overview of the complete steganography and the complete steganalysis field of research. It gives a comprehensive review by including many (though not all) sources about both steganography and steganalysis and linking them conceptually throughout the text and empirically through the performance-based topological sorting graphs.
Figure
2 shows our structural division of image steganography and steganalysis approaches. We divide steganography into two general approaches: adaptive approaches and static approaches. Adaptive approaches distinguish themselves through an additional statistical analysis to determine the location of the embedding. Both general approaches mostly use one of three different approaches toward embedding (hereforth called domains). The spatial domain approaches (
Least Significant Bit (LSB),
Pixel Value Differencing (PVD) and
Bit-Plane Complexity Segmentation (BPCS)) hide messages directly in the intensity values of pixels, while transform domain approaches (
Discrete Cosine Transform (DCT),
Discrete Wavelet Transform (DWT),
Integer Wavelet Transform (IWT) and
Discrete Fourier Transform (DFT)) first transform the image and then hide the message in the calculated coefficients. Deep learning approaches (
Convolutional Neural Network (CNN) and
Generative Adversarial Network (GAN)) use self-learned ways to embed a message. Steganalysis approaches, on the other hand, can be differentiated by whether they are steganography method specific or whether they can generalise over multiple steganography methods. Approaches that are not method specific are visual steganalysis, where one visually analyses an image, and universal steganalysis, where several methods are detected by one algorithm. Method-specific approaches are termed as targeted steganalysis, designed to detect one method, and signature steganalysis, which looks for signature-like patterns that can appear in images after embedding. This article is structured in accordance with Figure
2, where we will first go into the static steganography approaches, after which the statistical analysis of adaptive approaches will be discussed. The steganalysis part of this review is ordered by complexity (from least to most complex). Hence, visual steganalysis will first be discussed, and then the article will go into signature-based steganalysis, targeted steganalysis and finally universal steganalysis.
3 Steganography Methods
For steganography, four characteristics are used to evaluate a method [
19,
42,
43,
74,
157,
297,
352]. These parameters are:
—
Embedding capacity: the amount of data that can be hidden inside an image expressed in bits per pixel [
171], where higher is better.
—
Robustness: the amount of alterations to an image that a secret message can survive, where the higher this parameter, the better the hiding scheme.
—
Security: the resistance, tested through an attack resistance ability measure, a steganographic image shows towards steganalysis techniques, where higher resistance signifies higher security.
—
Imperceptibility: the ease with which the existence of a hidden message is detected based on inspection of the quality of the steganographic image alone, where the higher the better. This criterion is often measured through PSNR scores and/or SSIM scores.
Throughout sources, imperceptibility is often measured through the means of a PSNR score, though SSIM scores have more recently been used as well. Research suggests that SSIM is a better measure for imperceptibility than PSNR [
329].
Security is often measured by applying different steganalysis schemes to the stego content produced by a specific steganographic scheme. If the scheme is not detected (often) by the steganalysis algorithm, it is deemed to be secure.
All steganography approaches discussed in this section have the possibility to increase security by encrypting the secret information [
258]. However, this research focuses on steganography and steganalysis only and will not consider cryptography.
3.1 Spatial Domain Approaches
A spatial domain approach is one that directly alters the intensity of pixels. The most fundamental method is LSB steganography, but since its discovery more complex and adaptive models have been developed [
65,
157]. Spatial domain approaches have high embedding capacity [
155] as well as low computational complexity [
17]. The major drawbacks of spatial domain approaches are a lack in robustness [
17,
155] as well as limited security [
17].
The three approaches that were found most often in the literature were LSB, PVD and BPCS. Though other approaches for hiding information in the spatial domain exist (e.g., Spread spectrum [
100,
247]), they will not be discussed in this article, as their impact is limited.
3.1.1 LSB.
LSB steganography is a message-hiding approach that directly changes the bits that are least significant to the colour of a pixel: the last bit(s). More specifically, it replaces the values of existing bits with the binary value of the message [
372]. The LSB approach is the most conventional [
235,
330] and easiest to implement [
166,
200,
330] steganographic approach. However, LSB has low robustness [
166,
330] and the lowest security of all approaches [
166,
330].
Differences in LSB approaches exist and are mainly concerned with which bits’ value to alter. The most ‘basic’ method—
LSB replacement (LSBR)—changes the values of the bits in the leftmost top corner and goes right and down [
120]. Alternatively, Stego-key-directed LSB (sometimes called Random LSB, though it is far from random) uses a user-specified stego key to distribute the message over the image [
248,
259]. Another alteration is called
LSB matching (LSBM), which aims to ‘match’ the binary information to the pixels, so that as few pixels as possible are altered [
94,
216,
233,
251,
321,
335]. Edge LSB methods hide more of the secret message in the (sharpest) edges of the image [
46,
57,
159,
385,
413]. Other methods determine the number of bits to be hidden in a pixel based on its grayscale value, which increases the embedding capacity [
220]. Some methods allow the user of a tool control over where the embedding takes place [
168].
To be more imperceptible,
Optimal Pixel Adjustment Process (OPAP) was added to basic methods, which checks whether the change to a pixel falls within ranges that are considered to not decrease imperceptibility [
40]. Using a weight matrix was found to increase embedding capacity, security and imperceptibility [
45,
275,
375,
376], while directing embedding based on characteristic values per block was found to not decrease imperceptibility [
54]. Alternatively, shuffling [
398] or upscaling [
164] an image before embedding increases imperceptibility [
398]. Altering the image after LSB embedding was found to increase security [
117], while ensuring that the Local Binary Pattern of an image is preserved [
39] improves robustness.
For colour images, clever analysis of the use of
Red, Green and Blue (RGB) can help to increase imperceptibility [
343]. Alternatively, the use of one of the RGB values to encode whether a hidden message is present in the other two colour channels was implemented [
133]. Colour space conversion (to
Hue Saturation Intesity (HSI)) before hiding a message was also found useful [
260].
3.1.2 PVD.
To embed using the PVD approach, a cover image is partitioned into blocks of two consecutive non-overlapping pixels [
396]. For each block, the difference between the two pixels (the
difference value (DV)) is calculated and then classified into a specific set of ranges. Subsequently, the DV is replaced by part of the secret message, where the DV range dictates how many bits are hidden, ensuring more information is hidden in edges [
412]. Results are more imperceptible than LSB [
157,
166,
330], show a larger embedding capacity [
157,
166,
330] and provide higher security [
330].
Later alterations were mainly made to the selection of blocks. Allowing difference calculation between different directional neighbours (e.g., horizontal, vertical and/or diagonal) was found effective [
47,
134,
224,
291,
349,
359,
412]. Changing the size of the (sometimes overlapping) PVD blocks was shown to effectively improve results [
53,
225,
342]. Alternatively, ensuring the falling-off-boundary problem was no longer present [
46,
319], which is one of the major flaws of PVD, improves security [
53,
322]. Combining PVD with LSB approaches was also found effective [
156,
189,
397], where some use both methods within the blocks [
156,
189], while others use the DV to select which method to apply [
397]. Similarly, changing the embedding process based on local contrast of the pixel pair can optimise security and imperceptibility [
278]. Improving the selection of how to change the remainder of the PVD was also tried [
382]. Preservation of the natural histogram of an image increases security [
53,
446]. Others combine PVD with different methods to reduce distortion in the steganographic image [
320]. PVD has been paired with IWT [
349] and LSB [
156,
189,
397].
3.1.3 BPCS.
BPCS was inspired by the human visual system, which cannot perceive any shape information if it is in a very complex binary pattern [
173]. For BPCS, an image is divided into bit-planes [
266,
351]. A segmentation of each bit-plane is used and its complexity is computed, which determines the noisy blocks [
269]. These noisy blocks are then used to hide the secret information. For BPCS, the information to be hidden should be a complex binary pattern; if it is not it needs to be transformed into one [
269] or risk being easily detected. BPCS shows good embedding capacity [
351] and provides high imperceptibility [
351] but low security [
269,
357].
Increasing the robustness of BPCS with respect to lossy compression is possible [
273,
351]. Enhancing imperceptibility (and in some cases security) is possible by finding the most suitable method to categorize regions as noisy [
266,
267,
357,
358]. Security can be improved by using all bit-planes [
267]. Alternatively, maintaining the complexity distribution of the cover, so that the BPCS signature is not embedded, was found to increase security [
268]. BPCS was paired with IWT [
309].
3.2 Transform Domain-based Approaches
Transform domain approaches are those that encode the message in the transform coefficients. These approaches are more complex than the spatial ones. Alterations made in the transform domain are generally more robust [
157] and secure [
17]. However, the embedding capacity is lower [
157]. Computationally, these approaches are a lot more complex than spatial ones [
17,
74].
The four most used transforms to hide information in an image are DCT, DWT, DFT and IWT. Though there are more transforms that can be used for information hiding, these will not be discussed in this article.
3.2.1 DCT.
DCT transforms a signal into its elementary frequency components [
74]. It essentially breaks an image down into a summation of sinusoids of varying magnitudes and frequencies. DCT is the most frequently used transform domain method [
74]. For DCT embedding, one first breaks the image into blocks (usually 8x8); afterwards each block is centered around 0 by substracting 128 [
380]. DCT is applied to each block, followed by a quantization table-based compression of the block. Alterations to this quantization table can then hide a message. In general, DCT-based approaches have higher imperceptibility [
166] and robustness [
330] than spatial approaches but less embedding capacity [
166,
330].
Flexibility was added to DCT embedding by hiding different amounts depending on the values of the quantized DCT image [
15,
17,
90,
201,
264,
274,
305,
454]. Adding a (pseudo)randomness to the order of embedding was also found effective [
75,
264,
295,
392]. Ensuring that hiding starts with the least significant coefficients, working up to the most significant ones, was found effective [
264,
323,
345]. Having variable block sizes or randomly selecting an 8x8 block was found to optimize DCT embedding [
305,
346]. Clustering innocent (majority) and guilty (minority) actors together to determine where to use stego was found effective [
181].
Security can be increased in many ways. An example is opting to model the cover and then adhering to rules from this cover model [
67,
68,
118,
325,
326]. Another example is applying
Syndrome Trellis Coding (STC) to find the optimal embedding solution for each DCT block [
381] or, alternatively, through the use of hypothesis testing [
34]. Security can also be increased through ensuring feature vectors do not change because of embedding [
192] or through distortion minimization procedures for DCT [
151], sometimes with side information [
190,
317]. Additionally, using matrix encoding to hide information in the non-zero coefficients of the quantization table can increase security [
392]. Another example is aiming for the lowest amount of modifications while keeping flexibility in embedding locations [
295]. Another popular approach (Steghide [
142]) increases security by swapping DCT coefficients [
112]. Finally, security can be increased by embedding one message into multiple images [
182].
Optimization of imperceptibility can also be done in several ways. Examples are optimizing the segmentation process [
264] and embedding the message during quantization, not afterwards [
272]. Finally, one can postprocess the steganographic image to increase imperceptibility [
348].
To optimize security and robustness, [
106] opted to embed during compression, where the rounding of only the most uncertain (close to 0.5) coefficients are altered [
106]. Security and embedding capacity were improved by embedding in the difference between two DCT coefficients [
9]. DCT has been paired with LSB [
307].
3.2.2 DWT.
DWT is a hierarchical decomposition of an image, which is based on small waves (wavelets) of limited duration and varying frequency. It allows for the detection of the areas that are most effective for embedding [
166]. To create a DWT, horizontally neighbouring pixels’ sum and difference value are stored; all sums are stored in the left half of the image, while all differences are stored in the right half [
120]. Vertically the same principle is applied. The result is called the first-order 2-D Haar-DWT [
120]. Four sub-bands are created, which can be used to determine where in the image information could be effectively hidden [
12]. Compared to DCT, DWT is more robust to compression [
166] and has higher security [
166] but has a lower embedding capacity [
90].
Different approaches use different DWT coefficients and differ in how much they hide in each [
56]. Security can be increased by using Huffman encoding—which optimally encodes one symbol into one word, and one intensity value to one binary value—on the image [
263] and/or the secret message [
263,
306]. Different numbers of wavelets have been used; some embed in the first-order 2-D Haar-wavelet [
56,
132], while others use more or less than two wavelet transforms [
2,
202,
290,
373]. Alternatively, only particular wavelets can be processed multiple times [
12]. Imperceptibility was increased by using an SVD on each of the sub-bands [
88,
132]. Some use the adjacent DWT coefficient differences or the difference value between blocks to embed in an image [
202]. Imperceptibility was also increased by using Diamond Encoding to dictate where to embed [
8]. Only embedding in the DWT of skintones in an image was found to increase imperceptibility [
48]. DWT approaches have been paired with DCT [
124,
308,
327] and LSB [
21,
163].
3.2.3 IWT.
IWT approaches use the same transform as DWT approaches. The difference between the two is that IWT transformation maps integers to integers instead of continuous values [
254]. Due to this discrete method, IWT avoids the floating-point precision problems that DWT suffers from [
166]. When comparing imperceptibility using IWT and DWT, it was shown that IWT is superior [
261,
312]; IWT also shows high security [
166].
As for DWT approaches, differences exist in which sub-bands are used. Some use only one predefined sub-band (mostly HH) [
377], and some decide which to use adaptively based on the length of the message [
254]. Imperceptibility was increased by matching blocks of the image to the message so that as few pixels as possible are altered [
261]. Edge-based IWT embedding was also tried [
1]. Minimisation of the embedding error after embedding, using the
Optimum Pixel Adjustment (OPA) algorithm, was also tried [
261]. Security can be improved by using chaotic maps before embedding [
377]. Others used a metaheuristic optimization algorithm to determine the most efficient pixels to hide information in, which was found to increase security [
138]. Alternatively, security and imperceptibility can be increased by using a
Genetic Algorithm (GA) to determine which coefficients to alter so that the highest imperceptibility is achieved [
292,
316]. Diamond Encoding to dictate where to hide in an image [
436] was shown to improve imperceptibility and embedding capacity. IWT has been paired with LSB [
91,
377], BPCS [
309] and PVD [
349].
3.2.4 DFT.
DFT is the most complex computation of the transform approaches [
92]. In DFT an image is decomposed into its sine and cosine components [
172]—or the phase and magnitude representation [
136]. The magnitude components are most often used, as spatial shifts do not affect them [
136]. DFT steganography is the least used transform domain approach, due to a rounding error that decreases security [
135]. Embedding information in the DFT domain gives lesser imperceptibility [
166], security [
166] and embedding capacity [
166].
The addition of randomization to the embedding location increases imperceptibility [
3,
59]. Adding obfuscation was found to increase security [
310]. DFT has been combined with DCT [
136]
3.3 Deep Learning Approaches
Deep learning approaches use a deep network to hide information. In general, these approaches are more robust [
166] and imperceptible [
166]. Currently, the most used DL methods for image steganography are CNN [
33], though GANs are also used. The main advantage of GANs is that they can generate images, which eliminates the need for a cover image. Other approaches (e.g., Monte Carlo Tree Search [
255]) are also used but will not be further discussed in this review.
Currently one of the factors that halts further improvement of these schemes is the lack of realistic datasets. BOSSbase is currently the most commonly used [
126], yet this dataset does not go beyond grayscale images [
314] and is created by one steganography scheme, HUGO [
280], with the same embedding rate for all images. Though a newer database (ALASKA) was created, it is also not optimal as only three embedding schemes were used, all of which are (adaptive) transform methods [
66]. It was ensured that all images have one ‘steganalysis difficulty’ level by altering the embedding rate [
66].
3.3.1 CNN.
CNNs are particularly suitable for images. Use of CNNs can be more powerful than other methods because they do not require expert knowledge on steganography [
33]. Another advantage is that they can be retrained without needing to be redesigned, e.g., for a new tool. CNNs generally show high security [
33].
The major difference between methods lies in the architecture used. Some use vanilla CNNs [
13,
14,
239,
250,
334]; others use residual blocks [
399]. Up-convolutional and deconvolutional functionalities have also been added [
84,
435], as well as adversarial embedding [
20,
226,
367]. Others opted to increase the reversibility of neural-network-based schemes by employing
Invertible Neural Networks (INNs) [
232,
409], while exploiting autoencoders has also been researched [
29,
353]. Others use a
Neural Style Transfer (NST) network for embedding in the style features of an image [
116]. However, not everyone discloses details of their model architecture [
364].
Some of these methods lack in one of the evaluation criteria: robustness [
13,
84], embedding capacity [
250], security [
13,
364] or imperceptibility [
13,
364,
399,
435]. Others triumph in these evaluation criteria: robustness [
29,
239,
250,
353,
364,
399,
409,
435], embedding capacity [
14,
84,
232,
239,
353,
409], security [
14,
20,
116,
226,
239,
250,
334,
353,
367,
399,
435] or imperceptibility [
84,
116,
232,
250,
353,
409].
3.3.2 GAN.
GANs combine more than one neural network, where the idea is that networks have to compete. A generator creates something (a steganographic image), while the discriminator tries to detect whether it is genuine [
123]. For steganography the discriminator network is given steganographic images and cover images, which it must classify as such [
33]. The use of a discriminator should lead to more imperceptible and secure steganography, as the training process stops once it can do no better than random guessing [
33]. All advantages of CNNs over classical approaches previously mentioned apply to GANs as well.
Different GAN architectures were tried, where a number of algorithms need a cover image for embedding [
44,
140,
214,
246,
249,
361,
370,
415,
439,
440,
453], while others generate their own [
149,
213,
337,
379,
390,
425,
451,
452]. Changes to the GAN model were made for steganography. For example, inception modules [
440] were used to solve computational complexity and overfitting. Alternatively, use of the
Regular-Singular Method (RS method) realizes lossless data embedding through an invertible noise addition and a discrimination function [
44]. Others have adopted a U-Net-based GAN architecture [
415]. Another extension is the use of a CycleGAN architecture [
249], which essentially uses two complete GANs instead of one. Some exploit a
Deep Convolutional GAN (DCGAN) [
149], use more than one discriminator [
337] or add a noise layer network to the architecture [
453]. Another way the basic GAN structure was improved is through the use of both residual connections and dense connections [
439]. Alternatively, addition of a channel attention module [
361] or of a mutual information mechanism alongside a hierarchical gradient decay [
390] was also tried. Some use a GAN to hide information into contours rather than full images, to later (through another GAN) create a full steganographic image [
451]. Others made use of existing diffusion models [
425] or existing generative models [
452] for their embedding schemes.
A difference between methods is their performance on the four criteria. With regards to robustness, some methods performed very well [
214,
249,
425,
440,
453]. Some triumphed for the security criteria [
140,
149,
213,
214,
246,
249,
337,
361,
379,
390,
415,
425,
439,
440,
451,
453], while [
370] did not. For the embedding capacity criteria, [
44,
361,
390,
439,
452] performed very well as opposed to [
140,
149,
453]. For imperceptibility [
44,
249,
337,
361,
390,
453] perform well, while [
213] did not. It should be considered that many DL approaches are not (completely) reversible. One example [
440] created a hiding scheme that did not result in the exact same information. Other methods were developed that don’t have an extractor algorithm implemented [
214,
337]. All of these are most often not usable to communicate secret information.
3.4 Adaptive Approaches
Adaptive steganography approaches adaptively apply a spatial or transform domain approach. Adaptability means that the cover image features are used in the embedding process [
166]. It generally takes statistical features into account for embedding [
49], where the statistical analysis dictates where to embed. The method is characterized by selection of which pixels should be used to embed into, based on the statistical features [
157]. Typically a cost function is used to change each pixel, where the total cost is minimized [
328]. Adaptive approaches show good robustness [
49] as well as imperceptibility. Steganalysis struggles with adaptive methods, as they confine the modifications to the hard-to-model regions of the cover [
110], and thus security is generally high.
One difference between adaptive approaches is the method they use after the statistical analysis, where most methods limit themselves to a domain and not a single method. Examples are spatial [
96,
134,
143,
156,
188,
189,
211,
212,
218,
220,
235,
237,
257,
259,
261,
265,
266,
280,
311,
315,
328,
333,
359,
366,
386,
404,
413,
447] and transform [
23,
58,
81,
131,
144,
150,
165,
261,
276,
293,
304,
305,
323,
448]. Some can be applied to both domains: [
97,
148,
219,
250].
With regard to the statistical analysis of which pixels to embed into, differences exist. Some use the difference value between blocks of pixels (2x2, 3x1, etc.) to determine where and how much to embed [
134,
156,
188,
189,
235,
359,
413].
Adaptive Phase Modulation (APM) [
58] hides more or all information in edges—and thus noisy parts [
204,
385]—or uses adaptive sizes for blocks within an image [
304,
305,
416]. Conversion of the secret message to symbols, to then hide this in busier areas, was found to increase security [
447]. Others decide the amounts to embed based on analysis of blocks or bit-planes [
237,
266]. Model-based methods—modelling the cover image to adhere to its rules for embedding—have been found to increase security [
328,
333].
Some well-known algorithms focus on
distortion minimization (DM). HUGO [
280] does so by creating a weighted norm between higher-order statistics of pixel differences, with higher weights given to sparsely populated bins. UNIWARD [
148] computes costs in the wavelet domain and bases it on the relative changes of the wavelet coefficients in the HH subband [
148], while WOW [
143] keeps cost low by assessing post-embedding changes in all directions. Other approaches that focus on (sometimes asymmetric) distortion optimization have also been tried [
31,
96,
97,
144,
256]. Though critical review of DM methods has shown them to suffer from security flaws, a solution was given in [
196].
Some [
129–
131] implemented uniform embedding, which is based on the idea that by uniformly spreading the embedding alterations to the DCT coefficients, statistical changes can be minimized, which increases security [
131]. Additions to [
131,
148] spread the payload between the luminance and chrominance components to increase security [
360]. Using the intrinsic energy of the image to direct embedding [
150] improves security. Colour channel correlation preservation has increased security also [
366,
386]. Imperceptibility was increased by calculating error ratios for different hiding patterns and choosing the lowest error rate’s hiding pattern for embedding [
315]. It was shown that determining less predictable hiding places [
211] or clustering modification directions during embedding enhances security [
212]. Alternatively, using the first colour channel and the stego key to direct embedding was found to be effective [
259]. Security was found to increase when dividing a message over several images based on image texture features [
219]. Letting the modifying magnitude be determined by the quality factors of JPEG compression [
448] increases security. Increasing robustness so that steganographic messages can survive JPEG compression was effective [
449]. Employing the wet paper model to minimize the channel error rate was found to increase security and massively increase robustness [
434]. It was opted to ‘reset’ the last few DCT coefficients to randomly hide bits in these reset coefficients [
323]. Using gray code bit-planes rather than natural ones increases security [
265]. Clustering embedding positions to then smooth the image increases security [
404]. Machine learning was also used to determine embedding positions [
165], amongst which are CNNs [
250,
311], GA [
23,
24,
168,
253,
276,
293,
294,
331,
383,
388,
426], reinforcement learning [
256,
365], fuzzy neural networks [
81], and
Particle Swarm Optimization (PSO) [
81,
218,
257,
261].
4 Steganalysis Methods
Steganalysis is concerned with detection of hidden information (i.e., binary classification), whereas forensic steganalysis aims to provide further information on the hidden message. Steganalysis has received significantly less attention from academia than steganography [
74]. One of the biggest challenges within steganalysis is the
Cover Source Mismatch (CSM), which says that when a steganalysis scheme is trained on material from one source, its accuracy will drop on another [
119,
198]. Papers have aimed to detect parts of the image preprocessing pipelines to aid in resolving CSM [
284]. It is important to know whether the CSM exists when interpreting results of individual papers.
In order to ensure an overview of different approaches, no separate sections will be used to discuss forensic steganalysis approaches. This was decided because most forensic steganalysis approaches are similar to those taken for steganalysis, which would yield doubling of information within the review and provide the reader with less overview of the possible approaches. However, to allow differentiation between the two research areas, all forensic steganalysis approaches will also be mentioned separately.
Within this review an overview of the performance of steganalysis methods is given, where the reported performance in the literature is summarised in tables, which can be found in the supplementary material. A summary of all results is given in topological sorting graphs.
4.1 Visual Image Inspection
This is the simplest form of steganalysis [
99], where one visually inspects the image with the naked eye [
169]. As most new embedding schemes do not result in perceptible quality degradation, visual inspection has a questionable reliability whilst also being hard to automate [
99]. Visual inspection has gotten less reliable [
169]. However, when the parts of the image with no embedded information are removed, visual inspection can detect steganographic content [
169]. For visual inspection several features can be used: file size differences, number of unique colours or the distribution of 1s and 0s [
169]. However, it only works for the most simple steganography algorithms; moving from basic LSB to a slightly improved LSB makes visual inspection ineffective [
169,
395].
4.2 Signature Detection
In signature-based steganalysis the tool itself is analysed to see whether it would create a specific signature, which is then looked for in suspected images. An advantage of signature detection is its simplicity, alongside the promising results for sequentially embedded messages. However, [
271] showed that signature-based steganalysis is hard to automate and that the reliability of these schemes is often questionable. When a signature is found, it does not guarantee the usage of steganography nor of a specific tool [
18] due to the existence of many false positives. For signature steganalysis a signature can be two different things.
First, it can be a piece of information that is added by a tool [
169]; this is usually easily detected and provides information on the used tool. Examples include Jpegx, who adds 5B 3B 31 53 00 [
139,
169], and Hiderman, who hides the string CDN alongside the hidden information [
169]. Other possible signatures, though less significant because they are naturally more prevalent, are changes in metadata [
5]. Examples include Outguess
6 and VSL,
7 which always set the QF to 75 and 100, respectively [
5].
Alternatively, a signature can be a pattern within the image that is repeated by a stego tool during embedding [
169,
170]. When found, this pattern can help to detect steganography while also identifying the tool used [
162]. This type of signature differs from the previous type in that it was not purposefully hidden but occurs due to the degradation or unusual characteristics that appear during embedding. To find this pattern, it is necessary to create numerous cover–stego pairs and compare these until a pattern emerges [
161]. One example is Hide and Seek, whose steganographic image palette entries are all divisible by 4 [
162,
169], while Gifshuffle produces images that contain a randomized palette as the signature [
99]. YASS [
346] has a signature because the embedding host block location was not completely randomized [
210]. A signature specific to BPCS steganography is present in the complexity histogram where a valley is found [
267].
Through the usage of fuzzy rules signatures were found for [
324] YASS [
346], MB [
325] and PQ [
106]. Another signature is the
close colour pair (CCP), where after LSB an image holds more CCP [
111]. An alternative is the count of the neighbouring colours rather than the CCP [
393]. F5 [
392] was shown to add a characteristic header field [
18]. Embedding in JPEG images through spatial methods also leaves a signature as the quantization table will differ from the standard one [
99,
103]. You can also find a signature within the features of the colour spaces of an image [
126]. Alternatively, some have aimed to generalise the search for signatures so that extraction of signatures was automated [
18].
4.3 Targeted Steganalysis
Targeted steganalysis uses probabilistic models to differentiate between steganographic and cover images [
169]. Extensive analysis of methods shows which statistical properties change due to embedding [
169]; therefore, one needs an in-depth understanding of the hiding scheme [
169]. Thus, targeted steganalysis is method specific in nature. These method-specific schemes typically result in accurate detection [
271].
4.3.1 LSB.
One of the most used approaches for LSB steganalysis is histogram analysis, which constructs a histogram of the questioned (difference) image [
35,
36,
60,
113,
115,
174,
217,
228,
402,
403,
438,
444,
445]. Another approach investigates the changes that occur to colour pairs after LSB embedding, i.e., two consecutive colours, where one colour is present in two pairs [
105,
175,
177,
363]. More general analysis of differences between neighbouring pixels was also implemented [
111,
393,
442]. Alternatively, one can base detection on a
Weighted Stego (WS) image with three components: cover pixel prediction, least-squares weighting, and bias correction [
26,
101,
180,
429]. Some have used
similarity weights (SWs) between pixels (and channels) for detection [
340,
341]. The seventh and eighth bit-planes, where statistical differences occur after embedding, were also found helpful [
178,
179]. Use of higher-order statistics to derive a detection equation improved results [
85]. In [
450] a relation was found between the gradient energy forms and the embedded message’s length. Additionally, combining several separate methods into one can strengthen detection [
25]. It was also shown that entropy and joint entropy can be combined for successful LSB steganalysis [
62].
One of the benchmarking methods is RS steganalysis. It divides image pixel groups based on whether noise increases or decreases after flipping the LSB; the occurrence of both situations can then be used to detect LSB [
102]. RS was improved by using different (sometimes dynamic) masks [
175,
177]. Inspired by RS, [
87] found that LSB heavily influences the statistics of sample pairs of signals [
87], which was further improved in [
176,
231]. Additionally, cardinality changes within subsets of pixels can be used for steganalysis [
86]. It was shown that embedding in stego and cover images gives different alteration rates [
152].
The search for an optimal feature set for steganalysis having the richest features with the lowest dimensions—so-called
Rich Models (RMs)—has been extensively researched. For RMs some of the foundational work can be found in [
78,
107], though many different feature sets were exploited [
6,
28,
38,
63]. Some feature extraction methods and domains are more popular, e.g., using information from residuals [
121,
424] or, alternatively, feature extraction from regions that are more likely to be used for embedding [
368,
369]. Some RM methods do not focus on the extraction of features but on the optimization of the classification afterwards [
208,
234]. One alternative to RM is
Model-Based (MB) detection, where cover images are modelled to test the suspected images against this model [
69,
70,
72,
95].
Some have opted to use machine learning LSB steganalysis. These methods use different architectures and types of machine learning, amongst which are vanilla neural networks [
252] and CNNs with different specifications [
7,
79,
298–
300,
313,
374,
391,
400,
401,
405,
408,
414,
417–
419,
427,
441],
dynamic evolving neural fuzzy inference system (DENFIS) [
229], support vector machine recursive feature elimination [
229], a Siamese CNN [
419], residual blocks [
405], federated learning [
414] and ensemble classifiers [
108,
407].
Forensic LSB steganalysis has been applied by [
26,
85,
101,
102,
113,
176,
180,
231,
363,
427,
429,
444,
445,
450]. Additional information in the form of message length (or embedding rate) [
26,
85,
101,
102,
113,
176,
180,
231,
429,
444,
445,
450], pixel-level detection [
427] and threshold used for embedding [
363] has been investigated.
4.3.2 PVD.
Most PVD steganalysis methods are histogram-based methods, where the specific histogram each one uses differs [
215,
431]. Inspired by this, a scatter plot based on theoretical and observed frequencies of the DV between pixel pairs was used [
223] to show correlations. WS image has been applied as well [
437].
Within PVD steganalysis, additional information has also been gathered from the images that are analysed. Forensic PVD steganalysis in the form of calculating an embedded message length (or embedding rate) has been applied by [
431,
437].
4.3.3 BPCS.
For BPCS a wavelet characteristic function has been implemented [
230]. An alternative effective feature for steganalysis is the bit-plane-block complexity difference sequence [
362].
4.3.4 DCT.
Stegdetect [
296], a commercial tool, can detect DCT embedding by exploiting a version of
Pairs of Values (POVs) [
395]. Similarly, histogram-based methods [
104,
160,
205,
206,
430,
443] and the adjacency matrix of DCT coefficients can be used for steganalysis [
199]. Model-based methods, which opt to model the cover medium used to analyse differences between this model and a suspected image, have been applied [
5,
30,
301,
302,
371,
455]. [
394] adapted several higher-order statistical LSB steganalysis methods [
87,
101,
102,
105] to the DCT domain.
RMs have also been applied to DCT steganalysis [
4,
147,
193,
285,
288,
347]. Many RM methods extract the features through use of a Markov process [
51,
114,
282,
339], while others derive them (amongst others) from histograms [
52,
77], the wavelet domain [
241,
242,
456] or the cosine domain [
98,
146,
195,
227,
245,
281]. Some RM-based methods do not focus on feature extraction but on the optimization of the classification method afterwards [
128,
183,
197,
286]. Additionally, determining the most dominant features to reduce dimensionality has also been applied [
243].
Recently, machine learning has been applied to DCT steganalysis. These methods include CNNs with different specifications [
55,
153,
406,
423,
432], applying transfer learning to existing architectures [
420,
422], different detectors for different quality factors [
421] and PSO [
336].
Forensic DCT steganalysis has been applied by [
5,
104,
199,
281,
371,
394,
430,
443]. Additional information in the form of message length (or embedding rate) [
104,
371,
394,
430,
443], which embedding method was used [
5,
281] and the change rate [
199] has been investigated.
4.3.5 Other Approach-specific Steganalysis Methods.
No methods were found that are targeted only at DWT-, DFT-, IWT-, CNN- or GAN-based steganography methods.
4.4 Universal Steganalysis
Within this review, a universal (or blind) approach will be categorized as such when it is able to detect schemes from at least two different domains from spatial, transform and deep learning. A method that detects, for example, different DWT steganographic schemes has been categorized as targeted. Note that this sometimes differs from other sources’ classification of universal steganalysis approaches. Consequently, some research that was titled towards universal steganalysis is regarded as targeted for this review.
One of the first universal steganalysis approaches [
395] evolved around POVs, which are pixel values, quantized DCT coefficients or palette indices that get mapped to each other through LSB flipping. This distribution can then be used for steganalysis [
395]. A more general analysis of the differences between neighbouring pixels has also been found effective [
127,
207]. Since then, many other approaches have been implemented. Histogram analysis was found useful [
137], and the seventh and eighth bit-planes, where statistical differences occur between stego and cover images, have also been exploited [
10]. Using compressive sensing to distinguish stego from cover has also been implemented [
277]. Fusing several individual methods has shown its merit as well [
185].
RMs are the most researched universal detection scheme [
11,
37,
125,
145,
194,
221,
384,
411], where some used (amongst others) features from histogram analysis [
61,
83,
209], from wavelet domain [
93,
122,
236,
240,
338,
410] and from Markov chains [
141,
332,
355]. Analysis of the number of changes to a file based on extracted features was successful [
287].
Recently machine learning has been applied to universal steganalysis. Different architectures were implemented: regression trees [
378], CNNs [
389] and a CNN with residual blocks [
27,
344], which was further investigated in [
32].
Some research has not focused on the construction of new methods but on further testing of existing methods [
175,
184,
186].
Forensic universal steganalysis has been applied by few researchers [
287,
344,
378]; additional information was looked for in the form of number of embedding changes [
287,
378] and detecting the embedding method that was used [
344].
4.5 Summary of Steganalysis Performance per Steganography Method
In this section, topological sorting diagrams are shown to depict the relative performance of different steganalysis approaches previously discussed on some of the most-targeted steganography methods. The performance of all methods is taken from their publication; tables showing the reporting of these results in the literature can be found in the supplementary materials. It is important to realize that these diagrams do not depict numerical differences but show a hierarchy based on reported results. That is, when one method is shown to outperform another, it will be placed above it, but the distance between two methods does not depict a numerical difference between their performance. Caution should always be taken towards these results, as ordering is purely based on the results reported. This means that not all results are formulated based on the same dataset. Most studies created their own steganography dataset and then reported results on it. This means that reported results can vary for different datasets, which has the possibility to alter the topological sorting diagrams. Still the diagrams give a good impression of relative performance of different steganalysis schemes.
Within all topological sorting depictions in this section, three different types of relationships are distinguished: method A can outperform method B, method A can be comparable to method B and method A can be outperformed by (underperform) method B. Interesting to note is the difference in the way results are reported, where in some cases one method claims to be comparable to several others, even though these others were found to outperform or underperform each other. This would suggest that being comparable to all of them would not be possible. This could be due to differences in datasets but also due to the interpretability of ‘comparable’ and ‘outperforms’. Some researchers call an 0.05% accuracy increase ‘outperforms’, while others will call this a ‘comparable’ result.
The spatial domain steganography methods that have the most reported results (HILL, S-UNIWARD, HUGO and WOW) are shown in Figure
3. The transform-based steganography methods with the most reported results (JSteg,
8 JPHide&Seek,
9 Outguess, nsF5, MB, F5, J-UNIWARD, StegHide and UED) are shown in Figures
4 and
5.
A few notable observations can be made from the different topological sorting graphs. First of all, the references that mention their performance (and sometimes that of others) differ greatly per method, as does their place in the ranking (which is rarely the same). From this, we can conclude that no steganalysis method exists that is applicable to the majority of the steganography methods.
Another observation is the number of methods that do not compare their performance with any other (in the figures these have no outgoing arrows). Therefore, these methods cannot be compared and their relative performance is unknown. Thus, it is strongly recommended for future research to always compare new schemes with previous ones. By doing so, relative performance can be more readily judged, which increases the potential usefulness of a new publication.
It is very noticeable that for some steganography scheme papers, they compare their performance to many other steganalysis schemes, which results in a form of ranking when observing the topological sorting diagrams. However, for some schemes (e.g., StegHide or JPHide), comparison in papers is more often one on one, with no later additional comparisons. This makes comparison of relative performance difficult and results in topological sorting diagrams that are not very informative.
Lastly, it is interesting to note that no contradictions within the orderings were found. That is, no method specified itself to outperform another was later found to not outperform that method at all. This indicates there was no difference in relative performance of the methods, despite differences in settings. This strengthens the results depicted in the topological sorting diagrams.
5 Discussion and Practical Implications
Steganography has been researched much more widely than its counterpart, steganalysis. The most popular steganography methods are, as can be seen from the topological sorting graphs, mostly (adaptive) transform domain methods. This is likely because these methods score a lot better on the four evaluation criteria for steganography than older and more basic schemes like LSBR and LSBM. However, newer steganography methods, especially the deep-learning-based methods, show greater potential. At the same time, these newer methods suffer from one great downside compared to the more often used methods: the deep-learning-based methods are often non-reversible. This ensures that despite the good performance of these methods, they are in their current form not at all usable in practice (if no one can extract the hidden information, there is no use for the steganographic method). Therefore, steganography research should focus on enabling the practical use of their developed schemes by ensuring reversibility at all times. Without this reversibility, the potential that new schemes have cannot be fully used, thus stopping these schemes from finding practical applications.
For steganalysis most effort has been put into the construction of Rich Models. Though these models have achieved good performance on the steganography scheme they were designed for, they cannot easily generalise to more methods (as can be seen in Figures
3 through
5). Machine-learning-based steganalysis has the potential to overcome this generalisation problem; however, more research needs to be conducted for it to reach this potential. This future research area could take steganalysis from a mainly academic study area to it reaching practical applications, for example, within law enforcement agencies.
For practical applications of steganalysis it is of importance to focus on two separate goals. First, universal detection methods that truly detect the presence of most steganography methods are needed. Currently, steganalysis methods are shown to not generalise to more methods, as can be seen from Figures
3 through
5, where no single method comes out on top in most of the topological sorting graphs. Second, forensic steganalysis should move further towards the extraction of a message than is currently aimed for, as the mere detection of the presence of steganography does not answer most of the questions a practical application of steganalysis would ask. The implications of not taking these next steps are that steganalysis will continue to lack connection to the institutions and people that would be able to use breakthroughs in steganalysis in their work and practically apply this research. When these next steps are taken, it can greatly improve the practical usability of (forensic) steganalysis (e.g., within law enforcement).
With this survey a clear basis for future research into steganography and steganalysis is provided.