On Detection of Median Filtering in Digital Images: A B A B
On Detection of Median Filtering in Digital Images: A B A B
ABSTRACT
In digital image forensics, it is generally accepted that intentional manipulations of the image content are
most critical and hence numerous forensic methods focus on the detection of such ‘malicious’ post-processing.
However, it is also beneficial to know as much as possible about the general processing history of an image,
including content-preserving operations, since they can affect the reliability of forensic methods in various ways.
In this paper, we present a simple yet effective technique to detect median filtering in digital images—a widely
used denoising and smoothing operator. As a great variety of forensic methods relies on some kind of a linearity
assumption, a detection of non-linear median filtering is of particular interest. The effectiveness of our method
is backed with experimental evidence on a large image database.
Keywords: digital forensics, median filter, processing history, image processing
1. INTRODUCTION
Digital image forensics has recently become a widely studied stream of research in multimedia security. Ubiqui-
tous digital imaging devices and sophisticated editing software gave rise to the need for forensic toolboxes that
can blindly assess the authenticity of digital images without access to the source image or source device1, 2 or the
aid of an auxiliary watermark signal.3 When reasoning about the authenticity of digital images, it is necessary
to have at least a rough working definition of what constitutes a manipulation and what is considered to be a
‘legitimate’ post-processing.4 It is generally accepted that intentional manipulations of the image content (e.g.,
copy & paste operations or image splicing) are more critical and hence numerous forensic methods focus on de-
tection of such ‘malicious’ post-processing. However, it is also beneficial to know as much as possible about the
general processing history of an image, including content-preserving operations, such as compression,5 contrast
enhancement,6 sharpening,7 and denoising.
Even though such image processing primitives typically do not harm the authentic value of an image, they are
of interest in a forensic examination of an image since they can affect forensic methods in various ways. First, the
actual state of an image prior to manipulation may influence the set of tools we are using to analyze the image or
our interpretation of the evidence derived from these tools. This is related to the field of steganalysis, where, for
instance, the choice of a suitable spatial-domain detector should be made conditional to the cover properties.8
Second, certain post-processing steps may interfere with or diminish subtle traces of previous manipulations and
thus decrease the reliability of forensic methods.
In the course of this paper, we shall focus on the median filter, a well-known denoising and smoothing
operator.9 In the line with what was mentioned above, we believe that a detection of median filtered images is
of particular interest since a great variety of image forensic techniques rely on some kind of linearity assumption.
Because median filtering is a highly non-linear operation, it is likely to affect the reliability of these methods. A
typical example is the detection of resampling,10 which employs a local linear predictor of pixel intensities and
was shown to be vulnerable to median filtering.11
The rest of this paper is organized as follows: Starting from a short review of basic properties of the median
filter in Sect. 2, we will center on the so-called streaking artifacts in Sect. 3 and show how this characteristic can
actually be used to detect median filtering in bitmap images. Since forensic methods are generally desired to be
robust against lossy post-compression, Sect. 4 will focus on detection of median filtering after JPEG compression.
Both sections are underpinned by detailed experimental results from a large database of images. Finally, Sect. 5
concludes the paper.
Further author information: matthias.kirchner@inf.tu-dresden.de, fridrich@binghamton.edu
2. MEDIAN FILTERED IMAGES
Given a set of random variables X = (X1 , X2 , . . . , XN ), the order statistics X(1) ≤ X(2) ≤ · · · ≤ X(N ) are
random variables, defined by sorting the values of Xi in an increasing order. The median value is then given as
(
X(K+1) = X(m) , for N = 2K + 1
median(X ) = (1)
1/2 X
(K) + X(K+1) , for N = 2K,
where m = 2K + 1 is the median rank. The median is considered to be a robust estimator of the location
parameter of a distribution and has found numerous applications in smoothing and denoising, especially for
signals contaminated by impulsive noise.9
For a grayscale input image with intensity values xi,j , the two-dimensional median filter is defined as
yi,j = median(xi+r,j+s ) ,
(r,s)∈W
where W is a window over which the filter is applied. For the rest of this paper, we assume symmetric square
windows of size M × M with M = 2L + 1, i. e., the median rank m equals m = (M 2 + 1)/2. This is probably
also the most widely used form of this filter.
In order to describe some characteristics of median filtered images and compare the median filter to other
filters, it is useful to study the output distribution of the median filter. Due to its non-linearity, however,
theoretical analysis of the general relation between the input and output distribution of the median filter is
highly non-trivial. For this reason, it is often assumed that the input samples are i.i.d. The general cumulative
distribution function (CDF) FY for output samples yi,j and i.i.d. input samples xi,j with CDF FX is given by12
M2
M2
X 2
FY (y) = [FX (y)]k [1 − FX (y)]M −k ,
k
k=m
A special yet interesting case is the sample median of i.i.d. input samples following a normal distribution,
xi,j ∼ N (µ, σ), which was shown to asymptotically (as M → ∞) follow a normal distribution again,13, 14
r
π σ
yi,j ∼ N (µ, σm ) , where σm = · .
2 M
Since, in filtered images, pixels in a close neighborhood originate from overlapping windows, they are corre-
lated to some extent and thus the joint distribution of adjacent pixels is generally of interest. For an M × M
median filter with i.i.d. input FX (x), Liao et al.15 derive an expression for the bivariate distribution of two
output pixels yp and yq (H pixels window overlap), FY (yp , yq ). The formula, which can be found in Appendix A,
highlights how cumbersome the theoretical description of median filtered images can become even under the
unrealistic assumption of i.i.d. pixel intensities.
For this reason, many studies in the literature have focused on more specific features of interest when analyzing
the median filter. As such, the median filter was found to preserve edges better than, for instance, the moving
average filter.16 It is also known that median filtered images exhibit regions of constant or nearly constant
intensities.17 A further stream of research addresses the so-called roots of the median—signals which are invariant
to median filtering—as well as the convergence of arbitrary signals to such roots.18
3. STREAKING ARTIFACTS
One of the main differences between the median filter and other types of linear and non-linear filters is that,
for an odd filter dimension, its output samples are directly drawn from the set of input samples, cf. Eq. (1).
For discrete-valued signals, this means, in particular, that no rounding to integers has to be performed after
filtering. Because of overlapping filter windows, there exists a non-zero probability that the output pixels in a
certain neighborhood originate from the same position of the input image. This effect is called streaking and
was quantitatively analyzed by Bovik.17 For continuous-valued i.i.d. input samples, he derived expressions for
the probability that two pixels with a certain distance have equal intensity. While being a function of the filter
size, it turns out that these probabilities are independent of the actual distribution of the input. Tables with
probabilities for different filter sizes and pixel distances can be found in the original publication.17
Obviously, the presence of such a specific ‘probability pattern’ would be a very strong indication of previous
median filtering. However, while the reported distribution-independence of streaking artifacts in continuous-
valued i.i.d. signals is based on the zero probability of two input samples being equal, typical digital images
have discrete-valued pixel intensities drawn from a finite alphabet. Here, the streaking probabilities become
distribution-dependent because the quantized intensities can a priori be equal-valued. The probability that two
integer grayscale output pixels yp , yq have equal intensity can generally be written as
X X
P0 = Pr(yp = yq ) = FY (i, i) − FY (i − 1, i) + FY (i, i − 1) − FY (i − 1, i − 1) , (2)
i i
where, for i.i.d. input samples, FY is the joint distribution given in Eq. (10) in the appendix. Figure 1 demon-
strates the distribution-dependence by plotting the probability P0 for two adjacent output pixels (for instance
the horizontal neighbors yi,j and yi,j+1 ) as a function of the standard deviation of quantized i.i.d. Gaussian input
samples. As to be expected, P0 considerably increases for median filtered images. Due to its larger support, the
5 × 5 filter results in stronger artifacts than the 3 × 3 filter. Similar graphs can be obtained for two output pixels
that are not directly adjacent but still originate from overlapping windows.
(k,l)
di,j = yi,j − yi+k,j+l , (3)
as a detection statistic to distinguish filtered from non-filtered images. While we expect median filtered images
to result in ratios % 1, non-saturated originals will yield rather small ratios % ≈ 1. Note that a strong relative
∗
A description of the image database is given in Sect. 3.2.
1
original 6 original
3 × 3 median 3 × 3 median
5 × 5 median 5 × 5 median
0.8 5
0.6 4
density
P0
3
0.4
2
0.2
1
0 0
(1,0)
Figure 1. Streaking probabilities P0 (direct vertical or Figure 2. Density estimates for relative frequencies h̃0
horizontal neighbors) for quantized i.i.d. Gaussian samples from 6500 original images and their 3 × 3 and 5 × 5 median
input samples with variance σ 2 . filtered versions, respectively.
increase is indeed characteristic for the median filter, since, in contrast to conventional linear and non-linear
smoothers, no rounding to integer values has to be performed after filtering.
Apparently, strong saturation effects in the original image will render the detection of median filtered images
by means of % unreliable. To obtain a more robust discriminating feature, we divide the image under investigation
into the set B of non-overlapping blocks of dimension B × B. By determining %b as the ratio of histogram bins
h0 and h1 from the b-th block, the influence of saturated image blocks can be reduced by taking the weighted
median
%̂ = median (wb %b ) , (5)
b∈B
as a robust detection feature. Here, the weights wb function as an attenuation factor for saturation effects. In
the course of this paper, we set wb to
h0
wb = 1 − , (6)
B2 − B
giving less weight to strongly saturated blocks.
3 × 3 median
3 × 3 median
0.4 0.4 full image
%̂ (block)
B = 256
% (full)
B = 32
0.2 0.2
0 0
Figure 3. Detection results for 3×3 median filtering. ROC Figure 4. Detection results for 3×3 median filtering. ROC
curves for % and %̂, (B = 64). The block based approach is curves for % and %b for varying block sizes. Smaller blocks
more robust to false alarms. are less robust to local image characteristics.
1 1
0.98 0.98
true positive rate
3 × 3 median 5 × 5 median
0.94 0.94
B = 256 B = 256
B = 64 B = 64
B = 16 B = 16
0.92 0.92
0.9 0.9
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
false positive rate false positive rate
Figure 5. Detection results for 3 × 3 (left) and 5 × 5 (right) median filtering, respectively. ROC curves for different
block sizes B. Vertical differences with lag (k, l) = (1, 0). Smaller block sizes increase detection performance. (Note the
different scaling of the axes compared to all other figures.)
The opposite is true when we consider the robust estimate %̂ over all B × B blocks of the image. The weighted
median effectively attenuates too strong influences of local image characteristics, such as saturation. Figure 5
compares ROC curves for %̂ for the detection of 3 × 3 (left) and 5 × 5 (right) median filtering, respectively, for
varying block sizes. Observe how the detectability grows with the decreasing block size B. However, while the
advantage of using blocks with B = 64 over those of dimension 256 × 256 is considerable, the performance gain
for smaller block sizes is diminishing. Our experiments showed that B = 64 is a reasonable compromise between
detectability and the computing time needed to determine the median over all blocks.
In general, we found that there is a fair amount of variation in the values of %̂, mainly reflecting the different
sources of our test images. Besides the already mentioned saturation effects, for instance, the noise level of the
digital camera can be a very influential factor. Figure 6 gives an idea of how %̂ is distributed over our test database
by plotting the values for all original and equivalent 3 × 3 median filtered images. The best distinction between
originals and filtered images was found for the approximately 1600 images corresponding to the rightmost part
of Fig. 6. These images are from Ker’s ‘gold standard’ image set,8 which was built from RAW camera images
by switching any denoising off. As a consequence, %̂ is particularly low for the originals. A visual inspection of
those originals with a relatively large %̂-value revealed the presence of large homogeneous regions. Despite the
6
3 × 3 median
originals
4
%̂
Figure 6. %̂-values for each 6500 original (solid squares) and 3 × 3 median filtered images (gray crosses), (B = 64). The
variation in the measured values reflects the different sources of the images.
Table 1. Minimum average decision error Pe of median filtering detectors % and %̂ for
filter sizes 3 × 3 and 5 × 5. The results were obtained for 6500 never-compressed images
and (k, l) = (1, 0).
generally very good detection performance, this might indicate that the weighting factors wb could be further
refined in future work.
Since streaking artifacts are believed to be a specific characteristic of median filtered images, our detection
feature should be well able to distinguish median filtered images from those processed with alternative smoothers.
Figure 7 reports exemplarily detection results for this scenario. More specifically, the ROC curves for the
discrimination between 3 × 3 median filtered images and images processed with a 3 × 3 mean filter, as well as
a 3 × 3 binomial filter are depicted. The curves indicate that streaking is indeed characteristic for the median
filter, whereas a comparison with the mean filter gives slightly worse performance.
Note that we also investigated alternative lags in the computation of the first-order differences. Here, differ-
ences between pixels from a direct neighborhood, i.e., |k|, |l| ≤ 1, generally discriminated better than larger lags,
whereas horizontal/vertical differences always yielded preferable detection results. As to be expected, we could
not find a considerable deviation from the above reported results among all of the direct horizontal and vertical
differences (k, l) ∈ {(1, 0), (0, 1), (−1, 0), (0, −1)}.
In general, all of the above reported results indicate that our relatively simple detection feature based on
the ratio of first-order difference-histogram bins is a very reliable measure to detect median filtering in bitmap
images. Table 1 summarizes our findings in terms of the minimum average decision error under the assumption
of equal priors and equal costs,
Pe = min 1/2 PFP + (1 − PTP ) , (7)
where PFP and PTP denote the false positive and true positive rates, respectively. However, we admittedly have
to say that bitmap images are only half of the overall picture. In the following, we will therefore discuss why %̂
is not robust to JPEG compression and present an alternative approach.
1 1
3 × 3 median, B = 64 3 × 3 median, B = 64
0.4 0.4
JPEG 100
3 × 3 mean
JPEG 95
3 × 3 binomial
JPEG 90
0.2 0.2
0 0
Figure 7. Detection results for 3 × 3 median filtering, Figure 8. Detection results for 3 × 3 median filtering after
taking images obtained by 3 × 3 mean filtering and 3 × 3 JPEG post-compression with varying compression quali-
binomial filtering as ‘originals’ (B = 64). Strong streaking ties (B = 64). JPEG qualities below 100 render median
artifacts are not present in linearly smoothed images. filtering undetectable.
Since median filtering affects, in particular, small absolute differences, it is feasible to limit the range of considered
differences to |δn |, . . . , |δ0 | ≤ T . This also helps keep the dimensionality of the model low, which is important for
practical implementation.19 These transition probabilities are then taken to form a D-dimensional feature vector
F, D = 2 · (2T + 1)n+1 , by averaging the four horizontal/vertical and the four diagonal matrices, respectively:
F (h/v) = 1/4 M (1,0) + M (−1,0) + M (0,1) + M (0,−1)
δn ,...,δ0
F = (F(h/v) , F(d) ) with elements δn ,...,δ0 δn ,...,δ0 δn ,...,δ0 δn ,...,δ0
. (9)
F (d) (1,1) (−1,−1) (1,−1) (−1,1)
δn ,...,δ0 = /4 Mδn ,...,δ0 + Mδn ,...,δ0 + Mδn ,...,δ0 + Mδn ,...,δ0
1
Note that, in some sense, the SPAM features can be seen as a generalization of the %-based approach from the
previous section, where first-order differences are modeled as a zeroth-order Markov chain.
Once the features have been calculated, they are fed into a suitable classifier trained on a set of original
(JPEG) images and median filtered images.
using five-fold cross-validation. In the standard setup, we use SPAM features corresponding to a second-order
Markov model, extracted from images of size 512 × 512 with a threshold T = 3.
Figure 9 shows the ROC curves for the detection of 3×3 median filtering using SPAM features and soft-margin
SVMs as described above. More specifically, the top left graph depicts the results for JPEG post-compression
qualities in the range of {90, 80, 70}. The detection is relatively very reliable for moderate JPEG compression.
As expected, the detection performance decreases when lowering the JPEG quality because stronger quantization
increases the number of small absolute first-order differences in the original images.
The detection results obtained from the SPAM classifier are subject to a number of different parameter
choices. The threshold T controls the magnitude of the maximum absolute differences which are to be taken
into consideration. To see how this parameter influences the detection performance, the top right graph of Fig. 9
shows the ROC curves for a fixed JPEG compression with quality 80 and T ∈ {1, 2, 3}. While the results for
T = 3 and T = 2 are comparable (with T = 2 giving slightly worse results), the detectability considerably drops
when setting T = 1. This is again expected because a lower threshold limits the classifier’s models of original and
filtered images, giving rise to ambiguities. Similar results were obtained for different post-compression qualities
as well.
Since median filtering is expected to affect horizontal/vertical and diagonal differences in a similar manner,
we further tested whether the diagonal features F(d) provide some additional information beyond what is already
captured by the horizontal/vertical features F(h/v) . As indicated by the bottom left graph of Fig. 9, for a
fixed JPEG post-compression quality 80, the detectability is indeed more or less invariant to keeping/removing
the diagonal features. Table 2 presents the condensed results in terms of the minimum decision error Pe for
varying JPEG qualities and suggests that analyzing the averaged horizontal and vertical transition probabilities
is sufficient as long as T > 1.
In our last experiment, we investigate the influence of the size of the analyzed image (region). Because the
SPAM features are basically an estimate of the transition matrix of first-order differences, it is to be expected
†
The cropping is done for reasons of keeping the computing time for the feature extraction practicable. Since the
images are cropped from the center part of the full size image, there might be a bias towards less saturated images
(assuming that saturation mostly occurs in the uppermost part of an image). On the other hand, the relatively high
dimensionality of the SPAM features should be generally well able to cope with saturation effects.
1 1
0 0
1 1
0.8 0.8
true positive rate
0.2 0.2
0 0
Figure 9. SPAM results for the detection of 3 × 3 median filtering with JPEG post-compression. ROC curves for different
post-compression qualities (top left; T = 3, 512 × 512 images, horizontal/vertical and diagonal features); as well as curves
for a fixed JPEG quality 80 and varying parameter settings: threshold T (top right), with or without diagonal features
(bottom left), and image size (bottom right). Detection performance decreases with decreasing JPEG quality, SPAM
threshold T and image size. It is however more or less invariant to ignoring the diagonal SPAM features.
that these features become more robust to local variations as the image size increases (also see the discussion of
the block size in Sect. 3.2). Such local variations might be, for instance, caused by saturated or highly textured
regions. The bottom right graph of Fig. 9 shows the ROC curves for a JPEG compression quality 80 and images
of size 512 × 512 and 1024 × 1024, respectively. In line with our expectations, a better performance is obtained
for the larger images. Again, corresponding results were obtained for different JPEG compression qualities.
While all of the above findings for the 3 × 3 median filter also hold for larger filter dimensions, the detection
performance generally increases considerably. This is so because larger filters have a stronger smoothing effect and
thus affect the distribution of small absolute differences more severely. To demonstrate the gain in detectability
for a larger filter, Fig. 10 reports some sample results for a 5 × 5 filter. An almost perfect discrimination between
original (JPEG) images and filtered images was obtained for different JPEG compression qualities in the range
{90, 80, 70}. The zoomed-in part reveals that the detection performance decreases only slightly when lowering the
JPEG quality (also see Tab. 2). In agreement with the results from Fig. 9, only the horizontal/vertical features
where used for classification. However, as indicated by the right graph of Fig. 10, T should still be chosen to be
at least T = 2. Otherwise, feature vectors with a dimensionality that is too low make the detection unreliable.
Finally, it is worth noting that the first-order SPAM features by and large yield results in line with the
presented findings for the second-order features, however at a generally reduced detectability. For the sake of
brevity, we refrain from reporting them separately.
1 1
0 0
Figure 10. SPAM results for the detection of 5 × 5 median filtering with JPEG post-compression. ROC curves for
different post-compression qualities (left; T = 3, 512 × 512 images, horizontal/vertical features only) and for a fixed JPEG
quality 70 and varying threshold settings (right). Detection performance is generally better compared to 3 × 3 filtering.
On a more general level, it has to be mentioned that—contrary to the streaking-based approach of Sect. 3.1—
the SPAM features do not solely capture the specific effects of median filtering. After JPEG compression,
other linear and non-linear smoothers will have a similar (more or less strong) impact on the distribution
of the first-order differences. While this proliferates the ambiguities in the determination of the concrete pre-
processing history, especially when the JPEG quality becomes lower, it might be argued that after strong-enough
compression it is sufficient to know that an image has been smoothed before because typical filter characteristics
are suppressed by JPEG artifacts anyway. On the other hand, offering a more positive focus, SPAM features
may evolve to become a general-purpose smoothing detector—a subject of future research.
5. CONCLUDING REMARKS
In this paper, we have investigated the detection of median filtering in digital images. In the broader framework
of digital image forensics, we see this endeavor as a contribution to the problem of determining the general
processing history of digital images. While the application of ‘classical’ image processing primitives for denoising,
sharpening, or contrast enhancement does typically not per se harm the authentic value of an image, it is still of
high interest to learn as much as possible about what exactly has happened to an image and to make informed
decisions based on this knowledge. As such knowledge is desirable not only in forensics but also in steganalysis
and watermarking, we deem our methods as valuable instruments in various fields of multimedia security.
The presented findings of this paper are twofold. For uncompressed images, the analysis of the so-called
streaking artifacts17 in median filtered images has proven to be a reliable measure for discriminating between
filtered and non-filtered images. A perfect detection was achieved for false positive rates as low as 1.8 % (3 × 3
median, B = 64). While the detector is of splendid simplicity—relying on a single feature derived from histogram
bins of the first-order difference image—it turned out that it is not applicable to images that are JPEG compressed
after filtering. In the post-compression scenario, we therefore turned to a more complex detector based on the
recently introduced SPAM features,19 combined with support vector machines. Here, depending on the size of
the median filter, a very reliable detection was possible even for JPEG qualities as low as 70 (Pe = 1.1 % for the
5 × 5 median, T = 3).
In the case of median pre-compression (i.e., median filtering of already JPEG compressed images), which
was not explicitly discussed in the paper, we found that the variation of the %̂-values between different original
images is reduced to some extent. After the second compression, the SPAM features are still able to detect
median filtering reliably. In fact, a low pre-compression quality can even increase the detector’s performance.
As to the limitations, we have to note that JPEG compression does a good job in obfuscating the actual type
of smoothing applied to the image before compression. While being generally well-detectable with the SPAM
Table 2. Minimum average decision error Pe of median filtering detectors based on SPAM for filter sizes 3 × 3 and 5 × 5.
All results were obtained on a set of approximately 3250 JPEG images of dimension 512×512. The detectors’ performance
is broken down by the JPEG post-compression quality and the feature vector dimension (the threshold T , with or without
diagonal features F(d) ).
features, experiments showed that, contrary to the analysis of streaking artifacts in uncompressed images, it
is not possible to distinguish between the median filter and other smoothers. While this could also be turned
into an advantage by considering the SPAM features as a general purpose smoothing detector, alternative or
additional features should be explored that allow to track down further particularities of the median filter. A
possible candidate is, for instance, the median filter’s relatively good edge-preserving property compared to linear
smoothers.
ACKNOWLEDGMENTS
The authors want to thank Jan Kodovský for his help with setting up the SVMs. Matthias Kirchner gratefully
receives a doctorate scholarship from Deutsche Telekom Stiftung, Bonn, Germany. Part of this research has been
accomplished while the first author was a visiting scholar at SUNY Binghamton. Jessica Fridrich was supported
by an NSF award CNF-0830528.