Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
80 views

Performance Comparison and Visualization of AI-Generated-Image Detection Methods

Uploaded by

Horatiu Florea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Performance Comparison and Visualization of AI-Generated-Image Detection Methods

Uploaded by

Horatiu Florea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Received 7 February 2024, accepted 23 April 2024, date of publication 26 April 2024, date of current version 8 May 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3394250

Performance Comparison and Visualization of


AI-Generated-Image Detection Methods
DAEEOL PARK , HYUNSIK NA , AND DAESEON CHOI , (Member, IEEE)
Department of Software, Soongsil University, Seoul 07027, South Korea
Corresponding author: Daeseon Choi (sunchoi@ssu.ac.kr)
This work was supported by the Institute of Information and Communications Technology Planning and Evaluation (IITP) Grant funded by
Korea Government (MSIT) (Robust AI and Distributed Attack Detection for Edge AI Security) under Grant 2021-0-00511.

ABSTRACT Recent advancements in artificial intelligence (AI) have revolutionized the field of image
generation. This has concurrently escalated social problems and concerns related to AI image generation,
underscoring the necessity for an effective AI-generated-image detection method. Therefore, numerous
methods for detecting AI-generated images have been developed, but there remains a need for research
comparing the effectiveness of and visualizing these detection methods. In this study, we classify
AI-generated-image detection methods by the image features they use and compare their generalization
performance in detecting AI-generated images of different types. We selected five AI-generated-image
detection methods for performance evaluation and selected vision transformer as an additional method
for comparison. We use two types of training datasets, i.e., ProGAN and latent diffusion; combine
existing AI-generated-image test datasets into a diverse test dataset; and divide them into three types of
generative models, i.e., generative adversarial network (GAN), diffusion, and transformer, to evaluate the
comprehensive performance of the detection methods. We also analyze their detection performance on
images with data augmentation, considering scenarios that make it difficult to detect AI-generated images.
Grad-CAM and t-SNE are used to visualize the detection area and data distribution of each detection
method. As a result, we determine that artifact-feature-based detection performs well on GAN and real
images, whereas image-encoder-feature-based detection performs well on diffusion and transformer images.
In summary, our research analyzes the comparative detection performance of various AI-generated-image
detection methods, identifies their limitations, and suggests directions for further research.

INDEX TERMS Generative AI, AI-generated-image detection, synthetic-image detection, performance


comparison, GAN, diffusion model, transformer, Grad-CAM, t-SNE.

I. INTRODUCTION facilitated the creation of superior-quality images that meet


Since the emergence of artificial intelligence (AI) image the needs of users. These AI image generative models have
generative models, numerous advancements have been made progressed over time, heightening the seeming authenticity
in the field. The introduction of the generative adversarial and naturalness of AI-generated images and rendering them
network (GAN) [1] marked the start of AI image generation, increasingly indiscernible from real pictures. Therefore, as AI
with various GAN models having been developed to create images gain ground, a range of social concerns have arisen
images of diverse subjects such as faces, artwork, and land- as a result of the misuse of AI images. When ‘‘Théâtre
scapes. Subsequently, the diffusion model [2] was introduced, D’opéra Spatial,’’ a painting that was awarded first prize in
enabling the generation of higher-quality images compared the digital category of the Colorado State Art Competition
to those generated by the GAN model. The emergence 2022, was revealed to be generated by Midjourney [4], an AI
of AI image-generation models utilizing the transformer image generative model, concerns were raised about the
structure [3] found in language processing models has extent to which AI-generated images should be regarded
as creative works, if at all, and the rightful ownership of
The associate editor coordinating the review of this manuscript and AI-generated-image copyrights [5]. On a more serious note,
approving it for publication was Yue Zhang . a social media user posted a fabricated photo of an explosion
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 62609
D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

near the Pentagon in the United States [6], causing the U.S. their performance in scenarios that disrupt
stock market to drop within a few minutes. Later, it was AI-generated-image detection.
revealed that the photo was created by AI, demonstrating • Our Grad-CAM analysis revealed that the presence
the potential for AI-generated images to cause significant of learned features in the region in which detection
real-life problems. is performed is more crucial than the region itself.
To address ethical concerns and potential damages arising On the other hand, the t-SNE analysis showed how
from AI-generated images, there is a need for technology the generated-image data are distributed and how each
that can distinguish between real and AI-generated images. model detects images using the artifact method and
Consequently, numerous studies have been conducted on image-encoder methods.
AI-generated-image detection. These studies involve training The remaining parts of the paper are structured as follows:
a deep learning model using both real and AI-generated Section II describes the image-generation models used to
images to identify features unique to AI-generated images. create the images that constitute the test datasets used to
AI-generated-image detection research has resulted in vari- test the detection methods and discusses related research on
ous detection methods and the use of AI-generated features AI-generated-image detection focusing on performance com-
for detection. However, research on the most effective parison and data augmentation. Section III introduces the six
methods and feature detection models to detect AI-generated detection methods used in our study, and Section IV presents
images has thus far been deficient. It is also unclear which the construction of the training and test datasets. Section V
is the optimal method for discerning images generated by describes the training setting and test setting of each method,
different types of generative models. Furthermore, there is and Section VI shows the detection performance results.
a need for analysis of how detection methods distinguish Section VII contains Grad-CAM and t-SNE visualizations of
between real images and AI-generated images. the detection methods. Section VIII discusses the results, and
In this study, we evaluated and compared AI-generated- finally, Section IX concludes our study and outlines plans and
image detection methods based on their generalization recommendations for future work.
performance on various AI-generated images, divided
into GAN-generated, diffusion-generated, and transformer- II. RELATED WORK
generated images. We selected five detection methods that A. AI IMAGE GENERATIVE MODEL
provide a pretrained model or training code, and vision
An AI image generative model is a model that can generate
transformer (ViT) [7] as an additional detection method for
images using either img-to-img translation, which generates
comparison. For each method, we trained the model using
a new image in a different field while maintaining the
two distinct training datasets: ProGAN-generated images
concept of the original image, or text-to-img translation,
and latent-diffusion-generated images. We then tested the
which generates a corresponding image from a text prompt
models on separate GAN-generated, diffusion-generated, and
that describes the desired image. AI image generative models
transformer-generated image test datasets to determine which
can be classified into three different categories based on
method has the strongest generalization performance on
their structural design, specifically: GAN, diffusion, and
each type of generated image. By combining multiple test
transformer.
datasets from existing AI-generated-image detection research
literature, we constructed a rich test dataset that includes 1) GAN-BASED MODEL
a total of 23 AI-generated-image datasets and 3 real-image GAN [1] is a structured deep learning model consisting
datasets. To gain further insight, we conducted tests of of a generator and a discriminator. The generator produces
detection performance on JPEG compression quality and an image that tries to fool the discriminator, whereas the
Gaussian blur to analyze the robustness of each method discriminator tries to determine whether the image is real or
to JPEG-compressed images and Gaussian-blurred images. fake. This adversarial method by which the generator and
We also performed visualizations using Grad-CAM [8] and discriminator learn from each other improves the quality of
t-SNE [9] to see what regions the detection method detects production from the generator, resulting in a better-quality
in the image and how it classifies the image in feature image.
space. Since its advent, GAN has been used to develop numerous
The main contributions of our work are as follows: models for generating images. CycleGAN [10] transforms
• We built various kinds of generative-model test datasets an image into another with a different art style or in
with three different generative model structures to com- a different domain while maintaining the concept of the
pare existing AI-generated-image detection methods. original image. The model accomplishes this by introducing
• We divided the AI-generated-image detection methods cycle consistency loss, which allows the conversion to
into three groups based on the features they use and proceed only long enough to turn the converted image
analyzed the differences in detection performance for back into the original image. Gradually, GAN models were
each group. improved to be able to generate high-resolution images.
• We compared the robustness of AI-generated-image ProGAN [11] has demonstrated that it is more effective to
detection methods to image augmentation to investigate generate high-resolution images by gradually adding layers

62610 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

FIGURE 1. Sample images from test datasets used for our study.

to create larger images from smaller ones. On the other hand, The aforementioned diffusion models have a disadvantage
BigGAN [12] reliably generates high-resolution images by in that the diffusion process is performed in pixel space,
applying orthogonal regularization to the generator with two resulting in a long generation time. The latent diffusion
to four times more parameters than that for the former GAN model (LDM) [20] solves this problem by performing the
model. A GAN model capable of generating photorealistic diffusion process on a latent space extracted through an
images from a semantic layout has already been introduced: autoencoder, resulting in a shorter execution time. The
GauGAN [13] leverages spatially adaptive normalization for open-source model Stable Diffusion [21], which was based
increased visual fidelity. on LDM, was then trained using the LAION dataset [22]
Several GAN models for generating face images have and merged with the CLIP:ViT-L/14 [23] encoder to use
also been constructed. StarGAN [14] was proposed with text prompts as generation conditions. The model has since
a framework that enables multidomain conversion through been fine-tuned and utilized in various ways. Stable Diffusion
a single model incorporating domain classification loss to v2 [24] is an official fine-tuned model trained to produce
generate diverse facial styles. With the addition of a mapping higher-resolution images and uses CLIP:ViT-H as its encoder.
network and a style encoder module, StarGANv2 [15] is Stable Diffusion v1.5 Realistic Vision v2.0 [25], also a fine-
then able to express various styles under a specific domain, tuned model, produces photorealistic images and is available
effectively converting not only human faces but also animal on the Civitai [26] website, where fine-tuned models of Stable
faces. Meanwhile, StyleGAN [16] is designed to be a Diffusion are collected. Lexica [27] is another fine-tuned
style-based generator with a mapping network that generates model of Stable Diffusion that generates smooth and glossy
facial images with detailed attribution. The subsequent art images. The GLIDE [28] model enhances the classifier
version, StyleGAN2 [17], enhances generator normalization guidance of ADM by utilizing classifier-free and CLIP
to eliminate artifacts in StyleGAN-produced images and guidance to achieve photorealism in conditional images.
advance the model. StyleGAN3 [18] was then proposed Subsequently, the IF [29] model, consisting of a frozen T5
to solve the texture-sticking problem of StyleGAN2 from [30] encoder and a cascade of one base diffusion module
a signal processing perspective. Some GAN models have and two super-resolution diffusion modules, was introduced.
advanced beyond 2D image generation and can generate 3D IF is a state-of-the-art model that produces image quality
images. For example, Chan et al. proposed EG3D [19], a tri- exceeding those of previous diffusion-based image generative
plane-based 3D GAN framework that enables multi-view face models.
image generation.
2) DIFFUSION-BASED MODEL 3) TRANSFORMER-BASED MODEL
A diffusion model generates an image by gradually adding In the image generative domain, the transformer model
noise to the input image through a diffusion process to converts image pixels into tokens, which it then inputs into the
create a noisy image and then learning parameters to restore transformer structure. Using an attention mechanism [3] that
the original image from the noisy image through a reverse comprehends the complex connections between the tokens,
process. This model was introduced by Wang et al. based the model generates new images by predicting and generating
on a denoising diffusion probabilistic model (DDPM) [2], pixel tokens. DALL·E [31] is an autoregressive model that
which has been mentioned earlier. Subsequently, the ablated learns transformers by processing text and image tokens as
diffusion model (ADM) was introduced, enhancing the a single stream based on their structure. DALL·E outper-
structural design of the DDPM and adding classifier guidance formed several GAN models in terms of Frechet inception
to produce class-fidelity images. distance [32] and inception score [33], which are used as

VOLUME 12, 2024 62611


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

metrics to evaluate the generated images. In an attempt to techniques have been utilized in multiple studies to improve
make an open-source version of DALL·E, the model DALL·E the generalized detection performance of AI-generated-
Mini [34], trained with fewer parameters and images, was image detection models. Xuan et al. [46] found that applying
developed. Subsequently, Ramesh et al. proposed DALL·E Gaussian blur and Gaussian noise to an image dataset
2 [35], a two-stage model consisting of a prior and decoder prior to training reduced low-level noise in the images
to better understand the representation of the text encoder, and helped the detection model learn intrinsic features.
producing images that understood text captions better than On the other hand, Wang et al. [37] showed that training on
DALL·E does. Taming Transformers [36] is a model that JPEG-compressed images can generalize a model to detect
transforms an image into a codebook, a set of sequences that GAN-model-generated images and make it more robust to
have gone through a transformer and then processed to create detect JPEG-compressed images.
a new image.
III. AI-GENERATED-IMAGE DETECTION METHODS
4) STRUCTURE-UNKNOWN MODEL In this section, we present the detection methods used in our
Midjourney [4], which is considered state-of-the-art among study. We selected five AI-generated-image detection meth-
image generative models, currently has no publicly known ods that provide training code or pretrained model weights
model structure. Midjourney is presently a paid subscription from the official Github repository. Then, we divided the
service that provides image generation in a variety of art AI-generated-image detection methods into three categories
styles and super-resolution. based on the image features that they use.

A. ARTIFACT-FEATURE-BASED DETECTION
B. PERFORMANCE COMPARISON IN
Artifact-feature-based detection identifies common artifacts
AI-GENERATED-IMAGE DETECTION RESEARCH STUDIES
in AI-generated images. To create a model based on this
In other AI-generated-image detection studies, the typical detection method, a deep learning network is trained on both
approach is to compare the performance of the proposed real and AI-generated images. CNNDetection [37] is a widely
method to those of existing detection methods. Past compar- published study that applied this approach to the detection
isons of the performance of the detection methods selected of AI-generated images. A pretrained ResNet50 model [47]
for our research study are as follows. A research study was trained using the ProGAN training dataset [37] into
on CNNDetection [37] evaluated its performance on eleven a binary classifier that distinguishes real images from
synthesis-model-generated images, including those from six AI-generated images, and its detection performance was
GAN models, and compared it with those of the methods evaluated on eleven CNN-based models for image generation,
studied by Zhang et al [38]. A study on the no-down including six GAN models. Furthermore, the incorporation of
method [39] tested its detection performance on images probabilistic data augmentation techniques such as Gaussian
generated by five GAN models, four diffusion models, blur and JPEG compression led to enhanced generalization
and three transformer models and compared it with those performance in detecting images generated from various
of other detection methods [37], [38], [40]. The research generative models. Subsequently, the no-down detection
study that introduced the patch selection model (PSM) study [39] applied the no-down architecture [48], which does
[41] combined multiple generative models by family and not perform downsampling nor use pooling in the first layer
evaluated the performance of the proposed model on each of the ResNet50, to achieve higher detection performance.
family with those of other detection methods [37], [39], [40]. This architecture, prevents the suppression of artifacts and
Similarly, the study that developed the deep image fingerprint allows for further calculation of the noise residual, enabling
(DIF) [42] tested its detection performance on images the model to learn the more comprehensive features of an
generated by seven GAN models, three diffusion models, AI-generated image. The detection model utilizes both the
two transformer models, and Midjourney and compared it global and local features of an image by using the patch
with the performance of two rule-based methods [43], [44] selection module (PSM) [41], was proposed by Ju et al. This
and artifact-based methods [37], [39]. A research study that model passes the entire input image through a ResNet50
explored universal fake image detection (UFD) tested its model to extract a global feature map, resizes the selected
performance against those of other detection methods [37], local patches using a patch selection module, and passes
[38], [40], [45] on images generated by six GAN models, them through the ResNet50 model again to extract local
three diffusion models, one transformer model, and five other features. The global and local features are then fused using an
generative models. attention-based feature fusion module to determine whether
the image is real or AI-generated using a binary classifier.
C. DATA AUGMENTATION IN AI-GENERATED-IMAGE Furthermore, in addition to the convolutional neural
DETECTION network (CNN) model employed thus far, we utilized the
The two most prevalent data augmentations employed in ViT model to evaluate the detection performance on images
AI-generated-image detection research are based on Gaus- generated by AI models with a transformer structure. ViT is a
sian blur and JPEG compression. These data augmentation state-of-the-art model in image classification and applies the

62612 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

self-attention mechanism used in natural language processing TABLE 1. Specific information about different types of test datasets used
in performance comparison.
to images, dividing them into patches and generating
sequences as input to a transformer.

B. SPECTRUM-FEATURE-BASED DETECTION
Spectrum-feature-based detection is a technique for identi-
fying distinct patterns that appear when a set of generated
images is transformed into spectrum space by a fast Fourier
transform. Sergey et al. introduced the deep image fingerprint
(DIF) [42], which employs the inductive bias of CNN
to extract fingerprints from generated images. DIF is a
rule-based method that obtains an artificial fingerprint by
filtering an image through a U-Net [49] high-pass filter and
classifying it as either real or AI-generated by comparing the
correlation between the fingerprint and the residuals extracted
from the image. The DIF model has exhibited generalized
detection performance even when trained on only a small
number of AI-generated images, as few as 512.

C. IMAGE-ENCODER-FEATURE-BASED DETECTION
Image-encoder-based detection uses the feature space of
an image-to-text encoder such as CLIP [23], which has
been trained on a large number of image–text pair datasets.
Ojha et al. [50] discussed the challenge of classifying
both GAN and diffusion-generated images using models To match the size of the LDM-generated images, we rescaled
that were simply trained on real and AI-generated images. and center-cropped the real images down to dimensions of
Therefore, they proposed a universal fake image detector 256 × 256.
(UFD) that utilizes the feature space of CLIP:ViT-L/14,
which has learned from 400 million diverse images, to detect B. TEST DATASETS
AI-generated images from various domains. The UFD model In our study, the test dataset for a single generative model was
uses the CLIP model as a backbone and learns only the added not composed of a 50–50 split between real and AI-generated
linear classification layer. This model generates a feature images. Instead, the real-image dataset consisted exclusively
bank for real and AI-generated images based on training of real images, whereas the AI-generated image datasets con-
images. Then, it classifies an input image using the feature sisted exclusively of AI-generated images. To demonstrate
bank, assessing if the input image is closer to a real or why, if we suppose that the Stable Diffusion test dataset
AI-generated image by calculating the cosine distance with has a 1:1 ratio of real and AI-generated images, it will be
the features inputted into the image encoder. difficult to determine whether the detection accuracy is, for
example, 100% for real images and 50% for Stable-Diffusion-
IV. DATASETS generated images, or 80% for real images and 70% for
A. TRAINING DATASETS Stable-Diffusion-generated images, if the detection accuracy
We selected two different training datasets: the ProGAN is 75% for the whole dataset. By organizing the test dataset in
training dataset provided in the CNNDetection study [37] to this manner, we could more accurately identify the detection
test on GAN-generated images, and the LDM training dataset performance of the detection model on real images and on
provided by Corvi et al. [53] to test on diffusion-generated, each of the AI-generated images. A sampling of the test
transformer-generated, and Midjourney-generated images. images used in our study is shown in Fig. 1, and specific
The ProGAN training dataset comprises 364K ProGAN- details of the test datasets are provided in Table 1. The
generated images with LSUN images and 364K real images ‘‘+’’ sign in the Images column of Table 1 signifies that the
from the LSUN dataset, all with dimensions of 256×256 size different test datasets under the Source column for that row
and in PNG format. From the 364K ProGAN-generated are combined.
images and 364K real images, 4K each were used as the Our test dataset consisted of 23 AI-generated-image
validation set. The LDM training dataset contains 200K datasets and 3 real-image datasets. The AI-generated image
latent-diffusion-generated images in PNG format and 200K datasets included images generated by 10 GAN models,
real images from the LSUN and COCO datasets in JPEG 8 diffusion models, 4 transformer models, and Midjourney.
format. From the 200K LDM-generated images and 200K All images in the test dataset were in PNG format, except for
real images, 20K each were used as the validation set. the real images and Lexica [27], which were in JPEG format.

VOLUME 12, 2024 62613


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

FIGURE 2. Framework for performance comparison of AI-generated-image detection methods.

Depending on the structure of the generative model, the test which consisted of images generated by DALL·E Mini [34],
dataset was divided into three parts. DALL·E 2 [35], and Taming Transformers [36]. We used
additional images from DALL·E Mini and DALL·E 2
1) GAN-GENERATED-IMAGE TEST DATASET provided in the DIF research study [42], and images
The GAN-generated-image test dataset consisted of 39.3K from DALL·E [31] provided in the UFD study [50] test
images generated by ProGAN [11], BigGAN [12], Cycle- dataset. In addition to transformer-generated images, we also
GAN [10], GauGAN [13], StarGAN [14], StarGAN2 [15], included Midjourney-generated images in the transformer
StyleGAN [16], StyleGAN2 [17], StyleGAN3 [18], and test dataset, for a total of 13K images.
EG3D [19]. The dataset was based on the test dataset
used in the CNNDetection study [37]. We also used test 4) REAL-IMAGE TEST DATASET
images generated by StarGAN2, StyleGAN, StyleGAN3, The real-image test dataset is composed of the CC3M [55]
and EG3D from other detection studies [51], [52], [53] for validation dataset, COCO [56] test dataset, and Ima-
the supplementary GAN-generated images to evaluate the geNet [57] test dataset, comprising a total of 39.3K images.
performance of the models against those of the most recent The CC3M, COCO, and ImageNet datasets are widely
GAN models. used image datasets consisting of images from a variety
of domains. When the real dataset is used with one of the
2) DIFFUSION-GENERATED-IMAGE TEST DATASET aforementioned AI-generated-image datasets (GAN, diffu-
The diffusion-generated-image test dataset contained 32K sion, or transformer), the number of real images is adjusted
images generated by LDM [20], Stable Diffusion-V1 to match the total number of images in the AI-generated-
(SD-V1) [21], Stable Diffusion-V2 (SD-V2) [24], Stable image dataset, resulting in a 1:1 ratio. Thus, the GAN-
Diffusion-V1 Realistic Vision V2.0 (SD-V1 RV 2.0) [25], generated-image test dataset was paired with a real-image
Lexica [27], ADM [58], GLIDE [28], and IF [29]. This test dataset comprising 12.4K CC3M, 13.3K COCO, and
dataset was based on the test datasets used by Corvi et al. [53] 13.6K ImageNet images. The diffusion-generated-image test
and Lu et al. [52], albeit expanded with images provided in dataset was paired with a real-image test dataset comprising
other studies on various detection methods [50], [51] and the 10.7K images from each of the CC3M, COCO, and ImageNet
DiffusionDB dataset [54] to increase the number of images datasets. The transformer-generated-image test dataset was
and vary the image domains. Lexica images were obtained paired with a real-image test dataset comprising 4.3K images
from the official website via crawling [27]. We conducted from each of the CC3M, COCO, and ImageNet datasets. This
experiments on several Stable-Diffusion-based fine-tuned is the most effective way to determine the performance of
model datasets to evaluate whether detection models trained the detection method, because the total accuracy will be 50%
on LDM-generated images can also properly detect images even if the detection model classifies all images as real or as
generated by a fine-tuned Stable Diffusion model that has the AI-generated.
same structure as that of the LDM.
V. COMPARISON SETTINGS
3) TRANSFORMER-GENERATED-IMAGE TEST DATASET In this section, we describe the comparison-experiment
The test dataset provided by Corvi et al. [53] was also used settings for each method. The performance comparison
as the base of the transformer-generated-image test dataset, framework of our study is illustrated in Fig. 2. Each detection

62614 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

method was obtained from the official Github repository. If a In a real-world scenario, repeated saving and download-
pretrained model that was trained on the same dataset we ing of AI-generated images on social media can cause
used for training was available in the official repository, image quality degradation through JPEG compression. This
we selected it for the test. For CNNDetection and UFD, makes it more difficult for detection models to recognize
we applied the ProGAN pretrained model. CNNDetection AI-generated images. Therefore, we conducted additional
provided several ProGAN models pretrained on images with tests to analyze the robustness of each detection method
a 50% probability of augmentation by Gaussian blur and to JPEG compression. We compressed test dataset images
images with a 50% probability of augmentation JPEG com- using the Python Imaging Library (PIL) [59] with JPEG
pression, and a model pretrained on images with a 10% prob- qualities of 50, 70, and 90. In addition, a hostile adversary
ability of augmentation. We selected the model pretrained on could manipulate an AI-generated image to evade detection.
images with a 10% probability because it performed better We considered a scenario where an adversary applies
in the CNNDetection study [37]. For the no-down detection Gaussian blur to AI-generated images. Therefore, we applied
method, we used both the ProGAN pretrained model and Gaussian blur using the OpenCV library [60] to the test
LDM pretrained model. The methods without a pretrained datasets with sigma values of 1, 2, and 3, respectively, and
model were trained on the ProGAN training dataset and then evaluated the performance of the detection methods.
LDM training dataset, respectively; the model trained on With regard to the models for testing JPEG-compressed and
ProGAN-generated images was tested on GAN-generated Gaussian-blurred images, as described earlier, we tested the
images, and the model trained on LDM-generated images was methods involving JPEG compression and Gaussian blur as
tested on diffusion and transformer-generated images. Then, augmentation techniques using a model trained with image
we analyzed the performance against each test dataset, using augmentation and the other methods, ViT and DIF, using a
accuracy as the metric. model trained without image augmentation.

C. EVALUATION METRIC
A. TRAINING SETTINGS
We used accuracy as our evaluation metric for the test.
The goal of our study is to analyze the extent to which each
The detection results of each detection method have four
method can achieve its highest generalization performance.
cases: true positive (TP), when a real image is detected
Therefore, for optimal training, we set most of the training
as a real image; false negative (FN), when a real image
parameters to their default values in the training code rather
is detected as an AI-generated image; true negative (TN),
than using the same parameters among the different methods.
when an AI-generated image is detected as an AI-generated
However, we applied the same data augmentation parameter,
image; and false positive (FP), when an AI-generated image
which significantly affects performance. The CNNDetection,
is detected as a real image. Equation (1) is used to calculate
PSM, and UFD models, which accounted for Gaussian
the accuracy. However, as explained in Section IV, the
blur and JPEG compression as data augmentations, were
AI-generated-image dataset contains only AI-generated
set to use the same Gaussian blur and JPEG compression
images, and thus, there will be no TP or FN in the evaluation
probabilities of 10%, Gaussian blur sigma value range of
results for the AI-generated-image dataset; therefore, the
0 to 3, and JPEG compression quality range of 30 to
accuracy of detection on the AI-generated images is equal
100 when training. For CNNDetection [37] and UFD [50],
to (2). Similarly, the real-image dataset consists only of real
which provided multiple base model architectures, we used
images, and thus, there will be no TN and FP in its evaluation
the model architectures that were used in their respective
results; therefore, the accuracy of detection on the real images
introductory studies. We used ResNet50 for CNNDetection,
can be calculated as in (3). The total accuracy is calculated as
and ViT-L/14 for UFD. For the ViT, we used an ImageNet-
shown in (1), which reflects the total number of AI-generated
pretrained ViT-B/16 model, which has a 16 × 16 patch
and real images, for either the GAN, diffusion, or transformer
size, and added the classification head to the fine-tuning
test datasets.
configuration as a binary classifier for detecting real and
AI-generated images. More detailed training settings for the TP + TN
Accuracy = (1)
detection methods are provided in Appendix A. TP + FN + TN + FP
TN
Generated Image Accuracy = (2)
TN + FP
B. TEST SETTING TP
When conducting our tests, we used mostly the default values Real Image Accuracy = (3)
TP + FN
from the existing test codes of the detection methods but
applied the same batch size of 1 to all methods. Although VI. COMPARISON RESULTS
UFD was capable of determining the optimal threshold that The comparison results for the detection performance of the
provides the highest accuracy for a given dataset, we used selected detection methods are shown in Table 2, 3, and 4.
a threshold of 0.5 to be fair to the other methods. The The accuracy of a detection model on the AI-generated
detailed test settings for all detection methods are provided image dataset is the percentage of images detected by the
in Appendix B. model as AI-generated, whereas the accuracy of the detection

VOLUME 12, 2024 62615


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

TABLE 2. GAN-generated-image and real-image detection accuracies of detection methods trained on ProGAN training dataset.

model on the real-image dataset is the percentage of all and ImageNet real images. The weak detection performance
images that the model correctly detected as real. The total of DIF on real images can be assumed to have been due
accuracy is the number of correctly detected images divided to the insufficient configuration of the ProGAN training
by the total image dataset size. In the tables, ‘‘CD’’ denotes dataset to separate the spectrum features of real and AI-
CNNDetection, ‘‘Nd’’ denotes no-down, all accuracies are generated images. With the exception of DIF, all five methods
rounded to the second digit after the decimal point, and the correctly classified the real images with accuracies greater
highest values are bolded. than 95%. The images generated by CycleGAN and GauGAN
were detected most accurately by the UFD model, achieving
A. RESULTS ON GAN-GENERATED IMAGES accuracies of 98.03% and 99.70%, respectively. However,
The detection performance of each method trained with the UFD had a performance of approximately only 50% on
ProGAN dataset on the GAN-generated images is reported in images generated using models that are mainly for face
Table 2. First, the accuracies on the ProGAN images, which images, such as StyleGAN2 and StyleGAN3. StyleGAN3
were generated by the same models used to generate the is the GAN model that resulted in the images the lowest
images on which the detection models were trained, were all detectability; herein, CNNDetection achieved the highest
greater than 99% for all tested methods except for DIF. This detection rate of 72.03%, which is higher than that of any
shows that most of the methods are good at detecting images other model. In total, when trained on the ProGAN training
generated by the GAN models that were also used to generate dataset, the no-down method outperformed all other methods
the images on which the detection models were trained. The on both GAN-generated and real images. Given that all
no-down method performed well on most GAN-generated- methods have accuracies greater than 65% on the GAN-
image datasets that were not included in the training, generated images, it can be concluded that a detection model
exhibiting the best detection performance on the images trained only on ProGAN-generated images can detect other
generated by BigGAN, StarGAN, StarGAN2, StyleGAN, GAN-generated images to some extent and that the images
and StyleGAN2, and on the real images. Because all artifact generated by different GAN models share similar features.
detection methods demonstrated a performance greater than
90% on images generated by StarGAN, we can assume B. RESULTS ON DIFFUSION-GENERATED IMAGES
that ProGAN and StarGAN have similar artifacts. PSM Table 3 shows the detection performance of each method
performed slightly better than CNNDetection on the GAN- trained with the LDM training dataset on the diffusion-
generated-image test dataset, but not as well as the no-down generated images. On the test images generated by LDM,
method. Among the six detection methods, ViT demonstrated which was also the model that generated the training images,
the lowest detection accuracy on the GAN-generated images, all six methods achieved accuracies greater than 90%. ViT
suggesting that, as a detection model, it lacks generalization had the lowest accuracy, at 92.76%, whereas the no-down
performance on GAN-generated images. Interestingly, DIF method achieved the highest accuracy, at 100%. With regard
achieved the highest detection accuracy, at 99.30% for EG3D, to the performance of each method on images generated
but had a weak performance of less than 30% on the COCO by Stable Diffusion and by the fine-tuned Stable Diffusion

62616 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

TABLE 3. Diffusion-generated-image and real-image detection accuracies of detection methods trained on LDM training dataset.

model, the no-down method exhibited the highest accuracy, by various diffusion models have fewer shared artifact
which was greater than 99%, on images generated by features compared to those generated by GAN models.
SD-V1, SD-V2, and SD-V1 RV 2.0, whereas UFD had
the highest accuracy, at 84.75%, on images generated by C. RESULTS ON TRANSFORMER-GENERATED AND
Lexica. DIF demonstrated superior performance on images MIDJOURNEY-GENERATED IMAGES
generated by LDM and SD-V1 images but underperformed The detection performance of each method trained with the
compared to the other methods on images generated by LDM dataset on the transformer-generated and Midjourney-
the fine-tuned models SD-V2, SD-V1 RV 2.0, and Lexica. generated images is outlined in Table 4. UFD outperformed
We notice that the detection performance of the detection all other methods with high detection accuracies greater than
methods decreases significantly on images generated by 98% on images generated by DALL-E mini, DALL-E, and
fine-tuned models such as Lexica that have a significantly Taming transformers, and 84.20% on images generated by
different image style from that of the original Stable Diffusion DALL-E 2. The no-down method exhibited a performance of
model (SD-V1). UFD performed generally well across the 71.5% and 80.2% on images generated by DALL-E mini and
different diffusion generative models, with accuracies of DALL-E, respectively, but a low detection accuracy of 2.22%
89.43% on the ADM-generated images, 94.73% on the on images generated by DALL-E 2. None of the detection
GLIDE-generated images, and 70.63% on the IF-generated methods successfully detected the images generated by
images, compared to the other five methods, which all Midjourney, which is a state-of-the-art generative model.
resulted in performance levels worse than 50% on the images UFD had the highest success rate, at 56.04%, whereas ViT
generated by ADM, GLIDE, and IF. Overall, UFD achieved had the lowest success rate, at 5.43%. DIF exhibited the
the highest accuracy, reaching 86.66%, on the diffusion- weakest performance on images generated by transformer
generated images. However, its performance on real images, and Midjourney. It can be inferred that the spectrum
at 80.53%, was slightly lower than those of the other features of diffusion-generated and transformer-generated
methods, which exceeded 95%, with the no-down method images are quite dissimilar. The highest accuracy on the
exhibiting the highest accuracy, at 99.9%. The performance transformer-generated and Midjourney-generated images
of the models in detecting differences between the real and combined was 80.86% for UFD, whereas the highest real
AI-generated images is, to some extent, degraded, because accuracy was 99.85% for the no-down method, which was
the image-encoder features of the real and AI-generated similar to the results on the diffusion-generated images.
images become less distinct when the detection model is Of the six methods, UFD had the highest overall accuracy.
trained on the LDM training dataset than when trained on UFD showed good performance on the images generated by
the ProGAN training dataset. CNNDetection achieved the the transformer, which has a completely different structure
highest total accuracy, which was 85.7%. The diffusion- from the LDM with which the training images were
generated-image dataset that resulted in the lowest detection generated; however, its performance on real images was
rate was the IF-generated-image dataset, with an average lower than those of the other methods. Other methods based
accuracy of 19.98%. A comparison between the detection on artifact features have shown limited detection performance
accuracies of the models on images generated by GAN on images generated by DALL-E mini, DALL-E, and
and diffusion revealed that all artifact methods exhibited a Taming Transformers, but not on images generated by
decrease in performance. This implies that images produced DALL-E 2.

VOLUME 12, 2024 62617


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

TABLE 4. Transformer-generated-image, Midjourney-generated-image, and real-image detection accuracies of detection methods trained on LDM training
dataset.

FIGURE 3. Comparison of total detection accuracies on JPEG-compressed images for each test dataset.

D. RESULTS ON JPEG-COMPRESSED IMAGES quality decreased. The detection performance of the no-down
Fig. 3 illustrates the detection performance of the different method on the JPEG-compressed images slightly increased
methods on JPEG-compressed images. by 1.44% at 50%-quality JPEG compression compared to its
detection performance on the original images. On the other
1) GAN-GENERATED IMAGES hand, UFD exhibited a detection accuracy of 74.14% on the
images with 50%-quality JPEG compression, corresponding
First, we can see from Fig. 3(a) that, for most of the
to a 9.46% loss of detection accuracy compared to on
detection methods working on the GAN-generated images,
the original images, the largest decrease among the tested
the detection rate decreased as the JPEG compression
methods.
quality decreased. In particular, ViT exhibited the largest
decrease in accuracy, with a difference of 29.93% between 3) TRANSFORMER-GENERATED AND
no JPEG compression and 50%-quality JPEG compression. MIDJOURNEY-GENERATED IMAGES
DIF performed better when the JPEG compression quality According to Fig. 3(c), which illustrates the detection
was 90% than on the original images, but the overall accuracy performance of the tested methods on transformer-generated
was in the 50% range, and thus, this detection method can images, all methods except ViT and UFD performed better
hardly be considered robust to JPEG compression. For the when the JPEG compression quality was set to 50% than
JPEG-compressed GAN-generated images, most of the tested on the original images. Although the tested methods had
methods exhibited decreased performance. limited detection performance on the transformer-generated
images, it is feasible to conclude that the JPEG compression
2) DIFFUSION-GENERATED IMAGES decreased the image quality and facilitated the recognition
From Fig. 3(b), we can see that, for the diffusion-generated of comprehensive features in the AI-generated images.
images, the detection accuracies of CNNDetection, PSM, As a result, the no-down method emerged as the most
ViT, DIF, and UFD decreased as the JPEG compression robust to JPEG compression, with the least degradation

62618 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

FIGURE 4. Comparison of total detection accuracies on Gaussian-blurred images for each test dataset.

in detection performance on JPEG-compressed diffusion- sigma value of 3 was 24.84% worse than on the original, and
generated, transformer-generated, and Midjourney-generated for UFD, it was 32.04% worse.
images. CNNDetection was the next most robust to JPEG-
compressed images. The same JPEG augmentations we 3) TRANSFORMER-GENERATED AND
applied to CNNDetection and PSM were also applied to UFD MIDJOURNEY-GENERATED IMAGES
via training. However, UFD did not perform as well as the The results for the transformer-generated and Midjourney-
other two methods on the JPEG-compressed images. generated images in Fig. 4(c) show an 11.94% decrease in
performance for DIF and a 31.47% decrease in performance
E. RESULTS ON GAUSSIAN-BLURRED IMAGES for UFD at a sigma value of 3 compared to their perfor-
Fig. 4 illustrates the variation in total accuracy for different mance on the original. Similar to what it exhibited on the
values of Gaussian blur sigma. Gaussian-blurred diffusion-generated images, UFD showed a
significant drop in performance on the transformer-generated
1) GAN-GENERATED IMAGES images with a weak Gaussian blur of sigma value 1.
Fig. 4(a), illustrating the result of applying Gaussian blur to There were no noteworthy performance differences for PSM
the GAN-generated images, shows that the differences in the and ViT across various sigma values. On the other hand,
detection performance on the original and at sigma 3 were CNNDetection showed a 5.13% better performance at sigma
9.48% for PSM and 24.06% for UFD, indicating significant value 3 than on the original, whereas the no-down method
decreases in the detection performance. By contrast, the other showed 5.33% worse performance at sigma value 1 than on
methods did not exhibit significant decreases in performance. the original, but improved at sigma values 2 and 3, resulting
The performance of CNNDetection, no-down, ViT, and in a 6.43% improvement in detection performance at sigma
DIF did not degrade much as Gaussian blur intensity was 3. Overall, increases in the Gaussian blur intensity had the
increased. least impact on the detection performance of CNNDetection
and the no-down method. By contrast, UFD was found to be
2) DIFFUSION-GENERATED IMAGES weak on Gaussian blurred images.
Fig. 4(b) shows the detection results on the diffusion-generated
images after Gaussian blur was applied. DIF exhibited a VII. VISUALIZATION
significant decrease in performance of 17.15% even when A. GRAD-CAM
only a weak blur with a sigma value of 1 was applied, whereas To identify which parts of the image are detected as artifacts
UFD demonstrated a detection performance that was 20.14% by artifact-feature detection, we performed a Grad-CAM [8]
worse at a Gaussian blur sigma value of 1 than on the original visualization of CNNDetection and the no-down method, two
image. On the other hand, when the detection performance of of the best-performing artifact-feature methods in this study.
the no-down method on the original images and at a Gaussian Grad-CAM is a visualization method that uses gradients to
blur sigma value of 3 were compared, it was found to not determine the weight of a layer, revealing the specific areas
only have not been reduced but, rather, increased. At a sigma of an image on which the model has focused. We set the
value of 3, CNNDetection, PSM and ViT exhibited 2.39%, target layer of Grad-CAM to the last layer of ResNet50,
9.49% and 3.03% lower detection performance, respectively, which is the structure of both CNNDetection and the no-down
than on the original. For DIF, the detection performance at a method. Fig. 5, 6, and 7 display the Grad-CAM visualization

VOLUME 12, 2024 62619


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

FIGURE 5. Grad-CAM visualization and detection results on GAN test dataset.

results. The first and fourth columns show an original image and the no-down method trained on the ProGAN training
from each dataset, whereas the second and fifth columns dataset, we can see that CNNDetection and the no-down
show the Grad-CAM results for CNNDetection, and the third method have different areas of focus in the image. CNNDe-
and sixth columns show the Grad-CAM results for the no- tection concentrated on a few larger areas, whereas the
down method. Below each Grad-CAM image, we show the no-down method concentrated on several smaller areas.
visualization results on how the method classified that image, Given that the no-down method subtracts a few pooling
coloring it green if it was classified correctly and red if it was layers from its model structure, it can be assumed that
classified incorrectly. this type of Grad-CAM image appears because it captures
In Fig. 5, which shows the Grad-CAM results for the larger features than the original ResNet50 structure. In Fig. 6
classification of GAN-generated images by CNNDetection and 7, the Grad-CAM results for the classification of the

62620 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

FIGURE 6. Grad-CAM visualization and detection results on diffusion test dataset.

FIGURE 7. Grad-CAM visualization and detection results on transformer test dataset.

diffusion-generated, transformer-generated, and Midjourney- training dataset more clearly show the difference in the
generated images by the two methods trained on the LDM focus of the two methods. Generally, CNNDetection focused

VOLUME 12, 2024 62621


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

on parts of the object in the image and background, First, in Fig. 8(a), which illustrates the t-SNE results for the
whereas the no-down method focused along the edges of GAN-generated images as classified by the CNNDetection
the object. Neither method showed specific difference in and UFD methods, we can see that the cluster of ProGAN
correct or incorrect classification based on the area of images used for training CNNDetection is located far from
focus. On the other hand, although the images generated the cluster of real images in the feature space. We can also see
by the LDM and ADM are both bird images, and both that the unseen GAN-generated image has a boundary that
methods focused on similar regions, the LDM-generated separates it from the real images to some extent. Fig. 8(a)
image was correctly classified, whereas the ADM-generated demonstrates that, in t-SNE, UFD is able to cluster each
image was incorrectly classified. Thus, it can be concluded GAN model more effectively than CNNDetection; it can
that the presence of artifacts in the detection regions of the be observed that the StarGAN-generated and StarGAN2-
AI-generated images used for training is more important generated images are clustered in the upper-right corner,
than the selection of image regions by the detection method. whereas the StyleGAN3-generated and EG3D-generated are
In Fig. 5 and 7, although the original images from CC3M, also clustered at the top. Based on the detection performance
COCO, and ImageNet are the same, the Grad-CAM results of UFD on the GAN-generated images, as outlined in Table 2,
for the two methods are different. Furthermore, CNNDe- it can be concluded that clustering quality does not always
tection trained on ProGAN-generated images detected the correlate with detection performance. However, owing to
COCO real image incorrectly as an AI-generated image, the nature of UFD, which detects AI-generated images
whereas CNNDetection trained on LDM-generated images by comparing the cosine distance to the feature bank of
detected it correctly as a real image. This demonstrates real images with the cosine distance to the feature bank
that detection results can vary depending on the training of AI-generated images, clustering quality tends to affect
dataset. performance to some extent. In Fig. 8(b), we can notice that
CNNDetection trained on the LDM-generated-image training
B. T-SNE dataset separates the LDM-generated images from the real
To see how the artifact-feature method and image-encoder- images, whereas the images generated by other diffusion
feature method divide the generated images differently in models are evenly distributed with the real images. On the
feature space, we visualized them using t-SNE [9]. T-SNE other hand, the t-SNE result for UFD in Fig. 8(b) shows
is a technique for dimensionality reduction that enables the that the LDM-generated images are more scattered, whereas
visualization of data clustering. We chose CNNDetection as the Lexica-generated images are distinctly clustered from the
the artifact-feature method and UFD as the image-encoder- real images. Fig. 8(c) illustrates that, for both CNNDetection
feature method to visualize using t-SNE, and we extracted and UFD, the Taming-Transformers-generated images are
feature values from the previous layer of the classification highly clustered. Furthermore, for CNN detection, the
layer. CNNDetection is composed of a ResNet50 structure, Midjourney-generated images are mixed with real images,
with a feature-value size of [2048]. By contrast, UFD is whereas for UFD, the Midjourney-generated images are
composed of a CLIP:ViT/L-14 structure, with a feature-value clustered mostly separately from the real images. Overall,
size of [768]. The images used for the t-SNE visualization in Fig. 8, the distribution of real images shows that CNNDe-
were 600 real images and 600 AI-generated images from tection leads to more clustering to one side than does UFD.
each of the three test datasets, i.e., GAN-generated, diffusion- In summary, CNNDetection clustered well on images
generated, and transformer-generated images, for a total generated by ProGAN, LDM, and Taming Transformers,
of 1200 images per experiment. The datasets for t-SNE whereas UFD clustered well on images from a wider
visualization were constructed for each generation model range of generative models. In addition, UFD leads to
structure as follows: for the 600 real images, 200 images better clustering for GAN-generated images compared to for
from each of the three real-image datasets were obtained; diffusion-generated and transformer-generated images. This
for the GAN-generated images, 60 images from each of is because GAN-generated images are often limited to a
the 10 GAN-generated-image datasets; for the diffusion- specific domain. CNNDetection is a method for detecting
generated images, 75 images from each of the eight diffusion- GAN-generated images that performs well by identifying
generated-image datasets; and for the transformer-generated images that share similar artifacts with the AI-generated
and Midjourney-generated images, 120 images from each images used for training. This is achieved by learning to
of the four transformer-generated-image datasets and a widen the gap between real and AI-generated images in the
Midjourney-generated-image dataset. The t-SNE results for feature space. On the other hand, the UFD image encoder
the GAN-generated images used feature values from the method uses the CLIP image encoder to represent images as
model trained on the ProGAN training dataset, whereas text features. This feature enables the detection of generated
the t-SNE results for the diffusion-generated, transformer- images that resemble the domain of the AI-generated images
generated, and Midjourney-generated images used feature used for training. Thus, it appears that images generated
values from the model trained on the LDM training dataset. by diffusion, transformer, and Midjourney, which were not
The t-SNE results of each dataset for the CNNDetection and included in the training, could be detected with certain
no-down methods are shown in Fig. 8. generality.

62622 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

FIGURE 8. T-SNE visualization of CNNDetection and UFD methods on different generative-model image datasets.

VOLUME 12, 2024 62623


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

VIII. DISCUSSION For further research on trusted AI-generated-image detection


A. AI-GENERATED-IMAGE DETECTION PERFORMANCE models, it is possible to create an ensemble model that
With regard to performance on GAN-generated images, the combines the advantages of the artifact and image-encoder
detection methods based on artifact features demonstrated methods.
mostly good detection performance, of which the no-down
method exhibited overwhelmingly superior detection per- B. ROBUSTNESS TO DATA AUGMENTATION
formance with a total accuracy of 94.54%. UFD, which The change in detection performance with JPEG compression
detects based on image-encoder features, showed good and Gaussian blur varied slightly depending on the detection
performance on images generated by various unseen GAN method and the structure of the generative model that
models, especially for CycleGAN and GauGAN, but had produced the input image. DIF and ViT, which did not
weak performance on images generated by StyleGAN2 and include JPEG compression and Gaussian blur as data
StyleGAN3. For images generated by diffusion models, augmentations during training, generally performed worse
CNNDetection and the no-down method performed well on than the other methods on augmented images. As the quality
images generated by latent diffusion and Stable Diffusion of JPEG compression decreased, the detection performance
models, whereas UFD performed well on images generated on GAN-generated images mostly decreased; however, the
by unseen models such as ADM, GLIDE, and IF. Excluding no-down method exhibited the most consistent detection per-
images generated by latent diffusion and Stable Diffusion- formance on diffusion-generated and transformer-generated
V1, DIF did not properly detect diffusion-generated images, images, followed by CNNDetection. When Gaussian blur
classifying them as almost real images. Compared to GAN was applied, CNNDetection and the no-down method showed
models, diffusion models generate images of various domains the most stable performance on images generated from the
that are expected to share fewer spectrum features. ViT, which three different generative-model structures. By contrast, UFD
was used as an additional detection method in our study, did had a significant decrease in performance when detecting
not perform as well as other methods as an AI-generated- Gaussian-blurred images. To ensure the robustness of the
image detection model. Because the inductive bias in the detection method against attacks that attempt to evade
CNN structure is not present in the transformer structure [7], detection using JPEG-compressed images and Gaussian blur,
it is estimated that, for proper performance, it is necessary to it is necessary to use JPEG compression and Gaussian blur as
train the model on a larger number of images than that present data augmentation during the training process.
in the dataset we used for training the detection models.
In addition, we found that even though the detection meth- C. VISUALIZATION OF DETECTION METHODS
ods were trained on LDM-generated images, the detection Based on the Grad-CAM results for CNNDetection and
performance decreases significantly for images generated by the no-down method, we noticed that even though they
fine-tuned Stable Diffusion models such as Lexica, which were trained on the same dataset, the focus areas on the
is essentially the same as an LDM. As more open-source input images differed depending on the model structure.
image generative models are fine-tuned, detecting images Furthermore, seeing that the detection results of the detection
generated by these models will become progressively more method are less related to the focus areas on the input
challenging. Therefore, the detection model should be able images, it can be inferred that it is more important to
to capture features that are intrinsic to the generated image determine the presence of artifacts in the detection areas on
that do not change with fine-tuning the generative model. the AI-generated images used to train the detection models
Artifact-based methods were mostly good at detecting images rather than the selection of focus areas that affect the detection
from generative models that were not significantly different results. With regard to the detection performance, we found
from latent diffusion, which is the model that generated that the same image can be classified correctly or incorrectly
the images on which they were trained, but were not as depending on the training dataset of the detection method,
good at detecting generated images from state-of-the-art showing the importance of configuring the training dataset of
generative models with significantly different structures, the AI-generated-image detection method.
such as IF and Midjourney. UFD, an image-encoder-based After that, we used t-SNE visualization to investigate the
method, performed reasonably well on diffusion-generated data distribution difference in feature space between artifact
image datasets except on the SD-V2-generated images, where detection and image-encoder detection for real and AI-
its performance exceeded 70%. In particular, UFD showed generated images. Because the distributions of the ProGAN
superior performance on untrained images, which exceeded and LDM images used for training were far from the
84% on all transformer-generated images. However, UFD distributions of the real images, CNNDetection was well
had the limitation of performing worse than artifact methods able to detect images with similar artifacts among the
on real images. It can be concluded that the artifact method AI-generated images that were not used for training, i.e.,
is suitable for detecting AI-generated images that share images with distributions close to the AI-generated images
similar features with the training images, yielding very high that were used for training. It signifies that the artifact method
accuracies, whereas the image-encoder method exhibited assumes that the AI-generated image being classified has
good performance that generalizes to most generated images. similar artifacts to those of the images on which it was

62624 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

TABLE 5. Training parameters for detection methods used in our study.

trained. This can be seen as a limitation when detecting performance against additional AI-generated-image detec-
images from new generative models. On the other hand, tion methods. After that, we will use the LIME [63]
the t-SNE results from UFD indicate that the CLIP image visualization technique to analyze the areas that the detection
encoder enables AI-generated images that were not included method focused on with more precision.
in the training to cluster by model in the feature space
on some level. Although the degree of clustering was not APPENDIX A
always proportional to the detection performance, it was TRAINING SETTING
possible to demonstrate some degree of generalized detection We used a single RTX A5000 GPU as the computing
performance across generative models. A higher-performing environment for training our selected detection methods.
detector can probably be created if CLIP can exhibit this level The training parameters for the five detection methods, i.e.,
of clustering performance on AI-generated images without excluding the no-down method, are shown in Table 5. For the
training on them and if a dataset with a larger number of no-down method, models pretrained on ProGAN-generated
AI-generated images can be built. images and LDM-generated images were obtained instead.
We used the same parameters for training the models on
IX. CONCLUSION AND FUTURE WORK the ProGAN-generated-image training dataset and the LDM-
In our study, we compared the detection performance of generated-image training dataset. We referred to the post [64]
six AI-generated-image detection methods on a total of by Hugging Face for the code and parameters for fine-tuning
23 AI-generated-image datasets consisting of GAN- ViT. As described in Section V-A, most of the parameters
generated, diffusion-generated, transformer-generated, and of each method adopted the default values from the original
Midjourney-generated images. We categorized the six training code, except for the data augmentation parameter.
detection methods into three groups: artifact, spectrum, Additionally, DIF has a parameter to set how many real and
and image encoder, based on the image features used for AI-generated images to use when training; its default value is
detection, and analyzed the effectiveness of each detection 512 images. However, given that it may not perform very well
method in detecting AI-generated images according to when trained with fewer images than those used for the other
their respective categories. We then tested how detection methods, we set this parameter to 360K for the ProGAN-
performance changed when JPEG compression and Gaussian generated-image training dataset and 180K for the LDM-
blur were applied to the images. We also used visualization generated-image training dataset and conducted additional
techniques to show the differences between artifact-feature- training and testing. In additional tests, the performance
based detection methods in terms of their focus regions on was higher when 360K and 180K images were used than
an image, and between detection methods using different when only 512 images were used. Therefore, we adjusted
features in terms of their distribution of images in feature the parameter for the number of training samples for DIF to
space. Subsequently, we correlated these differences with match the number of training dataset images used by the other
detection performance. In summary, our analysis focused on methods instead of simply following the default value.
the objective evaluation of the detection performance of and
proposed future directions for AI-generated-image detection APPENDIX B
methods. TEST SETTING
Detecting AI-generated images is increasingly challenging After training for a set number of epochs per method, the
owing to the continual development of more image generative model at the epoch with the best evaluation results on the
models that produce better and higher-quality images. validation set was used for the test. Given that DIF has no
Therefore, our next research study will collect images from validation set in the training process, we tested all models
state-of-the-art generative models, such as Stable Diffusion every 5 epochs and then used the model with the best results.
XL [61] and DALL·E 3 [62], and evaluate their detection When testing the detection models on the images generated

VOLUME 12, 2024 62625


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

by GAN, diffusion, transformer, and Midjourney, we all used [15] Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, ‘‘StarGAN V2: Diverse image
the same test parameters for each method, which are as synthesis for multiple domains,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2020, pp. 8185–8194.
follows: [16] T. Karras, S. Laine, and T. Aila, ‘‘A style-based generator architecture for
• CNNDetection generative adversarial networks,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Batch size 1, No resize, No crop, No JPEG compression, Pattern Recognit. (CVPR), Jun. 2019, pp. 4396–4405.
No Gaussian blur. [17] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila,
‘‘Analyzing and improving the image quality of styleGAN,’’ in Proc.
• No-down IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
Batch size 1, No resize, No crop, No JPEG compression, pp. 8107–8116.
No Gaussian blur. [18] T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, and
T. Aila, ‘‘Alias-free generative adversarial networks,’’ in Proc. Adv. Neural
• Patch Selection Module
Inf. Process. Syst., vol. 34, 2021, pp. 852–863.
Batch size 1, No resize, No crop, No JPEG compression, [19] E. R. Chan, C. Z. Lin, M. A. Chan, K. Nagano, B. Pan, S. de Mello, O.
No Gaussian blur. Gallo, L. Guibas, J. Tremblay, S. Khamis, T. Karras, and G. Wetzstein,
• Vision Transformer ‘‘Efficient geometry-aware 3D generative adversarial networks,’’ in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022,
Batch size 1, 224 × 224 resize (Interpolation Mode: pp. 16123–16133.
Bilinear), No crop. [20] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer,
• Deep Image Fingerprint ‘‘High-resolution image synthesis with latent diffusion models,’’ in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022,
Batch size 1, No resize, 256 × 256 center crop. pp. 10674–10685.
• Universal Fake Detect [21] Compvis. (2023). Stable-Diffusion. Accessed: Aug. 03, 2023. [Online].
Batch size 1, No resize, 224×224 center crop, No JPEG Available: https://github.com/CompVis/stable-diffusion
compression, No Gaussian blur. [22] C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman,
M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman,
P. Schramowski, S. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk,
REFERENCES and J. Jitsev, ‘‘LAION-5B: An open large-scale dataset for training next
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, generation image-text models,’’ in Proc. Adv. Neural Inf. Process. Syst.,
S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial nets,’’ in vol. 35, 2022, pp. 25278–25294.
Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014, pp. 1–9. [23] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal,
[2] J. Ho, A. Jain, and P. Abbeel, ‘‘Denoising diffusion probabilistic models,’’ G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I.
in Proc. NIPS, vol. 33. Vancouver, BC, Canada: Curran Associates, 2020, Sutskever, ‘‘Learning transferable visual models from natural language
pp. 6840–6851. supervision,’’ in Proc. Int. Conf. Mach. Learn., vol. 139, 2021,
[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, pp. 8748–8763.
Ł. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv. [24] Styability-AI. (2023). Stablediffusion. Accessed: Aug. 03, 2023. [Online].
Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11. Available: https://github.com/Stability-AI/stablediffusion
[4] (2023). Midjourney. Accessed: Sep. 08, 2023. [Online]. Available: [25] Realistic Vision V2.0. Accessed: Sep. 29, 2023. [Online]. Available:
https://www.midjourney.com/ https://civitai.com/models/4201?modelVersionId=29460
[5] K. Roose. (2022). An AI-Generated Picture Won an Art Prize. [26] (2023). Civitai. Accessed: Sep. 29, 2023. [Online]. Available:
Artists Aren’t Happy. Accessed: Sep. 22, 2023. [Online]. Available: https://civitai.com/models
https://www.nytimes.com/2022/09/02/technology/ai-artificial- [27] (2023). Lexica. Accessed: Aug. 17, 2023. [Online]. Available:
intelligence-artists.html https://lexica.art/
[6] R. Andrew, B. Ross Sorkin, S. Warner, M. Kessler, L. de la [28] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew,
Merced, E. Hirsch, and E. Livni. (2023). An AI-Generated Spoof I. Sutskever, and M. Chen, ‘‘GLIDE: Towards photorealistic image
Rattles the Markets. Accessed: Sep. 22, 2023. [Online]. Available: generation and editing with text-guided diffusion models,’’ 2021,
https://www.nytimes.com/2023/05/23/business/ai-picture-stock- arXiv:2112.10741.
market.html [29] (2023). If Deep Floyd. Accessed: Oct. 22, 2023. [Online]. Available:
[7] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, https://github.com/deep-floyd/IF
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly,
[30] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou,
J. Uszkoreit, and N. Houlsby, ‘‘An image is worth 16×16 words:
W. Li, and P. J. Liu, ‘‘Exploring the limits of transfer learning with a
Transformers for image recognition at scale,’’ 2020, arXiv:2010.11929.
unified text-to-text transformer,’’ J. Mach. Learn. Res., vol. 21, no. 1,
[8] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
pp. 5485–5551, 2020.
D. Batra, ‘‘Grad-CAM: Visual explanations from deep networks via
gradient-based localization,’’ in Proc. IEEE Int. Conf. Comput. Vis. [31] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen,
(ICCV), Oct. 2017, pp. 618–626. and I. Sutskever, ‘‘Zero-shot text-to-image generation,’’ in Proc. Int. Conf.
Mach. Learn. (ICML), Jul. 2021, pp. 8821–8831.
[9] L. Van der Maaten and G. Hinton, ‘‘Visualizing data using t-SNE,’’
J. Mach. Learn. Res., vol. 9, no. 11, pp. 1–22, 2008. [32] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter,
[10] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, ‘‘Unpaired image-to-image ‘‘Gans trained by a two time-scale update rule converge to a local Nash
translation using cycle-consistent adversarial networks,’’ in Proc. IEEE Int. equilibrium,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017,
Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2242–2251. pp. 1–10.
[11] T. Karras, T. Aila, S. Laine, and J. Lehtinen, ‘‘Progressive grow- [33] T. Hinz, M. Fisher, O. Wang, and S. Wermter, ‘‘Improved techniques for
ing of GANs for improved quality, stability, and variation,’’ 2017, training single-image GANs,’’ in Proc. IEEE Winter Conf. Appl. Comput.
arXiv:1710.10196. Vis. (WACV), Jan. 2021, pp. 1–10.
[12] A. Brock, J. Donahue, and K. Simonyan, ‘‘Large scale GAN training for [34] B. Dayma, S. Patil, P. Cuenca, K. Saifullah, T. Abraham, P. L Khac,
high fidelity natural image synthesis,’’ 2018, arXiv:1809.11096. L. Melas, and R. Ghosh. (2021). Dalle Mini. [Online]. Available:
[13] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, ‘‘Semantic image synthesis https://github.com/borisdayma/dalle-mini
with spatially-adaptive normalization,’’ in Proc. IEEE/CVF Conf. Comput. [35] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, ‘‘Hier-
Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 2332–2341. archical text-conditional image generation with CLIP latents,’’ 2022,
[14] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, ‘‘StarGAN: arXiv:2204.06125.
Unified generative adversarial networks for multi-domain image-to-image [36] P. Esser, R. Rombach, and B. Ommer, ‘‘Taming transformers for high-
translation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., resolution image synthesis,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Jun. 2018, pp. 8789–8797. Pattern Recognit. (CVPR), Jun. 2021, pp. 12868–12878.

62626 VOLUME 12, 2024


D. Park et al.: Performance Comparison and Visualization of AI-Generated-Image Detection Methods

[37] S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, ‘‘CNN- [58] P. Dhariwal and A. Nichol, ‘‘Diffusion models beat GANs on image
generated images are surprisingly easy to spot. . . for now,’’ in Proc. synthesis,’’ in Advances in Neural Information Processing Systems, vol. 34.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, Red Hook, NY, USA: Curran Associates, 2021, pp. 8780–8794.
pp. 8692–8701. [59] P. Umesh, ‘‘Image processing in Python,’’ CSI Commun, vol. 23, 2012.
[38] X. Zhang, S. Karaman, and S.-F. Chang, ‘‘Detecting and simulating [60] G. Bradski, ‘‘The openCV library,’’ Dr. Dobb’s J. Softw. Tools, 2000.
artifacts in GAN fake images,’’ in Proc. IEEE Int. Workshop Inf. Forensics [61] D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller,
Secur. (WIFS), Dec. 2019, pp. 1–6. J. Penna, and R. Rombach, ‘‘SDXL: Improving latent diffusion models for
[39] D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, and L. Verdoliva, ‘‘Are high-resolution image synthesis,’’ 2023, arXiv:2307.01952.
GAN generated images easy to detect? A critical analysis of the state-of- [62] (2023). Dall’e 3. Accessed: Oct. 10, 2023. [Online]. Available:
the-art,’’ in Proc. IEEE Int. Conf. Multimedia Expo. (ICME), Jul. 2021, https://openai.com/dall-e-3
pp. 1–6. [63] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘‘Why should I trust you?’
[40] L. Chai, D. Bau, S.-N. Lim, and P. Isola, ‘‘What makes fake images explaining the predictions of any classifier,’’ in Proc. 22nd ACM SIGKDD
detectable? Understanding properties that generalize,’’ in Computer Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 1135–1144.
Vision—(ECCV). Cham, Switzerland: Springer, 2020, pp. 103–120. [64] N. Raw. (2023). Fine-Tune Vit for Image Classification With
[41] Y. Ju, S. Jia, L. Ke, H. Xue, K. Nagano, and S. Lyu, ‘‘Fusing global and Transformers. Accessed: Jun. 19, 2023. [Online]. Available:
local features for generalized AI-synthesized image detection,’’ in Proc. https://huggingface.co/blog/fine-tune-vit
IEEE Int. Conf. Image Process. (ICIP), Oct. 2022, pp. 3465–3469.
[42] S. Sinitsa and O. Fried, ‘‘Deep image fingerprint: Towards low
budget synthetic image detection and model lineage analysis,’’ 2023,
arXiv:2303.10762.
[43] F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi, ‘‘Do GANs leave
artificial fingerprints?’’ in Proc. IEEE Conf. Multimedia Inf. Process. Retr.
(MIPR), Mar. 2019, pp. 506–511.
DAEEOL PARK received the B.S. degree in
[44] M. Joslin and S. Hao, ‘‘Attributing and detecting fake images generated
by known GANs,’’ in Proc. IEEE Secur. Privacy Workshops (SPW), software from Soongsil University, South Korea,
May 2020, pp. 8–14. in 2023, where he is currently pursuing the M.S.
[45] L. Nataraj, T. Manhar Mohammed, S. Chandrasekaran, A. Flenner, J. degree in software convergence. His research inter-
H. Bappy, A. K. Roy-Chowdhury, and B. S. Manjunath, ‘‘Detecting ests include generative models and AI security.
GAN generated fake images using co-occurrence matrices,’’ 2019,
arXiv:1903.06836.
[46] X. Xuan, B. Peng, W. Wang, and J. Dong, ‘‘On the generalization of GAN
image forensics,’’ in Biometric Recognition. Cham, Switzerland: Springer,
2019, pp. 134–141.
[47] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2016, pp. 770–778.
[48] M. Boroumand, M. Chen, and J. Fridrich, ‘‘Deep residual network for
steganalysis of digital images,’’ IEEE Trans. Inf. Forensics Security,
vol. 14, no. 5, pp. 1181–1193, May 2019.
HYUNSIK NA is currently pursuing the bachelor’s
[49] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-Net: Convolutional networks
for biomedical image segmentation,’’ in Lecture Notes in Computer
degree with Soongsil University, South Korea.
Science. Cham, Switzerland: Springer, 2015, pp. 234–241. He is conducting research as an Undergraduate
[50] U. Ojha, Y. Li, and Y. Jae Lee, ‘‘Towards universal fake image detectors that Researcher with the AI Security Laboratory,
generalize across generative models,’’ in Proc. IEEE/CVF Conf. Comput. Department of Software, Soongsil University.
Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 24480–24489. His research interests include AI trustworthiness,
[51] X. Guo, X. Liu, Z. Ren, S. Grosz, I. Masi, and X. Liu, ‘‘Hierarchical AI security, AI robustness, edge AI, data privacy,
fine-grained image forgery detection and localization,’’ in Proc. IEEE/CVF and computer vision.
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 3155–3165.
[52] Z. Lu, D. Huang, L. Bai, J. Qu, C. Wu, X. Liu, and W. Ouyang, ‘‘Seeing
is not always believing: Benchmarking human and model perception of
AI-generated images,’’ 2023, arXiv:2304.13023.
[53] R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and
L. Verdoliva, ‘‘On the detection of synthetic images generated by diffusion
models,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
(ICASSP), Jun. 2023, pp. 1–5. DAESEON CHOI (Member, IEEE) received the
[54] Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and B.S. degree in computer science from Dongguk
D. H. Chau, ‘‘DiffusionDB: A large-scale prompt gallery dataset for text- University, South Korea, in 1995, the M.S. degree
to-image generative models,’’ 2022, arXiv:2210.14896. in computer science from Pohang Institute of Sci-
[55] P. Sharma, N. Ding, S. Goodman, and R. Soricut, ‘‘Conceptual captions:
ence and Technology, South Korea, in 1997, and
A cleaned, hypernymed, image alt-text dataset for automatic image
the Ph.D. degree in computer science from Korea
captioning,’’ in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics,
2018, pp. 2556–2565.
Advanced Institute of Science and Technology,
[56] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, South Korea, in 2009. He was a Professor with
P. Dollár, and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in the Department of Medical Information, Kongju
context,’’ in Computer Vision–ECCV. Cham, Switzerland: Springer, 2014, National University, South Korea, from September
pp. 740–755. 2015 to August 2020. He is currently a Professor with the Department of
[57] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: Software, Soongsil University, South Korea. His research interests include
A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. identity management and information security.
Vis. Pattern Recognit., Jun. 2009, pp. 248–255.

VOLUME 12, 2024 62627

You might also like