1 Introduction
Social media has evolved into an important platform for disseminating news and images about public incidents [
49,
65]. There are more than 5 billion users on social media [
18,
29]. Social media users publish a large amount of data about public events, and during the COVID-19 pandemic, it served as a crucial tool for data collection, providing valuable insights into the unfolding situations [
44,
70]. These crowdsourced images may contain critical information about public incidents, i.e., road accidents, crime scenes and violent scenes, and so on [
83]. These crucial images are increasingly used in different applications, i.e., scene analysis, scene reconstruction and image forensics [
81]. Utilizing these social media images can significantly facilitate the investigators to explore unfolding situations that might have led to the incident [
2].
Trustworthiness of crowdsourced social media images has hitherto been a prime issue [
50,
74]. An image shared on social media may be
modified to introduce misleading information, which could result in serious consequences, including social and political unrest [
19,
57]. The modifications
1 in an image can be of various types and may be within the picture or in the posted information with the image. For example, an image may be original but the metadata has been tampered with (e.g., date or city when/where the image was taken). In the latter, the image itself is original but it is the metadata or image description that is modified [
46]. Figure
1 shows some examples of misleading social media images. These images are legitimate but described in a misleading way. In some cases, changes are only intended for improving the image outlook, which does not categorize it as a
fake [
68]. Therefore, the changes need to be extensively analyzed from multiple perspectives before categorizing them as a fake. We define fake as a context-sensitive parameter. A fake image in one context may not be fake in other contexts.
Detecting untrustworthy
2 images has traditionally been addressed using image processing and information retrieval techniques [
15,
31,
62,
78]. For instance, a neural-network-based approach has been proposed to detect fake images [
33]. It utilizes residual signals from chrominance components, such as YCbCr and Lab, to acquire robust deep representations through a meticulously designed
convolutional neural network (CNN). Some solutions are based on image forensics [
15,
78]. An image forensic model is proposed in Reference [
78] to learn intrinsic features of an image. Some state-of-the-art approaches also leverage metadata along with the image content to discover changes in an image [
73]. These approaches are usually costly in terms of computational power. Some lightweight service-based approaches have recently been proposed to deal with crowdsourced images [
23,
48]. For instance, a preliminary service-based trust framework is proposed in Reference [
3], which is based on users’ comments on a social media post to assess the credibility of the image. Another work considers credibility of a user’s stance embedded in the comments on a social media post to determine the credibility of image [
1]. However, the credibility of images may not be completely assessed based only on the user’s stance. Fake posts on social media can get supportive comments from other users [
20]. The stance of credible users may also be biased [
84]. To address these limitations,
we propose to assess trust among social media images using a lightweight objective framework consisting of determining changes and updates in the images. In this respect, we leverage an image’s meta-information
3 to discover and characterize changes in the image. It is worth mentioning that some subtle changes in the colors and shades in an image are not always reflected in the image metadata. Therefore, we consider only those changes that may be reflected in the image metadata.
An image is a well-defined entity described by its visual components and metadata. We abstract an image as a
service having
functional and
non-functional attributes. Functional attributes are related to the actions to capture an image, i.e., pressing the shutter and switching picture/video modes, and so on. We represent an image’s metadata as its non-functional attributes. An image’s non-functional attributes are usually reflective of the content within the picture [
21,
38]. Editing an image may make it inconsistent with its non-functional attributes [
8,
28]. For instance, changing the background of a picture may make it inconsistent with the
GPS attributes in the image metadata [
24]. An attempt to hide the facts in the metadata may create some discrepancies among non-functional attributes [
27]. These discrepancies are not straightforward to identify as they are embedded systematically in the non-functional attributes. The qualitative nature of numerous non-functional attributes presents another challenge, as the discrepancies among them are predominantly semantic in nature. These discrepancies can be identified by looking into semantic meaning of attributes and the relationships among them. To address these challenges, we need a quantitative representation of non-functional attributes that is also reflective of their semantics. A framework is then required to compute semantic differences among non-functional attributes and to translate these discrepancies in to modifications.
We propose
Exif2Vec, a novel approach that leverages only the non-functional attributes of images to detect changes in social media images. It is worth mentioning that most social media platforms remove public access to an uploaded image’s metadata. We are proposing this framework from social media owners’ perspective, which allows us to assume that the metadata is available with the image. An image’s metadata comprises of many useful attributes that inform about the image and its creation. We primarily focus on the spatio-temporal and contextual attributes. The non-functional attributes are usually correlated with each other. For instance,
GPS location is a non-functional attribute, which is correlated to
country name and
city name. The correlated attributes can be changed collectively to introduce a systematic change. To capture these changes, we explore relationships among non-functional attributes. A pre-trained word embedding model is then employed to create a distributed representation of the attributes. The distributed representation places semantically similar attributes close to each other in a high-dimensional space [
7]. Afterwards, we isolate consistent and inconsistent pairs of attributes based on a similarity distance between two attributes. For instance, “Sydney-NSW” is a consistent pair of attributes, because the distance between Sydney and NSW is equal to the other city-state pairs. We further inspect these inconsistent pairs to discover if multiple inconsistencies are leading to a same misperception. For instance, the name of a
country,
city,
state, and
GPS coordinates can be changed collectively such that all attributes remain consistent to each other. The identified inconsistencies are then leveraged to quantify the severity of changes in an image. Below, we summarize our main contributions:
—
We propose Exif2Vec, a novel framework designed to detect changes in social media images solely relying on the metadata and the associated posted information accompanying the image. Whereas, state-of-the-art utilizes metadata in conjunction with the image content.
—
We employ a word-embeddings-based approach specifically Word2Vec to generate vector embeddings for the image non-functional attributes. These embeddings are then used to discover discrepancies in the non-functional attributes.
—
We propose a novel framework to translate discrepancies in the image’s non-functional attributes to the severity of changes in an image.
—
The proposed Exif2Vec is validated on two datasets, i.e., a context-based dataset and an image verification corpus. Results reflect up to 80% effectiveness of the proposed approach.
The proposed approach adeptly discovers a typical set of changes that manifest in the non-functional attributes. However, it could further enhance its coverage by considering specific alterations intrinsic to the image itself, such as shifts in tones, color intensity, or distortions. It is important to note that the proposed approach is not intended to compete with image processing-based methods but rather offer a complementary perspective. The main rationale behind the proposed approach is rooted in the fact that certain manipulations or alterations may leave detectable traces in the metadata. Leveraging this information allows for a computationally less expensive analysis compared to traditional image content-based methods. By focusing on metadata, we aim to provide an efficient and scalable solution that complements existing approaches. We acknowledge that this approach may have its limitations, particularly in cases where sophisticated image content manipulations are employed, we believe it offers unique advantages in scenarios where computational resources or time constraints may be a significant factor. It is important to highlight that the effectiveness of the proposed framework might be influenced by the extent to which non-functional attributes are available. In scenarios where there is a scarcity of such attributes, an alternate method involves extracting meta-information from the social media post. While this alternative approach might exhibit reduced precision due to potential uncertainties regarding the accuracy of the meta-information, it still presents valuable insights.
2 Motivating Scenario
Social-media has become an important tool to share news and information related to public incidents. However, many viral social media posts turned out to be fake in recent years [
40]. A recent study by
Massachusetts Institute of Technology (MIT) reveals that a fake post spreads substantially faster than real ones [
76]. More importantly, most fake images are accompanied with a manipulated text and image metadata [
16,
46,
60]. Indeed, an original image may also contain incorrect description of the image. Most of these posts contain deepfake text generated by deep learning models [
53]. An image’s metadata may reflect discrepancies with this deepfake text. Therefore, analyzing the text shared with an image and the image metadata may reflect changes in an image and its description.
Our motivating scenario involves a depiction of a plane crash that occurred in New York in 2009. Illustrated in Figure
2, the scene captures the evacuation of U.S. Airways Flight 1549 as it rests on the surface of the Hudson River. Intriguingly, this particular image was erroneously attributed to the missing Malaysian aircraft MH370 by Cable News Network in 2014. In this misleading post, the image itself is original but the post contains a false claim. Numerous cutting-edge solutions depend on image processing techniques to detect dubious content in social media images. These approaches center their attention on the visual content within the image, which can potentially lead to overlooking alterations that extend beyond the image itself. In response to this limitation, we put forth an innovative approach for recognizing untrustworthy images. Our method relies exclusively on the utilization of an image’s metadata along with the associated textual information. Exploring the image metadata may reveal a lot of inconsistencies in the metadata and the posted information. These inconsistencies may reflect the
trust of an image. For instance, in Figure
2, the image shared by Cable News Network is unedited and the metadata is reflective of the manipulations in the image description. The majority of social media platforms restrict public access to image metadata. We tackle this issue from the standpoint of social media owners, which permits us to operate under the assumption that metadata accompanies the image. The image metadata, as depicted in Figure
2, provide evidence that the image is linked to an entirely separate incident.
Figure
2 shows a very simple example of a cheap fake (a naive way to introduce changes) in which the inconsistencies between the metadata and the image description are straightforward to identify. However, in real case scenarios, the changes are embedded very systematically in images and its metadata. Image metadata is mostly changed to make it consistent with the changes in an image [
75]. For instance, if a background in an image is modified, the GPS tags in the metadata can be changed to make it consistent with the manipulated background in the picture. These types of changes are relatively hard to detect. Therefore, a framework is required to find inconsistencies in an image metadata that can reflect the trust of an image.
3 Related Work
Images in crowdsourced social media environments can be untrustworthy [
5]. Conventional strategies for detecting deceptive images primarily rely on the integration of image processing and machine learning methodologies, as highlighted by Patel et al. [
51]. A multi-modal approach (which uses multiple content and context type like text, visual, statistical, user profile, and network propagation) is proposed in Reference [
59] that utilizes new and upgraded models to detect fake social media images. A deep learning approach is proposed in Reference [
35] for detecting fake images by using contrastive loss. In the study detailed in Reference [
58], neural networks are employed to detect counterfeit images that are disseminated through various social media platforms. The work described in Reference [
79] employs a combination of traditional digital forensics techniques and artificial intelligence methods to uncover instances of image manipulation. Additionally, a block-oriented methodology for detecting Copy-Move Forgery is put into operation, as elucidated in Reference [
42]. In the publication by Guo et al. [
30], the proposal includes two methods for identifying counterfeit colorized images: one is based on histograms, and the other relies on feature encoding. In the work documented by Tanaka et al. [
61], a method utilizing robust hashing for the detection of counterfeit images is introduced. The proposed method with robust hashing is demonstrated to have a high fake-detection accuracy, even when multiple manipulation techniques are carried out. A convolutional neural network-based approach is proposed in Reference [
58] to spot fake images shared over social media platforms. Deep-learning methods for image forensics have been listed in Reference [
17]. Another strategy involves utilizing image metadata to identify fake WhatsApp images [
38]. However, this particular approach focuses solely on a limited set of spatio-temporal attributes from the image source. The image processing methods mentioned above exhibit impressive accuracy in identifying fraudulent elements within images; however, they demand substantial computational resources, as indicated by Dang et al. [
22]. The importance of computationally less intensive solutions for different social media applications is highlighted in Reference [
4]. In this article, we propose that a subset of trust in images can be derived by using only an image metadata.
Some recent studies claim that a subset of trust in social media images can be derived using comments on a post [
1,
3,
82]. A new crowdsourced image service trust model is proposed in Reference [
3]. The trustworthiness of an image service is measured based on the users’ stance. Textual features of social media images, i.e., comments and meta-data, e.g., spatio-temporal information are utilized to gather the trust-rate of the image service. A users’ stance and credibility-based crowdsourced image service trust model is proposed in Reference [
1]. The proposed model considers various indicators such as the stance embedded in the images’ comments, their meta-data, e.g., time, along with the users’ credibility. It models the interactions between commenters and sub-comments using different language-based models [
71]. These approaches are unable to capture modifications in an image, because misleading content on social media may receive positive comments from other users [
20]. Moreover, comments from credible users can be biased [
84]. We propose a more objective approach consisting of modifications and updates in an image to determine its likelihood of being fake.
Image metadata acts as an indicator of modifications made to an image. Our primary method for detecting forgery hinges solely upon the analysis of image metadata. In recent times, a multitude of approaches have arisen to uncover instances of image manipulation, drawing upon the insights provided by metadata information. A recent work utilizes image metadata and an ELA processor to detect forgery in images [
73]. The functioning of the proposed system relies on a neural network capable of identifying and processing image regions using a specific approach. Another work performs image provenance analysis based only on the metadata [
10]. The image provenance tree implicitly informs about modifications in an image. These solutions are based on a limited use of metadata along with other computer-vision-based approaches. Our proposed approach is difference in a sense that it is completely based on image metadata to ascertain the trust of an image service.
Image non-functional attributes exhibit a predominantly qualitative essence, encompassing features like city, country, state, and more. Detecting alterations within an image necessitates the identification of irregularities present in these non-functional attributes. These irregularities can be ascertained by exploring the semantic variations inherent in these attributes. Numerous cutting-edge studies have focused on extracting the semantic meanings behind these qualitative keywords.
Latent Semantic Analysis (LSA) stands out as a widely employed technique aimed at identifying dissimilarities between two documents. This method finds practical utility across a spectrum of domains, strategically addressing a variety of challenges. An illustrative example can be found in Reference [
14], where LSA is harnessed for text summarization. Within this context, a pioneering summarization approach is introduced, leveraging frequent item sets to encapsulate latent concepts inherent in the analyzed documents. Through the effective utilization of LSA, authors adeptly condense the potentially redundant assemblage of item sets into a succinct compilation of uncorrelated concepts. Subsequently, the summarization process selects sentences that encompass these latent concepts, all the while minimizing redundancy. LSA relies on the preexisting vocabulary within the documents it analyzes. Out-of-vocabulary words, which are not present in the training data, pose a challenge for LSA. It cannot effectively represent or analyze such words without additional preprocessing or techniques.
We introduce a novel way of discovering changes in an image’s non-functional attributes using a word-embeddings-based approach. In this respect, we use some pretrained models to create vector embeddings of the non-functional attributes. The vector embeddings are then exploited to find inconsistencies among the attributes. Word embeddings are widely used in different domains [
32,
45]. For instance, a word-embedding-based solution to search similar records in databases is presented in Reference [
26]. Word embedding is applied on twitter data in Reference [
34] to monitor natural disasters. A context sensitive word embedding approach is presented in Reference [
52]. Dynamic embeddings are developed in Reference [
55] to capture how the meaning of words change over time [
55]. Word-embeddings-based models create vectors for qualitative non-functional attributes. However, some non-functional attributes are quantitative in nature, e.g., GPS coordinates and GMT offset. It is a challenge to generate embeddings for the quantitative attributes. A novel approach is proposed in Reference [
80] to transform GPS coordinates in to real-valued vectors. Another study, as illustrated by Kazemi et al.’s work on time2vec embeddings [
39], furnishes vectorized representations of temporal data. Building upon these methodologies, we employ analogous techniques to convert quantitative non-functional attributes into real-valued vectors.
6 Experimentation and Results
We conduct a set of experiments to evaluate the effectiveness of detecting changes using non-functional attributes of images. We assess the performance of the proposed framework on two real datasets consisting of fact-checked social media images. We report the effectiveness in terms of four standard metrics: accuracy, precision, recall, and F-score.
6.1 Dataset and Experiment Setup
We evaluate the proposed framework on two real datasets: a context-based image dataset and a fact-checked image verification corpus (combined with another fake news dataset).
Context-based dataset. This dataset consists of 2,800 images. These images are collected from different social media platforms (e.g., Twitter, Facebook, Instagram, etc.) and copyright free image repositories (e.g., Shutterstock, Unsplash, Pexels, etc.). The images are collected for five different application domains, i.e., road accidents, crime scenes, violent scenes, natural disasters and public gatherings. We extract spatio-temporal and contextual information available with these images. Hence, this dataset contains three features/columns for each image, i.e., spatial attributes, temporal attributes and contextual attributes.
Injecting Changes: This dataset consists of original images. To test our framework on this dataset, we systematically introduce different types of changes among the non-functional attributes of these images. We leverage ChatGPT to inject changes in the non-functional attributes. ChatGPT has undergone comprehensive training on a wide range of data, encompassing real image metadata. This extensive training enables the model to possess a deep understanding of the characteristics and information associated with images’ metadata. ChatGPT is invoked using ChatGPT API. Instructions are provided to ChatGPT to create different types of variations between metadata of different versions of an image. In this respect, different combinations of changes in the spatial, temporal and contextual attributes are introduced. The alterations made by ChatGPT to image metadata cannot be classified as synthetic, as ChatGPT is trained on a vast dataset derived from real-world sources. This extensive training regimen ensures that ChatGPT’s responses and modifications are grounded in genuine linguistic patterns and context, rather than being artificially generated. Consequently, any adjustments it makes to image metadata are informed by real-world language usage and semantics. Figure
9 shows one examples of changes injected by ChatGPT in an image metadata. Changes in spatio-temporal attributes are injected in a way to mislead the viewer to a different incident. Contextual attributes are changed in a way to create a misperception about the image. Moreover, different levels of complexities are considered while injecting these changes. For instance, in comparatively simpler cases, a few attributes are changed to introduce a change. Whereas, in relatively complex cases, many attributes are changed collectively to create a consistent change in image non-functional attributes. Spatio-temporal changes constitute 50% of the overall modifications, whereas, rest of the modifications involve contextual attributes along with modified spatio-temporal features. Moreover, the modifications are introduced in a way to create both consistent and inconsistent changes. Consistent changes are 25% of the total introduced changes.
Image Verification Corpus. The second dataset in consideration is a well-established image verification corpus, as in Reference [
13]. This dataset is continually evolving and comprises both fake and authentic social media images. Notably, it contains more than 2,500 fact-checked viral social media posts. This dataset is composed of tweets and provides
tweet ids,
image urls, and information shared with the tweet. This dataset is a generalized dataset containing images related to multiple contexts, i.e., Nepal earthquake, Boston Marathon bombings, Hurricane Sandy, and so on. The images in this datasets are labeled as either fake or real. We use Python Pillow library to extract metadata from images. We import the Python Imaging Library module from Pillow to work with images and their metadata. We extract specific metadata fields by accessing the keys of the metadata dictionary. We use the tweet ids and the authentication information of the twitter developer account to fetch tweets’ details. We fetch spatial and temporal keywords from textual information using the spaCy
\(en\_core\_web\_sm\) model with
timexy pipe. We further simplify the data reserving only image id, spatial, temporal, contextual and label columns.
We combine this dataset with another dataset related to fake news, i.e., source-based fake news classification [
9]. This dataset contains fact-checked news posted by politicians, news channels, newspaper websites, and common civilians. We consider only those samples that have an image associated with the news. There is a lot of text available with each news, therefore, to filter out the unnecessary text, we use the
KeyBERT model to fetch 20 relevant keywords from the text of the news. These keywords serves as the image’s contextual attributes, and are also used afterwards to compile a context-based corpus. We use the spacy library and
timexy tool to get “Date,” “Time,” and “GPE” information from the text. Afterwards, we classify the text into spatial, temporal, and contextual categories. This meticulous data acquisition methodology lays a robust foundation for the subsequent stages of analysis and interpretation in this research endeavor. Dataset Characteristics: The three datasets (Context-based dataset, Image verification corpus, and fake news dataset) each include images that have been fact-checked and sorted into two categories: fake and real. Figure
10 shows the class distributions in the datasets.
6.2 Generating Attribute Embeddings
We use a pretrained
Word2Vec model to generate distributed representation of attribute embeddings. The model is pretrained using
Continuous Bag of Words (CBOW) [
77]. Attribute embeddings are generated for each individual image. The experiments reveal that similar distance analogies exist between many attribute pairs. For instance, Figure
11(a) shows attribute embeddings generated for a sample image selected from the dataset. “Altona” and “Victoria” are the spatial attributes of the image. It is evident from the figure that a similar distance relationship exists between “Altona” and “Victoria” as that of “Parramatta” and “NSW.” Therefore, “Altona-Parramatta” is a consistent pair of attributes. The pairs for which these analogies does not exist are labeled as inconsistent attributes.
To justify the use of word embeddings for image metadata, it is important to first explore the relationships between different metadata tags. We achieved significant success in exploring similar distance relationships among many attributes. We explore many new relationships in attributes related to road accidents and crime scenes. For instance, Figure
12(a) shows that pedal-bicycle is related to accelerator-motorbike. Similarly, Figure
12(b) shows that shooting-gun and stabbing-knife are related to each other. These relationships are useful in finding inconsistent keywords in an image description. For instance, the keyword “right-hand driving” is inconsistent with countries in which left-hand driving is followed. These inconsistencies reflect on the trust of an image. Figure
11(b) shows more examples of similar distance relationships identified by the pre-trained model.
6.3 Effectiveness
We report the performance of the proposed approach in terms of accuracy, precision, recall, and F-score. Accuracy illustrates the correct identification of modifications. Precision, recall, and F-score indicates correct identification of true positives and true negatives. It reflects the correct identification of consistent and inconsistent changes. Precision reflects the number of consistent changes out of the total consistent changes. We test the proposed framework in terms of accuracy and run-time efficiency.
6.3.1 Accuracy.
We report the accuracy as a percentage of correctly discovered changes. We separately report the accuracy for the attributes having strong and weak relationships. Figure
13 shows the accuracy of the proposed framework for three different types of changes, i.e., inconsistent changes, consistent changes and significant changes. The accuracy in determining the inconsistent changes is relatively higher as compared to consistent changes and significant changes, because inconsistent changes are injected naively, whereas, consistent and significant changes are introduced more systematically and are hard to discover.
We conduct our experiments a total of 100 times to ensure a thorough assessment of the consistency of our results. Figure
14 represents the standard deviation on % accuracy, providing a visual representation of the variability observed across multiple trials. This addition not only strengthens the robustness of our findings but also enhances the trustworthiness of our proposed framework. The incorporation of measures of variability, including error bars and standard deviation, allows for a more nuanced interpretation of our results.
Vector Size: Figure
13 shows the accuracy of pretrained Word2Vec model on the context-sensitive dataset for a vector size of 300. We also illustrate the trend in accuracy for varying vector sizes in Figures
15(a) and
15(b). Specifically, Figure
15(a) reports the accuracy on context-based dataset, whereas Figure
15(b) shows the accuracy on the general image verification corpus. It is also evident from the figures that the accuracy increases with the increase in vector size, because increasing the vector size better embeds the context in vectors.
Context Window Size: Context window size has an influence on the resulting embeddings. In some cases, the context of a keyword is defined by multiple words. For instance, the term “traffic signal” contains two words. Its context can not be completely defined by using only one word. Context window size represents number of words being used collectively to generate an attribute embedding. For instance, if a statement is “the quick brown fox,” a context window of two would mean your samples are like (the, quick) and (the, brown). Then you slide one word and your samples become (quick, the), (quick, brown), (quick, fox), and so on. A larger window size can give higher accuracy due to more available training examples but also results in longer training time. We repeat the experiments for multiple context window sizes to see the effect on the accuracy. Figures
16(a) and
16(b) show the change in accuracy by changing context window size. Figure
16(a) reports the accuracy on the context-based dataset, whereas Figure
16(b) shows the accuracy on the general image verification corpus. It is evident from the figures that the accuracy increases by increasing the context window by 1 or 2 degrees. Accuracy starts decreasing if we further increase the context window. There is a trade-off between context and accuracy. Increasing the context window better introduces the context of a term. However, a large context window loses discrepancies among different sub-words of a keyword. Due to this trade-off, the accuracy of determining consistent changes drops even if the context window size is greater than 1.
Comparison with State-of-the-Art. We evaluate the efficacy of our proposed word-embeddings-based framework by contrasting its performance against two established baselines: LSA and
Term Frequency-Inverse Document Frequency (TF-IDF). We chose these baselines due to their common goal of depicting semantic relationships among words, aiming to capture nuanced meanings and contextual nuances. Furthermore, we offer a comparative analysis with a state-of-the-art method for detecting fake images on WhatsApp [
38].
Latent Semantic Analysis: Latent Semantic Analysis is used to represent the contextual meaning of words by statistical computations applied to a large corpus of text. LSA is widely used to quantify the semantic similarity between two sets of sentences or documents [
25,
41]. Effectiveness of LSA is reported in terms of identified % dissimilarity in the non-functional attributes. The proposed Exif2Vec performs better than LSA.
TF-IDF: We additionally conduct a comparative analysis of our proposed approach’s accuracy against that of TF-IDF. TF-IDF quantifies the importance or relevance of string representations (words, phrases, lemmas, etc.) in a document among a collection of documents. We use TF-IDF to measure the similarity between an attribute and the entire set of other attributes within an image. This approach helps us uncover any inconsistencies among these attributes.
We compare the performance of Exif2Vec on two metrics: accuracy and precision. Exif2Vec performs better than LSA and TF-IDF as shown in Figure
17. One major reason is that LSA and TF-IDF are count-based models where similar terms have same counts for different documents. Whereas, Exif2Vec is a prediction based model, i.e., given a vector of an attribute, it predicts the context attribute vectors. State-of-the-Art: We benchmark our proposed framework against a cutting-edge method specialized in identifying fake images circulated on WhatsApp, as detailed in Reference [
38]. We choose this paper for the comparison, because it also considers spatio-temporal features of an image to assess the image’s trustworthiness. The model relies on three features: (1) image content-based features, (2) temporal features using the timestamps at which images were shared, and (3) social context features based on the users who shared images. These features are then fed into different machine learning models (
Logistic Regression (LR),
Random Forest Classifier (RFC),
Decision Tree (DT),
Support Vector Machine (SVM), and
Artificial Neural Networks (ANN)) to predict whether the input is fake or not. Figure
18(a) reflects that the proposed Exif2Vec performs equally good without relying on image-content-based features.
6.3.2 Precision, Recall, and F-score.
We report the accuracy of determining consistent and inconsistent changes in terms of precision, recall, and F-score. These metrics reflect the performance in avoiding false positives and false negatives, whereas recall indicates the performance in avoiding false negatives. Precision is calculated by dividing the true consistent/inconsistent changes by anything that was predicted as a consistent/inconsistent change. Precision of consistent changes is relatively better as shown in Table
2. Recall refers to the percentage of total consistent/inconsistent changes correctly classified by the proposed approach. F-score is the harmonic mean of precision and recall. Results reveal a significant recall for both consistent and inconsistent changes.
Run-time Efficiency: We train Exif2Vec on a vocabulary size of 50,000 words downloaded from Wikipedia using wikidump. The training is hosted on a server with 32 GB of RAM, Core i9 4 GHz CPU with 2 GPUs of memory 25 GB each. It takes around 3 days to train Exif2Vec using 4 workers/threads. Figure
18(b) shows the variation in run-time of training Exif2Vec with the change in the number of threads.