research-article

Open access

Exif2Vec: A Framework to Ascertain Untrustworthy Crowdsourced Images Using Metadata

Authors:

Yuyun LiuAuthors Info & Claims

ACM Transactions on the Web, Volume 18, Issue 3

Article No.: 31, Pages 1 - 27

https://doi.org/10.1145/3645094

Published: 15 April 2024 Publication History

PDF eReader

Abstract

In the context of social media, the integrity of images is often dubious. To tackle this challenge, we introduce Exif2Vec, a novel framework specifically designed to discover modifications in social media images. The proposed framework leverages an image’s metadata to discover changes in an image. We use a service-oriented approach that considers discovery of changes in images as a service. A novel word-embedding-based approach is proposed to discover semantic inconsistencies in an image metadata that are reflective of the changes in an image. These inconsistencies are used to measure the severity of changes. The novelty of the approach resides in that it does not require the use of images to determine the underlying changes. We use a pretrained Word2Vec model to conduct experiments. The model is validated on two different fact-checked image datasets, i.e., images related to general context and a context-specific image dataset. Notably, our findings showcase the remarkable efficacy of our approach, yielding results of up to 80% accuracy. This underscores the potential of our framework.

1 Introduction

Social media has evolved into an important platform for disseminating news and images about public incidents [49, 65]. There are more than 5 billion users on social media [18, 29]. Social media users publish a large amount of data about public events, and during the COVID-19 pandemic, it served as a crucial tool for data collection, providing valuable insights into the unfolding situations [44, 70]. These crowdsourced images may contain critical information about public incidents, i.e., road accidents, crime scenes and violent scenes, and so on [83]. These crucial images are increasingly used in different applications, i.e., scene analysis, scene reconstruction and image forensics [81]. Utilizing these social media images can significantly facilitate the investigators to explore unfolding situations that might have led to the incident [2].

Trustworthiness of crowdsourced social media images has hitherto been a prime issue [50, 74]. An image shared on social media may be modified to introduce misleading information, which could result in serious consequences, including social and political unrest [19, 57]. The modifications¹ in an image can be of various types and may be within the picture or in the posted information with the image. For example, an image may be original but the metadata has been tampered with (e.g., date or city when/where the image was taken). In the latter, the image itself is original but it is the metadata or image description that is modified [46]. Figure 1 shows some examples of misleading social media images. These images are legitimate but described in a misleading way. In some cases, changes are only intended for improving the image outlook, which does not categorize it as a fake [68]. Therefore, the changes need to be extensively analyzed from multiple perspectives before categorizing them as a fake. We define fake as a context-sensitive parameter. A fake image in one context may not be fake in other contexts.

Fig. 1.

Detecting untrustworthy² images has traditionally been addressed using image processing and information retrieval techniques [15, 31, 62, 78]. For instance, a neural-network-based approach has been proposed to detect fake images [33]. It utilizes residual signals from chrominance components, such as YCbCr and Lab, to acquire robust deep representations through a meticulously designed convolutional neural network (CNN). Some solutions are based on image forensics [15, 78]. An image forensic model is proposed in Reference [78] to learn intrinsic features of an image. Some state-of-the-art approaches also leverage metadata along with the image content to discover changes in an image [73]. These approaches are usually costly in terms of computational power. Some lightweight service-based approaches have recently been proposed to deal with crowdsourced images [23, 48]. For instance, a preliminary service-based trust framework is proposed in Reference [3], which is based on users’ comments on a social media post to assess the credibility of the image. Another work considers credibility of a user’s stance embedded in the comments on a social media post to determine the credibility of image [1]. However, the credibility of images may not be completely assessed based only on the user’s stance. Fake posts on social media can get supportive comments from other users [20]. The stance of credible users may also be biased [84]. To address these limitations, we propose to assess trust among social media images using a lightweight objective framework consisting of determining changes and updates in the images. In this respect, we leverage an image’s meta-information³ to discover and characterize changes in the image. It is worth mentioning that some subtle changes in the colors and shades in an image are not always reflected in the image metadata. Therefore, we consider only those changes that may be reflected in the image metadata.

An image is a well-defined entity described by its visual components and metadata. We abstract an image as a service having functional and non-functional attributes. Functional attributes are related to the actions to capture an image, i.e., pressing the shutter and switching picture/video modes, and so on. We represent an image’s metadata as its non-functional attributes. An image’s non-functional attributes are usually reflective of the content within the picture [21, 38]. Editing an image may make it inconsistent with its non-functional attributes [8, 28]. For instance, changing the background of a picture may make it inconsistent with the GPS attributes in the image metadata [24]. An attempt to hide the facts in the metadata may create some discrepancies among non-functional attributes [27]. These discrepancies are not straightforward to identify as they are embedded systematically in the non-functional attributes. The qualitative nature of numerous non-functional attributes presents another challenge, as the discrepancies among them are predominantly semantic in nature. These discrepancies can be identified by looking into semantic meaning of attributes and the relationships among them. To address these challenges, we need a quantitative representation of non-functional attributes that is also reflective of their semantics. A framework is then required to compute semantic differences among non-functional attributes and to translate these discrepancies in to modifications.

We propose Exif2Vec, a novel approach that leverages only the non-functional attributes of images to detect changes in social media images. It is worth mentioning that most social media platforms remove public access to an uploaded image’s metadata. We are proposing this framework from social media owners’ perspective, which allows us to assume that the metadata is available with the image. An image’s metadata comprises of many useful attributes that inform about the image and its creation. We primarily focus on the spatio-temporal and contextual attributes. The non-functional attributes are usually correlated with each other. For instance, GPS location is a non-functional attribute, which is correlated to country name and city name. The correlated attributes can be changed collectively to introduce a systematic change. To capture these changes, we explore relationships among non-functional attributes. A pre-trained word embedding model is then employed to create a distributed representation of the attributes. The distributed representation places semantically similar attributes close to each other in a high-dimensional space [7]. Afterwards, we isolate consistent and inconsistent pairs of attributes based on a similarity distance between two attributes. For instance, “Sydney-NSW” is a consistent pair of attributes, because the distance between Sydney and NSW is equal to the other city-state pairs. We further inspect these inconsistent pairs to discover if multiple inconsistencies are leading to a same misperception. For instance, the name of a country, city, state, and GPS coordinates can be changed collectively such that all attributes remain consistent to each other. The identified inconsistencies are then leveraged to quantify the severity of changes in an image. Below, we summarize our main contributions:

—

We propose Exif2Vec, a novel framework designed to detect changes in social media images solely relying on the metadata and the associated posted information accompanying the image. Whereas, state-of-the-art utilizes metadata in conjunction with the image content.

—

We employ a word-embeddings-based approach specifically Word2Vec to generate vector embeddings for the image non-functional attributes. These embeddings are then used to discover discrepancies in the non-functional attributes.

—

We propose a novel framework to translate discrepancies in the image’s non-functional attributes to the severity of changes in an image.

—

The proposed Exif2Vec is validated on two datasets, i.e., a context-based dataset and an image verification corpus. Results reflect up to 80% effectiveness of the proposed approach.

The proposed approach adeptly discovers a typical set of changes that manifest in the non-functional attributes. However, it could further enhance its coverage by considering specific alterations intrinsic to the image itself, such as shifts in tones, color intensity, or distortions. It is important to note that the proposed approach is not intended to compete with image processing-based methods but rather offer a complementary perspective. The main rationale behind the proposed approach is rooted in the fact that certain manipulations or alterations may leave detectable traces in the metadata. Leveraging this information allows for a computationally less expensive analysis compared to traditional image content-based methods. By focusing on metadata, we aim to provide an efficient and scalable solution that complements existing approaches. We acknowledge that this approach may have its limitations, particularly in cases where sophisticated image content manipulations are employed, we believe it offers unique advantages in scenarios where computational resources or time constraints may be a significant factor. It is important to highlight that the effectiveness of the proposed framework might be influenced by the extent to which non-functional attributes are available. In scenarios where there is a scarcity of such attributes, an alternate method involves extracting meta-information from the social media post. While this alternative approach might exhibit reduced precision due to potential uncertainties regarding the accuracy of the meta-information, it still presents valuable insights.

2 Motivating Scenario

Social-media has become an important tool to share news and information related to public incidents. However, many viral social media posts turned out to be fake in recent years [40]. A recent study by Massachusetts Institute of Technology (MIT) reveals that a fake post spreads substantially faster than real ones [76]. More importantly, most fake images are accompanied with a manipulated text and image metadata [16, 46, 60]. Indeed, an original image may also contain incorrect description of the image. Most of these posts contain deepfake text generated by deep learning models [53]. An image’s metadata may reflect discrepancies with this deepfake text. Therefore, analyzing the text shared with an image and the image metadata may reflect changes in an image and its description.

Our motivating scenario involves a depiction of a plane crash that occurred in New York in 2009. Illustrated in Figure 2, the scene captures the evacuation of U.S. Airways Flight 1549 as it rests on the surface of the Hudson River. Intriguingly, this particular image was erroneously attributed to the missing Malaysian aircraft MH370 by Cable News Network in 2014. In this misleading post, the image itself is original but the post contains a false claim. Numerous cutting-edge solutions depend on image processing techniques to detect dubious content in social media images. These approaches center their attention on the visual content within the image, which can potentially lead to overlooking alterations that extend beyond the image itself. In response to this limitation, we put forth an innovative approach for recognizing untrustworthy images. Our method relies exclusively on the utilization of an image’s metadata along with the associated textual information. Exploring the image metadata may reveal a lot of inconsistencies in the metadata and the posted information. These inconsistencies may reflect the trust of an image. For instance, in Figure 2, the image shared by Cable News Network is unedited and the metadata is reflective of the manipulations in the image description. The majority of social media platforms restrict public access to image metadata. We tackle this issue from the standpoint of social media owners, which permits us to operate under the assumption that metadata accompanies the image. The image metadata, as depicted in Figure 2, provide evidence that the image is linked to an entirely separate incident.

Fig. 2.

Figure 2 shows a very simple example of a cheap fake (a naive way to introduce changes) in which the inconsistencies between the metadata and the image description are straightforward to identify. However, in real case scenarios, the changes are embedded very systematically in images and its metadata. Image metadata is mostly changed to make it consistent with the changes in an image [75]. For instance, if a background in an image is modified, the GPS tags in the metadata can be changed to make it consistent with the manipulated background in the picture. These types of changes are relatively hard to detect. Therefore, a framework is required to find inconsistencies in an image metadata that can reflect the trust of an image.

3 Related Work

Images in crowdsourced social media environments can be untrustworthy [5]. Conventional strategies for detecting deceptive images primarily rely on the integration of image processing and machine learning methodologies, as highlighted by Patel et al. [51]. A multi-modal approach (which uses multiple content and context type like text, visual, statistical, user profile, and network propagation) is proposed in Reference [59] that utilizes new and upgraded models to detect fake social media images. A deep learning approach is proposed in Reference [35] for detecting fake images by using contrastive loss. In the study detailed in Reference [58], neural networks are employed to detect counterfeit images that are disseminated through various social media platforms. The work described in Reference [79] employs a combination of traditional digital forensics techniques and artificial intelligence methods to uncover instances of image manipulation. Additionally, a block-oriented methodology for detecting Copy-Move Forgery is put into operation, as elucidated in Reference [42]. In the publication by Guo et al. [30], the proposal includes two methods for identifying counterfeit colorized images: one is based on histograms, and the other relies on feature encoding. In the work documented by Tanaka et al. [61], a method utilizing robust hashing for the detection of counterfeit images is introduced. The proposed method with robust hashing is demonstrated to have a high fake-detection accuracy, even when multiple manipulation techniques are carried out. A convolutional neural network-based approach is proposed in Reference [58] to spot fake images shared over social media platforms. Deep-learning methods for image forensics have been listed in Reference [17]. Another strategy involves utilizing image metadata to identify fake WhatsApp images [38]. However, this particular approach focuses solely on a limited set of spatio-temporal attributes from the image source. The image processing methods mentioned above exhibit impressive accuracy in identifying fraudulent elements within images; however, they demand substantial computational resources, as indicated by Dang et al. [22]. The importance of computationally less intensive solutions for different social media applications is highlighted in Reference [4]. In this article, we propose that a subset of trust in images can be derived by using only an image metadata.

Some recent studies claim that a subset of trust in social media images can be derived using comments on a post [1, 3, 82]. A new crowdsourced image service trust model is proposed in Reference [3]. The trustworthiness of an image service is measured based on the users’ stance. Textual features of social media images, i.e., comments and meta-data, e.g., spatio-temporal information are utilized to gather the trust-rate of the image service. A users’ stance and credibility-based crowdsourced image service trust model is proposed in Reference [1]. The proposed model considers various indicators such as the stance embedded in the images’ comments, their meta-data, e.g., time, along with the users’ credibility. It models the interactions between commenters and sub-comments using different language-based models [71]. These approaches are unable to capture modifications in an image, because misleading content on social media may receive positive comments from other users [20]. Moreover, comments from credible users can be biased [84]. We propose a more objective approach consisting of modifications and updates in an image to determine its likelihood of being fake.

Image metadata acts as an indicator of modifications made to an image. Our primary method for detecting forgery hinges solely upon the analysis of image metadata. In recent times, a multitude of approaches have arisen to uncover instances of image manipulation, drawing upon the insights provided by metadata information. A recent work utilizes image metadata and an ELA processor to detect forgery in images [73]. The functioning of the proposed system relies on a neural network capable of identifying and processing image regions using a specific approach. Another work performs image provenance analysis based only on the metadata [10]. The image provenance tree implicitly informs about modifications in an image. These solutions are based on a limited use of metadata along with other computer-vision-based approaches. Our proposed approach is difference in a sense that it is completely based on image metadata to ascertain the trust of an image service.

Image non-functional attributes exhibit a predominantly qualitative essence, encompassing features like city, country, state, and more. Detecting alterations within an image necessitates the identification of irregularities present in these non-functional attributes. These irregularities can be ascertained by exploring the semantic variations inherent in these attributes. Numerous cutting-edge studies have focused on extracting the semantic meanings behind these qualitative keywords. Latent Semantic Analysis (LSA) stands out as a widely employed technique aimed at identifying dissimilarities between two documents. This method finds practical utility across a spectrum of domains, strategically addressing a variety of challenges. An illustrative example can be found in Reference [14], where LSA is harnessed for text summarization. Within this context, a pioneering summarization approach is introduced, leveraging frequent item sets to encapsulate latent concepts inherent in the analyzed documents. Through the effective utilization of LSA, authors adeptly condense the potentially redundant assemblage of item sets into a succinct compilation of uncorrelated concepts. Subsequently, the summarization process selects sentences that encompass these latent concepts, all the while minimizing redundancy. LSA relies on the preexisting vocabulary within the documents it analyzes. Out-of-vocabulary words, which are not present in the training data, pose a challenge for LSA. It cannot effectively represent or analyze such words without additional preprocessing or techniques.

We introduce a novel way of discovering changes in an image’s non-functional attributes using a word-embeddings-based approach. In this respect, we use some pretrained models to create vector embeddings of the non-functional attributes. The vector embeddings are then exploited to find inconsistencies among the attributes. Word embeddings are widely used in different domains [32, 45]. For instance, a word-embedding-based solution to search similar records in databases is presented in Reference [26]. Word embedding is applied on twitter data in Reference [34] to monitor natural disasters. A context sensitive word embedding approach is presented in Reference [52]. Dynamic embeddings are developed in Reference [55] to capture how the meaning of words change over time [55]. Word-embeddings-based models create vectors for qualitative non-functional attributes. However, some non-functional attributes are quantitative in nature, e.g., GPS coordinates and GMT offset. It is a challenge to generate embeddings for the quantitative attributes. A novel approach is proposed in Reference [80] to transform GPS coordinates in to real-valued vectors. Another study, as illustrated by Kazemi et al.’s work on time2vec embeddings [39], furnishes vectorized representations of temporal data. Building upon these methodologies, we employ analogous techniques to convert quantitative non-functional attributes into real-valued vectors.

4 Service Model for Social Media Images

This Section provides details of the proposed service model to abstract social media images as services.

Image Service: We represent a social media image in terms of its functional and non-functional attributes. Functional attributes identify the function of capturing an image, i.e., switching picture/video modes, delayed/timed picture taking, and so on. The image metadata tags are represented as non-functional attributes. Non-functional part of an image usually consists of spatio-temporal and contextual attributes as listed in Table 1.

Table 1.

Categories	Description	Example Attributes
Spatial Features	Spatial metadata tags describe the location at which the image was taken.	GPS Coordinates City, Sate, Country Sublocation
Temporal Features	Temporal metadata tags describe the date and time when the image was taken.	GPS Timezone Offset GPS Timestamp GPS Datestamp Exposure Time
Contextual Features	Contextual features define the context of an image. Contextual attributes may contain some other information, i.e., the details of the ambience and surroundings.	Title Image Caption Headline Image Description Content Description

Table 1. Description of Non-functional Attributes

4.1 Non-functional Attributes of an Image

Modifications in an image are usually reflected in its non-functional attributes, specifically the spatio-temporal and contextual attributes [28]. These non-functional attributes may help in detecting possible changes within an image. We group the non-functional attributes into the following categories:

—

Spatial Features: Spatial attributes represent the location where the image was captured. Modified spatial tags may be an indication of fake background in the image. For instance, a footage of a car accident can be forged to display a fake background. It can mislead the viewer that the car was at a different location.

—

Temporal Features: Temporal attributes represent the date and time when the image was captured. Date and time have no direct relation with the credibility of an image. However, it may affect the reconstructed scene because of the modified timeline. Temporal metadata tags are being forged to develop a fake story line. For instance, in case of a car accident, timestamp of the captured scene can be tampered to depict a wrong cause of the accident.

—

Contextual Features: Contextual features are related to context of an image. Fake context may support fake spatio-temporal tags of an image. For instance, a car accident can be described in a way to support fake spatio-temporal tags.

There may be some other categories of non-functional attributes that reflect changes embedded in the image, i.e., change in colors, resolution and objects, and so on. We primarily focus on the changes in spatial, temporal and contextual attributes to quantify changes in an image.

4.2 Potential Modifications in Non-functional Attributes

Modifications in different parts of an image are usually reflected in its non-functional attributes. We consider the following changes in an image’s non-functional attributes that are introduced to hide the facts. The following are the possible changes in image’s non-functional attributes:

—

Date and Time: In recent times, a noteworthy trend has emerged where outdated images are circulated, falsely representing current events. For instance, as illustrated in Figure 1(a), a photograph was disseminated in connection with the 2015 Nepal earthquake, portraying two children. However, the image is actually a depiction of two Vietnamese individuals captured in 2007. Notably, this image has been manipulated to alter its temporal and spatial attributes, which can potentially lead viewers astray.

—

Location: The spatial characteristics of an image can be manipulated to depict a false location. As an example, consider Figure 1(b), which gained widespread attention as a viral photograph in 2014, purporting to show the missing Malaysian MH370 plane. However, it was later revealed that the image actually depicted a plane crash in New York from 2009.

—

Context: Certain alterations have the potential to reshape the way we perceive an image. In the case of Figure 1(c), a description is provided that falsely portrays a camel with its limbs amputated, used for begging. However, in reality, the camel is simply resting with its legs folded underneath its body. The image description has been manipulated to reinforce this deceptive interpretation.

5 Proposed Framework

This Section introduces our novel framework, Exif2Vec, designed to discover and quantify the severity of underlying modifications in social media images. Figure 3 illustrates the workflow of our proposed approach. It commences with the extraction of metadata from images, followed by an exploration of the relationships among these attributes. Utilizing Word2Vec, we generate vector embeddings for the metadata attributes. Subsequently, we employ a similar distance analogy to identify inconsistencies, providing insight into the trustworthiness of an image.

Fig. 3.

5.1 Extracting Non-functional Attributes

The proposed Exif2Vec framework takes an image’s non-functional attributes as an input and returns the severity of changes. To initiate the process, we begin by extracting the non-functional attributes inherent in a social media image. The non-functional attributes can be fetched from the image metadata and the description posted with the image. An important assumption in this article is that the provided image is not the only version uploaded on social media platforms. In this context, we systematically retrieve various versions of an image from social media. In this regard, we employ Reverse Image Search (RIS)⁴ to acquire additional attributes associated with the image. It is important to note that our intention is not to compete with RIS, but rather to utilize it as a tool within our framework. RIS serves as a valuable tool for online image search, but it does not offer a trust rating for images, which is a distinct focus of our research. It is worth mentioning that some social media platforms remove public access to image metadata while uploading an image [63]. We are proposing this framework from social media owner’s perspective assuming that we have access to the metadata.

5.2 Exploring Relations Among Attributes

Most of the non-functional attributes are usually correlated with each other. For instance, name of city is related to name of state and country. Whereas, digital zoom and resolution are independent of location. The connected attributes are usually changed collectively to embed systematic changes in the image’s non-functional attributes. Hence, our initial focus involves delving into the interconnections among various non-functional attributes, a foundation upon which we can subsequently gauge the magnitude of alterations within an image. These attribute interrelationships can be categorized into two distinct types, shown in Figure 4 and described below:

Fig. 4.

5.2.1 Intra-relationships.

The relationships that exist in-between the non-functional attributes of an image are labeled as intra-relationships. For instance, the relation between GPS and City, and GPS and Country are examples of intra-relationships. We further classify intra-relationships into static and dynamic relationships as described below:

Static Relationships: We define static relationships as one-to-one relationships among non-functional attributes. Static relationships are not impacted by changing other attributes. For instance, the relation between GPS coordinates and the name of a country is a static relationship. It is not affected by changing any other attribute.

Dynamic Relationships: We define dynamic relationship as a relation among non-functional attributes that may be affected by changing the value of another attribute. For instance, there is a relation between Shutter Speed and timestamp of an image. High shutter speed indicates that the photo is captured during the day time, whereas, low shutter speed indicates that the photo is captured in the evening. There is another attribute named Exposure Time, which has an impact on this relationship. Exposure time is the time taken by the camera to collect light. Higher the exposure time, lower will be the shutter speed, that indicates a different timestamp [64]. The interplay between shutter speed and exposure time is further impacted when the camera’s flash is activated. This relationship can also be affected by indoor or outdoor setting.

5.2.2 Inter-relationships.

Image metadata might lack complete attribute information. Spatio-temporal information of an image can be used to retrieve social media posts containing the same image. RIS is one of the commonly used tools to retrieve different versions of an image on social media platforms. The relationships that exist between the non-functional attributes and the meta-information acquired from any external source, e.g., RIS are labeled as inter-relationships. For instance, the relationship between GPS acquired from metadata and name of City acquired from RIS is regarded as an inter-relationship. It is worth mentioning that inter-relationships are not our primary source of data. We leverage data extracted from these inter-relationships to effectively populate missing values within the metadata.

5.3 Representing Relationships in Terms of an Attribute Graph

We represent relationships among non-functional attributes in terms of an attribute graph. Figure 5 shows an attribute graph for a typical set of non-functional attributes retrieved from an image. Nodes in the graph represent attributes and values of vertices denote the presence or absence of a relationship. Ideally, the value of a vertex should be a spectrum (a value ranging from 0 to 1) reflecting the strength of a relationship. A high value reflects a strong relationship and vice versa. However, we consider the value of a relationship to be Boolean for the sake of simplicity. Vertices with value 1 indicate the presence of a relationship, whereas, 0 means that the corresponding attributes are independent of each other. The dotted vertices in the graph represent dynamic relationships, whereas, the solid vertices represent static relationships. The attribute graph can also be represented as a matrix as shown in Figure 5. Relationship of a node with itself is labeled as \(-1\) in the matrix to differentiate it from 0s. The matrix shown in Figure 5 consists of only a few attributes, however, in real case scenarios, the matrix usually consists of all available spatio-temporal and contextual attributes. The size of matrix depends on the available set of metadata.⁵

Fig. 5.

5.4 Generating Attribute Embeddings to Discover Modifications

Modifications in an image may have an impact on the semantics of the image [56]. Introducing a change can leave semantic discrepancies in non-functional attributes. It is imperative to effectively quantify these semantic variations to accurately gauge the extent of changes within an image. In this respect, we employ word embedding models to explore semantics of the changes in non-functional attributes. Word embedding techniques transform a given word into a high-dimensional vector, encapsulating the contextual essence of the word within the vector itself. We adopt a similar approach to generate attribute embeddings. Attributes with similar meanings exhibit analogous representations in the high-dimensional space.

5.4.1 Distributed Representation of Attribute Embeddings.

A variety of models is available for word embeddings, encompassing notable ones such as Word2Vec, GloVe, and BERT, as outlined by Karani et al. in 2018 [37]. It is worth noting that Word2Vec is context-independent, while BERT operates as a context-based model. In making the choice for a language model in our research article, we opted for Word2Vec over BERT, guided by specific requirements and considerations. One primary reason for this selection is that BERT treats the entire sentence as a singular input unit, constraining our control over the model’s complexity. In contrast, Word2Vec affords us the capability to integrate contextual information by considering a context window, facilitating a more nuanced analysis of word relationships within a sentence. Leveraging this capability allows us to tailor the analysis to align with our research objectives and explore the nuances of language usage in a more controlled manner. BERT takes whole sentence (sequence of words) as an input to generate context-based embeddings. Ideally, we should apply BERT, because it is able to find contextual meaning of words. However, we are using image metadata tags along with text. It is obviously important to find contextual meaning of attributes. However, for this purpose, we need to input a meaningful sequence of attributes to the BERT model. It is very hard to find meaningful sequences of non-functional attributes. Therefore, to limit the complexity of the model, we use pretrained models specifically Word2Vec to transform attributes into real-valued vectors. These vectors are then plotted on a high dimensional distributed space as shown in Figure 6. Dimensions in Figure 6 ⁶ are derived using deep neural networks and are latent. The model is pretrained on a corpus of words obtained from Wikipedia. In this article, we rely on the context-independent model trained on a context-independent corpus. Ideally, the model should be context-aware and trained on a context-aware corpus. For instance, if we are dealing with road accident images, then the corpus should be built using only the resources related to road accidents.

Fig. 6.

5.4.2 Similar Distance for Similar Relationships.

Distributed representation of attribute embeddings reflects useful analogies among attributes. Similar distance analogy can be viewed as a special type of similarity, where two pairs of attributes that share similar relationships, have similar distance between their corresponding vectors [47]. Similar distance analogy in word embeddings refers to a technique that allows for solving word analogy problems by leveraging the semantic relationships encoded in the vector space of word embeddings. It involves using vector arithmetic operations to find words that exhibit similar relationships to the given analogy. The process typically involves four words: A, B, C, and an unknown word X. For example, if we have the analogy “man is to woman as king is to X,” then we can calculate the analogy vector (Vector(woman) \(-\) Vector(man) + Vector(king)) and search for the word in the embedding space that is closest to this analogy vector. The resulting word X is expected to be “queen” based on the analogy. We leverage this similar distance analogy in the distributed representation to discover discrepancies among attributes. Let us assume that we are provided with an image having “Sydney” as the name of city and “Australia” as the name of country in its non-functional attributes. The distance between “Sydney” and “Australia” should be similar to other city-country pairs in the distributed representation. Similarly, the distance between a consistent state and country pair should be similar to that of other correct state-country pairs. Figure 6 shows the distributed representation of attributes and their relationships with other attributes. It can be noticed that similar relationships have same distances. However, if the relationship is not very well-defined, then the distances may not be precisely equal. It can be observed from “National Sports-Country pairs” in Figure 6 that the distance is not exactly similar for some pairs. For instance, the distance between Australia-Cricket and New Zealand-Rugby is not precisely equal. It may be due to the insufficiency of the content about these words in the corpus. This issue can be resolved by using a context-based corpus. Many useful relations among different words can be explored using a context-based corpus. For instance, the distance between “helmet” and “bike” may be similar to the distance between “seat-belt” and “car.” Similarly, the distance between “paddle” and “push bike” may be equal to “accelerator” and “motorbike.” However, the distance between these attributes may not be same when the model is trained on a general corpus. As stated earlier, we are training the model on a general corpus. To circumvent this concern, we posit the existence of a relationship when the distance between attributes within a pair falls within a designated range.

5.4.3 Discovering Discrepancies in Non-functional Attributes.

Figure 6 shows that semantically similar concepts have approximately similar distances. For instance, if we are analyzing city-country relationships, then the distance between all city-country pairs will be same as shown in Figure 6. We leverage this concept of similar distances to isolate matched pair of attributes from mismatched pair of attributes. If the distance is different as compared to other similar pairs, then it reflects that the provided city-country pair is not consistent with each other and it may represent a modification. It is quite likely that the consistent attributes have been modified collectively to create a systematic change. These inconsistencies object on the credibility of an image. At this step, we isolate the consistent pairs from inconsistent pairs using state-of-the-art clustering approach [36, 54]. These inconsistencies are further investigated in the next subsection to determine the impact of underlying modifications.

5.4.4 Discovering Consistent Changes.

Several modifications in an image may serve a common purpose. We use the term consistent changes for these changes that support a similar misleading perception. These changes are relatively hard to discover, because they are usually embedded systematically in the non-functional attributes. Moreover, the impact of these changes may be relatively more severe. Therefore, it is essential to discover consistent changes to correctly estimate the severity of underlying changes in the non-functional attributes. Figure 7 gives an example of these types of changes. The Figure shows the set of original and modified attributes of an image. It can be noticed that the original country and state names are “United States” and “Florida,” which have been changed to “Russia” and “Northwestern District,” respectively. Both Russia and United States have a city named “St. Petersburg.” These changes are creating a misperception that the picture is captured in the U.S. Another example of consistent changes could be to change the weather description along with location and time. For instance, let us assume, the original image contains “ \(-7\) degree Celsius” as the temperature and “Toronto” as the location. A consistent change could be to change the location to “Sydney” and temperature to “15 degree Celsius.” Another example could be to change the shutter speed and exposure time collectively.

Fig. 7.

Changes are considered consistent if both attributes in a consistent pair are inconsistent with another unmodified attribute. In the example shown in Figure 7, the modified name of state and country are both inconsistent with GMT offset. Therefore, this set of attributes involve consistent changes. Consistent changes should be investigated carefully, because a minimalistic change in the picture may be misinterpreted as a consistent change. For instance, a mistakenly modified attribute will be inconsistent with other consistent pairs. This confusion can be avoided by leveraging the inter-relationships between attributes. We compare the attributes with the information acquired from other versions (retrieved from RIS) of an image to determine if the changes are minimalistic or well-organized. Another method to confirm about a consistent change is to investigate if an attribute has a similar distance analogy with multiple attributes of a same type.

It is crucial to recognize that not all inconsistencies hold equal severity. The significance of identified inconsistencies depends on how severely the meaning of the image is harmed. Consequently, establishing a fixed threshold for inconsistencies is not feasible, as it cannot definitively categorize an image as either genuine or fake.

5.5 Significance of Attributes

We define significance as an important parameter to reflect the impact of introducing a change in an attribute. Significance of a particular attribute may vary in different contexts. For instance, color of objects have a high significance in scenes involving blood [43]. Similarly, timestamp is an important attribute in forensic applications. We ascertain the significance of attributes through a comprehensive analysis based on three key criteria: domain experts’ opinions, the relevance of an attribute to a specific context, and the strength of its relationship with other attributes. To enhance our understanding, we utilize insights from image forensic experts, leveraging their expertise to obtain informed opinions on the significance of particular attributes, as demonstrated by Bohme et al. [12]. For instance, it has been explored by image forensic community that shutter speed and exposure time represent the light availability at the time of taking a picture [11]. Hence, these attributes may reflect an image’s temporal information. These attributes are therefore considered significant in time critical scenes. The second criterion, i.e., relevancy of an attribute to a context can be determined by using any semantic similarity measure. We employ LSA to ascertain the contextual or topical relevance of an attribute.

We leverage the significance of attributes to determine relatively more severe changes. We also consider the strength of relationships among attributes to determine the severity of underlying modifications. In this respect, we define three different intensities of significant changes: highly significant, medium significant and less significant changes. Figure 8 explains an example scenario of significant changes in the context of road accidents. Suppose that the modified attributes shown in Figure 8 are of a road accident image that took place in Australia. The attributes have been changed to mislead that the accident took place in Beijing, China. Significant attributes having strong relationships contribute more towards highly severe changes. Whereas the changes involving less significant attributes and weak relationships are categorized as less-severe changes.

Fig. 8.

5.6 Quantifying the Severity of Changes

We define severity as a measure to quantify the intensity of changes in non-functional attributes. We formalize severity in terms of entropy. Entropy is a measure to represent uncertainty in a variable’s outcomes. Entropy of an image represents the uncertainty of image being real. The entropy increases with the increase in number of discrepancies discovered in non-functional attributes. However, the increase in entropy is not linear, because changes in significant attributes result in higher entropy. We formulate the entropy as follows:

\begin{equation} H(X) = - \sum _{i=1}^{n} P(x_i) \log P(x_i), \end{equation}

(1)

where n = total number of pairs, and

\begin{equation} P(x) = \frac{Number\;of\;inconsistent\;pairs}{Total\;number\;of\;pairs}. \end{equation}

(2)

The number of inconsistent pairs are calculated by using similar distance analogy. Two attributes are considered consistent when the distance between them is approximately similar to that of other pairs with similar characteristics. We define a relatively more strict criteria to compare the distance between significant attributes. The distance between significant attributes is labeled as consistent if it is precisely equal to other pairs of similar kind. In contrast, for other attributes, this distance can be approximately similar. This criteria implies non-linear behavior in the entropy of an image. The concept of dynamic criteria is taken from Reference [66].

5.6.1 Algorithm to Compute Severity of Changes.

We provide an algorithm that enlists all steps involved in our proposed Exif2Vec framework. The algorithm takes an image’s non-functional attributes as an input and return the severity of underlying changes. Initially, the metadata is extracted from an image and reverse image search is performed to collect missing information from steps 3–5. The attributes are then classified in to spatial, temporal and contextual categories in step 8. Attribute relationship matrix is derived in step 9. Afterwards, attribute embeddings are generated and these embeddings are used to find discrepancies in the metadata in step 12. These discrepancies are then used to measure the entropy of an image in steps 17–22.

6 Experimentation and Results

We conduct a set of experiments to evaluate the effectiveness of detecting changes using non-functional attributes of images. We assess the performance of the proposed framework on two real datasets consisting of fact-checked social media images. We report the effectiveness in terms of four standard metrics: accuracy, precision, recall, and F-score.

6.1 Dataset and Experiment Setup

We evaluate the proposed framework on two real datasets: a context-based image dataset and a fact-checked image verification corpus (combined with another fake news dataset).

Context-based dataset. This dataset consists of 2,800 images. These images are collected from different social media platforms (e.g., Twitter, Facebook, Instagram, etc.) and copyright free image repositories (e.g., Shutterstock, Unsplash, Pexels, etc.). The images are collected for five different application domains, i.e., road accidents, crime scenes, violent scenes, natural disasters and public gatherings. We extract spatio-temporal and contextual information available with these images. Hence, this dataset contains three features/columns for each image, i.e., spatial attributes, temporal attributes and contextual attributes.

Injecting Changes: This dataset consists of original images. To test our framework on this dataset, we systematically introduce different types of changes among the non-functional attributes of these images. We leverage ChatGPT to inject changes in the non-functional attributes. ChatGPT has undergone comprehensive training on a wide range of data, encompassing real image metadata. This extensive training enables the model to possess a deep understanding of the characteristics and information associated with images’ metadata. ChatGPT is invoked using ChatGPT API. Instructions are provided to ChatGPT to create different types of variations between metadata of different versions of an image. In this respect, different combinations of changes in the spatial, temporal and contextual attributes are introduced. The alterations made by ChatGPT to image metadata cannot be classified as synthetic, as ChatGPT is trained on a vast dataset derived from real-world sources. This extensive training regimen ensures that ChatGPT’s responses and modifications are grounded in genuine linguistic patterns and context, rather than being artificially generated. Consequently, any adjustments it makes to image metadata are informed by real-world language usage and semantics. Figure 9 shows one examples of changes injected by ChatGPT in an image metadata. Changes in spatio-temporal attributes are injected in a way to mislead the viewer to a different incident. Contextual attributes are changed in a way to create a misperception about the image. Moreover, different levels of complexities are considered while injecting these changes. For instance, in comparatively simpler cases, a few attributes are changed to introduce a change. Whereas, in relatively complex cases, many attributes are changed collectively to create a consistent change in image non-functional attributes. Spatio-temporal changes constitute 50% of the overall modifications, whereas, rest of the modifications involve contextual attributes along with modified spatio-temporal features. Moreover, the modifications are introduced in a way to create both consistent and inconsistent changes. Consistent changes are 25% of the total introduced changes. Image Verification Corpus. The second dataset in consideration is a well-established image verification corpus, as in Reference [13]. This dataset is continually evolving and comprises both fake and authentic social media images. Notably, it contains more than 2,500 fact-checked viral social media posts. This dataset is composed of tweets and provides tweet ids, image urls, and information shared with the tweet. This dataset is a generalized dataset containing images related to multiple contexts, i.e., Nepal earthquake, Boston Marathon bombings, Hurricane Sandy, and so on. The images in this datasets are labeled as either fake or real. We use Python Pillow library to extract metadata from images. We import the Python Imaging Library module from Pillow to work with images and their metadata. We extract specific metadata fields by accessing the keys of the metadata dictionary. We use the tweet ids and the authentication information of the twitter developer account to fetch tweets’ details. We fetch spatial and temporal keywords from textual information using the spaCy \(en\_core\_web\_sm\) model with timexy pipe. We further simplify the data reserving only image id, spatial, temporal, contextual and label columns.

Fig. 9.

We combine this dataset with another dataset related to fake news, i.e., source-based fake news classification [9]. This dataset contains fact-checked news posted by politicians, news channels, newspaper websites, and common civilians. We consider only those samples that have an image associated with the news. There is a lot of text available with each news, therefore, to filter out the unnecessary text, we use the KeyBERT model to fetch 20 relevant keywords from the text of the news. These keywords serves as the image’s contextual attributes, and are also used afterwards to compile a context-based corpus. We use the spacy library and timexy tool to get “Date,” “Time,” and “GPE” information from the text. Afterwards, we classify the text into spatial, temporal, and contextual categories. This meticulous data acquisition methodology lays a robust foundation for the subsequent stages of analysis and interpretation in this research endeavor. Dataset Characteristics: The three datasets (Context-based dataset, Image verification corpus, and fake news dataset) each include images that have been fact-checked and sorted into two categories: fake and real. Figure 10 shows the class distributions in the datasets.

Fig. 10.

6.2 Generating Attribute Embeddings

We use a pretrained Word2Vec model to generate distributed representation of attribute embeddings. The model is pretrained using Continuous Bag of Words (CBOW) [77]. Attribute embeddings are generated for each individual image. The experiments reveal that similar distance analogies exist between many attribute pairs. For instance, Figure 11(a) shows attribute embeddings generated for a sample image selected from the dataset. “Altona” and “Victoria” are the spatial attributes of the image. It is evident from the figure that a similar distance relationship exists between “Altona” and “Victoria” as that of “Parramatta” and “NSW.” Therefore, “Altona-Parramatta” is a consistent pair of attributes. The pairs for which these analogies does not exist are labeled as inconsistent attributes.

Fig. 11.

To justify the use of word embeddings for image metadata, it is important to first explore the relationships between different metadata tags. We achieved significant success in exploring similar distance relationships among many attributes. We explore many new relationships in attributes related to road accidents and crime scenes. For instance, Figure 12(a) shows that pedal-bicycle is related to accelerator-motorbike. Similarly, Figure 12(b) shows that shooting-gun and stabbing-knife are related to each other. These relationships are useful in finding inconsistent keywords in an image description. For instance, the keyword “right-hand driving” is inconsistent with countries in which left-hand driving is followed. These inconsistencies reflect on the trust of an image. Figure 11(b) shows more examples of similar distance relationships identified by the pre-trained model.

Fig. 12.

6.3 Effectiveness

We report the performance of the proposed approach in terms of accuracy, precision, recall, and F-score. Accuracy illustrates the correct identification of modifications. Precision, recall, and F-score indicates correct identification of true positives and true negatives. It reflects the correct identification of consistent and inconsistent changes. Precision reflects the number of consistent changes out of the total consistent changes. We test the proposed framework in terms of accuracy and run-time efficiency.

6.3.1 Accuracy.

We report the accuracy as a percentage of correctly discovered changes. We separately report the accuracy for the attributes having strong and weak relationships. Figure 13 shows the accuracy of the proposed framework for three different types of changes, i.e., inconsistent changes, consistent changes and significant changes. The accuracy in determining the inconsistent changes is relatively higher as compared to consistent changes and significant changes, because inconsistent changes are injected naively, whereas, consistent and significant changes are introduced more systematically and are hard to discover.

Fig. 13.

We conduct our experiments a total of 100 times to ensure a thorough assessment of the consistency of our results. Figure 14 represents the standard deviation on % accuracy, providing a visual representation of the variability observed across multiple trials. This addition not only strengthens the robustness of our findings but also enhances the trustworthiness of our proposed framework. The incorporation of measures of variability, including error bars and standard deviation, allows for a more nuanced interpretation of our results.

Vector Size: Figure 13 shows the accuracy of pretrained Word2Vec model on the context-sensitive dataset for a vector size of 300. We also illustrate the trend in accuracy for varying vector sizes in Figures 15(a) and 15(b). Specifically, Figure 15(a) reports the accuracy on context-based dataset, whereas Figure 15(b) shows the accuracy on the general image verification corpus. It is also evident from the figures that the accuracy increases with the increase in vector size, because increasing the vector size better embeds the context in vectors.

Context Window Size: Context window size has an influence on the resulting embeddings. In some cases, the context of a keyword is defined by multiple words. For instance, the term “traffic signal” contains two words. Its context can not be completely defined by using only one word. Context window size represents number of words being used collectively to generate an attribute embedding. For instance, if a statement is “the quick brown fox,” a context window of two would mean your samples are like (the, quick) and (the, brown). Then you slide one word and your samples become (quick, the), (quick, brown), (quick, fox), and so on. A larger window size can give higher accuracy due to more available training examples but also results in longer training time. We repeat the experiments for multiple context window sizes to see the effect on the accuracy. Figures 16(a) and 16(b) show the change in accuracy by changing context window size. Figure 16(a) reports the accuracy on the context-based dataset, whereas Figure 16(b) shows the accuracy on the general image verification corpus. It is evident from the figures that the accuracy increases by increasing the context window by 1 or 2 degrees. Accuracy starts decreasing if we further increase the context window. There is a trade-off between context and accuracy. Increasing the context window better introduces the context of a term. However, a large context window loses discrepancies among different sub-words of a keyword. Due to this trade-off, the accuracy of determining consistent changes drops even if the context window size is greater than 1.

Comparison with State-of-the-Art. We evaluate the efficacy of our proposed word-embeddings-based framework by contrasting its performance against two established baselines: LSA and Term Frequency-Inverse Document Frequency (TF-IDF). We chose these baselines due to their common goal of depicting semantic relationships among words, aiming to capture nuanced meanings and contextual nuances. Furthermore, we offer a comparative analysis with a state-of-the-art method for detecting fake images on WhatsApp [38].

Fig. 14.

Fig. 15.

Fig. 16.

Latent Semantic Analysis: Latent Semantic Analysis is used to represent the contextual meaning of words by statistical computations applied to a large corpus of text. LSA is widely used to quantify the semantic similarity between two sets of sentences or documents [25, 41]. Effectiveness of LSA is reported in terms of identified % dissimilarity in the non-functional attributes. The proposed Exif2Vec performs better than LSA.

TF-IDF: We additionally conduct a comparative analysis of our proposed approach’s accuracy against that of TF-IDF. TF-IDF quantifies the importance or relevance of string representations (words, phrases, lemmas, etc.) in a document among a collection of documents. We use TF-IDF to measure the similarity between an attribute and the entire set of other attributes within an image. This approach helps us uncover any inconsistencies among these attributes.

We compare the performance of Exif2Vec on two metrics: accuracy and precision. Exif2Vec performs better than LSA and TF-IDF as shown in Figure 17. One major reason is that LSA and TF-IDF are count-based models where similar terms have same counts for different documents. Whereas, Exif2Vec is a prediction based model, i.e., given a vector of an attribute, it predicts the context attribute vectors. State-of-the-Art: We benchmark our proposed framework against a cutting-edge method specialized in identifying fake images circulated on WhatsApp, as detailed in Reference [38]. We choose this paper for the comparison, because it also considers spatio-temporal features of an image to assess the image’s trustworthiness. The model relies on three features: (1) image content-based features, (2) temporal features using the timestamps at which images were shared, and (3) social context features based on the users who shared images. These features are then fed into different machine learning models (Logistic Regression (LR), Random Forest Classifier (RFC), Decision Tree (DT), Support Vector Machine (SVM), and Artificial Neural Networks (ANN)) to predict whether the input is fake or not. Figure 18(a) reflects that the proposed Exif2Vec performs equally good without relying on image-content-based features.

Fig. 17.

Fig. 18.

6.3.2 Precision, Recall, and F-score.

We report the accuracy of determining consistent and inconsistent changes in terms of precision, recall, and F-score. These metrics reflect the performance in avoiding false positives and false negatives, whereas recall indicates the performance in avoiding false negatives. Precision is calculated by dividing the true consistent/inconsistent changes by anything that was predicted as a consistent/inconsistent change. Precision of consistent changes is relatively better as shown in Table 2. Recall refers to the percentage of total consistent/inconsistent changes correctly classified by the proposed approach. F-score is the harmonic mean of precision and recall. Results reveal a significant recall for both consistent and inconsistent changes.

Table 2.

	Consistent Changes	Inconsistent Changes
Precision	0.82	0.75
Recall	0.87	0.87
F-score	0.84	0.81

Table 2. Performance Metrics

Run-time Efficiency: We train Exif2Vec on a vocabulary size of 50,000 words downloaded from Wikipedia using wikidump. The training is hosted on a server with 32 GB of RAM, Core i9 4 GHz CPU with 2 GPUs of memory 25 GB each. It takes around 3 days to train Exif2Vec using 4 workers/threads. Figure 18(b) shows the variation in run-time of training Exif2Vec with the change in the number of threads.

7 Conclusion

A novel framework named as “Exif2Vec” is proposed in this article to discover discrepancies in an image’s non-functional attributes to ascertain the likelihood of an image being fake. The non-functional attributes represent the image’s metadata and information posted with the image. The proposed approach is unique in a sense that it does not require the use of actual images to determine their trustworthiness. The framework employs a pretrained word embedding model to generate vector embeddings of the non-functional attributes. Our approach primarily focuses on metadata analysis. Complementing the proposed approach with image processing could provide a more comprehensive assessment of image trustworthiness by considering intrinsic alterations such as shifts in tones, color intensity, or distortions. We recognize the importance of capturing the full range of non-functional attributes and their variations in social media images, and we are open to exploring opportunities for integration in future iterations of the framework. A similar distance analogy is then leveraged to discover discrepancies among the non-functional attributes. Afterwards, we quantify the severity of underlying modifications considering the significance of attributes in a given context. Experimental results demonstrate up to 80% effectiveness of the proposed framework. The pretrained model contains vector embeddings of 50,000 most commonly used words. This corpus of words is context-independent. Future research could explore the potential benefits of utilizing larger and more diverse vocabularies to further enhance the framework’s ability to detect a broader range of image manipulations. The proposed framework can be trained on a context-sensitive corpus in the future to enhance the accuracy. Another promising future direction of this work is to create and include vector embeddings for the quantitative non-functional attributes in the proposed model. Moreover, there is potential for expanding this work to place a greater emphasis on metadata analysis, thereby reducing dependence on information derived from social media posts. Additionally, a prospective avenue for exploration involves conducting image provenance analysis [67]. This would aid in determining the origin of an image, offering a valuable means to identify changes even in scenarios where all non-functional attributes have been modified. The proposed framework could be used as a criteria to digitally sign the credible images and store the certificates on blockchain [6].

Footnotes

We interchangeably use the word modifications and changes to refer to edits in an image or in the description of an image.

We use the word untrustworthy to refer to fake images.

Image’s meta-information refers to image metadata and text posted with the uploaded image.

⁴

RIS is not a part of the proposed framework. It is utilized to fill missing values in the image metadata.

⁵

The idea of this attribute relationship graph and Attribute relationship matrix is inspired by a state diagram proposed in References [69, 72].

⁶

Figure 6 is drawn manually to describe similar distance analogy.

References

[1]

Tooba Aamir, Hai Dong, and Athman Bouguettaya. 2018. Stance and credibility based trust in social-sensor cloud services. In Proceedings of the International Conference on Web Information Systems Engineering. Springer, 178–189.

Abstract

1 Introduction

2 Motivating Scenario

3 Related Work

4 Service Model for Social Media Images

4.1 Non-functional Attributes of an Image

4.2 Potential Modifications in Non-functional Attributes

5 Proposed Framework

5.1 Extracting Non-functional Attributes

5.2 Exploring Relations Among Attributes

5.2.1 Intra-relationships.

5.2.2 Inter-relationships.

5.3 Representing Relationships in Terms of an Attribute Graph

5.4 Generating Attribute Embeddings to Discover Modifications

5.4.1 Distributed Representation of Attribute Embeddings.

5.4.2 Similar Distance for Similar Relationships.

5.4.3 Discovering Discrepancies in Non-functional Attributes.

5.4.4 Discovering Consistent Changes.

5.5 Significance of Attributes

5.6 Quantifying the Severity of Changes

5.6.1 Algorithm to Compute Severity of Changes.

6 Experimentation and Results

6.1 Dataset and Experiment Setup

6.2 Generating Attribute Embeddings

6.3 Effectiveness

6.3.1 Accuracy.

6.3.2 Precision, Recall, and F-score.

7 Conclusion

Footnotes

References

Index Terms

Recommendations

Detecting Changes in Crowdsourced Social Media Images

Fake News on Facebook and Twitter: Investigating How People (Don't) Investigate

Understanding the Use of Images to Spread COVID-19 Misinformation on Twitter

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations