Search | arXiv e-print repository

doi 10.1007/s11042-024-19734-3

Differentially Processed Optimized Collaborative Rich Text Editor

Authors: Nishtha Jatana, Mansehej Singh, Charu Gupta, Geetika Dhand, Shaily Malik, Pankaj Dadheech, Nagender Aneja, Sandhya Aneja

Abstract: A collaborative real-time text editor is an application that allows multiple users to edit a document simultaneously and merge their contributions automatically. It can be made collaborative by implementing a conflict resolution algorithm either on the client side (in peer-to-peer collaboration) or on the server side (when using web sockets and a central server to monitor state changes). Although… ▽ More A collaborative real-time text editor is an application that allows multiple users to edit a document simultaneously and merge their contributions automatically. It can be made collaborative by implementing a conflict resolution algorithm either on the client side (in peer-to-peer collaboration) or on the server side (when using web sockets and a central server to monitor state changes). Although web sockets are ideal for real-time text editors, using multiple collaborative editors on one connection can create problems. This is because a single web connection cannot monitor which user is collaborating on which application state, leading to unnecessary network queries and data being delivered to the wrong state. To address this issue, the current solution is to open multiple web socket connections, with one web socket per collaboration application. However, this can add significant overhead proportional to the number of apps utilized. In this study, we demonstrate an algorithm that enables using a single web socket for multiple collaborative applications in a collaborative editor. Our method involves modifying the socket's code to track which application's shared state is being worked on and by whom. This allows for the simultaneous collaboration of multiple states in real-time, with infinite users, without opening a different socket for each application. Our optimized editor showed an efficiency improvement of over 96% in access time duration. This approach can be implemented in other collaborative editors and web applications with similar architecture to improve performance and eliminate issues arising from network overload. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Journal ref: Multimedia Tools and Applications (2024)

arXiv:2403.12618 [pdf, other]

NewsCaption: Named-Entity aware Captioning for Out-of-Context Media

Authors: Anurag Singh, Shivangi Aneja

Abstract: With the increasing influence of social media, online misinformation has grown to become a societal issue. The motivation for our work comes from the threat caused by cheapfakes, where an unaltered image is described using a news caption in a new but false-context. The main challenge in detecting such out-of-context multimedia is the unavailability of large-scale datasets. Several detection method… ▽ More With the increasing influence of social media, online misinformation has grown to become a societal issue. The motivation for our work comes from the threat caused by cheapfakes, where an unaltered image is described using a news caption in a new but false-context. The main challenge in detecting such out-of-context multimedia is the unavailability of large-scale datasets. Several detection methods employ randomly selected captions to generate out-of-context training inputs. However, these randomly matched captions are not truly representative of out-of-context scenarios due to inconsistencies between the image description and the matched caption. We aim to address these limitations by introducing a novel task of out-of-context caption generation. In this work, we propose a new method that generates a realistic out-of-context caption given visual and textual context. We also demonstrate that the semantics of the generated captions can be controlled using the textual context. We also evaluate our method against several baselines and our method improves over the image captioning baseline by 6.2% BLUE-4, 2.96% CiDEr, 11.5% ROUGE, and 7.3% METEOR △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2312.08459 [pdf, other]

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

Abstract: We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coh… ▽ More We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coherent motion sequences. We propose a new latent diffusion model for this task, operating in the expression space of neural parametric head models, to synthesize audio-driven realistic head sequences. In the absence of a dataset with corresponding NPHM expressions to audio, we optimize for these correspondences to produce a dataset of temporally-optimized NPHM expressions fit to audio-video recordings of people talking. To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of volumetric human heads, representing a significant advancement in the field of audio-driven 3D animation. Notably, our approach stands out in its ability to generate plausible motion sequences that can produce high-fidelity head animation coupled with the NPHM shape space. Our experimental results substantiate the effectiveness of FaceTalk, consistently achieving superior and visually natural motion, encompassing diverse facial expressions and styles, outperforming existing methods by 75% in perceptual user study evaluation. △ Less

Submitted 17 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: Paper Video: https://youtu.be/7Jf0kawrA3Q Project Page: https://shivangi-aneja.github.io/projects/facetalk/

Journal ref: CVPR 2024

arXiv:2309.09816 [pdf]

Application of Novel PACS-based Informatics Platform to Identify Imaging Based Predictors of CDKN2A Allelic Status in Glioblastomas

Authors: Niklas Tillmanns, Jan Lost, Joanna Tabor, Sagar Vasandani, Shaurey Vetsa, Neelan Marianayagam, Kanat Yalcin, E. Zeynep Erson-Omay, Marc von Reppert, Leon Jekel, Sara Merkaj, Divya Ramakrishnan, Arman Avesta, Irene Dixe de Oliveira Santo, Lan Jin, Anita Huttner, Khaled Bousabarah, Ichiro Ikuta, MingDe Lin, Sanjay Aneja, Bernd Turowski, Mariam Aboian, Jennifer Moliterno

Abstract: Gliomas with CDKN2A mutations are known to have worse prognosis but imaging features of these gliomas are unknown. Our goal is to identify CDKN2A specific qualitative imaging biomarkers in glioblastomas using a new informatics workflow that enables rapid analysis of qualitative imaging features with Visually AcceSAble Rembrandtr Images (VASARI) for large datasets in PACS. Sixty nine patients under… ▽ More Gliomas with CDKN2A mutations are known to have worse prognosis but imaging features of these gliomas are unknown. Our goal is to identify CDKN2A specific qualitative imaging biomarkers in glioblastomas using a new informatics workflow that enables rapid analysis of qualitative imaging features with Visually AcceSAble Rembrandtr Images (VASARI) for large datasets in PACS. Sixty nine patients undergoing GBM resection with CDKN2A status determined by whole-exome sequencing were included. GBMs on magnetic resonance images were automatically 3D segmented using deep learning algorithms incorporated within PACS. VASARI features were assessed using FHIR forms integrated within PACS. GBMs without CDKN2A alterations were significantly larger (64% vs. 30%, p=0.007) compared to tumors with homozygous deletion (HOMDEL) and heterozygous loss (HETLOSS). Lesions larger than 8 cm were four times more likely to have no CDKN2A alteration (OR: 4.3; 95% CI:1.5-12.1; p<0.001). We developed a novel integrated PACS informatics platform for the assessment of GBM molecular subtypes and show that tumors with HOMDEL are more likely to have radiographic evidence of pial invasion and less likely to have deep white matter invasion or subependymal invasion. These imaging features may allow noninvasive identification of CDKN2A allele status. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 23 pages, 5 figures

arXiv:2309.05053 [pdf]

A Large Open Access Dataset of Brain Metastasis 3D Segmentations with Clinical and Imaging Feature Information

Authors: Divya Ramakrishnan, Leon Jekel, Saahil Chadha, Anastasia Janas, Harrison Moy, Nazanin Maleki, Matthew Sala, Manpreet Kaur, Gabriel Cassinelli Petersen, Sara Merkaj, Marc von Reppert, Ujjwal Baid, Spyridon Bakas, Claudia Kirsch, Melissa Davis, Khaled Bousabarah, Wolfgang Holler, MingDe Lin, Malte Westerhoff, Sanjay Aneja, Fatima Memon, Mariam S. Aboian

Abstract: Resection and whole brain radiotherapy (WBRT) are the standards of care for the treatment of patients with brain metastases (BM) but are often associated with cognitive side effects. Stereotactic radiosurgery (SRS) involves a more targeted treatment approach and has been shown to avoid the side effects associated with WBRT. However, SRS requires precise identification and delineation of BM. While… ▽ More Resection and whole brain radiotherapy (WBRT) are the standards of care for the treatment of patients with brain metastases (BM) but are often associated with cognitive side effects. Stereotactic radiosurgery (SRS) involves a more targeted treatment approach and has been shown to avoid the side effects associated with WBRT. However, SRS requires precise identification and delineation of BM. While many AI algorithms have been developed for this purpose, their clinical adoption has been limited due to poor model performance in the clinical setting. Major reasons for non-generalizable algorithms are the limitations in the datasets used for training the AI network. The purpose of this study was to create a large, heterogenous, annotated BM dataset for training and validation of AI models to improve generalizability. We present a BM dataset of 200 patients with pretreatment T1, T1 post-contrast, T2, and FLAIR MR images. The dataset includes contrast-enhancing and necrotic 3D segmentations on T1 post-contrast and whole tumor (including peritumoral edema) 3D segmentations on FLAIR. Our dataset contains 975 contrast-enhancing lesions, many of which are sub centimeter, along with clinical and imaging feature information. We used a streamlined approach to database-building leveraging a PACS-integrated segmentation workflow. △ Less

Submitted 11 September, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

Comments: 12 pages, 2 figures, 1 table

arXiv:2306.00838 [pdf, other]

The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI

Authors: Ahmed W. Moawad, Anastasia Janas, Ujjwal Baid, Divya Ramakrishnan, Rachit Saluja, Nader Ashraf, Leon Jekel, Raisa Amiruddin, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Sanjay Aneja, Syed Muhammad Anwar, Timothy Bergquist, Evan Calabrese, Veronica Chiang, Verena Chung, Gian Marco Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Juan Eugenio Iglesias, Zhifan Jiang , et al. (206 additional authors not shown)

Abstract: The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and chara… ▽ More The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and characterizes the challenging cases that impacted the performance of the winning algorithms. Untreated brain metastases on standard anatomic MRI sequences (T1, T2, FLAIR, T1PG) from eight contributed international datasets were annotated in stepwise method: published UNET algorithms, student, neuroradiologist, final approver neuroradiologist. Segmentations were ranked based on lesion-wise Dice and Hausdorff distance (HD95) scores. False positives (FP) and false negatives (FN) were rigorously penalized, receiving a score of 0 for Dice and a fixed penalty of 374 for HD95. Eight datasets comprising 1303 studies were annotated, with 402 studies (3076 lesions) released on Synapse as publicly available datasets to challenge competitors. Additionally, 31 studies (139 lesions) were held out for validation, and 59 studies (218 lesions) were used for testing. Segmentation accuracy was measured as rank across subjects, with the winning team achieving a LesionWise mean score of 7.9. Common errors among the leading teams included false negatives for small lesions and misregistration of masks in space.The BraTS-METS 2023 challenge successfully curated well-annotated, diverse datasets and identified common errors, facilitating the translation of BM segmentation across varied clinical environments and providing personalized volumetric reports to patients undergoing BM treatment. △ Less

Submitted 17 June, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2212.01406 [pdf, other]

doi 10.1145/3588432.3591566

ClipFace: Text-guided Editing of Textured 3D Morphable Models

Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

Abstract: We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We leverage the geometric expressiveness of 3D morphable models, which inherently possess limited controllability and texture expressivity, and develop a… ▽ More We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We leverage the geometric expressiveness of 3D morphable models, which inherently possess limited controllability and texture expressivity, and develop a self-supervised generative model to jointly synthesize expressive, textured, and articulated faces in 3D. We enable high-quality texture generation for 3D faces by adversarial self-supervised training, guided by differentiable rendering against collections of real RGB images. Controllable editing and manipulation are given by language prompts to adapt texture and expression of the 3D morphable model. To this end, we propose a neural network that predicts both texture and expression latent codes of the morphable model. Our model is trained in a self-supervised fashion by exploiting differentiable rendering and losses based on a pre-trained CLIP model. Once trained, our model jointly predicts face textures in UV-space, along with expression parameters to capture both geometry and texture changes in facial expressions in a single forward pass. We further show the applicability of our method to generate temporally changing textures for a given animation sequence. △ Less

Submitted 24 April, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Paper Video: https://youtu.be/toGOQqFuNmA Project website: https://shivangi-aneja.github.io/projects/clipface/

Journal ref: SIGGRAPH 2023

arXiv:2211.14505 [pdf]

doi 10.11591/ijai.v11.i4.pp1252-1260

Predictive linguistic cues for fake news: a societal artificial intelligence problem

Authors: Sandhya Aneja, Nagender Aneja, Ponnurangam Kumaraguru

Abstract: Media news are making a large part of public opinion and, therefore, must not be fake. News on web sites, blogs, and social media must be analyzed before being published. In this paper, we present linguistic characteristics of media news items to differentiate between fake news and real news using machine learning algorithms. Neural fake news generation, headlines created by machines, semantic inc… ▽ More Media news are making a large part of public opinion and, therefore, must not be fake. News on web sites, blogs, and social media must be analyzed before being published. In this paper, we present linguistic characteristics of media news items to differentiate between fake news and real news using machine learning algorithms. Neural fake news generation, headlines created by machines, semantic incongruities in text and image captions generated by machine are other types of fake news problems. These problems use neural networks which mainly control distributional features rather than evidence. We propose applying correlation between features set and class, and correlation among the features to compute correlation attribute evaluation metric and covariance metric to compute variance of attributes over the news items. Features unique, negative, positive, and cardinal numbers with high values on the metrics are observed to provide a high area under the curve (AUC) and F1-score. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Journal ref: IAES International Journal of Artificial Intelligence (IJ-AI), Vol. 11, No. 4, December 2022, pp. 1252~1260

arXiv:2209.11359 [pdf, other]

CUTS: A Deep Learning and Topological Framework for Multigranular Unsupervised Medical Image Segmentation

Authors: Chen Liu, Matthew Amodio, Liangbo L. Shen, Feng Gao, Arman Avesta, Sanjay Aneja, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy

Abstract: Segmenting medical images is critical to facilitating both patient diagnoses and quantitative research. A major limiting factor is the lack of labeled data, as obtaining expert annotations for each new set of imaging data and task can be labor intensive and inconsistent among annotators. We present CUTS, an unsupervised deep learning framework for medical image segmentation. CUTS operates in two s… ▽ More Segmenting medical images is critical to facilitating both patient diagnoses and quantitative research. A major limiting factor is the lack of labeled data, as obtaining expert annotations for each new set of imaging data and task can be labor intensive and inconsistent among annotators. We present CUTS, an unsupervised deep learning framework for medical image segmentation. CUTS operates in two stages. For each image, it produces an embedding map via intra-image contrastive learning and local patch reconstruction. Then, these embeddings are partitioned at dynamic granularity levels that correspond to the data topology. CUTS yields a series of coarse-to-fine-grained segmentations that highlight features at various granularities. We applied CUTS to retinal fundus images and two types of brain MRI images to delineate structures and patterns at different scales. When evaluated against predefined anatomical masks, CUTS improved the dice coefficient and Hausdorff distance by at least 10% compared to existing unsupervised methods. Finally, CUTS showed performance on par with Segment Anything Models (SAM, MedSAM, SAM-Med2D) pre-trained on gigantic labeled datasets. △ Less

Submitted 25 June, 2024; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: Accepted to the 27th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2024)

arXiv:2207.14534 [pdf, other]

ACM Multimedia Grand Challenge on Detecting Cheapfakes

Authors: Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Sohail Ahmed Khan, Michael Riegler, Pål Halvorsen, Chris Bregler, Balu Adsumilli

Abstract: Cheapfake is a recently coined term that encompasses non-AI (``cheap'') manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alterati… ▽ More Cheapfake is a recently coined term that encompasses non-AI (``cheap'') manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset. △ Less

Submitted 29 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2107.05297

arXiv:2206.12685 [pdf]

doi 10.11591/ijai.v11.i3.pp961-968

Defense against adversarial attacks on deep convolutional neural networks through nonlocal denoising

Authors: Sandhya Aneja, Nagender Aneja, Pg Emeroylariffion Abas, Abdul Ghani Naim

Abstract: Despite substantial advances in network architecture performance, the susceptibility of adversarial attacks makes deep learning challenging to implement in safety-critical applications. This paper proposes a data-centric approach to addressing this problem. A nonlocal denoising method with different luminance values has been used to generate adversarial examples from the Modified National Institut… ▽ More Despite substantial advances in network architecture performance, the susceptibility of adversarial attacks makes deep learning challenging to implement in safety-critical applications. This paper proposes a data-centric approach to addressing this problem. A nonlocal denoising method with different luminance values has been used to generate adversarial examples from the Modified National Institute of Standards and Technology database (MNIST) and Canadian Institute for Advanced Research (CIFAR-10) data sets. Under perturbation, the method provided absolute accuracy improvements of up to 9.3% in the MNIST data set and 13% in the CIFAR-10 data set. Training using transformed images with higher luminance values increases the robustness of the classifier. We have shown that transfer learning is disadvantageous for adversarial machine learning. The results indicate that simple adversarial examples can improve resilience and make deep learning easier to apply in various applications. △ Less

Submitted 25 June, 2022; originally announced June 2022.

Journal ref: IAES International Journal of Artificial Intelligence, Vol. 11, No. 3, September 2022, pp. 961~968, ISSN: 2252-8938

arXiv:2112.15523 [pdf]

Transfer learning for cancer diagnosis in histopathological images

Authors: Sandhya Aneja, Nagender Aneja, Pg Emeroylariffion Abas, Abdul Ghani Naim

Abstract: Transfer learning allows us to exploit knowledge gained from one task to assist in solving another but relevant task. In modern computer vision research, the question is which architecture performs better for a given dataset. In this paper, we compare the performance of 14 pre-trained ImageNet models on the histopathologic cancer detection dataset, where each model has been configured as a naive m… ▽ More Transfer learning allows us to exploit knowledge gained from one task to assist in solving another but relevant task. In modern computer vision research, the question is which architecture performs better for a given dataset. In this paper, we compare the performance of 14 pre-trained ImageNet models on the histopathologic cancer detection dataset, where each model has been configured as a naive model, feature extractor model, or fine-tuned model. Densenet161 has been shown to have high precision whilst Resnet101 has a high recall. A high precision model is suitable to be used when follow-up examination cost is high, whilst low precision but a high recall/sensitivity model can be used when the cost of follow-up examination is low. Results also show that transfer learning helps to converge a model faster. △ Less

Submitted 31 December, 2021; originally announced December 2021.

Journal ref: IAES International Journal of Artificial Intelligence (IJ-AI), Vol. 11, No. 1, March 2022, pp. 129~136

arXiv:2112.12546 [pdf, other]

Collaborative adversary nodes learning on the logs of IoT devices in an IoT network

Authors: Sandhya Aneja, Melanie Ang Xuan En, Nagender Aneja

Abstract: Artificial Intelligence (AI) development has encouraged many new research areas, including AI-enabled Internet of Things (IoT) network. AI analytics and intelligent paradigms greatly improve learning efficiency and accuracy. Applying these learning paradigms to network scenarios provide technical advantages of new networking solutions. In this paper, we propose an improved approach for IoT securit… ▽ More Artificial Intelligence (AI) development has encouraged many new research areas, including AI-enabled Internet of Things (IoT) network. AI analytics and intelligent paradigms greatly improve learning efficiency and accuracy. Applying these learning paradigms to network scenarios provide technical advantages of new networking solutions. In this paper, we propose an improved approach for IoT security from data perspective. The network traffic of IoT devices can be analyzed using AI techniques. The Adversary Learning (AdLIoTLog) model is proposed using Recurrent Neural Network (RNN) with attention mechanism on sequences of network events in the network traffic. We define network events as a sequence of the time series packets of protocols captured in the log. We have considered different packets TCP packets, UDP packets, and HTTP packets in the network log to make the algorithm robust. The distributed IoT devices can collaborate to cripple our world which is extending to Internet of Intelligence. The time series packets are converted into structured data by removing noise and adding timestamps. The resulting data set is trained by RNN and can detect the node pairs collaborating with each other. We used the BLEU score to evaluate the model performance. Our results show that the predicting performance of the AdLIoTLog model trained by our method degrades by 3-4% in the presence of attack in comparison to the scenario when the network is not under attack. AdLIoTLog can detect adversaries because when adversaries are present the model gets duped by the collaborative events and therefore predicts the next event with a biased event rather than a benign event. We conclude that AI can provision ubiquitous learning for the new generation of Internet of Things. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2112.09151 [pdf, other]

TAFIM: Targeted Adversarial Attacks against Facial Image Manipulations

Authors: Shivangi Aneja, Lev Markhasin, Matthias Niessner

Abstract: Face manipulation methods can be misused to affect an individual's privacy or to spread disinformation. To this end, we introduce a novel data-driven approach that produces image-specific perturbations which are embedded in the original images. The key idea is that these protected images prevent face manipulation by causing the manipulation model to produce a predefined manipulation target (unifor… ▽ More Face manipulation methods can be misused to affect an individual's privacy or to spread disinformation. To this end, we introduce a novel data-driven approach that produces image-specific perturbations which are embedded in the original images. The key idea is that these protected images prevent face manipulation by causing the manipulation model to produce a predefined manipulation target (uniformly colored output image in our case) instead of the actual manipulation. In addition, we propose to leverage differentiable compression approximation, hence making generated perturbations robust to common image compression. In order to prevent against multiple manipulation methods simultaneously, we further propose a novel attention-based fusion of manipulation-specific perturbations. Compared to traditional adversarial attacks that optimize noise patterns for each image individually, our generalized model only needs a single forward pass, thus running orders of magnitude faster and allowing for easy integration in image processing stacks, even on resource-constrained devices like smartphones. △ Less

Submitted 25 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: (ECCV 2022 Paper) Video: https://youtu.be/11VMOJI7tKg Project Page: https://shivangi-aneja.github.io/projects/tafim/

arXiv:2107.05297 [pdf, other]

MMSys'21 Grand Challenge on Detecting Cheapfakes

Authors: Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Michael Alexander Riegler, Paal Halvorsen, Matthias Niessner, Balu Adsumilli, Chris Bregler

Abstract: Cheapfake is a recently coined term that encompasses non-AI ("cheap") manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration… ▽ More Cheapfake is a recently coined term that encompasses non-AI ("cheap") manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse} of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2104.02830 [pdf, other]

IndoFashion : Apparel Classification for Indian Ethnic Clothes

Authors: Pranjal Singh Rajput, Shivangi Aneja

Abstract: Cloth categorization is an important research problem that is used by e-commerce websites for displaying correct products to the end-users. Indian clothes have a large number of clothing categories both for men and women. The traditional Indian clothes like "Saree" and "Dhoti" are worn very differently from western clothes like t-shirts and jeans. Moreover, the style and patterns of ethnic clothes… ▽ More Cloth categorization is an important research problem that is used by e-commerce websites for displaying correct products to the end-users. Indian clothes have a large number of clothing categories both for men and women. The traditional Indian clothes like "Saree" and "Dhoti" are worn very differently from western clothes like t-shirts and jeans. Moreover, the style and patterns of ethnic clothes have a very different distribution from western outfits. Thus the models trained on standard cloth datasets fail miserably on ethnic outfits. To address these challenges, we introduce the first large-scale ethnic dataset of over 106k images with 15 different categories for fine-grained classification of Indian ethnic clothes. We gathered a diverse dataset from a large number of Indian e-commerce websites. We then evaluate several baselines for the cloth classification task on our dataset. In the end, we obtain 88.43% classification accuracy. We hope that our dataset would foster research in the development of several algorithms such as cloth classification, landmark detection, especially for ethnic clothes. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2103.15446 [pdf]

doi 10.1145/3393822.3432314

Deep Image Compositing

Authors: Shivangi Aneja, Soham Mazumder

Abstract: In image editing, the most common task is pasting objects from one image to the other and then eventually adjusting the manifestation of the foreground object with the background object. This task is called image compositing. But image compositing is a challenging problem that requires professional editing skills and a considerable amount of time. Not only these professionals are expensive to hire… ▽ More In image editing, the most common task is pasting objects from one image to the other and then eventually adjusting the manifestation of the foreground object with the background object. This task is called image compositing. But image compositing is a challenging problem that requires professional editing skills and a considerable amount of time. Not only these professionals are expensive to hire, but the tools (like Adobe Photoshop) used for doing such tasks are also expensive to purchase making the overall task of image compositing difficult for people without this skillset. In this work, we aim to cater to this problem by making composite images look realistic. To achieve this, we are using Generative Adversarial Networks (GANS). By training the network with a diverse range of filters applied to the images and special loss functions, the model is able to decode the color histogram of the foreground and background part of the image and also learns to blend the foreground object with the background. The hue and saturation values of the image play an important role as discussed in this paper. To the best of our knowledge, this is the first work that uses GANs for the task of image compositing. Currently, there is no benchmark dataset available for image compositing. So we created the dataset and will also make the dataset publicly available for benchmarking. Experimental results on this dataset show that our method outperforms all current state-of-the-art methods. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: ESSE 2020: Proceedings of the 2020 European Symposium on Software Engineering

Journal ref: In Proceedings of the 2020 European Symposium on Software Engineering (pp. 101-104) 2020

arXiv:2101.06278 [pdf, other]

COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning

Authors: Shivangi Aneja, Chris Bregler, Matthias Nießner

Abstract: Despite the recent attention to DeepFakes, one of the most prevalent ways to mislead audiences on social media is the use of unaltered images in a new but false context. To address these challenges and support fact-checkers, we propose a new method that automatically detects out-of-context image and text pairs. Our key insight is to leverage the grounding of image with text to distinguish out-of-c… ▽ More Despite the recent attention to DeepFakes, one of the most prevalent ways to mislead audiences on social media is the use of unaltered images in a new but false context. To address these challenges and support fact-checkers, we propose a new method that automatically detects out-of-context image and text pairs. Our key insight is to leverage the grounding of image with text to distinguish out-of-context scenarios that cannot be disambiguated with language alone. We propose a self-supervised training strategy where we only need a set of captioned images. At train time, our method learns to selectively align individual objects in an image with textual claims, without explicit supervision. At test time, we check if both captions correspond to the same object(s) in the image but are semantically different, which allows us to make fairly accurate out-of-context predictions. Our method achieves 85% out-of-context detection accuracy. To facilitate benchmarking of this task, we create a large-scale dataset of 200K images with 450K textual captions from a variety of news websites, blogs, and social media posts. The dataset and source code is publicly available at https://shivangi-aneja.github.io/projects/cosmos/. △ Less

Submitted 21 April, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

Comments: Video : https://youtu.be/riI3Cl2xy10

arXiv:2009.04682 [pdf, other]

Network Traffic Analysis based IoT Device Identification

Authors: Rajarshi Roy Chowdhury, Sandhya Aneja, Nagender Aneja, Emeroylariffion Abas

Abstract: Device identification is the process of identifying a device on Internet without using its assigned network or other credentials. The sharp rise of usage in Internet of Things (IoT) devices has imposed new challenges in device identification due to a wide variety of devices, protocols and control interfaces. In a network, conventional IoT devices identify each other by utilizing IP or MAC addresse… ▽ More Device identification is the process of identifying a device on Internet without using its assigned network or other credentials. The sharp rise of usage in Internet of Things (IoT) devices has imposed new challenges in device identification due to a wide variety of devices, protocols and control interfaces. In a network, conventional IoT devices identify each other by utilizing IP or MAC addresses, which are prone to spoofing. Moreover, IoT devices are low power devices with minimal embedded security solution. To mitigate the issue in IoT devices, fingerprint (DFP) for device identification can be used. DFP identifies a device by using implicit identifiers, such as network traffic (or packets), radio signal, which a device used for its communication over the network. These identifiers are closely related to the device hardware and software features. In this paper, we exploit TCP/IP packet header features to create a device fingerprint utilizing device originated network packets. We present a set of three metrics which separate some features from a packet which contribute actively for device identification. To evaluate our approach, we used publicly accessible two datasets. We observed the accuracy of device genre classification 99.37% and 83.35% of accuracy in the identification of an individual device from IoT Sentinel dataset. However, using UNSW dataset device type identification accuracy reached up to 97.78%. △ Less

Submitted 10 September, 2020; originally announced September 2020.

Journal ref: International Conference on Big Data and Internet of Things (BDIOT2020), August 22-24, 2020

arXiv:2007.16011 [pdf, other]

Neural Machine Translation model for University Email Application

Authors: Sandhya Aneja, Siti Nur Afikah Bte Abdul Mazid, Nagender Aneja

Abstract: Machine translation has many applications such as news translation, email translation, official letter translation etc. Commercial translators, e.g. Google Translation lags in regional vocabulary and are unable to learn the bilingual text in the source and target languages within the input. In this paper, a regional vocabulary-based application-oriented Neural Machine Translation (NMT) model is pr… ▽ More Machine translation has many applications such as news translation, email translation, official letter translation etc. Commercial translators, e.g. Google Translation lags in regional vocabulary and are unable to learn the bilingual text in the source and target languages within the input. In this paper, a regional vocabulary-based application-oriented Neural Machine Translation (NMT) model is proposed over the data set of emails used at the University for communication over a period of three years. A state-of-the-art Sequence-to-Sequence Neural Network for ML -> EN and EN -> ML translations is compared with Google Translate using Gated Recurrent Unit Recurrent Neural Network machine translation model with attention decoder. The low BLEU score of Google Translation in comparison to our model indicates that the application based regional models are better. The low BLEU score of EN -> ML of our model and Google Translation indicates that the Malay Language has complex language features corresponding to English. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: International Conference on Natural Language Processing (ICNLP 2020), July 11-13, 2020

Journal ref: International Conference on Natural Language Processing (ICNLP 2020), July 11-13, 2020

arXiv:2006.11863 [pdf, other]

Generalized Zero and Few-Shot Transfer for Facial Forgery Detection

Authors: Shivangi Aneja, Matthias Nießner

Abstract: We propose Deep Distribution Transfer(DDT), a new transfer learning approach to address the problem of zero and few-shot transfer in the context of facial forgery detection. We examine how well a model (pre-)trained with one forgery creation method generalizes towards a previously unseen manipulation technique or different dataset. To facilitate this transfer, we introduce a new mixture model-base… ▽ More We propose Deep Distribution Transfer(DDT), a new transfer learning approach to address the problem of zero and few-shot transfer in the context of facial forgery detection. We examine how well a model (pre-)trained with one forgery creation method generalizes towards a previously unseen manipulation technique or different dataset. To facilitate this transfer, we introduce a new mixture model-based loss formulation that learns a multi-modal distribution, with modes corresponding to class categories of the underlying data of the source forgery method. Our core idea is to first pre-train an encoder neural network, which maps each mode of this distribution to the respective class labels, i.e., real or fake images in the source domain by minimizing wasserstein distance between them. In order to transfer this model to a new domain, we associate a few target samples with one of the previously trained modes. In addition, we propose a spatial mixup augmentation strategy that further helps generalization across domains. We find this learning strategy to be surprisingly effective at domain transfer compared to a traditional classification or even state-of-the-art domain adaptation/few-shot learning methods. For instance, compared to the best baseline, our method improves the classification accuracy by 4.88% for zero-shot and by 8.38% for the few-shot case transferred from the FaceForensics++ to Dessa dataset. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: Project page: https://shivangi-aneja.github.io/ddt/

arXiv:1910.13334 [pdf]

doi 10.1016/j.prro.2020.06.001

NCI Workshop on Artificial Intelligence in Radiation Oncology: Training the Next Generation

Authors: John Kang, Reid F. Thompson, Sanjay Aneja, Constance Lehman, Andrew Trister, James Zou, Ceferino Obcemea, Issam El Naqa

Abstract: Artificial intelligence (AI) is about to touch every aspect of radiotherapy from consultation, treatment planning, quality assurance, therapy delivery, to outcomes modeling. There is an urgent need to train radiation oncologists and medical physicists in data science to help shepherd AI solutions into clinical practice. Poorly trained personnel may do more harm than good when attempting to apply r… ▽ More Artificial intelligence (AI) is about to touch every aspect of radiotherapy from consultation, treatment planning, quality assurance, therapy delivery, to outcomes modeling. There is an urgent need to train radiation oncologists and medical physicists in data science to help shepherd AI solutions into clinical practice. Poorly trained personnel may do more harm than good when attempting to apply rapidly developing and complex technologies. As the amount of AI research expands in our field, the radiation oncology community needs to discuss how to educate future generations in this area. The National Cancer Institute (NCI) Workshop on AI in Radiation Oncology (Shady Grove, MD, April 4-5, 2019) was the first (https://dctd.cancer.gov/NewsEvents/20190523_ai_in_radiation_oncology.htm) of two data science workshops in radiation oncology hosted by the NCI in 2019. During this workshop, the Training and Education Working Group was formed by volunteers among the invited attendees. Its members represent radiation oncology, medical physics, radiology, computer science, industry, and the NCI. In this perspective article written by members of the Training and Education Working Group, we provide and discuss Action Points relevant for future trainees interested in radiation oncology AI: (1) creating AI awareness and responsible conduct; (2) implementing a practical didactic curriculum; (3) creating a publicly available database of training resources; and (4) accelerate learning and funding opportunities. Together, these Action Points can facilitate the translation of AI into clinical practice. △ Less

Submitted 5 July, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: 1 page, 4 tables. Workshop paper from NCI Workshop on Artificial Intelligence in Radiation Oncology. Accepted for publication by Practical Radiation Oncology

arXiv:1909.08774 [pdf, other]

doi 10.1109/ICAIT47043.2019.8987286

Transfer Learning using CNN for Handwritten Devanagari Character Recognition

Authors: Nagender Aneja, Sandhya Aneja

Abstract: This paper presents an analysis of pre-trained models to recognize handwritten Devanagari alphabets using transfer learning for Deep Convolution Neural Network (DCNN). This research implements AlexNet, DenseNet, Vgg, and Inception ConvNet as a fixed feature extractor. We implemented 15 epochs for each of AlexNet, DenseNet 121, DenseNet 201, Vgg 11, Vgg 16, Vgg 19, and Inception V3. Results show th… ▽ More This paper presents an analysis of pre-trained models to recognize handwritten Devanagari alphabets using transfer learning for Deep Convolution Neural Network (DCNN). This research implements AlexNet, DenseNet, Vgg, and Inception ConvNet as a fixed feature extractor. We implemented 15 epochs for each of AlexNet, DenseNet 121, DenseNet 201, Vgg 11, Vgg 16, Vgg 19, and Inception V3. Results show that Inception V3 performs better in terms of accuracy achieving 99% accuracy with average epoch time 16.3 minutes while AlexNet performs fastest with 2.2 minutes per epoch and achieving 98\% accuracy. △ Less

Submitted 18 September, 2019; originally announced September 2019.

Journal ref: IEEE International Conference on Advances in Information Technology (ICAIT), ICAIT - 2019

arXiv:1902.01926 [pdf]

doi 10.1109/IOTAIS.2018.8600824

IoT Device Fingerprint using Deep Learning

Authors: Sandhya Aneja, Nagender Aneja, Md Shohidul Islam

Abstract: Device Fingerprinting (DFP) is the identification of a device without using its network or other assigned identities including IP address, Medium Access Control (MAC) address, or International Mobile Equipment Identity (IMEI) number. DFP identifies a device using information from the packets which the device uses to communicate over the network. Packets are received at a router and processed to ex… ▽ More Device Fingerprinting (DFP) is the identification of a device without using its network or other assigned identities including IP address, Medium Access Control (MAC) address, or International Mobile Equipment Identity (IMEI) number. DFP identifies a device using information from the packets which the device uses to communicate over the network. Packets are received at a router and processed to extract the information. In this paper, we worked on the DFP using Inter Arrival Time (IAT). IAT is the time interval between the two consecutive packets received. This has been observed that the IAT is unique for a device because of different hardware and the software used for the device. The existing work on the DFP uses the statistical techniques to analyze the IAT and to further generate the information using which a device can be identified uniquely. This work presents a novel idea of DFP by plotting graphs of IAT for packets with each graph plotting 100 IATs and subsequently processing the resulting graphs for the identification of the device. This approach improves the efficiency to identify a device DFP due to achieved benchmark of the deep learning libraries in the image processing. We configured Raspberry Pi to work as a router and installed our packet sniffer application on the Raspberry Pi . The packet sniffer application captured the packet information from the connected devices in a log file. We connected two Apple devices iPad4 and iPhone 7 Plus to the router and created IAT graphs for these two devices. We used Convolution Neural Network (CNN) to identify the devices and observed the accuracy of 86.7%. △ Less

Submitted 17 January, 2019; originally announced February 2019.

Showing 1–24 of 24 results for author: Aneja, S