Search | arXiv e-print repository

Enhancing material property prediction with ensemble deep graph convolutional networks

Authors: Chowdhury Mohammad Abid Rahman, Ghadendra Bhandari, Nasser M Nasrabadi, Aldo H. Romero, Prashnna K. Gyawali

Abstract: Machine learning (ML) models have emerged as powerful tools for accelerating materials discovery and design by enabling accurate predictions of properties from compositional and structural data. These capabilities are vital for developing advanced technologies across fields such as energy, electronics, and biomedicine, potentially reducing the time and resources needed for new material exploration… ▽ More Machine learning (ML) models have emerged as powerful tools for accelerating materials discovery and design by enabling accurate predictions of properties from compositional and structural data. These capabilities are vital for developing advanced technologies across fields such as energy, electronics, and biomedicine, potentially reducing the time and resources needed for new material exploration and promoting rapid innovation cycles. Recent efforts have focused on employing advanced ML algorithms, including deep learning - based graph neural network, for property prediction. Additionally, ensemble models have proven to enhance the generalizability and robustness of ML and DL. However, the use of such ensemble strategies in deep graph networks for material property prediction remains underexplored. Our research provides an in-depth evaluation of ensemble strategies in deep learning - based graph neural network, specifically targeting material property prediction tasks. By testing the Crystal Graph Convolutional Neural Network (CGCNN) and its multitask version, MT-CGCNN, we demonstrated that ensemble techniques, especially prediction averaging, substantially improve precision beyond traditional metrics for key properties like formation energy per atom ($ΔE^{f}$), band gap ($E_{g}$) and density ($ρ$) in 33,990 stable inorganic materials. These findings support the broader application of ensemble methods to enhance predictive accuracy in the field. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 9 pages, 6 figures, 2 tables

arXiv:2407.15964 [pdf, other]

FDWST: Fingerphoto Deblurring using Wavelet Style Transfer

Authors: David Keaton, Amol S. Joshi, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: The challenge of deblurring fingerphoto images, or generating a sharp fingerphoto from a given blurry one, is a significant problem in the realm of computer vision. To address this problem, we propose a fingerphoto deblurring architecture referred to as Fingerphoto Deblurring using Wavelet Style Transfer (FDWST), which aims to utilize the information transmission of Style Transfer techniques to de… ▽ More The challenge of deblurring fingerphoto images, or generating a sharp fingerphoto from a given blurry one, is a significant problem in the realm of computer vision. To address this problem, we propose a fingerphoto deblurring architecture referred to as Fingerphoto Deblurring using Wavelet Style Transfer (FDWST), which aims to utilize the information transmission of Style Transfer techniques to deblur fingerphotos. Additionally, we incorporate the Discrete Wavelet Transform (DWT) for its ability to split images into different frequency bands. By combining these two techniques, we can perform Style Transfer over a wide array of wavelet frequency bands, thereby increasing the quality and variety of sharpness information transferred from sharp to blurry images. Using this technique, our model was able to drastically increase the quality of the generated fingerphotos compared to their originals, and achieve a peak matching accuracy of 0.9907 when tasked with matching a deblurred fingerphoto to its sharp counterpart, outperforming multiple other state-of-the-art deblurring and style transfer techniques. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: Accepted by IJCB 2024

arXiv:2407.15730 [pdf, other]

Neural-based Video Compression on Solar Dynamics Observatory Images

Authors: Atefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Nasser M. Nasrabadi, Barbara J. Thompson, Michael S. F. Kirk, Daniel da Silva

Abstract: NASA's Solar Dynamics Observatory (SDO) mission collects extensive data to monitor the Sun's daily activity. In the realm of space mission design, data compression plays a crucial role in addressing the challenges posed by limited telemetry rates. The primary objective of data compression is to facilitate efficient data management and transmission to work within the constrained bandwidth, thereby… ▽ More NASA's Solar Dynamics Observatory (SDO) mission collects extensive data to monitor the Sun's daily activity. In the realm of space mission design, data compression plays a crucial role in addressing the challenges posed by limited telemetry rates. The primary objective of data compression is to facilitate efficient data management and transmission to work within the constrained bandwidth, thereby ensuring that essential information is captured while optimizing the utilization of available resources. This paper introduces a neural video compression technique that achieves a high compression ratio for the SDO's image data collection. The proposed approach focuses on leveraging both temporal and spatial redundancies in the data, leading to a more efficient compression. In this work, we introduce an architecture based on the Transformer model, which is specifically designed to capture both local and global information from input images in an effective and efficient manner. Additionally, our network is equipped with an entropy model that can accurately model the probability distribution of the latent representations and improves the speed of the entropy decoding step. The entropy model leverages a channel-dependent approach and utilizes checkerboard-shaped local and global spatial contexts. By combining the Transformer-based video compression network with our entropy model, the proposed compression algorithm demonstrates superior performance over traditional video codecs like H.264 and H.265, as confirmed by our experimental results. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.14972 [pdf, other]

ARoFace: Alignment Robustness to Improve Low-Quality Face Recognition

Authors: Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei, Nasser M. Nasrabadi

Abstract: Aiming to enhance Face Recognition (FR) on Low-Quality (LQ) inputs, recent studies suggest incorporating synthetic LQ samples into training. Although promising, the quality factors that are considered in these works are general rather than FR-specific, \eg, atmospheric turbulence, resolution, \etc. Motivated by the observation of the vulnerability of current FR models to even small Face Alignmen… ▽ More Aiming to enhance Face Recognition (FR) on Low-Quality (LQ) inputs, recent studies suggest incorporating synthetic LQ samples into training. Although promising, the quality factors that are considered in these works are general rather than FR-specific, \eg, atmospheric turbulence, resolution, \etc. Motivated by the observation of the vulnerability of current FR models to even small Face Alignment Errors (FAE) in LQ images, we present a simple yet effective method that considers FAE as another quality factor that is tailored to FR. We seek to improve LQ FR by enhancing FR models' robustness to FAE. To this aim, we formalize the problem as a combination of differentiable spatial transformations and adversarial data augmentation in FR. We perturb the alignment of the training samples using a controllable spatial transformation and enrich the training with samples expressing FAE. We demonstrate the benefits of the proposed method by conducting evaluations on IJB-B, IJB-C, IJB-S (+4.3\% Rank1), and TinyFace (+2.63\%). \href{https://github.com/msed-Ebrahimi/ARoFace}{https://github.com/msed-Ebrahimi/ARoFace} △ Less

Submitted 20 July, 2024; originally announced July 2024.

Comments: European Conference on Computer Vision (ECCV 2024)

arXiv:2403.16258 [pdf, other]

Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis

Authors: Atefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Nasser M. Nasrabadi

Abstract: While replacing Gaussian decoders with a conditional diffusion model enhances the perceptual quality of reconstructions in neural image compression, their lack of inductive bias for image data restricts their ability to achieve state-of-the-art perceptual levels. To address this limitation, we adopt a non-isotropic diffusion model at the decoder side. This model imposes an inductive bias aimed at… ▽ More While replacing Gaussian decoders with a conditional diffusion model enhances the perceptual quality of reconstructions in neural image compression, their lack of inductive bias for image data restricts their ability to achieve state-of-the-art perceptual levels. To address this limitation, we adopt a non-isotropic diffusion model at the decoder side. This model imposes an inductive bias aimed at distinguishing between frequency contents, thereby facilitating the generation of high-quality images. Moreover, our framework is equipped with a novel entropy model that accurately models the probability distribution of latent representation by exploiting spatio-channel correlations in latent space, while accelerating the entropy decoding step. This channel-wise entropy model leverages both local and global spatial contexts within each channel chunk. The global spatial context is built upon the Transformer, which is specifically designed for image compression tasks. The designed Transformer employs a Laplacian-shaped positional encoding, the learnable parameters of which are adaptively adjusted for each channel cluster. Our experiments demonstrate that our proposed framework yields better perceptual quality compared to cutting-edge generative-based codecs, and the proposed entropy model contributes to notable bitrate savings. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2401.12340 [pdf, other]

doi 10.1109/TAES.2023.3337768

Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target Annotation

Authors: Shoaib Meraj Sami, Md Mahedi Hasan, Nasser M. Nasrabadi, Raghuveer Rao

Abstract: Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled information of the source domain images. The transductive transfer learning (TTL) method that incorporates a CycleGAN-based unpaired domain translation n… ▽ More Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled information of the source domain images. The transductive transfer learning (TTL) method that incorporates a CycleGAN-based unpaired domain translation network has been previously proposed in the literature for effective ATR annotation. Although this method demonstrates great potential for ATR, it severely suffers from lower annotation performance, higher Fréchet Inception Distance (FID) score, and the presence of visual artifacts in the synthetic images. To address these issues, we propose a hybrid contrastive learning base unpaired domain translation (H-CUT) network that achieves a significantly lower FID score. It incorporates both attention and entropy to emphasize the domain-specific region, a noisy feature mixup module to generate high variational synthetic negative patches, and a modulated noise contrastive estimation (MoNCE) loss to reweight all negative patches using optimal transport for better performance. Our proposed contrastive learning and cycle-consistency-based TTL (C3TTL) framework consists of two H-CUT networks and two classifiers. It simultaneously optimizes cycle-consistency, MoNCE, and identity losses. In C3TTL, two H-CUT networks have been employed through a bijection mapping to feed the reconstructed source domain images into a pretrained classifier to guide the optimal target domain classifier. Extensive experimental analysis conducted on three ATR datasets demonstrates that the proposed C3TTL method is effective in annotating civilian and military vehicles, as well as ship targets. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: This Paper is Accepted in IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS. This Arxiv version is an older version than the reviewed version

arXiv:2401.03037 [pdf, other]

doi 10.1109/TBIOM.2023.3349218

CATFace: Cross-Attribute-Guided Transformer with Self-Attention Distillation for Low-Quality Face Recognition

Authors: Niloufar Alipour Talemi, Hossein Kashiani, Nasser M. Nasrabadi

Abstract: Although face recognition (FR) has achieved great success in recent years, it is still challenging to accurately recognize faces in low-quality images due to the obscured facial details. Nevertheless, it is often feasible to make predictions about specific soft biometric (SB) attributes, such as gender, and baldness even in dealing with low-quality images. In this paper, we propose a novel multi-b… ▽ More Although face recognition (FR) has achieved great success in recent years, it is still challenging to accurately recognize faces in low-quality images due to the obscured facial details. Nevertheless, it is often feasible to make predictions about specific soft biometric (SB) attributes, such as gender, and baldness even in dealing with low-quality images. In this paper, we propose a novel multi-branch neural network that leverages SB attribute information to boost the performance of FR. To this end, we propose a cross-attribute-guided transformer fusion (CATF) module that effectively captures the long-range dependencies and relationships between FR and SB feature representations. The synergy created by the reciprocal flow of information in the dual cross-attention operations of the proposed CATF module enhances the performance of FR. Furthermore, we introduce a novel self-attention distillation framework that effectively highlights crucial facial regions, such as landmarks by aligning low-quality images with those of their high-quality counterparts in the feature space. The proposed self-attention distillation regularizes our network to learn a unified quality-invariant feature representation in unconstrained environments. We conduct extensive experiments on various FR benchmarks varying in quality. Experimental results demonstrate the superiority of our FR method compared to state-of-the-art FR studies. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Accepted in IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM), 2024

arXiv:2311.02855 [pdf, other]

doi 10.1109/TAES.2023.3332056

Neural-based Compression Scheme for Solar Image Data

Authors: Ali Zafari, Atefeh Khoshkhahtinat, Jeremy A. Grajeda, Piyush M. Mehta, Nasser M. Nasrabadi, Laura E. Boucheron, Barbara J. Thompson, Michael S. F. Kirk, Daniel da Silva

Abstract: Studying the solar system and especially the Sun relies on the data gathered daily from space missions. These missions are data-intensive and compressing this data to make them efficiently transferable to the ground station is a twofold decision to make. Stronger compression methods, by distorting the data, can increase data throughput at the cost of accuracy which could affect scientific analysis… ▽ More Studying the solar system and especially the Sun relies on the data gathered daily from space missions. These missions are data-intensive and compressing this data to make them efficiently transferable to the ground station is a twofold decision to make. Stronger compression methods, by distorting the data, can increase data throughput at the cost of accuracy which could affect scientific analysis of the data. On the other hand, preserving subtle details in the compressed data requires a high amount of data to be transferred, reducing the desired gains from compression. In this work, we propose a neural network-based lossy compression method to be used in NASA's data-intensive imagery missions. We chose NASA's SDO mission which transmits 1.4 terabytes of data each day as a proof of concept for the proposed algorithm. In this work, we propose an adversarially trained neural network, equipped with local and non-local attention modules to capture both the local and global structure of the image resulting in a better trade-off in rate-distortion (RD) compared to conventional hand-engineered codecs. The RD variational autoencoder used in this work is jointly trained with a channel-dependent entropy model as a shared prior between the analysis and synthesis transforms to make the entropy coding of the latent code more effective. Our neural image compression algorithm outperforms currently-in-use and state-of-the-art codecs such as JPEG and JPEG-2000 in terms of the RD performance when compressing extreme-ultraviolet (EUV) data. As a proof of concept for use of this algorithm in SDO data analysis, we have performed coronal hole (CH) detection using our compressed images, and generated consistent segmentations, even at a compression rate of $\sim0.1$ bits per pixel (compared to 8 bits per pixel on the original data) using EUV data from SDO. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems (TAES). arXiv admin note: text overlap with arXiv:2210.06478

arXiv:2309.13137 [pdf, other]

Trading-off Mutual Information on Feature Aggregation for Face Recognition

Authors: Mohammad Akyash, Ali Zafari, Nasser M. Nasrabadi

Abstract: Despite the advances in the field of Face Recognition (FR), the precision of these methods is not yet sufficient. To improve the FR performance, this paper proposes a technique to aggregate the outputs of two state-of-the-art (SOTA) deep FR models, namely ArcFace and AdaFace. In our approach, we leverage the transformer attention mechanism to exploit the relationship between different parts of two… ▽ More Despite the advances in the field of Face Recognition (FR), the precision of these methods is not yet sufficient. To improve the FR performance, this paper proposes a technique to aggregate the outputs of two state-of-the-art (SOTA) deep FR models, namely ArcFace and AdaFace. In our approach, we leverage the transformer attention mechanism to exploit the relationship between different parts of two feature maps. By doing so, we aim to enhance the overall discriminative power of the FR system. One of the challenges in feature aggregation is the effective modeling of both local and global dependencies. Conventional transformers are known for their ability to capture long-range dependencies, but they often struggle with modeling local dependencies accurately. To address this limitation, we augment the self-attention mechanism to capture both local and global dependencies effectively. This allows our model to take advantage of the overlapping receptive fields present in corresponding locations of the feature maps. However, fusing two feature maps from different FR models might introduce redundancies to the face embedding. Since these models often share identical backbone architectures, the resulting feature maps may contain overlapping information, which can mislead the training process. To overcome this problem, we leverage the principle of Information Bottleneck to obtain a maximally informative facial representation. This ensures that the aggregated features retain the most relevant and discriminative information while minimizing redundant or misleading details. To evaluate the effectiveness of our proposed method, we conducted experiments on popular benchmarks and compared our results with state-of-the-art algorithms. The consistent improvement we observed in these benchmarks demonstrates the efficacy of our approach in enhancing FR performance. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: Accepted to 22$^{nd}$ IEEE International Conference on Machine Learning and Applications 2023 (ICMLA)

arXiv:2309.10799 [pdf, other]

Multi-Context Dual Hyper-Prior Neural Image Compression

Authors: Atefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Mohammad Akyash, Hossein Kashiani, Nasser M. Nasrabadi

Abstract: Transform and entropy models are the two core components in deep image compression neural networks. Most existing learning-based image compression methods utilize convolutional-based transform, which lacks the ability to model long-range dependencies, primarily due to the limited receptive field of the convolution operation. To address this limitation, we propose a Transformer-based nonlinear tran… ▽ More Transform and entropy models are the two core components in deep image compression neural networks. Most existing learning-based image compression methods utilize convolutional-based transform, which lacks the ability to model long-range dependencies, primarily due to the limited receptive field of the convolution operation. To address this limitation, we propose a Transformer-based nonlinear transform. This transform has the remarkable ability to efficiently capture both local and global information from the input image, leading to a more decorrelated latent representation. In addition, we introduce a novel entropy model that incorporates two different hyperpriors to model cross-channel and spatial dependencies of the latent representation. To further improve the entropy model, we add a global context that leverages distant relationships to predict the current latent more accurately. This global context employs a causal attention mechanism to extract long-range information in a content-dependent manner. Our experiments show that our proposed framework performs better than the state-of-the-art methods in terms of rate-distortion performance. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE 22$^nd$ International Conference on Machine Learning and Applications 2023 (ICMLA) - Selected for Oral Presentation

arXiv:2309.10791 [pdf, other]

Multi-spectral Entropy Constrained Neural Compression of Solar Imagery

Authors: Ali Zafari, Atefeh Khoshkhahtinat, Piyush M. Mehta, Nasser M. Nasrabadi, Barbara J. Thompson, Michael S. F. Kirk, Daniel da Silva

Abstract: Missions studying the dynamic behaviour of the Sun are defined to capture multi-spectral images of the sun and transmit them to the ground station in a daily basis. To make transmission efficient and feasible, image compression systems need to be exploited. Recently successful end-to-end optimized neural network-based image compression systems have shown great potential to be used in an ad-hoc man… ▽ More Missions studying the dynamic behaviour of the Sun are defined to capture multi-spectral images of the sun and transmit them to the ground station in a daily basis. To make transmission efficient and feasible, image compression systems need to be exploited. Recently successful end-to-end optimized neural network-based image compression systems have shown great potential to be used in an ad-hoc manner. In this work we have proposed a transformer-based multi-spectral neural image compressor to efficiently capture redundancies both intra/inter-wavelength. To unleash the locality of window-based self attention mechanism, we propose an inter-window aggregated token multi head self attention. Additionally to make the neural compressor autoencoder shift invariant, a randomly shifted window attention mechanism is used which makes the transformer blocks insensitive to translations in their input domain. We demonstrate that the proposed approach not only outperforms the conventional compression algorithms but also it is able to better decorrelates images along the multiple wavelengths compared to single spectral compression. △ Less

Submitted 10 October, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE 22$^{nd}$ International Conference on Machine Learning and Applications 2023 (ICMLA)

arXiv:2309.10784 [pdf, other]

Context-Aware Neural Video Compression on Solar Dynamics Observatory

Authors: Atefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Nasser M. Nasrabadi, Barbara J. Thompson, Michael S. F. Kirk, Daniel da Silva

Abstract: NASA's Solar Dynamics Observatory (SDO) mission collects large data volumes of the Sun's daily activity. Data compression is crucial for space missions to reduce data storage and video bandwidth requirements by eliminating redundancies in the data. In this paper, we present a novel neural Transformer-based video compression approach specifically designed for the SDO images. Our primary objective i… ▽ More NASA's Solar Dynamics Observatory (SDO) mission collects large data volumes of the Sun's daily activity. Data compression is crucial for space missions to reduce data storage and video bandwidth requirements by eliminating redundancies in the data. In this paper, we present a novel neural Transformer-based video compression approach specifically designed for the SDO images. Our primary objective is to efficiently exploit the temporal and spatial redundancies inherent in solar images to obtain a high compression ratio. Our proposed architecture benefits from a novel Transformer block called Fused Local-aware Window (FLaWin), which incorporates window-based self-attention modules and an efficient fused local-aware feed-forward (FLaFF) network. This architectural design allows us to simultaneously capture short-range and long-range information while facilitating the extraction of rich and diverse contextual representations. Moreover, this design choice results in reduced computational complexity. Experimental results demonstrate the significant contribution of the FLaWin Transformer block to the compression performance, outperforming conventional hand-engineered video codecs such as H.264 and H.265 in terms of rate-distortion trade-off. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE 22$^{nd}$ International Conference on Machine Learning and Applications 2023 (ICMLA) - Selected for Oral Presentation

arXiv:2308.10392 [pdf, other]

Towards Generalizable Morph Attack Detection with Consistency Regularization

Authors: Hossein Kashiani, Niloufar Alipour Talemi, Mohammad Saeed Ebrahimi Saadabadi, Nasser M. Nasrabadi

Abstract: Though recent studies have made significant progress in morph attack detection by virtue of deep neural networks, they often fail to generalize well to unseen morph attacks. With numerous morph attacks emerging frequently, generalizable morph attack detection has gained significant attention. This paper focuses on enhancing the generalization capability of morph attack detection from the perspecti… ▽ More Though recent studies have made significant progress in morph attack detection by virtue of deep neural networks, they often fail to generalize well to unseen morph attacks. With numerous morph attacks emerging frequently, generalizable morph attack detection has gained significant attention. This paper focuses on enhancing the generalization capability of morph attack detection from the perspective of consistency regularization. Consistency regularization operates under the premise that generalizable morph attack detection should output consistent predictions irrespective of the possible variations that may occur in the input space. In this work, to reach this objective, two simple yet effective morph-wise augmentations are proposed to explore a wide space of realistic morph transformations in our consistency regularization. Then, the model is regularized to learn consistently at the logit as well as embedding levels across a wide range of morph-wise augmented images. The proposed consistency regularization aligns the abstraction in the hidden layers of our model across the morph attack images which are generated from diverse domains in the wild. Experimental results demonstrate the superior generalization and robustness performance of our proposed method compared to the state-of-the-art studies. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: Accepted to the IEEE International Joint Conference on Biometrics (IJCB), 2023

arXiv:2308.09234 [pdf, other]

Deep Boosting Multi-Modal Ensemble Face Recognition with Sample-Level Weighting

Authors: Sahar Rahimi Malakshan, Mohammad Saeed Ebrahimi Saadabadi, Nima Najafzadeh, Nasser M. Nasrabadi

Abstract: Deep convolutional neural networks have achieved remarkable success in face recognition (FR), partly due to the abundant data availability. However, the current training benchmarks exhibit an imbalanced quality distribution; most images are of high quality. This poses issues for generalization on hard samples since they are underrepresented during training. In this work, we employ the multi-model… ▽ More Deep convolutional neural networks have achieved remarkable success in face recognition (FR), partly due to the abundant data availability. However, the current training benchmarks exhibit an imbalanced quality distribution; most images are of high quality. This poses issues for generalization on hard samples since they are underrepresented during training. In this work, we employ the multi-model boosting technique to deal with this issue. Inspired by the well-known AdaBoost, we propose a sample-level weighting approach to incorporate the importance of different samples into the FR loss. Individual models of the proposed framework are experts at distinct levels of sample hardness. Therefore, the combination of models leads to a robust feature extractor without losing the discriminability on the easy samples. Also, for incorporating the sample hardness into the training criterion, we analytically show the effect of sample mining on the important aspects of current angular margin loss functions, i.e., margin and scale. The proposed method shows superior performance in comparison with the state-of-the-art algorithms in extensive experiments on the CFP-FP, LFW, CPLFW, CALFW, AgeDB, TinyFace, IJB-B, and IJB-C evaluation datasets. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 2023 IEEE International Joint Conference on Biometrics (IJCB)

arXiv:2308.09230 [pdf, other]

CCFace: Classification Consistency for Low-Resolution Face Recognition

Authors: Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Hossein Kashiani, Nasser M. Nasrabadi

Abstract: In recent years, deep face recognition methods have demonstrated impressive results on in-the-wild datasets. However, these methods have shown a significant decline in performance when applied to real-world low-resolution benchmarks like TinyFace or SCFace. To address this challenge, we propose a novel classification consistency knowledge distillation approach that transfers the learned classifier… ▽ More In recent years, deep face recognition methods have demonstrated impressive results on in-the-wild datasets. However, these methods have shown a significant decline in performance when applied to real-world low-resolution benchmarks like TinyFace or SCFace. To address this challenge, we propose a novel classification consistency knowledge distillation approach that transfers the learned classifier from a high-resolution model to a low-resolution network. This approach helps in finding discriminative representations for low-resolution instances. To further improve the performance, we designed a knowledge distillation loss using the adaptive angular penalty inspired by the success of the popular angular margin loss function. The adaptive penalty reduces overfitting on low-resolution samples and alleviates the convergence issue of the model integrated with data augmentation. Additionally, we utilize an asymmetric cross-resolution learning approach based on the state-of-the-art semi-supervised representation learning paradigm to improve discriminability on low-resolution instances and prevent them from forming a cluster. Our proposed method outperforms state-of-the-art approaches on low-resolution benchmarks, with a three percent improvement on TinyFace while maintaining performance on high-resolution benchmarks. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 2023 IEEE International Joint Conference on Biometrics (IJCB)

arXiv:2308.07243 [pdf, other]

AAFACE: Attribute-aware Attentional Network for Face Recognition

Authors: Niloufar Alipour Talemi, Hossein Kashiani, Sahar Rahimi Malakshan, Mohammad Saeed Ebrahimi Saadabadi, Nima Najafzadeh, Mohammad Akyash, Nasser M. Nasrabadi

Abstract: In this paper, we present a new multi-branch neural network that simultaneously performs soft biometric (SB) prediction as an auxiliary modality and face recognition (FR) as the main task. Our proposed network named AAFace utilizes SB attributes to enhance the discriminative ability of FR representation. To achieve this goal, we propose an attribute-aware attentional integration (AAI) module to pe… ▽ More In this paper, we present a new multi-branch neural network that simultaneously performs soft biometric (SB) prediction as an auxiliary modality and face recognition (FR) as the main task. Our proposed network named AAFace utilizes SB attributes to enhance the discriminative ability of FR representation. To achieve this goal, we propose an attribute-aware attentional integration (AAI) module to perform weighted integration of FR with SB feature maps. Our proposed AAI module is not only fully context-aware but also capable of learning complex relationships between input features by means of the sequential multi-scale channel and spatial sub-modules. Experimental results verify the superiority of our proposed network compared with the state-of-the-art (SoTA) SB prediction and FR methods. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: Accepted to $30^{th}$ IEEE International Conference on Image Processing (ICIP 2023) as an oral presentation

arXiv:2308.02620 [pdf, other]

doi 10.1109/ICIP49359.2023.10222816

Frequency Disentangled Features in Neural Image Compression

Authors: Ali Zafari, Atefeh Khoshkhahtinat, Piyush Mehta, Mohammad Saeed Ebrahimi Saadabadi, Mohammad Akyash, Nasser M. Nasrabadi

Abstract: The design of a neural image compression network is governed by how well the entropy model matches the true distribution of the latent code. Apart from the model capacity, this ability is indirectly under the effect of how close the relaxed quantization is to the actual hard quantization. Optimizing the parameters of a rate-distortion variational autoencoder (R-D VAE) is ruled by this approximated… ▽ More The design of a neural image compression network is governed by how well the entropy model matches the true distribution of the latent code. Apart from the model capacity, this ability is indirectly under the effect of how close the relaxed quantization is to the actual hard quantization. Optimizing the parameters of a rate-distortion variational autoencoder (R-D VAE) is ruled by this approximated quantization scheme. In this paper, we propose a feature-level frequency disentanglement to help the relaxed scalar quantization achieve lower bit rates by guiding the high entropy latent features to include most of the low-frequency texture of the image. In addition, to strengthen the de-correlating power of the transformer-based analysis/synthesis transform, an augmented self-attention score calculation based on the Hadamard product is utilized during both encoding and decoding. Channel-wise autoregressive entropy modeling takes advantage of the proposed frequency separation as it inherently directs high-informational low-frequency channels to the first chunks and conditions the future chunks on it. The proposed network not only outperforms hand-engineered codecs, but also neural network-based codecs built on computation-heavy spatially autoregressive entropy models. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: Accepted to 30$^{th}$ IEEE International Conference on Image Processing (ICIP 2023)

arXiv:2306.04000 [pdf, other]

A Quality Aware Sample-to-Sample Comparison for Face Recognition

Authors: Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Ali Zafari, Moktari Mostofa, Nasser M. Nasrabadi

Abstract: Currently available face datasets mainly consist of a large number of high-quality and a small number of low-quality samples. As a result, a Face Recognition (FR) network fails to learn the distribution of low-quality samples since they are less frequent during training (underrepresented). Moreover, current state-of-the-art FR training paradigms are based on the sample-to-center comparison (i.e.,… ▽ More Currently available face datasets mainly consist of a large number of high-quality and a small number of low-quality samples. As a result, a Face Recognition (FR) network fails to learn the distribution of low-quality samples since they are less frequent during training (underrepresented). Moreover, current state-of-the-art FR training paradigms are based on the sample-to-center comparison (i.e., Softmax-based classifier), which results in a lack of uniformity between train and test metrics. This work integrates a quality-aware learning process at the sample level into the classification training paradigm (QAFace). In this regard, Softmax centers are adaptively guided to pay more attention to low-quality samples by using a quality-aware function. Accordingly, QAFace adds a quality-based adjustment to the updating procedure of the Softmax-based classifier to improve the performance on the underrepresented low-quality samples. Our method adaptively finds and assigns more attention to the recognizable low-quality samples in the training datasets. In addition, QAFace ignores the unrecognizable low-quality samples using the feature magnitude as a proxy for quality. As a result, QAFace prevents class centers from getting distracted from the optimal direction. The proposed method is superior to the state-of-the-art algorithms in extensive experimental results on the CFP-FP, LFW, CPLFW, CALFW, AgeDB, IJB-B, and IJB-C datasets. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV23)

arXiv:2305.13886 [pdf, other]

Deep Transductive Transfer Learning for Automatic Target Recognition

Authors: Shoaib M. Sami, Nasser M. Nasrabadi, Raghuveer Rao

Abstract: One of the major obstacles in designing an automatic target recognition (ATR) algorithm, is that there are often labeled images in one domain (i.e., infrared source domain) but no annotated images in the other target domains (i.e., visible, SAR, LIDAR). Therefore, automatically annotating these images is essential to build a robust classifier in the target domain based on the labeled images of the… ▽ More One of the major obstacles in designing an automatic target recognition (ATR) algorithm, is that there are often labeled images in one domain (i.e., infrared source domain) but no annotated images in the other target domains (i.e., visible, SAR, LIDAR). Therefore, automatically annotating these images is essential to build a robust classifier in the target domain based on the labeled images of the source domain. Transductive transfer learning is an effective way to adapt a network to a new target domain by utilizing a pretrained ATR network in the source domain. We propose an unpaired transductive transfer learning framework where a CycleGAN model and a well-trained ATR classifier in the source domain are used to construct an ATR classifier in the target domain without having any labeled data in the target domain. We employ a CycleGAN model to transfer the mid-wave infrared (MWIR) images to visible (VIS) domain images (or visible to MWIR domain). To train the transductive CycleGAN, we optimize a cost function consisting of the adversarial, identity, cycle-consistency, and categorical cross-entropy loss for both the source and target classifiers. In this paper, we perform a detailed experimental analysis on the challenging DSIAC ATR dataset. The dataset consists of ten classes of vehicles at different poses and distances ranging from 1-5 kilometers on both the MWIR and VIS domains. In our experiment, we assume that the images in the VIS domain are the unlabeled target dataset. We first detect and crop the vehicles from the raw images and then project them into a common distance of 2 kilometers. Our proposed transductive CycleGAN achieves 71.56% accuracy in classifying the visible domain vehicles in the DSIAC ATR dataset. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 10 pages, 5 figures

Journal ref: SPIE Defense & Commercial Sensing 2023, Conference 12521, Automatic target recognition XXXIII, Orlando, Florida

arXiv:2210.10182 [pdf, other]

Landmark Enforcement and Style Manipulation for Generative Morphing

Authors: Samuel Price, Sobhan Soleymani, Nasser M. Nasrabadi

Abstract: Morph images threaten Facial Recognition Systems (FRS) by presenting as multiple individuals, allowing an adversary to swap identities with another subject. Morph generation using generative adversarial networks (GANs) results in high-quality morphs unaffected by the spatial artifacts caused by landmark-based methods, but there is an apparent loss in identity with standard GAN-based morphing metho… ▽ More Morph images threaten Facial Recognition Systems (FRS) by presenting as multiple individuals, allowing an adversary to swap identities with another subject. Morph generation using generative adversarial networks (GANs) results in high-quality morphs unaffected by the spatial artifacts caused by landmark-based methods, but there is an apparent loss in identity with standard GAN-based morphing methods. In this paper, we propose a novel StyleGAN morph generation technique by introducing a landmark enforcement method to resolve this issue. Considering this method, we aim to enforce the landmarks of the morph image to represent the spatial average of the landmarks of the bona fide faces and subsequently the morph images to inherit the geometric identity of both bona fide faces. Exploration of the latent space of our model is conducted using Principal Component Analysis (PCA) to accentuate the effect of both the bona fide faces on the morphed latent representation and address the identity loss issue with latent domain averaging. Additionally, to improve high frequency reconstruction in the morphs, we study the train-ability of the noise input for the StyleGAN2 model. △ Less

Submitted 18 October, 2022; originally announced October 2022.

arXiv:2210.06478 [pdf, other]

doi 10.1109/ICMLA55696.2022.00035

Attention-Based Generative Neural Image Compression on Solar Dynamics Observatory

Authors: Ali Zafari, Atefeh Khoshkhahtinat, Piyush M. Mehta, Nasser M. Nasrabadi, Barbara J. Thompson, Daniel da Silva, Michael S. F. Kirk

Abstract: NASA's Solar Dynamics Observatory (SDO) mission gathers 1.4 terabytes of data each day from its geosynchronous orbit in space. SDO data includes images of the Sun captured at different wavelengths, with the primary scientific goal of understanding the dynamic processes governing the Sun. Recently, end-to-end optimized artificial neural networks (ANN) have shown great potential in performing image… ▽ More NASA's Solar Dynamics Observatory (SDO) mission gathers 1.4 terabytes of data each day from its geosynchronous orbit in space. SDO data includes images of the Sun captured at different wavelengths, with the primary scientific goal of understanding the dynamic processes governing the Sun. Recently, end-to-end optimized artificial neural networks (ANN) have shown great potential in performing image compression. ANN-based compression schemes have outperformed conventional hand-engineered algorithms for lossy and lossless image compression. We have designed an ad-hoc ANN-based image compression scheme to reduce the amount of data needed to be stored and retrieved on space missions studying solar dynamics. In this work, we propose an attention module to make use of both local and non-local attention mechanisms in an adversarially trained neural image compression network. We have also demonstrated the superior perceptual quality of this neural image compressor. Our proposed algorithm for compressing images downloaded from the SDO spacecraft performs better in rate-distortion trade-off than the popular currently-in-use image compression codecs such as JPEG and JPEG2000. In addition we have shown that the proposed method outperforms state-of-the art lossy transform coding compression codec, i.e., BPG. △ Less

Submitted 4 May, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted to IEEE 21$^{st}$ International Conference on Machine Learning and Applications 2022 (ICMLA) - Selected for Oral Presentation

arXiv:2209.08130 [pdf, other]

Robust Ensemble Morph Detection with Domain Generalization

Authors: Hossein Kashiani, Shoaib Meraj Sami, Sobhan Soleymani, Nasser M. Nasrabadi

Abstract: Although a substantial amount of studies is dedicated to morph detection, most of them fail to generalize for morph faces outside of their training paradigm. Moreover, recent morph detection methods are highly vulnerable to adversarial attacks. In this paper, we intend to learn a morph detection model with high generalization to a wide range of morphing attacks and high robustness against differen… ▽ More Although a substantial amount of studies is dedicated to morph detection, most of them fail to generalize for morph faces outside of their training paradigm. Moreover, recent morph detection methods are highly vulnerable to adversarial attacks. In this paper, we intend to learn a morph detection model with high generalization to a wide range of morphing attacks and high robustness against different adversarial attacks. To this aim, we develop an ensemble of convolutional neural networks (CNNs) and Transformer models to benefit from their capabilities simultaneously. To improve the robust accuracy of the ensemble model, we employ multi-perturbation adversarial training and generate adversarial examples with high transferability for several single models. Our exhaustive evaluations demonstrate that the proposed robust ensemble model generalizes to several morphing attacks and face datasets. In addition, we validate that our robust ensemble model gain better robustness against several adversarial attacks while outperforming the state-of-the-art studies. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: Accepted in IJCB 2022

arXiv:2209.07001 [pdf, other]

Pose Attention-Guided Profile-to-Frontal Face Recognition

Authors: Moktari Mostofa, Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Nasser M. Nasrabadi

Abstract: In recent years, face recognition systems have achieved exceptional success due to promising advances in deep learning architectures. However, they still fail to achieve expected accuracy when matching profile images against a gallery of frontal images. Current approaches either perform pose normalization (i.e., frontalization) or disentangle pose information for face recognition. We instead propo… ▽ More In recent years, face recognition systems have achieved exceptional success due to promising advances in deep learning architectures. However, they still fail to achieve expected accuracy when matching profile images against a gallery of frontal images. Current approaches either perform pose normalization (i.e., frontalization) or disentangle pose information for face recognition. We instead propose a new approach to utilize pose as an auxiliary information via an attention mechanism. In this paper, we hypothesize that pose attended information using an attention mechanism can guide contextual and distinctive feature extraction from profile faces, which further benefits a better representation learning in an embedded domain. To achieve this, first, we design a unified coupled profile-to-frontal face recognition network. It learns the mapping from faces to a compact embedding subspace via a class-specific contrastive loss. Second, we develop a novel pose attention block (PAB) to specially guide the pose-agnostic feature extraction from profile faces. To be more specific, PAB is designed to explicitly help the network to focus on important features along both channel and spatial dimension while learning discriminative yet pose invariant features in an embedding subspace. To validate the effectiveness of our proposed method, we conduct experiments on both controlled and in the wild benchmarks including Multi-PIE, CFP, IJBC, and show superiority over the state of the arts. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: 10 pages, 5 figures, Accepted at IJCB, 2022

arXiv:2209.03456 [pdf, other]

Information Maximization for Extreme Pose Face Recognition

Authors: Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Sobhan Soleymani, Moktari Mostofa, Nasser M. Nasrabadi

Abstract: In this paper, we seek to draw connections between the frontal and profile face images in an abstract embedding space. We exploit this connection using a coupled-encoder network to project frontal/profile face images into a common latent embedding space. The proposed model forces the similarity of representations in the embedding space by maximizing the mutual information between two views of the… ▽ More In this paper, we seek to draw connections between the frontal and profile face images in an abstract embedding space. We exploit this connection using a coupled-encoder network to project frontal/profile face images into a common latent embedding space. The proposed model forces the similarity of representations in the embedding space by maximizing the mutual information between two views of the face. The proposed coupled-encoder benefits from three contributions for matching faces with extreme pose disparities. First, we leverage our pose-aware contrastive learning to maximize the mutual information between frontal and profile representations of identities. Second, a memory buffer, which consists of latent representations accumulated over past iterations, is integrated into the model so it can refer to relatively much more instances than the mini-batch size. Third, a novel pose-aware adversarial domain adaptation method forces the model to learn an asymmetric mapping from profile to frontal representation. In our framework, the coupled-encoder learns to enlarge the margin between the distribution of genuine and imposter faces, which results in high mutual information between different views of the same identity. The effectiveness of the proposed model is investigated through extensive experiments, evaluations, and ablation studies on four benchmark datasets, and comparison with the compelling state-of-the-art algorithms. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2022)

arXiv:2209.01199 [pdf, other]

Revisiting Outer Optimization in Adversarial Training

Authors: Ali Dabouei, Fariborz Taherkhani, Sobhan Soleymani, Nasser M. Nasrabadi

Abstract: Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the o… ▽ More Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods. Furthermore, ENGM alleviates major shortcomings of AT including robust overfitting and high sensitivity to hyperparameter settings. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2206.13740 [pdf, other]

GAN-based Super-Resolution and Segmentation of Retinal Layers in Optical coherence tomography Scans

Authors: Paria Jeihouni, Omid Dehzangi, Annahita Amireskandari, Ali Rezai, Nasser M. Nasrabadi

Abstract: In this paper, we design a Generative Adversarial Network (GAN)-based solution for super-resolution and segmentation of optical coherence tomography (OCT) scans of the retinal layers. OCT has been identified as a non-invasive and inexpensive modality of imaging to discover potential biomarkers for the diagnosis and progress determination of neurodegenerative diseases, such as Alzheimer's Disease (… ▽ More In this paper, we design a Generative Adversarial Network (GAN)-based solution for super-resolution and segmentation of optical coherence tomography (OCT) scans of the retinal layers. OCT has been identified as a non-invasive and inexpensive modality of imaging to discover potential biomarkers for the diagnosis and progress determination of neurodegenerative diseases, such as Alzheimer's Disease (AD). Current hypotheses presume the thickness of the retinal layers, which are analyzable within OCT scans, can be effective biomarkers. As a logical first step, this work concentrates on the challenging task of retinal layer segmentation and also super-resolution for higher clarity and accuracy. We propose a GAN-based segmentation model and evaluate incorporating popular networks, namely, U-Net and ResNet, in the GAN architecture with additional blocks of transposed convolution and sub-pixel convolution for the task of upscaling OCT images from low to high resolution by a factor of four. We also incorporate the Dice loss as an additional reconstruction loss term to improve the performance of this joint optimization task. Our best model configuration empirically achieved the Dice coefficient of 0.867 and mIOU of 0.765. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: 5 pages,7 figures

arXiv:2206.05277 [pdf, other]

Superresolution and Segmentation of OCT scans using Multi-Stage adversarial Guided Attention Training

Authors: Paria Jeihouni, Omid Dehzangi, Annahita Amireskandari, Ali Dabouei, Ali Rezai, Nasser M. Nasrabadi

Abstract: Optical coherence tomography (OCT) is one of the non-invasive and easy-to-acquire biomarkers (the thickness of the retinal layers, which is detectable within OCT scans) being investigated to diagnose Alzheimer's disease (AD). This work aims to segment the OCT images automatically; however, it is a challenging task due to various issues such as the speckle noise, small target region, and unfavorabl… ▽ More Optical coherence tomography (OCT) is one of the non-invasive and easy-to-acquire biomarkers (the thickness of the retinal layers, which is detectable within OCT scans) being investigated to diagnose Alzheimer's disease (AD). This work aims to segment the OCT images automatically; however, it is a challenging task due to various issues such as the speckle noise, small target region, and unfavorable imaging conditions. In our previous work, we have proposed the multi-stage & multi-discriminatory generative adversarial network (MultiSDGAN) to translate OCT scans in high-resolution segmentation labels. In this investigation, we aim to evaluate and compare various combinations of channel and spatial attention to the MultiSDGAN architecture to extract more powerful feature maps by capturing rich contextual relationships to improve segmentation performance. Moreover, we developed and evaluated a guided mutli-stage attention framework where we incorporated a guided attention mechanism by forcing an L-1 loss between a specifically designed binary mask and the generated attention maps. Our ablation study results on the WVU-OCT data-set in five-fold cross-validation (5-CV) suggest that the proposed MultiSDGAN with a serial attention module provides the most competitive performance, and guiding the spatial attention feature maps by binary masks further improves the performance in our proposed network. Comparing the baseline model with adding the guided-attention, our results demonstrated relative improvements of 21.44% and 19.45% on the Dice coefficient and SSIM, respectively. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: 5 pages,conference

arXiv:2112.05827 [pdf, other]

Quality-Aware Multimodal Biometric Recognition

Authors: Sobhan Soleymani, Ali Dabouei, Fariborz Taherkhani, Seyed Mehdi Iranmanesh, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: We present a quality-aware multimodal recognition framework that combines representations from multiple biometric traits with varying quality and number of samples to achieve increased recognition accuracy by extracting complimentary identification information based on the quality of the samples. We develop a quality-aware framework for fusing representations of input modalities by weighting their… ▽ More We present a quality-aware multimodal recognition framework that combines representations from multiple biometric traits with varying quality and number of samples to achieve increased recognition accuracy by extracting complimentary identification information based on the quality of the samples. We develop a quality-aware framework for fusing representations of input modalities by weighting their importance using quality scores estimated in a weakly-supervised fashion. This framework utilizes two fusion blocks, each represented by a set of quality-aware and aggregation networks. In addition to architecture modifications, we propose two task-specific loss functions: multimodal separability loss and multimodal compactness loss. The first loss assures that the representations of modalities for a class have comparable magnitudes to provide a better quality estimation, while the multimodal representations of different classes are distributed to achieve maximum discrimination in the embedding space. The second loss, which is considered to regularize the network weights, improves the generalization performance by regularizing the framework. We evaluate the performance by considering three multimodal datasets consisting of face, iris, and fingerprint modalities. The efficacy of the framework is demonstrated through comparison with the state-of-the-art algorithms. In particular, our framework outperforms the rank- and score-level fusion of modalities of BIOMDATA by more than 30% for true acceptance rate at false acceptance rate of $10^{-4}$. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Comments: IEEE Transactions on Biometrics, Behavior, and Identity Science

arXiv:2111.14943 [pdf, other]

Morph Detection Enhanced by Structured Group Sparsity

Authors: Poorya Aghdaie, Baaria Chaudhary, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: In this paper, we consider the challenge of face morphing attacks, which substantially undermine the integrity of face recognition systems such as those adopted for use in border protection agencies. Morph detection can be formulated as extracting fine-grained representations, where local discriminative features are harnessed for learning a hypothesis. To acquire discriminative features at differe… ▽ More In this paper, we consider the challenge of face morphing attacks, which substantially undermine the integrity of face recognition systems such as those adopted for use in border protection agencies. Morph detection can be formulated as extracting fine-grained representations, where local discriminative features are harnessed for learning a hypothesis. To acquire discriminative features at different granularity as well as a decoupled spectral information, we leverage wavelet domain analysis to gain insight into the spatial-frequency content of a morphed face. As such, instead of using images in the RGB domain, we decompose every image into its wavelet sub-bands using 2D wavelet decomposition and a deep supervised feature selection scheme is employed to find the most discriminative wavelet sub-bands of input images. To this end, we train a Deep Neural Network (DNN) morph detector using the decomposed wavelet sub-bands of the morphed and bona fide images. In the training phase, our structured group sparsity-constrained DNN picks the most discriminative wavelet sub-bands out of all the sub-bands, with which we retrain our DNN, resulting in a precise detection of morphed images when inference is achieved on a probe image. The efficacy of our deep morph detector which is enhanced by structured group lasso is validated through experiments on three facial morph image databases, i.e., VISAPP17, LMA, and MorGAN. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.02692 [pdf, other]

Human Age Estimation from Gene Expression Data using Artificial Neural Networks

Authors: Salman Mohamadi, Gianfranco. Doretto, Nasser M. Nasrabadi, Donald A. Adjeroh

Abstract: The study of signatures of aging in terms of genomic biomarkers can be uniquely helpful in understanding the mechanisms of aging and developing models to accurately predict the age. Prior studies have employed gene expression and DNA methylation data aiming at accurate prediction of age. In this line, we propose a new framework for human age estimation using information from human dermal fibroblas… ▽ More The study of signatures of aging in terms of genomic biomarkers can be uniquely helpful in understanding the mechanisms of aging and developing models to accurately predict the age. Prior studies have employed gene expression and DNA methylation data aiming at accurate prediction of age. In this line, we propose a new framework for human age estimation using information from human dermal fibroblast gene expression data. First, we propose a new spatial representation as well as a data augmentation approach for gene expression data. Next in order to predict the age, we design an architecture of neural network and apply it to this new representation of the original and augmented data, as an ensemble classification approach. Our experimental results suggest the superiority of the proposed framework over state-of-the-art age estimation methods using DNA methylation and gene expression data. △ Less

Submitted 4 November, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

Comments: 8 pages, 5 figures, This paper is accepted to 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

arXiv:2111.01965 [pdf]

Adversarially Perturbed Wavelet-based Morphed Face Generation

Authors: Kelsey O'Haire, Sobhan Soleymani, Baaria Chaudhary, Poorya Aghdaie, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: Morphing is the process of combining two or more subjects in an image in order to create a new identity which contains features of both individuals. Morphed images can fool Facial Recognition Systems (FRS) into falsely accepting multiple people, leading to failures in national security. As morphed image synthesis becomes easier, it is vital to expand the research community's available data to help… ▽ More Morphing is the process of combining two or more subjects in an image in order to create a new identity which contains features of both individuals. Morphed images can fool Facial Recognition Systems (FRS) into falsely accepting multiple people, leading to failures in national security. As morphed image synthesis becomes easier, it is vital to expand the research community's available data to help combat this dilemma. In this paper, we explore combination of two methods for morphed image generation, those of geometric transformation (warping and blending to create morphed images) and photometric perturbation. We leverage both methods to generate high-quality adversarially perturbed morphs from the FERET, FRGC, and FRLL datasets. The final images retain high similarity to both input subjects while resulting in minimal artifacts in the visual domain. Images are synthesized by fusing the wavelet sub-bands from the two look-alike subjects, and then adversarially perturbed to create highly convincing imagery to deceive both humans and deep morph detectors. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2111.01325 [pdf, other]

Attribute-Based Deep Periocular Recognition: Leveraging Soft Biometrics to Improve Periocular Recognition

Authors: Veeru Talreja, Nasser M. Nasrabadi, Matthew C. Valenti

Abstract: In recent years, periocular recognition has been developed as a valuable biometric identification approach, especially in wild environments (for example, masked faces due to COVID-19 pandemic) where facial recognition may not be applicable. This paper presents a new deep periocular recognition framework called attribute-based deep periocular recognition (ADPR), which predicts soft biometrics and i… ▽ More In recent years, periocular recognition has been developed as a valuable biometric identification approach, especially in wild environments (for example, masked faces due to COVID-19 pandemic) where facial recognition may not be applicable. This paper presents a new deep periocular recognition framework called attribute-based deep periocular recognition (ADPR), which predicts soft biometrics and incorporates the prediction into a periocular recognition algorithm to determine identity from periocular images with high accuracy. We propose an end-to-end framework, which uses several shared convolutional neural network (CNN)layers (a common network) whose output feeds two separate dedicated branches (modality dedicated layers); the first branch classifies periocular images while the second branch predicts softn biometrics. Next, the features from these two branches are fused together for a final periocular recognition. The proposed method is different from existing methods as it not only uses a shared CNN feature space to train these two tasks jointly, but it also fuses predicted soft biometric features with the periocular features in the training step to improve the overall periocular recognition performance. Our proposed model is extensively evaluated using four different publicly available datasets. Experimental results indicate that our soft biometric based periocular recognition approach outperforms other state-of-the-art methods for periocular recognition in wild environments. △ Less

Submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted to be published in WACV 2022

arXiv:2110.12338 [pdf, other]

Quality Map Fusion for Adversarial Learning

Authors: Uche Osahor, Nasser M. Nasrabadi

Abstract: Generative adversarial models that capture salient low-level features which convey visual information in correlation with the human visual system (HVS) still suffer from perceptible image degradations. The inability to convey such highly informative features can be attributed to mode collapse, convergence failure and vanishing gradients. In this paper, we improve image quality adversarially by int… ▽ More Generative adversarial models that capture salient low-level features which convey visual information in correlation with the human visual system (HVS) still suffer from perceptible image degradations. The inability to convey such highly informative features can be attributed to mode collapse, convergence failure and vanishing gradients. In this paper, we improve image quality adversarially by introducing a novel quality map fusion technique that harnesses image features similar to the HVS and the perceptual properties of a deep convolutional neural network (DCNN). We extend the widely adopted l2 Wasserstein distance metric to other preferable quality norms derived from Banach spaces that capture richer image properties like structure, luminance, contrast and the naturalness of images. We also show that incorporating a perceptual attention mechanism (PAM) that extracts global feature embeddings from the network bottleneck with aggregated perceptual maps derived from standard image quality metrics translate to a better image quality. We also demonstrate impressive performance over other methods. △ Less

Submitted 23 October, 2021; originally announced October 2021.

arXiv:2110.09374 [pdf, other]

Ortho-Shot: Low Displacement Rank Regularization with Data Augmentation for Few-Shot Learning

Authors: Uche Osahor, Nasser M. Nasrabadi

Abstract: In few-shot classification, the primary goal is to learn representations from a few samples that generalize well for novel classes. In this paper, we propose an efficient low displacement rank (LDR) regularization strategy termed Ortho-Shot; a technique that imposes orthogonal regularization on the convolutional layers of a few-shot classifier, which is based on the doubly-block toeplitz (DBT) mat… ▽ More In few-shot classification, the primary goal is to learn representations from a few samples that generalize well for novel classes. In this paper, we propose an efficient low displacement rank (LDR) regularization strategy termed Ortho-Shot; a technique that imposes orthogonal regularization on the convolutional layers of a few-shot classifier, which is based on the doubly-block toeplitz (DBT) matrix structure. The regularized convolutional layers of the few-shot classifier enhances model generalization and intra-class feature embeddings that are crucial for few-shot learning. Overfitting is a typical issue for few-shot models, the lack of data diversity inhibits proper model inference which weakens the classification accuracy of few-shot learners to novel classes. In this regard, we broke down the pipeline of the few-shot classifier and established that the support, query and task data augmentation collectively alleviates overfitting in networks. With compelling results, we demonstrated that combining a DBT-based low-rank orthogonal regularizer with data augmentation strategies, significantly boosts the performance of a few-shot classifier. We perform our experiments on the miniImagenet, CIFAR-FS and Stanford datasets with performance values of about 5\% when compared to state-of-the-art △ Less

Submitted 18 October, 2021; originally announced October 2021.

arXiv:2108.05948

Deep adversarial attack on target detection systems

Authors: Uche M. Osahor, Nasser M. Nasrabadi

Abstract: Target detection systems identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. Deep Convolutional Neural Network (DCNN) classifiers have proven to be successful tools for computer vision applications. However,prior research confirms that even state of the art classifi… ▽ More Target detection systems identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. Deep Convolutional Neural Network (DCNN) classifiers have proven to be successful tools for computer vision applications. However,prior research confirms that even state of the art classifier models are susceptible to adversarial attacks. In this paper, we show how to generate adversarial infrared images by adding small perturbations to the targets region to deceive a DCNN-based target detector at remarkable levels. We demonstrate significant progress in developing visually imperceptible adversarial infrared images where the targets are visually recognizable by an expert but a DCNN-based target detector cannot detect the targets in the image. △ Less

Submitted 29 August, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: Trying to improve the paper

arXiv:2108.04353 [pdf, other]

Tasks Structure Regularization in Multi-Task Learning for Improving Facial Attribute Prediction

Authors: Fariborz Taherkhani, Ali Dabouei, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: The great success of Convolutional Neural Networks (CNN) for facial attribute prediction relies on a large amount of labeled images. Facial image datasets are usually annotated by some commonly used attributes (e.g., gender), while labels for the other attributes (e.g., big nose) are limited which causes their prediction challenging. To address this problem, we use a new Multi-Task Learning (MTL)… ▽ More The great success of Convolutional Neural Networks (CNN) for facial attribute prediction relies on a large amount of labeled images. Facial image datasets are usually annotated by some commonly used attributes (e.g., gender), while labels for the other attributes (e.g., big nose) are limited which causes their prediction challenging. To address this problem, we use a new Multi-Task Learning (MTL) paradigm in which a facial attribute predictor uses the knowledge of other related attributes to obtain a better generalization performance. Here, we leverage MLT paradigm in two problem settings. First, it is assumed that the structure of the tasks (e.g., grouping pattern of facial attributes) is known as a prior knowledge, and parameters of the tasks (i.e., predictors) within the same group are represented by a linear combination of a limited number of underlying basis tasks. Here, a sparsity constraint on the coefficients of this linear combination is also considered such that each task is represented in a more structured and simpler manner. Second, it is assumed that the structure of the tasks is unknown, and then structure and parameters of the tasks are learned jointly by using a Laplacian regularization framework. Our MTL methods are compared with competing methods for facial attribute prediction to show its effectiveness. △ Less

Submitted 17 August, 2021; v1 submitted 29 July, 2021; originally announced August 2021.

arXiv:2108.04352 [pdf, other]

Attribute Guided Sparse Tensor-Based Model for Person Re-Identification

Authors: Fariborz Taherkhani, Ali Dabouei, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: Visual perception of a person is easily influenced by many factors such as camera parameters, pose and viewpoint variations. These variations make person Re-Identification (ReID) a challenging problem. Nevertheless, human attributes usually stand as robust visual properties to such variations. In this paper, we propose a new method to leverage features from human attributes for person ReID. Our mo… ▽ More Visual perception of a person is easily influenced by many factors such as camera parameters, pose and viewpoint variations. These variations make person Re-Identification (ReID) a challenging problem. Nevertheless, human attributes usually stand as robust visual properties to such variations. In this paper, we propose a new method to leverage features from human attributes for person ReID. Our model uses a tensor to non-linearly fuse identity and attribute features, and then forces the parameters of the tensor in the loss function to generate discriminative fused features for ReID. Since tensor-based methods usually contain a large number of parameters, training all of these parameters becomes very slow, and the chance of overfitting increases as well. To address this issue, we propose two new techniques based on Structural Sparsity Learning (SSL) and Tensor Decomposition (TD) methods to create an accurate and stable learning problem. We conducted experiments on several standard pedestrian datasets, and experimental results indicate that our tensor-based approach significantly improves person ReID baselines and also outperforms state of the art methods. △ Less

Submitted 29 July, 2021; originally announced August 2021.

arXiv:2108.01569 [pdf, ps, other]

Deep GAN-Based Cross-Spectral Cross-Resolution Iris Recognition

Authors: Moktari Mostofa, Salman Mohamadi, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: In recent years, cross-spectral iris recognition has emerged as a promising biometric approach to establish the identity of individuals. However, matching iris images acquired at different spectral bands (i.e., matching a visible (VIS) iris probe to a gallery of near-infrared (NIR) iris images or vice versa) shows a significant performance degradation when compared to intraband NIR matching. Hence… ▽ More In recent years, cross-spectral iris recognition has emerged as a promising biometric approach to establish the identity of individuals. However, matching iris images acquired at different spectral bands (i.e., matching a visible (VIS) iris probe to a gallery of near-infrared (NIR) iris images or vice versa) shows a significant performance degradation when compared to intraband NIR matching. Hence, in this paper, we have investigated a range of deep convolutional generative adversarial network (DCGAN) architectures to further improve the accuracy of cross-spectral iris recognition methods. Moreover, unlike the existing works in the literature, we introduce a resolution difference into the classical cross-spectral matching problem domain. We have developed two different techniques using the conditional generative adversarial network (cGAN) as a backbone architecture for cross-spectral iris matching. In the first approach, we simultaneously address the cross-resolution and cross-spectral matching problem by training a cGAN that jointly translates cross-resolution as well as cross-spectral tasks to the same resolution and within the same spectrum. In the second approach, we design a coupled generative adversarial network (cpGAN) architecture consisting of a pair of cGAN modules that project the VIS and NIR iris images into a low-dimensional embedding domain to ensure maximum pairwise similarity between the feature vectors from the two iris modalities of the same subject. △ Less

Submitted 3 August, 2021; originally announced August 2021.

Comments: 20 pages, 12 figures, Accepted to IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE (T-BIOM). arXiv admin note: text overlap with arXiv:2010.11689

arXiv:2107.13742 [pdf, other]

Profile to Frontal Face Recognition in the Wild Using Coupled Conditional GAN

Authors: Fariborz Taherkhani, Veeru Talreja, Jeremy Dawson, Matthew C. Valenti, Nasser M. Nasrabadi

Abstract: In recent years, with the advent of deep-learning, face recognition has achieved exceptional success. However, many of these deep face recognition models perform much better in handling frontal faces compared to profile faces. The major reason for poor performance in handling of profile faces is that it is inherently difficult to learn pose-invariant deep representations that are useful for profil… ▽ More In recent years, with the advent of deep-learning, face recognition has achieved exceptional success. However, many of these deep face recognition models perform much better in handling frontal faces compared to profile faces. The major reason for poor performance in handling of profile faces is that it is inherently difficult to learn pose-invariant deep representations that are useful for profile face recognition. In this paper, we hypothesize that the profile face domain possesses a latent connection with the frontal face domain in a latent feature subspace. We look to exploit this latent connection by projecting the profile faces and frontal faces into a common latent subspace and perform verification or retrieval in the latent domain. We leverage a coupled conditional generative adversarial network (cpGAN) structure to find the hidden relationship between the profile and frontal images in a latent common embedding subspace. Specifically, the cpGAN framework consists of two conditional GAN-based sub-networks, one dedicated to the frontal domain and the other dedicated to the profile domain. Each sub-network tends to find a projection that maximizes the pair-wise correlation between the two feature domains in a common embedding feature subspace. The efficacy of our approach compared with the state-of-the-art is demonstrated using the CFP, CMU Multi-PIE, IJB-A, and IJB-C datasets. Additionally, we have also implemented a coupled convolutional neural network (cpCNN) and an adversarial discriminative domain adaptation network (ADDA) for profile to frontal face recognition. We have evaluated the performance of cpCNN and ADDA and compared it with the proposed cpGAN. Finally, we have also evaluated our cpGAN for reconstruction of frontal faces from input profile faces contained in the VGGFace2 dataset. △ Less

Submitted 29 July, 2021; originally announced July 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2005.02166

arXiv:2106.15686 [pdf, other]

Attention Aware Wavelet-based Detection of Morphed Face Images

Authors: Poorya Aghdaie, Baaria Chaudhary, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: Morphed images have exploited loopholes in the face recognition checkpoints, e.g., Credential Authentication Technology (CAT), used by Transportation Security Administration (TSA), which is a non-trivial security concern. To overcome the risks incurred due to morphed presentations, we propose a wavelet-based morph detection methodology which adopts an end-to-end trainable soft attention mechanism… ▽ More Morphed images have exploited loopholes in the face recognition checkpoints, e.g., Credential Authentication Technology (CAT), used by Transportation Security Administration (TSA), which is a non-trivial security concern. To overcome the risks incurred due to morphed presentations, we propose a wavelet-based morph detection methodology which adopts an end-to-end trainable soft attention mechanism . Our attention-based deep neural network (DNN) focuses on the salient Regions of Interest (ROI) which have the most spatial support for morph detector decision function, i.e, morph class binary softmax output. A retrospective of morph synthesizing procedure aids us to speculate the ROI as regions around facial landmarks , particularly for the case of landmark-based morphing techniques. Moreover, our attention-based DNN is adapted to the wavelet space, where inputs of the network are coarse-to-fine spectral representations, 48 stacked wavelet sub-bands to be exact. We evaluate performance of the proposed framework using three datasets, VISAPP17, LMA, and MorGAN. In addition, as attention maps can be a robust indicator whether a probe image under investigation is genuine or counterfeit, we analyze the estimated attention maps for both a bona fide image and its corresponding morphed image. Finally, we present an ablation study on the efficacy of utilizing attention mechanism for the sake of morph detection. △ Less

Submitted 22 July, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

Comments: IJCB 2021

arXiv:2106.13178 [pdf, other]

Differential Morph Face Detection using Discriminative Wavelet Sub-bands

Authors: Baaria Chaudhary, Poorya Aghdaie, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: Face recognition systems are extremely vulnerable to morphing attacks, in which a morphed facial reference image can be successfully verified as two or more distinct identities. In this paper, we propose a morph attack detection algorithm that leverages an undecimated 2D Discrete Wavelet Transform (DWT) for identifying morphed face images. The core of our framework is that artifacts resulting from… ▽ More Face recognition systems are extremely vulnerable to morphing attacks, in which a morphed facial reference image can be successfully verified as two or more distinct identities. In this paper, we propose a morph attack detection algorithm that leverages an undecimated 2D Discrete Wavelet Transform (DWT) for identifying morphed face images. The core of our framework is that artifacts resulting from the morphing process that are not discernible in the image domain can be more easily identified in the spatial frequency domain. A discriminative wavelet sub-band can accentuate the disparity between a real and a morphed image. To this end, multi-level DWT is applied to all images, yielding 48 mid and high-frequency sub-bands each. The entropy distributions for each sub-band are calculated separately for both bona fide and morph images. For some of the sub-bands, there is a marked difference between the entropy of the sub-band in a bona fide image and the identical sub-band's entropy in a morphed image. Consequently, we employ Kullback-Liebler Divergence (KLD) to exploit these differences and isolate the sub-bands that are the most discriminative. We measure how discriminative a sub-band is by its KLD value and the 22 sub-bands with the highest KLD values are chosen for network training. Then, we train a deep Siamese neural network using these 22 selected sub-bands for differential morph attack detection. We examine the efficacy of discriminative wavelet sub-bands for morph attack detection and show that a deep neural network trained on these sub-bands can accurately identify morph imagery. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: CVPR Biometrics Workshop 2021

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 1425-1434

arXiv:2106.11354 [pdf, other]

FDeblur-GAN: Fingerprint Deblurring using Generative Adversarial Network

Authors: Amol S. Joshi, Ali Dabouei, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: While working with fingerprint images acquired from crime scenes, mobile cameras, or low-quality sensors, it becomes difficult for automated identification systems to verify the identity due to image blur and distortion. We propose a fingerprint deblurring model FDeblur-GAN, based on the conditional Generative Adversarial Networks (cGANs) and multi-stage framework of the stack GAN. Additionally, w… ▽ More While working with fingerprint images acquired from crime scenes, mobile cameras, or low-quality sensors, it becomes difficult for automated identification systems to verify the identity due to image blur and distortion. We propose a fingerprint deblurring model FDeblur-GAN, based on the conditional Generative Adversarial Networks (cGANs) and multi-stage framework of the stack GAN. Additionally, we integrate two auxiliary sub-networks into the model for the deblurring task. The first sub-network is a ridge extractor model. It is added to generate ridge maps to ensure that fingerprint information and minutiae are preserved in the deblurring process and prevent the model from generating erroneous minutiae. The second sub-network is a verifier that helps the generator to preserve the ID information during the generation process. Using a database of blurred fingerprints and corresponding ridge maps, the deep network learns to deblur from the input blurry samples. We evaluate the proposed method in combination with two different fingerprint matching algorithms. We achieved an accuracy of 95.18% on our fingerprint database for the task of matching deblurred and ground truth fingerprints. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: 8 Pages, Accepted in IJCB Conference

arXiv:2106.08565 [pdf, other]

Detection of Morphed Face Images Using Discriminative Wavelet Sub-bands

Authors: Poorya Aghdaie, Baaria Chaudhary, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: This work investigates the well-known problem of morphing attacks, which has drawn considerable attention in the biometrics community. Morphed images have exposed face recognition systems' susceptibility to false acceptance, resulting in dire consequences, especially for national security applications. To detect morphing attacks, we propose a method which is based on a discriminative 2D Discrete W… ▽ More This work investigates the well-known problem of morphing attacks, which has drawn considerable attention in the biometrics community. Morphed images have exposed face recognition systems' susceptibility to false acceptance, resulting in dire consequences, especially for national security applications. To detect morphing attacks, we propose a method which is based on a discriminative 2D Discrete Wavelet Transform (2D-DWT). A discriminative wavelet sub-band can highlight inconsistencies between a real and a morphed image. We observe that there is a salient discrepancy between the entropy of a given sub-band in a bona fide image, and the same sub-band's entropy in a morphed sample. Considering this dissimilarity between these two entropy values, we find the Kullback-Leibler divergence between the two distributions, namely the entropy of the bona fide and the corresponding morphed images. The most discriminative wavelet sub-bands are those with the highest corresponding KL-divergence values. Accordingly, 22 sub-bands are selected as the most discriminative ones in terms of morph detection. We show that a Deep Neural Network (DNN) trained on the 22 discriminative sub-bands can detect morphed samples precisely. Most importantly, the effectiveness of our algorithm is validated through experiments on three datasets: VISAPP17, LMA, and MorGAN. We also performed an ablation study on the sub-band selection. △ Less

Submitted 16 June, 2021; originally announced June 2021.

arXiv:2102.03710 [pdf, other]

HGAN: Hybrid Generative Adversarial Network

Authors: Seyed Mehdi Iranmanesh, Nasser M. Nasrabadi

Abstract: In this paper, we present a simple approach to train Generative Adversarial Networks (GANs) in order to avoid a \textit {mode collapse} issue. Implicit models such as GANs tend to generate better samples compared to explicit models that are trained on tractable data likelihood. However, GANs overlook the explicit data density characteristics which leads to undesirable quantitative evaluations and… ▽ More In this paper, we present a simple approach to train Generative Adversarial Networks (GANs) in order to avoid a \textit {mode collapse} issue. Implicit models such as GANs tend to generate better samples compared to explicit models that are trained on tractable data likelihood. However, GANs overlook the explicit data density characteristics which leads to undesirable quantitative evaluations and mode collapse. To bridge this gap, we propose a hybrid generative adversarial network (HGAN) for which we can enforce data density estimation via an autoregressive model and support both adversarial and likelihood framework in a joint training manner which diversify the estimated density in order to cover different modes. We propose to use an adversarial network to \textit {transfer knowledge} from an autoregressive model (teacher) to the generator (student) of a GAN model. A novel deep architecture within the GAN formulation is developed to adversarially distill the autoregressive model information in addition to simple GAN training approach. We conduct extensive experiments on real-world datasets (i.e., MNIST, CIFAR-10, STL-10) to demonstrate the effectiveness of the proposed HGAN under qualitative and quantitative evaluations. The experimental results show the superiority and competitiveness of our method compared to the baselines. △ Less

Submitted 6 February, 2021; originally announced February 2021.

arXiv:2101.08827 [pdf, other]

doi 10.1109/LGRS.2021.3049711

Unsupervised Pixel-wise Hyperspectral Anomaly Detection via Autoencoding Adversarial Networks

Authors: Sertac Arisoy, Nasser M. Nasrabadi, Koray Kayabol

Abstract: We propose a completely unsupervised pixel-wise anomaly detection method for hyperspectral images. The proposed method consists of three steps called data preparation, reconstruction, and detection. In the data preparation step, we apply a background purification to train the deep network in an unsupervised manner. In the reconstruction step, we propose to use three different deep autoencoding adv… ▽ More We propose a completely unsupervised pixel-wise anomaly detection method for hyperspectral images. The proposed method consists of three steps called data preparation, reconstruction, and detection. In the data preparation step, we apply a background purification to train the deep network in an unsupervised manner. In the reconstruction step, we propose to use three different deep autoencoding adversarial network (AEAN) models including 1D-AEAN, 2D-AEAN, and 3D-AEAN which are developed for working on spectral, spatial, and joint spectral-spatial domains, respectively. The goal of the AEAN models is to generate synthesized hyperspectral images (HSIs) which are close to real ones. A reconstruction error map (REM) is calculated between the original and the synthesized image pixels. In the detection step, we propose to use a WRX-based detector in which the pixel weights are obtained according to REM. We compare our proposed method with the classical RX, WRX, support vector data description-based (SVDD), collaborative representation-based detector (CRD), adaptive weight deep belief network (AW-DBN) detector and deep autoencoder anomaly detection (DAEAD) method on real hyperspectral datasets. The experimental results show that the proposed approach outperforms other detectors in the benchmark. △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:2101.02637 [pdf, other]

A Large-Scale, Time-Synchronized Visible and Thermal Face Dataset

Authors: Domenick Poster, Matthew Thielke, Robert Nguyen, Srinivasan Rajaraman, Xing Di, Cedric Nimpa Fondje, Vishal M. Patel, Nathaniel J. Short, Benjamin S. Riggan, Nasser M. Nasrabadi, Shuowen Hu

Abstract: Thermal face imagery, which captures the naturally emitted heat from the face, is limited in availability compared to face imagery in the visible spectrum. To help address this scarcity of thermal face imagery for research and algorithm development, we present the DEVCOM Army Research Laboratory Visible-Thermal Face Dataset (ARL-VTF). With over 500,000 images from 395 subjects, the ARL-VTF dataset… ▽ More Thermal face imagery, which captures the naturally emitted heat from the face, is limited in availability compared to face imagery in the visible spectrum. To help address this scarcity of thermal face imagery for research and algorithm development, we present the DEVCOM Army Research Laboratory Visible-Thermal Face Dataset (ARL-VTF). With over 500,000 images from 395 subjects, the ARL-VTF dataset represents, to the best of our knowledge, the largest collection of paired visible and thermal face images to date. The data was captured using a modern long wave infrared (LWIR) camera mounted alongside a stereo setup of three visible spectrum cameras. Variability in expressions, pose, and eyewear has been systematically recorded. The dataset has been curated with extensive annotations, metadata, and standardized protocols for evaluation. Furthermore, this paper presents extensive benchmark results and analysis on thermal face landmark detection and thermal-to-visible face verification by evaluating state-of-the-art models on the ARL-VTF dataset. △ Less

Submitted 7 January, 2021; originally announced January 2021.

arXiv:2012.05959 [pdf, other]

Super-resolution Guided Pore Detection for Fingerprint Recognition

Authors: Syeda Nyma Ferdous, Ali Dabouei, Jeremy Dawson, Nasser M Nasrabadi

Abstract: Performance of fingerprint recognition algorithms substantially rely on fine features extracted from fingerprints. Apart from minutiae and ridge patterns, pore features have proven to be usable for fingerprint recognition. Although features from minutiae and ridge patterns are quite attainable from low-resolution images, using pore features is practical only if the fingerprint image is of high res… ▽ More Performance of fingerprint recognition algorithms substantially rely on fine features extracted from fingerprints. Apart from minutiae and ridge patterns, pore features have proven to be usable for fingerprint recognition. Although features from minutiae and ridge patterns are quite attainable from low-resolution images, using pore features is practical only if the fingerprint image is of high resolution which necessitates a model that enhances the image quality of the conventional 500 ppi legacy fingerprints preserving the fine details. To find a solution for recovering pore information from low-resolution fingerprints, we adopt a joint learning-based approach that combines both super-resolution and pore detection networks. Our modified single image Super-Resolution Generative Adversarial Network (SRGAN) framework helps to reliably reconstruct high-resolution fingerprint samples from low-resolution ones assisting the pore detection network to identify pores with a high accuracy. The network jointly learns a distinctive feature representation from a real low-resolution fingerprint sample and successfully synthesizes a high-resolution sample from it. To add discriminative information and uniqueness for all the subjects, we have integrated features extracted from a deep fingerprint verifier with the SRGAN quality discriminator. We also add ridge reconstruction loss, utilizing ridge patterns to make the best use of extracted features. Our proposed method solves the recognition problem by improving the quality of fingerprint images. High recognition accuracy of the synthesized samples that is close to the accuracy achieved using the original high-resolution images validate the effectiveness of our proposed model. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Report number: SR:NEW01

Journal ref: ICPR: International Conference on Pattern Recognition 2021

arXiv:2012.03790 [pdf, other]

Matching Distributions via Optimal Transport for Semi-Supervised Learning

Authors: Fariborz Taherkhani, Hadi Kazemi, Ali Dabouei, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data when there is not a sufficient amount of labeled data available over the course of training. SSL methods based on Convolutional Neural Networks (CNNs) have recently provided successful results on standard benchmark tasks such as image classification. In this work, we consider the general se… ▽ More Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data when there is not a sufficient amount of labeled data available over the course of training. SSL methods based on Convolutional Neural Networks (CNNs) have recently provided successful results on standard benchmark tasks such as image classification. In this work, we consider the general setting of SSL problem where the labeled and unlabeled data come from the same underlying probability distribution. We propose a new approach that adopts an Optimal Transport (OT) technique serving as a metric of similarity between discrete empirical probability measures to provide pseudo-labels for the unlabeled data, which can then be used in conjunction with the initial labeled data to train the CNN model in an SSL manner. We have evaluated and compared our proposed method with state-of-the-art SSL algorithms on standard datasets to demonstrate the superiority and effectiveness of our SSL algorithm. △ Less

Submitted 21 October, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2012.01542 [pdf, other]

Mutual Information Maximization on Disentangled Representations for Differential Morph Detection

Authors: Sobhan Soleymani, Ali Dabouei, Fariborz Taherkhani, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: In this paper, we present a novel differential morph detection framework, utilizing landmark and appearance disentanglement. In our framework, the face image is represented in the embedding domain using two disentangled but complementary representations. The network is trained by triplets of face images, in which the intermediate image inherits the landmarks from one image and the appearance from… ▽ More In this paper, we present a novel differential morph detection framework, utilizing landmark and appearance disentanglement. In our framework, the face image is represented in the embedding domain using two disentangled but complementary representations. The network is trained by triplets of face images, in which the intermediate image inherits the landmarks from one image and the appearance from the other image. This initially trained network is further trained for each dataset using contrastive representations. We demonstrate that, by employing appearance and landmark disentanglement, the proposed framework can provide state-of-the-art differential morph detection performance. This functionality is achieved by the using distances in landmark, appearance, and ID domains. The performance of the proposed framework is evaluated using three morph datasets generated with different methodologies. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: IEEE Winter Conference on Applications of Computer Vision (WACV 2021)

arXiv:2012.01541 [pdf, other]

Differential Morphed Face Detection Using Deep Siamese Networks

Authors: Sobhan Soleymani, Baaria Chaudhary, Ali Dabouei, Jeremy Dawson, Nasser M. Nasrabadi

Abstract: Although biometric facial recognition systems are fast becoming part of security applications, these systems are still vulnerable to morphing attacks, in which a facial reference image can be verified as two or more separate identities. In border control scenarios, a successful morphing attack allows two or more people to use the same passport to cross borders. In this paper, we propose a novel di… ▽ More Although biometric facial recognition systems are fast becoming part of security applications, these systems are still vulnerable to morphing attacks, in which a facial reference image can be verified as two or more separate identities. In border control scenarios, a successful morphing attack allows two or more people to use the same passport to cross borders. In this paper, we propose a novel differential morph attack detection framework using a deep Siamese network. To the best of our knowledge, this is the first research work that makes use of a Siamese network architecture for morph attack detection. We compare our model with other classical and deep learning models using two distinct morph datasets, VISAPP17 and MorGAN. We explore the embedding space generated by the contrastive loss using three decision making frameworks using Euclidean distance, feature difference and a support vector machine classifier, and feature concatenation and a support vector machine classifier. △ Less

Submitted 4 December, 2020; v1 submitted 2 December, 2020; originally announced December 2020.

Comments: MultiMedia FORensics in the WILD (MMForWILD 2020)

Showing 1–50 of 102 results for author: Nasrabadi, N M