Search | arXiv e-print repository

AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

Authors: Junho Park, Kyeongbo Kong, Suk-Ju Kang

Abstract: Recently, there has been a significant amount of research conducted on 3D hand reconstruction to use various forms of human-computer interaction. However, 3D hand reconstruction in the wild is challenging due to extreme lack of in-the-wild 3D hand datasets. Especially, when hands are in complex pose such as interacting hands, the problems like appearance similarity, self-handed occclusion and dept… ▽ More Recently, there has been a significant amount of research conducted on 3D hand reconstruction to use various forms of human-computer interaction. However, 3D hand reconstruction in the wild is challenging due to extreme lack of in-the-wild 3D hand datasets. Especially, when hands are in complex pose such as interacting hands, the problems like appearance similarity, self-handed occclusion and depth ambiguity make it more difficult. To overcome these issues, we propose AttentionHand, a novel method for text-driven controllable hand image generation. Since AttentionHand can generate various and numerous in-the-wild hand images well-aligned with 3D hand label, we can acquire a new 3D hand dataset, and can relieve the domain gap between indoor and outdoor scenes. Our method needs easy-to-use four modalities (i.e, an RGB image, a hand mesh image from 3D label, a bounding box, and a text prompt). These modalities are embedded into the latent space by the encoding phase. Then, through the text attention stage, hand-related tokens from the given text prompt are attended to highlight hand-related regions of the latent embedding. After the highlighted embedding is fed to the visual attention stage, hand-related regions in the embedding are attended by conditioning global and local hand mesh images with the diffusion-based pipeline. In the decoding phase, the final feature is decoded to new hand images, which are well-aligned with the given hand mesh image and text prompt. As a result, AttentionHand achieved state-of-the-art among text-to-hand image generation models, and the performance of 3D hand mesh reconstruction was improved by additionally training with hand images generated by AttentionHand. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.17261 [pdf, other]

Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation

Authors: Hyunwoo Yu, Yubin Cho, Beoungwoo Kang, Seunghun Moon, Kyeongbo Kong, Suk-Ju Kang

Abstract: We present an Encoder-Decoder Attention Transformer, EDAFormer, which consists of the Embedding-Free Transformer (EFT) encoder and the all-attention decoder leveraging our Embedding-Free Attention (EFA) structure. The proposed EFA is a novel global context modeling mechanism that focuses on functioning the global non-linearity, not the specific roles of the query, key and value. For the decoder, w… ▽ More We present an Encoder-Decoder Attention Transformer, EDAFormer, which consists of the Embedding-Free Transformer (EFT) encoder and the all-attention decoder leveraging our Embedding-Free Attention (EFA) structure. The proposed EFA is a novel global context modeling mechanism that focuses on functioning the global non-linearity, not the specific roles of the query, key and value. For the decoder, we explore the optimized structure for considering the globality, which can improve the semantic segmentation performance. In addition, we propose a novel Inference Spatial Reduction (ISR) method for the computational efficiency. Different from the previous spatial reduction attention methods, our ISR method further reduces the key-value resolution at the inference phase, which can mitigate the computation-performance trade-off gap for the efficient semantic segmentation. Our EDAFormer shows the state-of-the-art performance with the efficient computation compared to the existing transformer-based semantic segmentation models in three public benchmarks, including ADE20K, Cityscapes and COCO-Stuff. Furthermore, our ISR method reduces the computational cost by up to 61% with minimal mIoU performance degradation on Cityscapes dataset. The code is available at https://github.com/hyunwoo137/EDAFormer. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2405.10284 [pdf, other]

doi 10.3390/axioms13050323

Quantum Vision Transformers for Quark-Gluon Classification

Authors: Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu

Abstract: We introduce a hybrid quantum-classical vision transformer architecture, notable for its integration of variational quantum circuits within both the attention mechanism and the multi-layer perceptrons. The research addresses the critical challenge of computational efficiency and resource constraints in analyzing data from the upcoming High Luminosity Large Hadron Collider, presenting the architect… ▽ More We introduce a hybrid quantum-classical vision transformer architecture, notable for its integration of variational quantum circuits within both the attention mechanism and the multi-layer perceptrons. The research addresses the critical challenge of computational efficiency and resource constraints in analyzing data from the upcoming High Luminosity Large Hadron Collider, presenting the architecture as a potential solution. In particular, we evaluate our method by applying the model to multi-detector jet images from CMS Open Data. The goal is to distinguish quark-initiated from gluon-initiated jets. We successfully train the quantum model and evaluate it via numerical simulations. Using this approach, we achieve classification performance almost on par with the one obtained with the completely classical architecture, considering a similar number of parameters. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 14 pages, 8 figures. Published in MDPI Axioms 2024, 13(5), 323

MSC Class: 68Q12 (Primary) 81P68; 68T07 (Secondary)

Journal ref: Axioms 2024, 13(5), 323

arXiv:2402.14361 [pdf, other]

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Authors: Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis

Abstract: Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to div… ▽ More Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified data modalities and large table sizes. In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant tables and then generates SQL programs to parse the retrieved tables efficiently. Utilizing the intermediate data derived from the SQL executions, it conducts grounded inference to produce accurate response. Extensive experimental evaluation shows that OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system. △ Less

Submitted 12 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted by ICLR 2024

arXiv:2402.00776 [pdf, other]

doi 10.3390/axioms13030187

Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics

Authors: Eyup B. Unlu, Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva

Abstract: Models based on vision transformer architectures are considered state-of-the-art when it comes to image classification tasks. However, they require extensive computational resources both for training and deployment. The problem is exacerbated as the amount and complexity of the data increases. Quantum-based vision transformer models could potentially alleviate this issue by reducing the training a… ▽ More Models based on vision transformer architectures are considered state-of-the-art when it comes to image classification tasks. However, they require extensive computational resources both for training and deployment. The problem is exacerbated as the amount and complexity of the data increases. Quantum-based vision transformer models could potentially alleviate this issue by reducing the training and operating time while maintaining the same predictive power. Although current quantum computers are not yet able to perform high-dimensional tasks yet, they do offer one of the most efficient solutions for the future. In this work, we construct several variations of a quantum hybrid vision transformer for a classification problem in high energy physics (distinguishing photons and electrons in the electromagnetic calorimeter). We test them against classical vision transformer architectures. Our findings indicate that the hybrid models can achieve comparable performance to their classical analogues with a similar number of parameters. △ Less

Submitted 19 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 13 pages, 9 figures. Published version in a special issue "Computational Aspects of Machine Learning and Quantum Computing"

MSC Class: 68Q12; 81P68

Journal ref: Axioms v. 13, no 3, (2024) 187

arXiv:2311.18744 [pdf, other]

doi 10.3390/axioms13030188

$\mathbb{Z}_2\times \mathbb{Z}_2$ Equivariant Quantum Neural Networks: Benchmarking against Classical Neural Networks

Authors: Zhongtian Dong, Marçal Comajoan Cara, Gopal Ramesh Dahale, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu

Abstract: This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity… ▽ More This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity (measured by the number of parameters) and the size of the training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN and the QNN provide superior performance for smaller parameter sets and modest training data samples. △ Less

Submitted 20 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 13 pages, 7 figures

Journal ref: Axioms 2024, 13 (3), 188

arXiv:2311.18672 [pdf, other]

doi 10.3390/axioms13030160

A Comparison Between Invariant and Equivariant Classical and Quantum Graph Neural Networks

Authors: Roy T. Forestano, Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu

Abstract: Machine learning algorithms are heavily relied on to understand the vast amounts of data from high-energy particle collisions at the CERN Large Hadron Collider (LHC). The data from such collision events can naturally be represented with graph structures. Therefore, deep geometric methods, such as graph neural networks (GNNs), have been leveraged for various data analysis tasks in high-energy physi… ▽ More Machine learning algorithms are heavily relied on to understand the vast amounts of data from high-energy particle collisions at the CERN Large Hadron Collider (LHC). The data from such collision events can naturally be represented with graph structures. Therefore, deep geometric methods, such as graph neural networks (GNNs), have been leveraged for various data analysis tasks in high-energy physics. One typical task is jet tagging, where jets are viewed as point clouds with distinct features and edge connections between their constituent particles. The increasing size and complexity of the LHC particle datasets, as well as the computational models used for their analysis, greatly motivate the development of alternative fast and efficient computational paradigms such as quantum computation. In addition, to enhance the validity and robustness of deep networks, one can leverage the fundamental symmetries present in the data through the use of invariant inputs and equivariant layers. In this paper, we perform a fair and comprehensive comparison between classical graph neural networks (GNNs) and equivariant graph neural networks (EGNNs) and their quantum counterparts: quantum graph neural networks (QGNNs) and equivariant quantum graph neural networks (EQGNN). The four architectures were benchmarked on a binary classification task to classify the parton-level particle initiating the jet. Based on their AUC scores, the quantum networks were shown to outperform the classical networks. However, seeing the computational advantage of the quantum networks in practice may have to wait for the further development of quantum technology and its associated APIs. △ Less

Submitted 21 May, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 15 pages, 7 figures, 3 appendices

Journal ref: Axioms 13 (2024) 160

arXiv:2311.14464 [pdf, other]

Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation

Authors: Loh Sher En Jessica, Naheed Anjum Arafat, Wei Xian Lim, Wai Lee Chan, Adams Wai Kin Kong

Abstract: Computational fluid dynamics (CFD) simulation is an irreplaceable modelling step in many engineering designs, but it is often computationally expensive. Some graph neural network (GNN)-based CFD methods have been proposed. However, the current methods inherit the weakness of traditional numerical simulators, as well as ignore the cell characteristics in the mesh used in the finite volume method, a… ▽ More Computational fluid dynamics (CFD) simulation is an irreplaceable modelling step in many engineering designs, but it is often computationally expensive. Some graph neural network (GNN)-based CFD methods have been proposed. However, the current methods inherit the weakness of traditional numerical simulators, as well as ignore the cell characteristics in the mesh used in the finite volume method, a common method in practical CFD applications. Specifically, the input nodes in these GNN methods have very limited information about any object immersed in the simulation domain and its surrounding environment. Also, the cell characteristics of the mesh such as cell volume, face surface area, and face centroid are not included in the message-passing operations in the GNN methods. To address these weaknesses, this work proposes two novel geometric representations: Shortest Vector (SV) and Directional Integrated Distance (DID). Extracted from the mesh, the SV and DID provide global geometry perspective to each input node, thus removing the need to collect this information through message-passing. This work also introduces the use of Finite Volume Features (FVF) in the graph convolutions as node and edge attributes, enabling its message-passing operations to adjust to different nodes. Finally, this work is the first to demonstrate how residual training, with the availability of low-resolution data, can be adopted to improve the flow field prediction accuracy. Experimental results on two datasets with five different state-of-the-art GNN methods for CFD indicate that SV, DID, FVF and residual training can effectively reduce the predictive error of current GNN-based methods by as much as 41%. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.05383 [pdf]

Improving Hand Recognition in Uncontrolled and Uncooperative Environments using Multiple Spatial Transformers and Loss Functions

Authors: Wojciech Michal Matkowski, Xiaojie Li, Adams Wai Kin Kong

Abstract: The prevalence of smartphone and consumer camera has led to more evidence in the form of digital images, which are mostly taken in uncontrolled and uncooperative environments. In these images, criminals likely hide or cover their faces while their hands are observable in some cases, creating a challenging use case for forensic investigation. Many existing hand-based recognition methods perform wel… ▽ More The prevalence of smartphone and consumer camera has led to more evidence in the form of digital images, which are mostly taken in uncontrolled and uncooperative environments. In these images, criminals likely hide or cover their faces while their hands are observable in some cases, creating a challenging use case for forensic investigation. Many existing hand-based recognition methods perform well for hand images collected in controlled environments with user cooperation. However, their performance deteriorates significantly in uncontrolled and uncooperative environments. A recent work has exposed the potential of hand recognition in these environments. However, only the palmar regions were considered, and the recognition performance is still far from satisfactory. To improve the recognition accuracy, an algorithm integrating a multi-spatial transformer network (MSTN) and multiple loss functions is proposed to fully utilize information in full hand images. MSTN is firstly employed to localize the palms and fingers and estimate the alignment parameters. Then, the aligned images are further fed into pretrained convolutional neural networks, where features are extracted. Finally, a training scheme with multiple loss functions is used to train the network end-to-end. To demonstrate the effectiveness of the proposed algorithm, the trained model is evaluated on NTU-PI-v1 database and six benchmark databases from different domains. Experimental results show that the proposed algorithm performs significantly better than the existing methods in these uncontrolled and uncooperative environments and has good generalization capabilities to samples from different domains. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.13345 [pdf, other]

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

Authors: Xilie Xu, Keyi Kong, Ning Liu, Lizhen Cui, Di Wang, Jingfeng Zhang, Mohan Kankanhalli

Abstract: The wide-ranging applications of large language models (LLMs), especially in safety-critical domains, necessitate the proper evaluation of the LLM's adversarial robustness. This paper proposes an efficient tool to audit the LLM's adversarial robustness via a prompt-based adversarial attack (PromptAttack). PromptAttack converts adversarial textual attacks into an attack prompt that can cause the vi… ▽ More The wide-ranging applications of large language models (LLMs), especially in safety-critical domains, necessitate the proper evaluation of the LLM's adversarial robustness. This paper proposes an efficient tool to audit the LLM's adversarial robustness via a prompt-based adversarial attack (PromptAttack). PromptAttack converts adversarial textual attacks into an attack prompt that can cause the victim LLM to output the adversarial sample to fool itself. The attack prompt is composed of three important components: (1) original input (OI) including the original sample and its ground-truth label, (2) attack objective (AO) illustrating a task description of generating a new sample that can fool itself without changing the semantic meaning, and (3) attack guidance (AG) containing the perturbation instructions to guide the LLM on how to complete the task by perturbing the original sample at character, word, and sentence levels, respectively. Besides, we use a fidelity filter to ensure that PromptAttack maintains the original semantic meanings of the adversarial examples. Further, we enhance the attack power of PromptAttack by ensembling adversarial examples at different perturbation levels. Comprehensive empirical results using Llama2 and GPT-3.5 validate that PromptAttack consistently yields a much higher attack success rate compared to AdvGLUE and AdvGLUE++. Interesting findings include that a simple emoji can easily mislead GPT-3.5 to make wrong predictions. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2306.04634 [pdf, other]

On the Reliability of Watermarks for Large Language Models

Authors: John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

Abstract: As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked… ▽ More As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a 1e-5 false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors. △ Less

Submitted 1 May, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 9 pages in the main body. Published at ICLR 2024. Code is available at https://github.com/jwkirchenbauer/lm-watermarking

arXiv:2305.17445 [pdf, other]

Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on the False Alarms in Automated Speech Recognition Testing

Authors: Julia Kaiwen Lau, Kelvin Kai Wen Kong, Julian Hao Yong, Per Hoong Tan, Zhou Yang, Zi Qian Yong, Joshua Chern Wey Low, Chun Yong Chong, Mei Kuan Lim, David Lo

Abstract: Recent studies have proposed the use of Text-To-Speech (TTS) systems to automatically synthesise speech test cases on a scale and uncover a large number of failures in ASR systems. However, the failures uncovered by synthetic test cases may not reflect the actual performance of an ASR system when it transcribes human audio, which we refer to as false alarms. Given a failed test case synthesised fr… ▽ More Recent studies have proposed the use of Text-To-Speech (TTS) systems to automatically synthesise speech test cases on a scale and uncover a large number of failures in ASR systems. However, the failures uncovered by synthetic test cases may not reflect the actual performance of an ASR system when it transcribes human audio, which we refer to as false alarms. Given a failed test case synthesised from TTS systems, which consists of TTS-generated audio and the corresponding ground truth text, we feed the human audio stating the same text to an ASR system. If human audio can be correctly transcribed, an instance of a false alarm is detected. In this study, we investigate false alarm occurrences in five popular ASR systems using synthetic audio generated from four TTS systems and human audio obtained from two commonly used datasets. Our results show that the least number of false alarms is identified when testing Deepspeech, and the number of false alarms is the highest when testing Wav2vec2. On average, false alarm rates range from 21% to 34% in all five ASR systems. Among the TTS systems used, Google TTS produces the least number of false alarms (17%), and Espeak TTS produces the highest number of false alarms (32%) among the four TTS systems. Additionally, we build a false alarm estimator that flags potential false alarms, which achieves promising results: a precision of 98.3%, a recall of 96.4%, an accuracy of 98.5%, and an F1 score of 97.3%. Our study provides insight into the appropriate selection of TTS systems to generate high-quality speech to test ASR systems. Additionally, a false alarm estimator can be a way to minimise the impact of false alarms and help developers choose suitable test inputs when evaluating ASR systems. The source code used in this paper is publicly available on GitHub at https://github.com/julianyonghao/FAinASRtest. △ Less

Submitted 18 July, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: 13 pages, Accepted at ISSTA2023

arXiv:2211.08420 [pdf, other]

doi 10.1103/PhysRevD.107.055018

Is the Machine Smarter than the Theorist: Deriving Formulas for Particle Kinematics with Symbolic Regression

Authors: Zhongtian Dong, Kyoungchul Kong, Konstantin T. Matchev, Katia Matcheva

Abstract: We demonstrate the use of symbolic regression in deriving analytical formulas, which are needed at various stages of a typical experimental analysis in collider phenomenology. As a first application, we consider kinematic variables like the stransverse mass, $M_{T2}$, which are defined algorithmically through an optimization procedure and not in terms of an analytical formula. We then train a symb… ▽ More We demonstrate the use of symbolic regression in deriving analytical formulas, which are needed at various stages of a typical experimental analysis in collider phenomenology. As a first application, we consider kinematic variables like the stransverse mass, $M_{T2}$, which are defined algorithmically through an optimization procedure and not in terms of an analytical formula. We then train a symbolic regression and obtain the correct analytical expressions for all known special cases of $M_{T2}$ in the literature. As a second application, we reproduce the correct analytical expression for a next-to-leading order (NLO) kinematic distribution from data, which is simulated with a NLO event generator. Finally, we derive analytical approximations for the NLO kinematic distributions after detector simulation, for which no known analytical formulas currently exist. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 15 pages, 13 figures, 8 tables

arXiv:2210.01680 [pdf, other]

New Machine Learning Techniques for Simulation-Based Inference: InferoStatic Nets, Kernel Score Estimation, and Kernel Likelihood Ratio Estimation

Authors: Kyoungchul Kong, Konstantin T. Matchev, Stephen Mrenna, Prasanth Shyamsundar

Abstract: We propose an intuitive, machine-learning approach to multiparameter inference, dubbed the InferoStatic Networks (ISN) method, to model the score and likelihood ratio estimators in cases when the probability density can be sampled but not computed directly. The ISN uses a backend neural network that models a scalar function called the inferostatic potential $\varphi$. In addition, we introduce new… ▽ More We propose an intuitive, machine-learning approach to multiparameter inference, dubbed the InferoStatic Networks (ISN) method, to model the score and likelihood ratio estimators in cases when the probability density can be sampled but not computed directly. The ISN uses a backend neural network that models a scalar function called the inferostatic potential $\varphi$. In addition, we introduce new strategies, respectively called Kernel Score Estimation (KSE) and Kernel Likelihood Ratio Estimation (KLRE), to learn the score and the likelihood ratio functions from simulated data. We illustrate the new techniques with some toy examples and compare to existing approaches in the literature. We mention en passant some new loss functions that optimally incorporate latent information from simulations into the training procedure. △ Less

Submitted 3 February, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: 36 pages, 10 figures. Submission to SciPost

Report number: FERMILAB-PUB-22-741-SCD

arXiv:2110.14363 [pdf, other]

VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Authors: Mucong Ding, Kezhi Kong, Jingling Li, Chen Zhu, John P Dickerson, Furong Huang, Tom Goldstein

Abstract: Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a… ▽ More Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021

arXiv:2110.03396 [pdf, other]

AnoSeg: Anomaly Segmentation Network Using Self-Supervised Learning

Authors: Jouwon Song, Kyeongbo Kong, Ye-In Park, Seong-Gyun Kim, Suk-Ju Kang

Abstract: Anomaly segmentation, which localizes defective areas, is an important component in large-scale industrial manufacturing. However, most recent researches have focused on anomaly detection. This paper proposes a novel anomaly segmentation network (AnoSeg) that can directly generate an accurate anomaly map using self-supervised learning. For highly accurate anomaly segmentation, the proposed AnoSeg… ▽ More Anomaly segmentation, which localizes defective areas, is an important component in large-scale industrial manufacturing. However, most recent researches have focused on anomaly detection. This paper proposes a novel anomaly segmentation network (AnoSeg) that can directly generate an accurate anomaly map using self-supervised learning. For highly accurate anomaly segmentation, the proposed AnoSeg considers three novel techniques: Anomaly data generation based on hard augmentation, self-supervised learning with pixel-wise and adversarial losses, and coordinate channel concatenation. First, to generate synthetic anomaly images and reference masks for normal data, the proposed method uses hard augmentation to change the normal sample distribution. Then, the proposed AnoSeg is trained in a self-supervised learning manner from the synthetic anomaly data and normal data. Finally, the coordinate channel, which represents the pixel location information, is concatenated to an input of AnoSeg to consider the positional relationship of each pixel in the image. The estimated anomaly map can also be utilized to improve the performance of anomaly detection. Our experiments show that the proposed method outperforms the state-of-the-art anomaly detection and anomaly segmentation methods for the MVTec AD dataset. In addition, we compared the proposed method with the existing methods through the intersection over union (IoU) metric commonly used in segmentation tasks and demonstrated the superiority of our method for anomaly segmentation. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 10 pages, 17 figures

arXiv:2107.08792 [pdf, other]

Selective Focusing Learning in Conditional GANs

Authors: Kyeongbo Kong, Kyunghun Kim, Woo-Jin Song, Suk-Ju Kang

Abstract: Conditional generative adversarial networks (cGANs) have demonstrated remarkable success due to their class-wise controllability and superior quality for complex generation tasks. Typical cGANs solve the joint distribution matching problem by decomposing two easier sub-problems: marginal matching and conditional matching. From our toy experiments, we found that it is the best to apply only conditi… ▽ More Conditional generative adversarial networks (cGANs) have demonstrated remarkable success due to their class-wise controllability and superior quality for complex generation tasks. Typical cGANs solve the joint distribution matching problem by decomposing two easier sub-problems: marginal matching and conditional matching. From our toy experiments, we found that it is the best to apply only conditional matching to certain samples due to the content-aware optimization of the discriminator. This paper proposes a simple (a few lines of code) but effective training methodology, selective focusing learning, which enforces the discriminator and generator to learn easy samples of each class rapidly while maintaining diversity. Our key idea is to selectively apply conditional and joint matching for the data in each mini-batch. We conducted experiments on recent cGAN variants in ImageNet (64x64 and 128x128), CIFAR-10, and CIFAR-100 datasets, and improved the performance significantly (up to 35.18% in terms of FID) without sacrificing diversity. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 14 pages, 9 figures, spotlight presented at the ICML 2021 Workshop on Subset Selection in ML

arXiv:2107.07041 [pdf, other]

Mitigating Memorization in Sample Selection for Learning with Noisy Labels

Authors: Kyeongbo Kong, Junggi Lee, Youngchul Kwak, Young-Rae Cho, Seong-Eun Kim, Woo-Jin Song

Abstract: Because deep learning is vulnerable to noisy labels, sample selection techniques, which train networks with only clean labeled data, have attracted a great attention. However, if the labels are dominantly corrupted by few classes, these noisy samples are called dominant-noisy-labeled samples, the network also learns dominant-noisy-labeled samples rapidly via content-aware optimization. In this stu… ▽ More Because deep learning is vulnerable to noisy labels, sample selection techniques, which train networks with only clean labeled data, have attracted a great attention. However, if the labels are dominantly corrupted by few classes, these noisy samples are called dominant-noisy-labeled samples, the network also learns dominant-noisy-labeled samples rapidly via content-aware optimization. In this study, we propose a compelling criteria to penalize dominant-noisy-labeled samples intensively through class-wise penalty labels. By averaging prediction confidences for the each observed label, we obtain suitable penalty labels that have high values if the labels are largely corrupted by some classes. Experiments were performed using benchmarks (CIFAR-10, CIFAR-100, Tiny-ImageNet) and real-world datasets (ANIMAL-10N, Clothing1M) to evaluate the proposed criteria in various scenarios with different noise rates. Using the proposed sample selection, the learning process of the network becomes significantly robust to noisy labels compared to existing methods in several noise types. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 14 pages, 9 figures, spotlight presented at the ICML 2021 Workshop on Subset Selection in ML

arXiv:2107.06869 [pdf, other]

Core-set Sampling for Efficient Neural Architecture Search

Authors: Jae-hun Shim, Kyeongbo Kong, Suk-Ju Kang

Abstract: Neural architecture search (NAS), an important branch of automatic machine learning, has become an effective approach to automate the design of deep learning models. However, the major issue in NAS is how to reduce the large search time imposed by the heavy computational burden. While most recent approaches focus on pruning redundant sets or developing new search methodologies, this paper attempts… ▽ More Neural architecture search (NAS), an important branch of automatic machine learning, has become an effective approach to automate the design of deep learning models. However, the major issue in NAS is how to reduce the large search time imposed by the heavy computational burden. While most recent approaches focus on pruning redundant sets or developing new search methodologies, this paper attempts to formulate the problem based on the data curation manner. Our key strategy is to search the architecture using summarized data distribution, i.e., core-set. Typically, many NAS algorithms separate searching and training stages, and the proposed core-set methodology is only used in search stage, thus their performance degradation can be minimized. In our experiments, we were able to save overall computational time from 30.8 hours to 3.5 hours, 8.8x reduction, on a single RTX 3090 GPU without sacrificing accuracy. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 8 pages, 2 figures, spotlight presented at the ICML 2021 Workshop on Subset Selection in ML

arXiv:2107.05274 [pdf, other]

TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation

Authors: Bingzhi Chen, Yishu Liu, Zheng Zhang, Guangming Lu, Adams Wai Kin Kong

Abstract: Accurate segmentation of organs or lesions from medical images is crucial for reliable diagnosis of diseases and organ morphometry. In recent years, convolutional encoder-decoder solutions have achieved substantial progress in the field of automatic medical image segmentation. Due to the inherent bias in the convolution operations, prior models mainly focus on local visual cues formed by the neigh… ▽ More Accurate segmentation of organs or lesions from medical images is crucial for reliable diagnosis of diseases and organ morphometry. In recent years, convolutional encoder-decoder solutions have achieved substantial progress in the field of automatic medical image segmentation. Due to the inherent bias in the convolution operations, prior models mainly focus on local visual cues formed by the neighboring pixels, but fail to fully model the long-range contextual dependencies. In this paper, we propose a novel Transformer-based Attention Guided Network called TransAttUnet, in which the multi-level guided attention and multi-scale skip connection are designed to jointly enhance the performance of the semantical segmentation architecture. Inspired by Transformer, the self-aware attention (SAA) module with Transformer Self Attention (TSA) and Global Spatial Attention (GSA) is incorporated into TransAttUnet to effectively learn the non-local interactions among encoder features. Moreover, we also use additional multi-scale skip connections between decoder blocks to aggregate the upsampled features with different semantic scales. In this way, the representation ability of multi-scale context information is strengthened to generate discriminative features. Benefitting from these complementary components, the proposed TransAttUnet can effectively alleviate the loss of fine details caused by the stacking of convolution layers and the consecutive sampling operations, finally improving the segmentation quality of medical images. Extensive experiments on multiple medical image segmentation datasets from different imaging modalities demonstrate that the proposed method consistently outperforms the state-of-the-art baselines. Our code and pre-trained models are available at: https://github.com/YishuLiu/TransAttUnet. △ Less

Submitted 8 July, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

arXiv:2105.10126 [pdf, other]

Deep-Learned Event Variables for Collider Phenomenology

Authors: Doojin Kim, Kyoungchul Kong, Konstantin T. Matchev, Myeonghun Park, Prasanth Shyamsundar

Abstract: The choice of optimal event variables is crucial for achieving the maximal sensitivity of experimental analyses. Over time, physicists have derived suitable kinematic variables for many typical event topologies in collider physics. Here we introduce a deep learning technique to design good event variables, which are sensitive over a wide range of values for the unknown model parameters. We demonst… ▽ More The choice of optimal event variables is crucial for achieving the maximal sensitivity of experimental analyses. Over time, physicists have derived suitable kinematic variables for many typical event topologies in collider physics. Here we introduce a deep learning technique to design good event variables, which are sensitive over a wide range of values for the unknown model parameters. We demonstrate that the neural networks trained with our technique on some simple event topologies are able to reproduce standard event variables like invariant mass, transverse mass, and stransverse mass. The method is automatable, completely general, and can be used to derive sensitive, previously unknown, event variables for other, more complex event topologies. △ Less

Submitted 21 May, 2021; originally announced May 2021.

Comments: 6 pages, 7 figures

Report number: FERMILAB-PUB-21-244-QIS, MI-TH-2111

arXiv:2103.16851 [pdf, other]

Attention Map-guided Two-stage Anomaly Detection using Hard Augmentation

Authors: Jou Won Song, Kyeongbo Kong, Ye In Park, Suk-Ju Kang

Abstract: Anomaly detection is a task that recognizes whether an input sample is included in the distribution of a target normal class or an anomaly class. Conventional generative adversarial network (GAN)-based methods utilize an entire image including foreground and background as an input. However, in these methods, a useless region unrelated to the normal class (e.g., unrelated background) is learned as… ▽ More Anomaly detection is a task that recognizes whether an input sample is included in the distribution of a target normal class or an anomaly class. Conventional generative adversarial network (GAN)-based methods utilize an entire image including foreground and background as an input. However, in these methods, a useless region unrelated to the normal class (e.g., unrelated background) is learned as normal class distribution, thereby leading to false detection. To alleviate this problem, this paper proposes a novel two-stage network consisting of an attention network and an anomaly detection GAN (ADGAN). The attention network generates an attention map that can indicate the region representing the normal class distribution. To generate an accurate attention map, we propose the attention loss and the adversarial anomaly loss based on synthetic anomaly samples generated from hard augmentation. By applying the attention map to an image feature map, ADGAN learns the normal class distribution from which the useless region is removed, and it is possible to greatly reduce the problem difficulty of the anomaly detection task. Additionally, the estimated attention map can be used for anomaly segmentation because it can distinguish between normal and anomaly regions. As a result, the proposed method outperforms the state-of-the-art anomaly detection and anomaly segmentation methods for widely used datasets. △ Less

Submitted 31 March, 2021; originally announced March 2021.

arXiv:2103.04436

Insta-RS: Instance-wise Randomized Smoothing for Improved Robustness and Accuracy

Authors: Chen Chen, Kezhi Kong, Peihong Yu, Juan Luque, Tom Goldstein, Furong Huang

Abstract: Randomized smoothing (RS) is an effective and scalable technique for constructing neural network classifiers that are certifiably robust to adversarial perturbations. Most RS works focus on training a good base model that boosts the certified robustness of the smoothed model. However, existing RS techniques treat every data point the same, i.e., the variance of the Gaussian noise used to form the… ▽ More Randomized smoothing (RS) is an effective and scalable technique for constructing neural network classifiers that are certifiably robust to adversarial perturbations. Most RS works focus on training a good base model that boosts the certified robustness of the smoothed model. However, existing RS techniques treat every data point the same, i.e., the variance of the Gaussian noise used to form the smoothed model is preset and universal for all training and test data. This preset and universal Gaussian noise variance is suboptimal since different data points have different margins and the local properties of the base model vary across the input examples. In this paper, we examine the impact of customized handling of examples and propose Instance-wise Randomized Smoothing (Insta-RS) -- a multiple-start search algorithm that assigns customized Gaussian variances to test examples. We also design Insta-RS Train -- a novel two-stage training algorithm that adaptively adjusts and customizes the noise level of each training example for training a base model that boosts the certified robustness of the instance-wise Gaussian smoothed model. Through extensive experiments on CIFAR-10 and ImageNet, we show that our method significantly enhances the average certified radius (ACR) as well as the clean data accuracy compared to existing state-of-the-art provably robust classifiers. △ Less

Submitted 16 September, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

Comments: We plan to make major modifications to this paper including rewriting the entire text, rewriting the proofs and adding experiments. Given that the paper will be completely different, we decided to take this paper down temporarily

arXiv:2102.08098 [pdf, other]

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Authors: Chen Zhu, Renkun Ni, Zheng Xu, Kezhi Kong, W. Ronny Huang, Tom Goldstein

Abstract: Innovations in neural architectures have fostered significant breakthroughs in language modeling and computer vision. Unfortunately, novel architectures often result in challenging hyper-parameter choices and training instability if the network parameters are not properly initialized. A number of architecture-specific initialization schemes have been proposed, but these schemes are not always port… ▽ More Innovations in neural architectures have fostered significant breakthroughs in language modeling and computer vision. Unfortunately, novel architectures often result in challenging hyper-parameter choices and training instability if the network parameters are not properly initialized. A number of architecture-specific initialization schemes have been proposed, but these schemes are not always portable to new architectures. This paper presents GradInit, an automated and architecture agnostic method for initializing neural networks. GradInit is based on a simple heuristic; the norm of each network layer is adjusted so that a single step of SGD or Adam with prescribed hyperparameters results in the smallest possible loss value. This adjustment is done by introducing a scalar multiplier variable in front of each parameter block, and then optimizing these variables using a simple numerical scheme. GradInit accelerates the convergence and test performance of many convolutional architectures, both with or without skip connections, and even without normalization layers. It also improves the stability of the original Transformer architecture for machine translation, enabling training it without learning rate warmup using either Adam or SGD under a wide range of learning rates and momentum coefficients. Code is available at https://github.com/zhuchen03/gradinit. △ Less

Submitted 24 November, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: NeurIPS 2021, fixing typos

arXiv:2011.10684 [pdf, other]

SHOT-VAE: Semi-supervised Deep Generative Models With Label-aware ELBO Approximations

Authors: Hao-Zhe Feng, Kezhi Kong, Minghao Chen, Tianye Zhang, Minfeng Zhu, Wei Chen

Abstract: Semi-supervised variational autoencoders (VAEs) have obtained strong results, but have also encountered the challenge that good ELBO values do not always imply accurate inference results. In this paper, we investigate and propose two causes of this problem: (1) The ELBO objective cannot utilize the label information directly. (2) A bottleneck value exists and continuing to optimize ELBO after this… ▽ More Semi-supervised variational autoencoders (VAEs) have obtained strong results, but have also encountered the challenge that good ELBO values do not always imply accurate inference results. In this paper, we investigate and propose two causes of this problem: (1) The ELBO objective cannot utilize the label information directly. (2) A bottleneck value exists and continuing to optimize ELBO after this value will not improve inference accuracy. On the basis of the experiment results, we propose SHOT-VAE to address these problems without introducing additional prior knowledge. The SHOT-VAE offers two contributions: (1) A new ELBO approximation named smooth-ELBO that integrates the label predictive loss into ELBO. (2) An approximation based on optimal interpolation that breaks the ELBO value bottleneck by reducing the margin between ELBO and the data likelihood. The SHOT-VAE achieves good performance with a 25.30% error rate on CIFAR-100 with 10k labels and reduces the error rate to 6.11% on CIFAR-10 with 4k labels. △ Less

Submitted 8 December, 2020; v1 submitted 20 November, 2020; originally announced November 2020.

Comments: 12 pages, 6 figures, Accepted for presentation at AAAI2021

arXiv:2010.09891 [pdf, other]

Robust Optimization as Data Augmentation for Large-scale Graphs

Authors: Kezhi Kong, Guohao Li, Mucong Ding, Zuxuan Wu, Chen Zhu, Bernard Ghanem, Gavin Taylor, Tom Goldstein

Abstract: Data augmentation helps neural networks generalize better by enlarging the training set, but it remains an open question how to effectively augment graph data to enhance the performance of GNNs (Graph Neural Networks). While most existing graph regularizers focus on manipulating graph topological structures by adding/removing edges, we offer a method to augment node features for better performance… ▽ More Data augmentation helps neural networks generalize better by enlarging the training set, but it remains an open question how to effectively augment graph data to enhance the performance of GNNs (Graph Neural Networks). While most existing graph regularizers focus on manipulating graph topological structures by adding/removing edges, we offer a method to augment node features for better performance. We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. By making the model invariant to small fluctuations in input data, our method helps models generalize to out-of-distribution samples and boosts model performance at test time. FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks. FLAG is also highly flexible and scalable, and is deployable with arbitrary GNN backbones and large-scale datasets. We demonstrate the efficacy and stability of our method through extensive experiments and ablation studies. We also provide intuitive observations for a deeper understanding of our method. △ Less

Submitted 29 March, 2022; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: Accepted at CVPR 2022

arXiv:2010.07092 [pdf, other]

Data Augmentation for Meta-Learning

Authors: Renkun Ni, Micah Goldblum, Amr Sharaf, Kezhi Kong, Tom Goldstein

Abstract: Conventional image classifiers are trained by randomly sampling mini-batches of images. To achieve state-of-the-art performance, practitioners use sophisticated data augmentation schemes to expand the amount of training data available for sampling. In contrast, meta-learning algorithms sample support data, query data, and tasks on each training step. In this complex sampling scenario, data augment… ▽ More Conventional image classifiers are trained by randomly sampling mini-batches of images. To achieve state-of-the-art performance, practitioners use sophisticated data augmentation schemes to expand the amount of training data available for sampling. In contrast, meta-learning algorithms sample support data, query data, and tasks on each training step. In this complex sampling scenario, data augmentation can be used not only to expand the number of images available per class, but also to generate entirely new classes/tasks. We systematically dissect the meta-learning pipeline and investigate the distinct ways in which data augmentation can be integrated at both the image and class levels. Our proposed meta-specific data augmentation significantly improves the performance of meta-learners on few-shot classification benchmarks. △ Less

Submitted 22 June, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: 15 pages, 3 figures, Accepted to ICML 2021

arXiv:2010.01810 [pdf, other]

Painting Outside as Inside: Edge Guided Image Outpainting via Bidirectional Rearrangement with Progressive Step Learning

Authors: Kyunghun Kim, Yeohun Yun, Keon-Woo Kang, Kyeongbo Kong, Siyeong Lee, Suk-Ju Kang

Abstract: Image outpainting is a very intriguing problem as the outside of a given image can be continuously filled by considering as the context of the image. This task has two main challenges. The first is to maintain the spatial consistency in contents of generated regions and the original input. The second is to generate a high-quality large image with a small amount of adjacent information. Conventiona… ▽ More Image outpainting is a very intriguing problem as the outside of a given image can be continuously filled by considering as the context of the image. This task has two main challenges. The first is to maintain the spatial consistency in contents of generated regions and the original input. The second is to generate a high-quality large image with a small amount of adjacent information. Conventional image outpainting methods generate inconsistent, blurry, and repeated pixels. To alleviate the difficulty of an outpainting problem, we propose a novel image outpainting method using bidirectional boundary region rearrangement. We rearrange the image to benefit from the image inpainting task by reflecting more directional information. The bidirectional boundary region rearrangement enables the generation of the missing region using bidirectional information similar to that of the image inpainting task, thereby generating the higher quality than the conventional methods using unidirectional information. Moreover, we use the edge map generator that considers images as original input with structural information and hallucinates the edges of unknown regions to generate the image. Our proposed method is compared with other state-of-the-art outpainting and inpainting methods both qualitatively and quantitatively. We further compared and evaluated them using BRISQUE, one of the No-Reference image quality assessment (IQA) metrics, to evaluate the naturalness of the output. The experimental results demonstrate that our method outperforms other methods and generates new images with 360°panoramic characteristics. △ Less

Submitted 9 November, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: Paper accepted in WACV 2021

arXiv:2008.04147 [pdf, ps, other]

Knowledge Distillation-aided End-to-End Learning for Linear Precoding in Multiuser MIMO Downlink Systems with Finite-Rate Feedback

Authors: Kyeongbo Kong, Woo-Jin Song, Moonsik Min

Abstract: We propose a deep learning-based channel estimation, quantization, feedback, and precoding method for downlink multiuser multiple-input and multiple-output systems. In the proposed system, channel estimation and quantization for limited feedback are handled by a receiver deep neural network (DNN). Precoder selection is handled by a transmitter DNN. To emulate the traditional channel quantization,… ▽ More We propose a deep learning-based channel estimation, quantization, feedback, and precoding method for downlink multiuser multiple-input and multiple-output systems. In the proposed system, channel estimation and quantization for limited feedback are handled by a receiver deep neural network (DNN). Precoder selection is handled by a transmitter DNN. To emulate the traditional channel quantization, a binarization layer is adopted at each receiver DNN, and the binarization layer is also used to enable end-to-end learning. However, this can lead to inaccurate gradients, which can trap the receiver DNNs at a poor local minimum during training. To address this, we consider knowledge distillation, in which the existing DNNs are jointly trained with an auxiliary transmitter DNN. The use of an auxiliary DNN as a teacher network allows the receiver DNNs to additionally exploit lossless gradients, which is useful in avoiding a poor local minimum. For the same number of feedback bits, our DNN-based precoding scheme can achieve a higher downlink rate compared to conventional linear precoding with codebook-based limited feedback. △ Less

Submitted 22 March, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: 6 pages, 4 figures, submitted to IEEE Transactions on Vehicular Technology

arXiv:2008.02500 [pdf]

doi 10.1109/IJCB48548.2020.9304907

Gender and Ethnicity Classification based on Palmprint and Palmar Hand Images from Uncontrolled Environment

Authors: Wojciech Michal Matkowski, Adams Wai Kin Kong

Abstract: Soft biometric attributes such as gender, ethnicity or age may provide useful information for biometrics and forensics applications. Researchers used, e.g., face, gait, iris, and hand, etc. to classify such attributes. Even though hand has been widely studied for biometric recognition, relatively less attention has been given to soft biometrics from hand. Previous studies of soft biometrics based… ▽ More Soft biometric attributes such as gender, ethnicity or age may provide useful information for biometrics and forensics applications. Researchers used, e.g., face, gait, iris, and hand, etc. to classify such attributes. Even though hand has been widely studied for biometric recognition, relatively less attention has been given to soft biometrics from hand. Previous studies of soft biometrics based on hand images focused on gender and well-controlled imaging environment. In this paper, the gender and ethnicity classification in uncontrolled environment are considered. Gender and ethnicity labels are collected and provided for subjects in a publicly available database, which contains hand images from the Internet. Five deep learning models are fine-tuned and evaluated in gender and ethnicity classification scenarios based on palmar 1) full hand, 2) segmented hand and 3) palmprint images. The experimental results indicate that for gender and ethnicity classification in uncontrolled environment, full and segmented hand images are more suitable than palmprint images. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Comments: Accepted in the International Joint Conference on Biometrics (IJCB 2020), scheduled for Sep 28-Oct 1, 2020

arXiv:1911.12514 [pdf]

doi 10.1109/TIFS.2019.2945183

Palmprint Recognition in Uncontrolled and Uncooperative Environment

Authors: Wojciech Michal Matkowski, Tingting Chai, Adams Wai Kin Kong

Abstract: Online palmprint recognition and latent palmprint identification are two branches of palmprint studies. The former uses middle-resolution images collected by a digital camera in a well-controlled or contact-based environment with user cooperation for commercial applications and the latter uses high-resolution latent palmprints collected in crime scenes for forensic investigation. However, these tw… ▽ More Online palmprint recognition and latent palmprint identification are two branches of palmprint studies. The former uses middle-resolution images collected by a digital camera in a well-controlled or contact-based environment with user cooperation for commercial applications and the latter uses high-resolution latent palmprints collected in crime scenes for forensic investigation. However, these two branches do not cover some palmprint images which have the potential for forensic investigation. Due to the prevalence of smartphone and consumer camera, more evidence is in the form of digital images taken in uncontrolled and uncooperative environment, e.g., child pornographic images and terrorist images, where the criminals commonly hide or cover their face. However, their palms can be observable. To study palmprint identification on images collected in uncontrolled and uncooperative environment, a new palmprint database is established and an end-to-end deep learning algorithm is proposed. The new database named NTU Palmprints from the Internet (NTU-PI-v1) contains 7881 images from 2035 palms collected from the Internet. The proposed algorithm consists of an alignment network and a feature extraction network and is end-to-end trainable. The proposed algorithm is compared with the state-of-the-art online palmprint recognition methods and evaluated on three public contactless palmprint databases, IITD, CASIA, and PolyU and two new databases, NTU-PI-v1 and NTU contactless palmprint database. The experimental results showed that the proposed algorithm outperforms the existing palmprint recognition methods. △ Less

Submitted 27 November, 2019; originally announced November 2019.

Comments: Accepted in the IEEE Transactions on Information Forensics and Security

arXiv:1910.03213 [pdf, other]

doi 10.1016/j.imavis.2019.05.005

A Study on Wrist Identification for Forensic Investigation

Authors: Wojciech Michal Matkowski, Frodo Kin Sun Chan, Adams Wai Kin Kong

Abstract: Criminal and victim identification based on crime scene images is an important part of forensic investigation. Criminals usually avoid identification by covering their faces and tattoos in the evidence images, which are taken in uncontrolled environments. Existing identification methods, which make use of biometric traits, such as vein, skin mark, height, skin color, weight, race, etc., are consid… ▽ More Criminal and victim identification based on crime scene images is an important part of forensic investigation. Criminals usually avoid identification by covering their faces and tattoos in the evidence images, which are taken in uncontrolled environments. Existing identification methods, which make use of biometric traits, such as vein, skin mark, height, skin color, weight, race, etc., are considered for solving this problem. The soft biometric traits, including skin color, gender, height, weight and race, provide useful information but not distinctive enough. Veins and skin marks are limited to high resolution images and some body sites may neither have enough skin marks nor clear veins. Terrorists and rioters tend to expose their wrists in a gesture of triumph, greeting or salute, while paedophiles usually show them when touching victims. However, wrists were neglected by the biometric community for forensic applications. In this paper, a wrist identification algorithm, which includes skin segmentation, key point localization, image to template alignment, large feature set extraction, and classification, is proposed. The proposed algorithm is evaluated on NTU-Wrist-Image-Database-v1, which consists of 3945 images from 731 different wrists, including 205 pairs of wrist images collected from the Internet, taken under uneven illuminations with different poses and resolutions. The experimental results show that wrist is a useful clue for criminal and victim identification. Keywords: biometrics, criminal and victim identification, forensics, wrist. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Journal ref: Image and Vision Computing, vol. 88, August 2019, pp 96-112

arXiv:1905.11163 [pdf]

doi 10.1109/ICIP.2019.8803125

Giant Panda Face Recognition Using Small Dataset

Authors: Wojciech Michal Matkowski, Adams Wai Kin Kong, Han Su, Peng Chen, Rong Hou, Zhihe Zhang

Abstract: Giant panda (panda) is a highly endangered animal. Significant efforts and resources have been put on panda conservation. To measure effectiveness of conservation schemes, estimating its population size in wild is an important task. The current population estimation approaches, including capture-recapture, human visual identification and collection of DNA from hair or feces, are invasive, subjecti… ▽ More Giant panda (panda) is a highly endangered animal. Significant efforts and resources have been put on panda conservation. To measure effectiveness of conservation schemes, estimating its population size in wild is an important task. The current population estimation approaches, including capture-recapture, human visual identification and collection of DNA from hair or feces, are invasive, subjective, costly or even dangerous to the workers who perform these tasks in wild. Cameras have been widely installed in the regions where pandas live. It opens a new possibility for non-invasive image based panda recognition. Panda face recognition is naturally a small dataset problem, because of the number of pandas in the world and the number of qualified images captured by the cameras in each encounter. In this paper, a panda face recognition algorithm, which includes alignment, large feature set extraction and matching is proposed and evaluated on a dataset consisting of 163 images. The experimental results are encouraging. △ Less

Submitted 27 May, 2019; originally announced May 2019.

Comments: Accepted in the IEEE 2019 International Conference on Image Processing (ICIP 2019), scheduled for 22-25 September 2019 in Taipei, Taiwan

Journal ref: 2019 IEEE International Conference on Image Processing (ICIP)

arXiv:1902.07057 [pdf, other]

doi 10.1145/3300061.3300118

Towards Touch-to-Access Device Authentication Using Induced Body Electric Potentials

Authors: Zhenyu Yan, Qun Song, Rui Tan, Yang Li, Adams Wai Kin Kong

Abstract: This paper presents TouchAuth, a new touch-to-access device authentication approach using induced body electric potentials (iBEPs) caused by the indoor ambient electric field that is mainly emitted from the building's electrical cabling. The design of TouchAuth is based on the electrostatics of iBEP generation and a resulting property, i.e., the iBEPs at two close locations on the same human body… ▽ More This paper presents TouchAuth, a new touch-to-access device authentication approach using induced body electric potentials (iBEPs) caused by the indoor ambient electric field that is mainly emitted from the building's electrical cabling. The design of TouchAuth is based on the electrostatics of iBEP generation and a resulting property, i.e., the iBEPs at two close locations on the same human body are similar, whereas those from different human bodies are distinct. Extensive experiments verify the above property and show that TouchAuth achieves high-profile receiver operating characteristics in implementing the touch-to-access policy. Our experiments also show that a range of possible interfering sources including appliances' electromagnetic emanations and noise injections into the power network do not affect the performance of TouchAuth. A key advantage of TouchAuth is that the iBEP sensing requires a simple analog-to-digital converter only, which is widely available on microcontrollers. Compared with existing approaches including intra-body communication and physiological sensing, TouchAuth is a low-cost, lightweight, and convenient approach for authorized users to access the smart objects found in indoor environments. △ Less

Submitted 15 February, 2019; originally announced February 2019.

Comments: 16 pages, accepted to the 25th Annual International Conference on Mobile Computing and Networking (MobiCom 2019), October 21-25, 2019, Los Cabos, Mexico

arXiv:1505.03246 [pdf]

doi 10.5121/ijcsit.2015.7209

Prefix-based Labeling Annotation for Effective XML Fragmentation

Authors: Kok-Leong Koong, Su-Cheng Haw, Lay-Ki Soon, Samini Subramaniam

Abstract: XML is gradually employed as a standard of data exchange in web environment since its inception in the 90s until present. It serves as a data exchange between systems and other applications. Meanwhile the data volume has grown substantially in the web and thus effective methods of storing and retrieving these data is essential. One recommended way is physically or virtually fragments the large chu… ▽ More XML is gradually employed as a standard of data exchange in web environment since its inception in the 90s until present. It serves as a data exchange between systems and other applications. Meanwhile the data volume has grown substantially in the web and thus effective methods of storing and retrieving these data is essential. One recommended way is physically or virtually fragments the large chunk of data and distributes the fragments into different nodes. Fragmentation design of XML document contains of two parts: fragmentation operation and fragmentation method. The three fragmentation operations are Horizontal, Vertical and Hybrid. It determines how the XML should be fragmented. This paper aims to give an overview on the fragmentation design consideration and subsequently, propose a fragmentation technique using number addressing. △ Less

Submitted 13 May, 2015; originally announced May 2015.

Comments: 12 pages, invited extension from conference paper. International Journal of Computer Science & Information Technology (IJCSIT), Vol 7, No 2, April 2015

ACM Class: H.2.4

Showing 1–35 of 35 results for author: Kong, K