Search | arXiv e-print repository

Error Detection and Constraint Recovery in Hierarchical Multi-Label Classification without Prior Knowledge

Authors: Joshua Shay Kricheli, Khoa Vo, Aniruddha Datta, Spencer Ozgur, Paulo Shakarian

Abstract: Recent advances in Hierarchical Multi-label Classification (HMC), particularly neurosymbolic-based approaches, have demonstrated improved consistency and accuracy by enforcing constraints on a neural model during training. However, such work assumes the existence of such constraints a-priori. In this paper, we relax this strong assumption and present an approach based on Error Detection Rules (EDR… ▽ More Recent advances in Hierarchical Multi-label Classification (HMC), particularly neurosymbolic-based approaches, have demonstrated improved consistency and accuracy by enforcing constraints on a neural model during training. However, such work assumes the existence of such constraints a-priori. In this paper, we relax this strong assumption and present an approach based on Error Detection Rules (EDR) that allow for learning explainable rules about the failure modes of machine learning models. We show that these rules are not only effective in detecting when a machine learning classifier has made an error but also can be leveraged as constraints for HMC, thereby allowing the recovery of explainable constraints even if they are not provided. We show that our approach is effective in detecting machine learning errors and recovering constraints, is noise tolerant, and can function as a source of knowledge for neurosymbolic models on multiple datasets, including a newly introduced military vehicle recognition dataset. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2406.00307 [pdf, other]

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

Authors: Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le

Abstract: Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modaliti… ▽ More Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modalities. In this paper, we take an inspiration from human perception and explore a compositional approach for egocentric video representation. We introduce HENASY (Hierarchical ENtities ASsemblY), which includes a spatiotemporal token grouping mechanism to explicitly assemble dynamically evolving scene entities through time and model their relationship for video representation. By leveraging compositional structure understanding, HENASY possesses strong interpretability via visual grounding with free-form text queries. We further explore a suite of multi-grained contrastive losses to facilitate entity-centric understandings. This comprises three alignment types: video-narration, noun-entity, verb-entities alignments. Our method demonstrates strong interpretability in both quantitative and qualitative experiments; while maintaining competitive performances on five downstream tasks via zero-shot transfer or as video/text representation, including video/text retrieval, action recognition, multi-choice query, natural language query, and moments query. △ Less

Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: under submission

arXiv:2405.19277 [pdf, other]

Deep Latent Variable Modeling of Physiological Signals

Authors: Khuong Vo

Abstract: A deep latent variable model is a powerful method for capturing complex distributions. These models assume that underlying structures, but unobserved, are present within the data. In this dissertation, we explore high-dimensional problems related to physiological monitoring using latent variable models. First, we present a novel deep state-space model to generate electrical waveforms of the heart… ▽ More A deep latent variable model is a powerful method for capturing complex distributions. These models assume that underlying structures, but unobserved, are present within the data. In this dissertation, we explore high-dimensional problems related to physiological monitoring using latent variable models. First, we present a novel deep state-space model to generate electrical waveforms of the heart using optically obtained signals as inputs. This can bring about clinical diagnoses of heart disease via simple assessment through wearable devices. Second, we present a brain signal modeling scheme that combines the strengths of probabilistic graphical models and deep adversarial learning. The structured representations can provide interpretability and encode inductive biases to reduce the data complexity of neural oscillations. The efficacy of the learned representations is further studied in epilepsy seizure detection formulated as an unsupervised learning problem. Third, we propose a framework for the joint modeling of physiological measures and behavior. Existing methods to combine multiple sources of brain data provided are limited. Direct analysis of the relationship between different types of physiological measures usually does not involve behavioral data. Our method can identify the unique and shared contributions of brain regions to behavior and can be used to discover new functions of brain regions. The success of these innovative computational methods would allow the translation of biomarker findings across species and provide insight into neurocognitive analysis in numerous biological studies and clinical diagnoses, as well as emerging consumer applications. △ Less

Submitted 12 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: PhD thesis

arXiv:2403.11376 [pdf, other]

ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation

Authors: Minh Tran, Winston Bounsavy, Khoa Vo, Anh Nguyen, Tri Nguyen, Ngan Le

Abstract: Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utiliza… ▽ More Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utilization of amodal features through the amodal-to-visible can confuse the visible features due to the extra information of occluded/hidden segments not presented in visible display. Consequently, this compromised quality of visible features during the subsequent visible-to-amodal transition. To tackle this issue, we introduce ShapeFormer, a decoupled Transformer-based model with a visible-to-amodal transition. It facilitates the explicit relationship between output segmentations and avoids the need for amodal-to-visible transitions. ShapeFormer comprises three key modules: (i) Visible-Occluding Mask Head for predicting visible segmentation with occlusion awareness, (ii) Shape-Prior Amodal Mask Head for predicting amodal and occluded masks, and (iii) Category-Specific Shape Prior Retriever aims to provide shape prior knowledge. Comprehensive experiments and extensive ablation studies across various AIS benchmarks demonstrate the effectiveness of our ShapeFormer. The code is available at: \url{https://github.com/UARK-AICV/ShapeFormer} △ Less

Submitted 17 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted to IJCNN2024

arXiv:2311.11879 [pdf, other]

Periodic orbits in general Glass networks

Authors: Huy K. Vo

Abstract: Glass networks are piecewise linear ODE systems that models an interactive system where there are 'switching points': the underlying dynamic changes qualitatively when a certain variable pass over a threshold. One of the most well-studied class of models of the original Glass network are the cyclic attractor in the orthants (a sequence of orthants where the flow from one orthant to another is unan… ▽ More Glass networks are piecewise linear ODE systems that models an interactive system where there are 'switching points': the underlying dynamic changes qualitatively when a certain variable pass over a threshold. One of the most well-studied class of models of the original Glass network are the cyclic attractor in the orthants (a sequence of orthants where the flow from one orthant to another is unanimous), which was first defined and analysed by Glass and Pasternack in 1978. In that paper, the authors gave a complete classification of the topological features of the flow in a full-rank cyclic attractor, which is a cyclic attractor that cannot be contained in any sub-cube in the graph of orthants. In this paper, we will extend the definition of cyclic attractor to one generalisation of the Glass network, one that allows for multiple switching points in each variables, and give a complete classification of the topological features of the flow for any cyclic attractor, both in the extended network and the original network, including non full-rank ones. We will show that in any cyclic attractor, there is either a unique and asymptotically stable periodic orbit, or that all periodic orbits are degenerated. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 16 pages, 5 figures

arXiv:2311.00729 [pdf, other]

ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection

Authors: Thinh Phan, Khoa Vo, Duy Le, Gianfranco Doretto, Donald Adjeroh, Ngan Le

Abstract: Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot T… ▽ More Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot TAD methods have limitations on how to properly construct the strong relationship between two interdependent tasks of localization and classification and adapt ViL model to video understanding. In this work, we present ZEETAD, featuring two modules: dual-localization and zero-shot proposal classification. The former is a Transformer-based module that detects action events while selectively collecting crucial semantic embeddings for later recognition. The latter one, CLIP-based module, generates semantic embeddings from text and frame inputs for each temporal unit. Additionally, we enhance discriminative capability on unseen classes by minimally updating the frozen CLIP encoder with lightweight adapters. Extensive experiments on THUMOS14 and ActivityNet-1.3 datasets demonstrate our approach's superior performance in zero-shot TAD and effective knowledge transfer from ViL models to unseen action categories. △ Less

Submitted 4 November, 2023; v1 submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.03923 [pdf, other]

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Authors: Kashu Yamazaki, Taisei Hanyu, Khoa Vo, Thang Pham, Minh Tran, Gianfranco Doretto, Anh Nguyen, Ngan Le

Abstract: Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained vision-language found… ▽ More Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction. By leveraging the VLFM, we extract region-based embeddings and their associated confidence maps. These are then integrated with 3D knowledge from TSDF using an enhanced Hungarian-based feature-matching mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without necessitating additional 3D training. Benchmark tests on the ScanNet dataset against leading zero-shot methods highlight Open-Fusion's superiority. Furthermore, it seamlessly combines the strengths of region-based VLFM and TSDF, facilitating real-time 3D scene comprehension that includes object concepts and open-world semantics. We encourage the readers to view the demos on our project page: https://uark-aicv.github.io/OpenFusion △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.15375 [pdf, other]

PPG-to-ECG Signal Translation for Continuous Atrial Fibrillation Detection via Attention-based Deep State-Space Modeling

Authors: Khuong Vo, Mostafa El-Khamy, Yoojin Choi

Abstract: Photoplethysmography (PPG) is a cost-effective and non-invasive technique that utilizes optical methods to measure cardiac physiology. PPG has become increasingly popular in health monitoring and is used in various commercial and clinical wearable devices. Compared to electrocardiography (ECG), PPG does not provide substantial clinical diagnostic value, despite the strong correlation between the t… ▽ More Photoplethysmography (PPG) is a cost-effective and non-invasive technique that utilizes optical methods to measure cardiac physiology. PPG has become increasingly popular in health monitoring and is used in various commercial and clinical wearable devices. Compared to electrocardiography (ECG), PPG does not provide substantial clinical diagnostic value, despite the strong correlation between the two. Here, we propose a subject-independent attention-based deep state-space model (ADSSM) to translate PPG signals to corresponding ECG waveforms. The model is not only robust to noise but also data-efficient by incorporating probabilistic prior knowledge. To evaluate our approach, 55 subjects' data from the MIMIC-III database were used in their original form, and then modified with noise, mimicking real-world scenarios. Our approach was proven effective as evidenced by the PR-AUC of 0.986 achieved when inputting the translated ECG signals into an existing atrial fibrillation (AFib) detector. ADSSM enables the integration of ECG's extensive knowledge base and PPG's continuous measurement for early diagnosis of cardiovascular disease. △ Less

Submitted 12 June, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: Accepted to 46th IEEE EMBC

arXiv:2308.16262 [pdf, other]

Causal Strategic Learning with Competitive Selection

Authors: Kiet Q. H. Vo, Muneeb Aadil, Siu Lun Chau, Krikamol Muandet

Abstract: We study the problem of agent selection in causal strategic learning under multiple decision makers and address two key challenges that come with it. Firstly, while much of prior work focuses on studying a fixed pool of agents that remains static regardless of their evaluations, we consider the impact of selection procedure by which agents are not only evaluated, but also selected. When each decis… ▽ More We study the problem of agent selection in causal strategic learning under multiple decision makers and address two key challenges that come with it. Firstly, while much of prior work focuses on studying a fixed pool of agents that remains static regardless of their evaluations, we consider the impact of selection procedure by which agents are not only evaluated, but also selected. When each decision maker unilaterally selects agents by maximising their own utility, we show that the optimal selection rule is a trade-off between selecting the best agents and providing incentives to maximise the agents' improvement. Furthermore, this optimal selection rule relies on incorrect predictions of agents' outcomes. Hence, we study the conditions under which a decision maker's optimal selection rule will not lead to deterioration of agents' outcome nor cause unjust reduction in agents' selection chance. To that end, we provide an analytical form of the optimal selection rule and a mechanism to retrieve the causal parameters from observational data, under certain assumptions on agents' behaviour. Secondly, when there are multiple decision makers, the interference between selection rules introduces another source of biases in estimating the underlying causal parameters. To address this problem, we provide a cooperative protocol which all decision makers must collectively adopt to recover the true causal parameters. Lastly, we complement our theoretical results with simulation studies. Our results highlight not only the importance of causal modeling as a strategy to mitigate the effect of gaming, as suggested by previous work, but also the need of a benevolent regulator to enable it. △ Less

Submitted 3 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Added more discussions on assumptions and the algorithm, and expand the Conclusion

arXiv:2307.16169 [pdf, other]

doi 10.24132/CSRN.3301.9

StarSRGAN: Improving Real-World Blind Super-Resolution

Authors: Khoa D. Vo, Len T. Bui

Abstract: The aim of blind super-resolution (SR) in computer vision is to improve the resolution of an image without prior knowledge of the degradation process that caused the image to be low-resolution. The State of the Art (SOTA) model Real-ESRGAN has advanced perceptual loss and produced visually compelling outcomes using more complex degradation models to simulate real-world degradations. However, there… ▽ More The aim of blind super-resolution (SR) in computer vision is to improve the resolution of an image without prior knowledge of the degradation process that caused the image to be low-resolution. The State of the Art (SOTA) model Real-ESRGAN has advanced perceptual loss and produced visually compelling outcomes using more complex degradation models to simulate real-world degradations. However, there is still room to improve the super-resolved quality of Real-ESRGAN by implementing recent techniques. This research paper introduces StarSRGAN, a novel GAN model designed for blind super-resolution tasks that utilize 5 various architectures. Our model provides new SOTA performance with roughly 10% better on the MANIQA and AHIQ measures, as demonstrated by experimental comparisons with Real-ESRGAN. In addition, as a compact version, StarSRGAN Lite provides approximately 7.5 times faster reconstruction speed (real-time upsampling from 540p to 4K) but can still keep nearly 90% of image quality, thereby facilitating the development of a real-time SR experience for future research. Our codes are released at https://github.com/kynthesis/StarSRGAN. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: 11 pages, 7 figures, 2 tables, accepted for oral presentation at WSCG 2023

arXiv:2305.06044 [pdf, other]

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

Authors: Nhat-Hao Pham, Khanh-Linh Vo, Mai Anh Vu, Thu Nguyen, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen

Abstract: Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can pose a significant challenge in estimating correlation coefficients. In this paper, we compare the effects of various missing data methods on the correlation plot, focusing on two common missing patterns: random and monotone. We aim to provide practical strategies… ▽ More Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can pose a significant challenge in estimating correlation coefficients. In this paper, we compare the effects of various missing data methods on the correlation plot, focusing on two common missing patterns: random and monotone. We aim to provide practical strategies and recommendations for researchers and practitioners in creating and analyzing the correlation plot. Our experimental results suggest that while imputation is commonly used for missing data, using imputed data for plotting the correlation matrix may lead to a significantly misleading inference of the relation between the features. We recommend using DPER, a direct parameter estimation approach, for plotting the correlation matrix based on its performance in the experiments. △ Less

Submitted 5 September, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2212.06206 [pdf, other]

Contextual Explainable Video Representation: Human Perception-based Understanding

Authors: Khoa Vo, Kashu Yamazaki, Phong X. Nguyen, Phat Nguyen, Khoa Luu, Ngan Le

Abstract: Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given… ▽ More Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation. △ Less

Submitted 17 December, 2022; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: Accepted in Asilomar Conference 2022

arXiv:2212.05136 [pdf, other]

CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection

Authors: Hyekang Kevin Joo, Khoa Vo, Kashu Yamazaki, Ngan Le

Abstract: Video anomaly detection (VAD) -- commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature -- is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C… ▽ More Video anomaly detection (VAD) -- commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature -- is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. We then model temporal dependencies and nominate the snippets of interest by leveraging our proposed Temporal Self-Attention (TSA). The ablation study confirms the effectiveness of TSA and ViT feature. The extensive experiments show that our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on three commonly-used benchmark datasets in the VAD problem (UCF-Crime, ShanghaiTech Campus, and XD-Violence). Our source code is available at https://github.com/joos2010kj/CLIP-TSA. △ Less

Submitted 3 July, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: Published at the 30th IEEE International Conference on Image Processing (IEEE ICIP 2023)

arXiv:2211.15103 [pdf, other]

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

Authors: Kashu Yamazaki, Khoa Vo, Sang Truong, Bhiksha Raj, Ngan Le

Abstract: Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling. Following the human perception process, where the scene is effectively understood by decomposing it into visual (e.g. human, animal) and non-visual components (e.g. action, relations) under the mutual influence of vision and language, we fir… ▽ More Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling. Following the human perception process, where the scene is effectively understood by decomposing it into visual (e.g. human, animal) and non-visual components (e.g. action, relations) under the mutual influence of vision and language, we first propose a visual-linguistic (VL) feature. In the proposed VL feature, the scene is modeled by three modalities including (i) a global visual environment; (ii) local visual main agents; (iii) linguistic scene elements. We then introduce an autoregressive Transformer-in-Transformer (TinT) to simultaneously capture the semantic coherence of intra- and inter-event contents within a video. Finally, we present a new VL contrastive loss function to guarantee learnt embedding features are matched with the captions semantics. Comprehensive experiments and extensive ablation studies on ActivityNet Captions and YouCookII datasets show that the proposed Visual-Linguistic Transformer-in-Transform (VLTinT) outperforms prior state-of-the-art methods on accuracy and diversity. Source code is made publicly available at: https://github.com/UARK-AICV/VLTinT. △ Less

Submitted 15 February, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: Accepted to AAAI 2023 Oral

arXiv:2210.17058 [pdf, other]

Antiferromagnetism and chiral-d wave superconductivity in a honeycomb lattice close to Mott state

Authors: Chien-Peng Ho, Khee-Kyun Voo

Abstract: The antiferromagnetism (AFM) and chiral-d wave superconductivity (SC) in a honeycomb lattice close to an antiferromagnetic (AF) Mott state at half band filling are studied with a t-J model and slave boson mean field theory. The order parameters and single particle dispersion relations at different band filling fractions are investigated. It is found that the AFM enhances the SC, and leads to nodal… ▽ More The antiferromagnetism (AFM) and chiral-d wave superconductivity (SC) in a honeycomb lattice close to an antiferromagnetic (AF) Mott state at half band filling are studied with a t-J model and slave boson mean field theory. The order parameters and single particle dispersion relations at different band filling fractions are investigated. It is found that the AFM enhances the SC, and leads to nodal single particle dispersion relations at two band filling fractions. These unexpected nodal dispersion relations out of nodeless chiral-d wave superconducting order are discussed. A comparison between AF chiral-d wave states and AF extended-s wave states is also given to highlight the pertinent features in chiral-d wave superconducting states. This study may be related to the honeycomb lattice materials such as the In$_3$Cu$_2$VO$_9$ compound. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Journal ref: Physica B 648 (2023) 414408

arXiv:2210.06323 [pdf, other]

AISFormer: Amodal Instance Segmentation with Transformer

Authors: Minh Tran, Khoa Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd, Ngan Le

Abstract: Amodal Instance Segmentation (AIS) aims to segment the region of both visible and possible occluded parts of an object instance. While Mask R-CNN-based AIS approaches have shown promising results, they are unable to model high-level features coherence due to the limited receptive field. The most recent transformer-based models show impressive performance on vision tasks, even better than Convoluti… ▽ More Amodal Instance Segmentation (AIS) aims to segment the region of both visible and possible occluded parts of an object instance. While Mask R-CNN-based AIS approaches have shown promising results, they are unable to model high-level features coherence due to the limited receptive field. The most recent transformer-based models show impressive performance on vision tasks, even better than Convolution Neural Networks (CNN). In this work, we present AISFormer, an AIS framework, with a Transformer-based mask head. AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries. Specifically, AISFormer contains four modules: (i) feature encoding: extract ROI and learn both short-range and long-range visual features. (ii) mask transformer decoding: generate the occluder, visible, and amodal mask query embeddings by a transformer decoder (iii) invisible mask embedding: model the coherence between the amodal and visible masks, and (iv) mask predicting: estimate output masks including occluder, visible, amodal and invisible. We conduct extensive experiments and ablation studies on three challenging benchmarks i.e. KINS, D2SA, and COCOA-cls to evaluate the effectiveness of AISFormer. The code is available at: https://github.com/UARK-AICV/AISFormer △ Less

Submitted 17 March, 2024; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted to BMVC2022

arXiv:2210.02578 [pdf, other]

AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation

Authors: Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, Ngan Le

Abstract: Temporal action proposal generation (TAPG) is a challenging task, which requires localizing action intervals in an untrimmed video. Intuitively, we as humans, perceive an action through the interactions between actors, relevant objects, and the surrounding environment. Despite the significant progress of TAPG, a vast majority of existing methods ignore the aforementioned principle of the human per… ▽ More Temporal action proposal generation (TAPG) is a challenging task, which requires localizing action intervals in an untrimmed video. Intuitively, we as humans, perceive an action through the interactions between actors, relevant objects, and the surrounding environment. Despite the significant progress of TAPG, a vast majority of existing methods ignore the aforementioned principle of the human perceiving process by applying a backbone network into a given video as a black-box. In this paper, we propose to model these interactions with a multi-modal representation network, namely, Actors-Objects-Environment Interaction Network (AOE-Net). Our AOE-Net consists of two modules, i.e., perception-based multi-modal representation (PMR) and boundary-matching module (BMM). Additionally, we introduce adaptive attention mechanism (AAM) in PMR to focus only on main actors (or relevant objects) and model the relationships among them. PMR module represents each video snippet by a visual-linguistic feature, in which main actors and surrounding environment are represented by visual information, whereas relevant objects are depicted by linguistic features through an image-text model. BMM module processes the sequence of visual-linguistic features as its input and generates action proposals. Comprehensive experiments and extensive ablation studies on ActivityNet-1.3 and THUMOS-14 datasets show that our proposed AOE-Net outperforms previous state-of-the-art methods with remarkable performance and generalization for both TAPG and temporal action detection. To prove the robustness and effectiveness of AOE-Net, we further conduct an ablation study on egocentric videos, i.e. EPIC-KITCHENS 100 dataset. Source code is available upon acceptance. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: Accepted for publication in International Journal of Computer Vision

arXiv:2208.02845 [pdf, other]

Decision SincNet: Neurocognitive models of decision making that predict cognitive processes from neural signals

Authors: Qinhua Jenny Sun, Khuong Vo, Kitty Lui, Michael Nunez, Joachim Vandekerckhove, Ramesh Srinivasan

Abstract: Human decision making behavior is observed with choice-response time data during psychological experiments. Drift-diffusion models of this data consist of a Wiener first-passage time (WFPT) distribution and are described by cognitive parameters: drift rate, boundary separation, and starting point. These estimated parameters are of interest to neuroscientists as they can be mapped to features of co… ▽ More Human decision making behavior is observed with choice-response time data during psychological experiments. Drift-diffusion models of this data consist of a Wiener first-passage time (WFPT) distribution and are described by cognitive parameters: drift rate, boundary separation, and starting point. These estimated parameters are of interest to neuroscientists as they can be mapped to features of cognitive processes of decision making (such as speed, caution, and bias) and related to brain activity. The observed patterns of RT also reflect the variability of cognitive processes from trial to trial mediated by neural dynamics. We adapted a SincNet-based shallow neural network architecture to fit the Drift-Diffusion model using EEG signals on every experimental trial. The model consists of a SincNet layer, a depthwise spatial convolution layer, and two separate FC layers that predict drift rate and boundary for each trial in-parallel. The SincNet layer parametrized the kernels in order to directly learn the low and high cutoff frequencies of bandpass filters that are applied to the EEG data to predict drift and boundary parameters. During training, model parameters were updated by minimizing the negative log likelihood function of WFPT distribution given trial RT. We developed separate decision SincNet models for each participant performing a two-alternative forced-choice task. Our results showed that single-trial estimates of drift and boundary performed better at predicting RTs than the median estimates in both training and test data sets, suggesting that our model can successfully use EEG features to estimate meaningful single-trial Diffusion model parameters. Furthermore, the shallow SincNet architecture identified time windows of information processing related to evidence accumulation and caution and the EEG frequency bands that reflect these processes within each participant. △ Less

Submitted 16 August, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

Comments: This paper was accepted as an oral presentation at IEEE WCCI 2022 (IJCNN 2022), under the session Neurodynamics and computational Neuroscience. This paper is published in International Joint Conference on Neural Networks (IJCNN) Proceedings 2022

arXiv:2206.12972 [pdf, other]

VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning

Authors: Kashu Yamazaki, Sang Truong, Khoa Vo, Michael Kidd, Chase Rainwater, Khoa Luu, Ngan Le

Abstract: In this paper, we leverage the human perceiving process, that involves vision and language interaction, to generate a coherent paragraph description of untrimmed videos. We propose vision-language (VL) features consisting of two modalities, i.e., (i) vision modality to capture global visual content of the entire scene and (ii) language modality to extract scene elements description of both human a… ▽ More In this paper, we leverage the human perceiving process, that involves vision and language interaction, to generate a coherent paragraph description of untrimmed videos. We propose vision-language (VL) features consisting of two modalities, i.e., (i) vision modality to capture global visual content of the entire scene and (ii) language modality to extract scene elements description of both human and non-human objects (e.g. animals, vehicles, etc), visual and non-visual elements (e.g. relations, activities, etc). Furthermore, we propose to train our proposed VLCap under a contrastive learning VL loss. The experiments and ablation studies on ActivityNet Captions and YouCookII datasets show that our VLCap outperforms existing SOTA methods on both accuracy and diversity metrics. △ Less

Submitted 6 August, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

Comments: accepted by The 29th IEEE International Conference on Image Processing (IEEE ICIP) 2022

arXiv:2205.06218 [pdf, other]

Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets

Authors: Kenny T. R. Voo, Liming Jiang, Chen Change Loy

Abstract: This paper performs comprehensive analysis on datasets for occlusion-aware face segmentation, a task that is crucial for many downstream applications. The collection and annotation of such datasets are time-consuming and labor-intensive. Although some efforts have been made in synthetic data generation, the naturalistic aspect of data remains less explored. In our study, we propose two occlusion g… ▽ More This paper performs comprehensive analysis on datasets for occlusion-aware face segmentation, a task that is crucial for many downstream applications. The collection and annotation of such datasets are time-consuming and labor-intensive. Although some efforts have been made in synthetic data generation, the naturalistic aspect of data remains less explored. In our study, we propose two occlusion generation techniques, Naturalistic Occlusion Generation (NatOcc), for producing high-quality naturalistic synthetic occluded faces; and Random Occlusion Generation (RandOcc), a more general synthetic occluded data generation method. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. To facilitate model evaluation, we present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild, featuring both careful alignment preprocessing and an in-the-wild setting for robustness test. We further conduct a comprehensive analysis on a newly introduced segmentation benchmark, offering insights for future exploration. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: CVPR 2022 Workshop on Vision Datasets Understanding. Code and Datasets: https://github.com/kennyvoo/face-occlusion-generation

arXiv:2203.08942 [pdf, other]

doi 10.1109/ACCESS.2021.3110973

ABN: Agent-Aware Boundary Networks for Temporal Action Proposal Generation

Authors: Khoa Vo, Kashu Yamazaki, Sang Truong, Minh-Triet Tran, Akihiro Sugimoto, Ngan Le

Abstract: Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding. Despite the great achievement in TAPG, most existing works ignore the human perception of interaction between agents and the surrounding environment by applying a deep learning model as a… ▽ More Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding. Despite the great achievement in TAPG, most existing works ignore the human perception of interaction between agents and the surrounding environment by applying a deep learning model as a black-box to the untrimmed videos to extract video visual representation. Therefore, it is beneficial and potentially improve the performance of TAPG if we can capture these interactions between agents and the environment. In this paper, we propose a novel framework named Agent-Aware Boundary Network (ABN), which consists of two sub-networks (i) an Agent-Aware Representation Network to obtain both agent-agent and agents-environment relationships in the video representation, and (ii) a Boundary Generation Network to estimate the confidence score of temporal intervals. In the Agent-Aware Representation Network, the interactions between agents are expressed through local pathway, which operates at a local level to focus on the motions of agents whereas the overall perception of the surroundings are expressed through global pathway, which operates at a global level to perceive the effects of agents-environment. Comprehensive evaluations on 20-action THUMOS-14 and 200-action ActivityNet-1.3 datasets with different backbone networks (i.e C3D, SlowFast and Two-Stream) show that our proposed ABN robustly outperforms state-of-the-art methods regardless of the employed backbone network on TAPG. We further examine the proposal quality by leveraging proposals generated by our method onto temporal action detection (TAD) frameworks and evaluate their detection performances. The source code can be found in this URL https://github.com/vhvkhoa/TAPG-AgentEnvNetwork.git. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Accepted in the journal of IEEE Access Vol. 9

arXiv:2110.11474 [pdf, other]

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

Authors: Khoa Vo, Hyekang Joo, Kashu Yamazaki, Sang Truong, Kris Kitani, Minh-Triet Tran, Ngan Le

Abstract: Humans typically perceive the establishment of an action in a video through the interaction between an actor and the surrounding environment. An action only starts when the main actor in the video begins to interact with the environment, while it ends when the main actor stops the interaction. Despite the great progress in temporal action proposal generation, most existing works ignore the aforeme… ▽ More Humans typically perceive the establishment of an action in a video through the interaction between an actor and the surrounding environment. An action only starts when the main actor in the video begins to interact with the environment, while it ends when the main actor stops the interaction. Despite the great progress in temporal action proposal generation, most existing works ignore the aforementioned fact and leave their model learning to propose actions as a black-box. In this paper, we make an attempt to simulate that ability of a human by proposing Actor Environment Interaction (AEI) network to improve the video representation for temporal action proposals generation. AEI contains two modules, i.e., perception-based visual representation (PVR) and boundary-matching module (BMM). PVR represents each video snippet by taking human-human relations and humans-environment relations into consideration using the proposed adaptive attention mechanism. Then, the video representation is taken by BMM to generate action proposals. AEI is comprehensively evaluated in ActivityNet-1.3 and THUMOS-14 datasets, on temporal action proposal and detection tasks, with two boundary-matching architectures (i.e., CNN-based and GCN-based) and two classifiers (i.e., Unet and P-GCN). Our AEI robustly outperforms the state-of-the-art methods with remarkable performance and generalization for both temporal action proposal generation and temporal action detection. △ Less

Submitted 24 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: Accepted in BMVC 2021 (Oral Session)

arXiv:2106.05914 [pdf, ps, other]

Matrix power means and new characterizations of operator monotone functions

Authors: Trung Hoa Dinh, Cong Trinh Le, The Van Nguyen, Bich Khue Vo

Abstract: For positive definite matrices $A$ and $B$, the Kubo-Ando matrix power mean is defined as $$ P_μ(p, A, B) = A^{1/2}\left(\frac{1+(A^{-1/2}BA^{-1/2})^p}{2}\right )^{1/p} A^{1/2}\quad (p \ge 0). $$ In this paper, for $0\le p \le 1 \le q$, we show that if one of the following inequalities \begin{align*} f(P_μ(p, A, B)) \le f(P_μ(1, A, B)) \le f(P_μ(q, A, B))\nonumber \end{align*} holds for any posi… ▽ More For positive definite matrices $A$ and $B$, the Kubo-Ando matrix power mean is defined as $$ P_μ(p, A, B) = A^{1/2}\left(\frac{1+(A^{-1/2}BA^{-1/2})^p}{2}\right )^{1/p} A^{1/2}\quad (p \ge 0). $$ In this paper, for $0\le p \le 1 \le q$, we show that if one of the following inequalities \begin{align*} f(P_μ(p, A, B)) \le f(P_μ(1, A, B)) \le f(P_μ(q, A, B))\nonumber \end{align*} holds for any positive definite matrices $A$ and $B$, then the function $f$ is operator monotone on $(0, \infty).$ We also study the inverse problem for non-Kubo-Ando matrix power means with the powers $1/2$ and $2$. As a consequence, we establish new charaterizations of operator monotone functions with the non-Kubo-Ando matrix power means. △ Less

Submitted 13 July, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: 13 pages, some typos are fixed

MSC Class: 47A63; 47A64; 47A56; 46E05; 15B48

arXiv:2103.05073 [pdf, other]

Offboard 3D Object Detection from Point Cloud Sequences

Authors: Charles R. Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa Vo, Boyang Deng, Dragomir Anguelov

Abstract: While current 3D object recognition research mostly focuses on the real-time, onboard scenario, there are many offboard use cases of perception that are largely under-explored, such as using machines to automatically generate high-quality 3D labels. Existing 3D object detectors fail to satisfy the high-quality requirement for offboard uses due to the limited input and speed constraints. In this pa… ▽ More While current 3D object recognition research mostly focuses on the real-time, onboard scenario, there are many offboard use cases of perception that are largely under-explored, such as using machines to automatically generate high-quality 3D labels. Existing 3D object detectors fail to satisfy the high-quality requirement for offboard uses due to the limited input and speed constraints. In this paper, we propose a novel offboard 3D object detection pipeline using point cloud sequence data. Observing that different frames capture complementary views of objects, we design the offboard detector to make use of the temporal points through both multi-frame object detection and novel object-centric refinement models. Evaluated on the Waymo Open Dataset, our pipeline named 3D Auto Labeling shows significant gains compared to the state-of-the-art onboard detectors and our offboard baselines. Its performance is even on par with human labels verified through a human label study. Further experiments demonstrate the application of auto labels for semi-supervised learning and provide extensive analysis to validate various design choices. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 18 pages, 7 figures, 19 tables

arXiv:2102.12173 [pdf]

Deep learning-based framework for cardiac function assessment in embryonic zebrafish from heart beating videos

Authors: Amir Mohammad Naderi, Haisong Bu, Jingcheng Su, Mao-Hsiang Huang, Khuong Vo, Ramses Seferino Trigo Torres, J. -C. Chiao, Juhyun Lee, Michael P. H. Lau, Xiaolei Xu, Hung Cao

Abstract: Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validate… ▽ More Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validated a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) based on a U-net deep learning model for automated assessment of cardiovascular indices, such as ejection fraction (EF) and fractional shortening (FS) from microscopic videos of wildtype and cardiomyopathy mutant zebrafish embryos. Our approach yielded favorable performance with accuracy above 90% compared with manual processing. We used only black and white regular microscopic recordings with frame rates of 5-20 frames per second (fps); thus, the framework could be widely applicable with any laboratory resources and infrastructure. Most importantly, the automatic feature holds promise to enable efficient, consistent and reliable processing and analysis capacity for large amounts of videos, which can be generated by diverse collaborating teams. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:1904.01961 [pdf, ps, other]

Two trace inequalities for operator functions

Authors: Trung Hoa Dinh, Minh Toan Ho, Cong Trinh Le, Bich Khue Vo

Abstract: In this paper we show that for a non-negative operator monotone function $f$ on $[0, \infty)$ such that $f(0)= 0$ and for any positive semidefinite matrices $A$ and $B$, $$ Tr((A-B)(f(A)-f(B))) \le Tr(|A-B|f(|A-B|)). $$ When the function $f$ is operator convex on $[0, \infty)$, the inequality is reversed. In this paper we show that for a non-negative operator monotone function $f$ on $[0, \infty)$ such that $f(0)= 0$ and for any positive semidefinite matrices $A$ and $B$, $$ Tr((A-B)(f(A)-f(B))) \le Tr(|A-B|f(|A-B|)). $$ When the function $f$ is operator convex on $[0, \infty)$, the inequality is reversed. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: 7 pages, final version, to be published in MIA (2019)

MSC Class: 46L51; 47A30

arXiv:1902.06050 [pdf, other]

doi 10.1504/IJCVR.2019.102286

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media

Authors: Khuong Vo, Tri Nguyen, Dang Pham, Mao Nguyen, Minh Truong, Trung Mai, Tho Quan

Abstract: Sentiment analysis has been emerging recently as one of the major natural language processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become significant sources for brands to observe user opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we noti… ▽ More Sentiment analysis has been emerging recently as one of the major natural language processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become significant sources for brands to observe user opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suffer from many difficulties to handle, especially ones using deep learning approaches. In this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Networks with domain knowledge. The combination is used for acquiring additional training data augmentation and a more reasonable loss function. In this work, we further improve our architecture by various substantial enhancements, including negation-based data augmentation, transfer learning for word embeddings, the combination of word-level embeddings and character-level embeddings, and using multitask learning technique for attaching domain knowledge rules in the learning process. Those enhancements, specifically aiming to handle short and informal messages, help us to enjoy significant improvement in performance once experimenting on real datasets. △ Less

Submitted 20 December, 2019; v1 submitted 16 February, 2019; originally announced February 2019.

Comments: A Preprint of an article accepted for publication by Inderscience in IJCVR on September 2018

Journal ref: International Journal of Computational Vision and Robotics, 2019 Vol.9 No.5, pp.458 - 485

arXiv:1806.09793 [pdf]

doi 10.1007/978-3-319-64471-4_25

A NoSQL Data-based Personalized Recommendation System for C2C e-Commerce

Authors: Khanh Dang, Khuong Vo, Josef Küng

Abstract: With the considerable development of customer-to-customer (C2C) e-commerce in the recent years, there is a big demand for an effective recommendation system that suggests suitable websites for users to sell their items with some specified needs. Nonetheless, e-commerce recommendation systems are mostly designed for business-to-customer (B2C) websites, where the systems offer the consumers the prod… ▽ More With the considerable development of customer-to-customer (C2C) e-commerce in the recent years, there is a big demand for an effective recommendation system that suggests suitable websites for users to sell their items with some specified needs. Nonetheless, e-commerce recommendation systems are mostly designed for business-to-customer (B2C) websites, where the systems offer the consumers the products that they might like to buy. Almost none of the related research works focus on choosing selling sites for target items. In this paper, we introduce an approach that recommends the selling websites based upon the item's description, category, and desired selling price. This approach employs NoSQL data-based machine learning techniques for building and training topic models and classification models. The trained models can then be used to rank the websites dynamically with respect to the user needs. The experimental results with real-world datasets from Vietnam C2C websites will demonstrate the effectiveness of our proposed method. △ Less

Submitted 26 June, 2018; originally announced June 2018.

Comments: Accepted to DEXA 2017

arXiv:1806.08760 [pdf]

doi 10.1007/978-3-319-69456-6_14

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis

Authors: Khuong Vo, Dang Pham, Mao Nguyen, Trung Mai, Tho Quan

Abstract: The emerging technique of deep learning has been widely applied in many different areas. However, when adopted in a certain specific domain, this technique should be combined with domain knowledge to improve efficiency and accuracy. In particular, when analyzing the applications of deep learning in sentiment analysis, we found that the current approaches are suffering from the following drawbacks:… ▽ More The emerging technique of deep learning has been widely applied in many different areas. However, when adopted in a certain specific domain, this technique should be combined with domain knowledge to improve efficiency and accuracy. In particular, when analyzing the applications of deep learning in sentiment analysis, we found that the current approaches are suffering from the following drawbacks: (i) the existing works have not paid much attention to the importance of different types of sentiment terms, which is an important concept in this area; and (ii) the loss function currently employed does not well reflect the degree of error of sentiment misclassification. To overcome such problem, we propose to combine domain knowledge with deep learning. Our proposal includes using sentiment scores, learnt by quadratic programming, to augment training data; and introducing the penalty matrix for enhancing the loss function of cross entropy. When experimented, we achieved a significant improvement in classification results. △ Less

Submitted 15 February, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

Comments: Accepted to MIWAI 2017

arXiv:1712.07731 [pdf, ps, other]

doi 10.1080/03081087.2017.1307914

Some inequalities for operator (p,h)-convex functions

Authors: Trung Hoa Dinh, Khue TB Vo

Abstract: Let $p$ be a positive number and $h$ a function on $\mathbb{R}^+$ satisfying $h(xy) \ge h(x) h(y)$ for any $x, y \in \mathbb{R}^+$. A non-negative continuous function $f$ on $K (\subset \mathbb{R}^+)$ is said to be {\it operator $(p,h)$-convex} if \begin{equation*}\label{def} f ([αA^p + (1-α)B^p]^{1/p}) \leq h(α)f(A) +h(1-α)f(B) \end{equation*} holds for all positive semidefinite matrices $A, B$ o… ▽ More Let $p$ be a positive number and $h$ a function on $\mathbb{R}^+$ satisfying $h(xy) \ge h(x) h(y)$ for any $x, y \in \mathbb{R}^+$. A non-negative continuous function $f$ on $K (\subset \mathbb{R}^+)$ is said to be {\it operator $(p,h)$-convex} if \begin{equation*}\label{def} f ([αA^p + (1-α)B^p]^{1/p}) \leq h(α)f(A) +h(1-α)f(B) \end{equation*} holds for all positive semidefinite matrices $A, B$ of order $n$ with spectra in $K$, and for any $α\in (0,1)$. In this paper, we study properties of operator $(p,h)$-convex functions and prove the Jensen, Hansen-Pedersen type inequalities for them. We also give some equivalent conditions for a function to become an operator $(p,h)$-convex. In applications, we obtain Choi-Davis-Jensen type inequality for operator $(p,h)$-convex functions and a relation between operator $(p,h)$-convex functions with operator monotone functions. △ Less

Submitted 20 December, 2017; originally announced December 2017.

Journal ref: Linear and Multilinear Algebra, 2017

arXiv:1710.09332 [pdf, ps, other]

doi 10.1016/j.aml.2017.02.009

Regularity bounds for a Gevrey criterion in a kernel-based regularization of the Cauchy problem of elliptic equations

Authors: Khoa Anh Vo, The Hung Tran

Abstract: This Note derives regularity bounds for a Gevrey criterion when the Cauchy problem of elliptic equations is solved by regularization. When utilizing the regularization, one knows that checking such criterion is basically problematic, albeit its importance to engineering circumstances. Therefore, coping with that impediment helps us improve the use of some regularization methods in real-world appli… ▽ More This Note derives regularity bounds for a Gevrey criterion when the Cauchy problem of elliptic equations is solved by regularization. When utilizing the regularization, one knows that checking such criterion is basically problematic, albeit its importance to engineering circumstances. Therefore, coping with that impediment helps us improve the use of some regularization methods in real-world applications. This work also consider the presence of the power-law nonlinearities. △ Less

Submitted 25 October, 2017; originally announced October 2017.

Journal ref: Applied Mathematics Letters, 2017

arXiv:1701.03942 [pdf, other]

doi 10.1145/2908131.2908165

Can We Find Documents in Web Archives without Knowing their Contents?

Authors: Khoi Duy Vo, Tuan Tran, Tu Ngoc Nguyen, Xiaofei Zhu, Wolfgang Nejdl

Abstract: Recent advances of preservation technologies have led to an increasing number of Web archive systems and collections. These collections are valuable to explore the past of the Web, but their value can only be uncovered with effective access and exploration mechanisms. Ideal search and rank- ing methods must be robust to the high redundancy and the temporal noise of contents, as well as scalable to… ▽ More Recent advances of preservation technologies have led to an increasing number of Web archive systems and collections. These collections are valuable to explore the past of the Web, but their value can only be uncovered with effective access and exploration mechanisms. Ideal search and rank- ing methods must be robust to the high redundancy and the temporal noise of contents, as well as scalable to the huge amount of data archived. Despite several attempts in Web archive search, facilitating access to Web archive still remains a challenging problem. In this work, we conduct a first analysis on different ranking strategies that exploit evidences from metadata instead of the full content of documents. We perform a first study to compare the usefulness of non-content evidences to Web archive search, where the evidences are mined from the metadata of file headers, links and URL strings only. Based on these findings, we propose a simple yet surprisingly effective learning model that combines multiple evidences to distinguish "good" from "bad" search results. We conduct empirical experiments quantitatively as well as qualitatively to confirm the validity of our proposed method, as a first step towards better ranking in Web archives taking meta- data into account. △ Less

Submitted 14 January, 2017; originally announced January 2017.

Comments: Published via ACM to Websci 2015

ACM Class: H.3.1

arXiv:1601.01851 [pdf, other]

doi 10.1016/j.aml.2016.02.009

A Note on Iterations-based Derivations of High-order Homogenization Correctors for Multiscale Semi-linear Elliptic Equations

Authors: Khoa Vo, Adrian Muntean

Abstract: This Note aims at presenting a simple and efficient procedure to derive the structure of high-order corrector estimates for the homogenization limit applied to a semi-linear elliptic equation posed in perforated domains. Our working technique relies on monotone iterations combined with formal two-scale homogenization asymptotics. It can be adapted to handle more complex scenarios including for ins… ▽ More This Note aims at presenting a simple and efficient procedure to derive the structure of high-order corrector estimates for the homogenization limit applied to a semi-linear elliptic equation posed in perforated domains. Our working technique relies on monotone iterations combined with formal two-scale homogenization asymptotics. It can be adapted to handle more complex scenarios including for instance nonlinearities posed at the boundary of perforations and the vectorial case, when the model equations are coupled only through the nonlinear production terms. △ Less

Submitted 8 January, 2016; originally announced January 2016.

Comments: 1 figure

MSC Class: 35B27; 35C20; 76M30; 35B09

arXiv:0809.2122 [pdf, ps, other]

doi 10.1016/j.physe.2008.09.018

Dimensionality reduction in translational noninvariant wave guides

Authors: Khee-Kyun Voo

Abstract: A scheme to reduce translational noninvariant quasi-one-dimensional wave guides into singly or multiply connected one-dimensional (1D) lines is proposed. It is meant to simplify the analysis of wave guides, with the low-energy properties of the guides preserved. Guides comprising uniform-cross-sectional sections and discontinuities such as bends and branching junctions are considered. The unifor… ▽ More A scheme to reduce translational noninvariant quasi-one-dimensional wave guides into singly or multiply connected one-dimensional (1D) lines is proposed. It is meant to simplify the analysis of wave guides, with the low-energy properties of the guides preserved. Guides comprising uniform-cross-sectional sections and discontinuities such as bends and branching junctions are considered. The uniform sections are treated as 1D lines, and the discontinuities are described by equations sets connecting the wave functions on the lines. The procedures to derive the equations and to solve reduced systems are illustrated by examples, and the scheme is found to apply when the discontinuities are distant and the energy is low. When the scheme applies, it may substantially simplify the analysis of a wave guide, and hence the scheme may find uses in the study of related problems, such as quantum wire networks. △ Less

Submitted 11 September, 2008; originally announced September 2008.

Comments: 17 pages, 5 figures

arXiv:cond-mat/0609214 [pdf, ps, other]

doi 10.1103/PhysRevB.74.155306

Localized states in the continuum in low-dimensional systems

Authors: Khee-Kyun Voo, C. S. Chu

Abstract: It is shown in this paper that for open systems, states which are localized in space, discrete in energy, and embedded in the continuum of extended states, can be sustained by low-dimensional and channeled leads. These states have an origin different from that of analogous states discussed by J. von Neumann and E. Wigner [Phys. Z., vol. 30, 465 (1929)]. A few representative systems are discussed… ▽ More It is shown in this paper that for open systems, states which are localized in space, discrete in energy, and embedded in the continuum of extended states, can be sustained by low-dimensional and channeled leads. These states have an origin different from that of analogous states discussed by J. von Neumann and E. Wigner [Phys. Z., vol. 30, 465 (1929)]. A few representative systems are discussed. These states cause, for example, infinitely sharp Fano resonance in transport when they are marginally destroyed. △ Less

Submitted 9 September, 2006; originally announced September 2006.

Comments: 7 pages, 4 figures. To appear in Phys. Rev. B

arXiv:cond-mat/0603192 [pdf, ps, other]

doi 10.1016/j.physc.2004.10.011

An Alternative Interpretation of the Magnetic Penetration Depth Data on Pr(2-x)Ce(x)CuO(4-y) and La(2-x)Ce(x)CuO(4-y)

Authors: Khee-Kyun Voo, Wen Chin Wu

Abstract: We have revisited the magnetic penetration depth data on the electron-doped cuprates Pr(2-x)Ce(x)CuO(4-y) and La(2-x)Ce(x)CuO(4-y). It is proposed that the transition between the nodal-gap-like and nodeless-gap-like behaviors upon electron-doping [see, e.g., M. Kim et al., Phys. Rev. Lett. 91, 87001 (2003)] can be due to a scattering of the quasiparticles in the d-wave superconducting state by a… ▽ More We have revisited the magnetic penetration depth data on the electron-doped cuprates Pr(2-x)Ce(x)CuO(4-y) and La(2-x)Ce(x)CuO(4-y). It is proposed that the transition between the nodal-gap-like and nodeless-gap-like behaviors upon electron-doping [see, e.g., M. Kim et al., Phys. Rev. Lett. 91, 87001 (2003)] can be due to a scattering of the quasiparticles in the d-wave superconducting state by an incipient or weak antiferromagnetic spin-density-wave. This conjecture is supported by the inelastic neutron scattering and angle-resolved photoemission experiments on some closely related electron-doped cuprates. △ Less

Submitted 19 March, 2006; v1 submitted 8 March, 2006; originally announced March 2006.

Comments: 6 pages, 5 figures

Journal ref: Physica C 417, 103-109 (2005)

arXiv:cond-mat/0602575 [pdf, ps, other]

doi 10.1103/PhysRevB.73.035307

Connecting wave functions at a three-leg junction of one-dimensional channels

Authors: Khee-Kyun Voo, Shu-Chuan Chen, Chi-Shung Tang, Chon-Saar Chu

Abstract: We propose a scheme to connect the wave functions on different one-dimensional branches of a three-leg junction (Y-junction). Our scheme differs from that due to Griffith [Trans. Faraday Soc. 49, 345 (1953)] in the respect that ours can model the difference in the widths of the quasi-one-dimensional channels in different systems. We test our scheme by comparing results from a doubly-connected on… ▽ More We propose a scheme to connect the wave functions on different one-dimensional branches of a three-leg junction (Y-junction). Our scheme differs from that due to Griffith [Trans. Faraday Soc. 49, 345 (1953)] in the respect that ours can model the difference in the widths of the quasi-one-dimensional channels in different systems. We test our scheme by comparing results from a doubly-connected one-dimensional system and a related quasi-one-dimensional system, and we find a good agreement. Therefore our scheme may be useful in the construction of one-dimensional effective theories out of (multiply-connected) quasi-one-dimensional systems. △ Less

Submitted 23 February, 2006; originally announced February 2006.

Comments: 6 pages, 4 figures

Journal ref: Phys. Rev. B 73, 035307 (2006)

arXiv:cond-mat/0309298 [pdf, ps, other]

doi 10.1016/j.physb.2003.09.262

Phases and Density of States in a Generalized Su-Schrieffer-Heeger Model

Authors: Khee-Kyun Voo, Chung-Yu Mou

Abstract: Self-consistent solutions to a generalized Su-Schrieffer-Heeger model on a 2-dimensional square lattice are investigated. Away from half-filling, spatially inhomogeneous phases are found. Those phases may have topological structures on the flux order, large unit cell bond order, localized bipolarons, or they are simply short-range ordered and glassy. They have an universal feature of always poss… ▽ More Self-consistent solutions to a generalized Su-Schrieffer-Heeger model on a 2-dimensional square lattice are investigated. Away from half-filling, spatially inhomogeneous phases are found. Those phases may have topological structures on the flux order, large unit cell bond order, localized bipolarons, or they are simply short-range ordered and glassy. They have an universal feature of always possessing a gap at the Fermi level. △ Less

Submitted 11 September, 2003; originally announced September 2003.

Comments: 11 pages, 5 figures

Journal ref: Physica B 344, 224-230 (2004).

arXiv:cond-mat/0308149 [pdf, ps, other]

doi 10.1016/j.physc.2004.09.007

Temperature Effect and Fermi Surface Investigation in the Scanning Tunneling Microscopy of Bi$_2$Sr$_2$CaCu$_2$O$_8$

Authors: K. -K. Voo, W. C. Wu, H. -Y. Chen, C. -Y. Mou

Abstract: Based on a Fermi liquid picture, the temperature effect on the impurity-induced spatial modulation of local density of states (LDOS) is investigated for the d-wave superconductor Bi$_2$Sr$_2$CaCu$_2$O$_8$, in the context of scanning tunneling microscopy (STM). It is found that stripe-like structure exists even in the normal state due to a local-nesting mechanism, which is different from the octe… ▽ More Based on a Fermi liquid picture, the temperature effect on the impurity-induced spatial modulation of local density of states (LDOS) is investigated for the d-wave superconductor Bi$_2$Sr$_2$CaCu$_2$O$_8$, in the context of scanning tunneling microscopy (STM). It is found that stripe-like structure exists even in the normal state due to a local-nesting mechanism, which is different from the octet scattering mechanism proposed by McElroy $et al$. {[Nature {\bf 422}, 592 (2003)]} in the d-wave superconducting ($d$SC) state. The normal-state spectra, when Fourier-transformed into the reciprocal space, can reveal the information of the entire Fermi surface at a single measuring bias, in contrast to the point-wise tracing proposed by McElroy $et al$. This may serve as another way to check the reality of Landau quasiparticles in the normal state. We have also re-visited the spectra in the $d$SC state and pointed out that, due to the Umklapp symmetry of the lattice, there should exist additional peaks in the reciprocal space, but experimentally yet to be found. △ Less

Submitted 7 August, 2003; originally announced August 2003.

Comments: 5 pages, 5 figures

arXiv:cond-mat/0304675 [pdf, ps, other]

Comment on ``Relating atomic-scale electronic phenomena to wave-like quasiparticle states in superconducting Bi$_2$Sr$_2$CaCu$_2$O$_{8+δ}$''

Authors: Khee-Kyun Voo, Hong-Yi Chen, W. C. Wu

Abstract: Comment on ``Relating atomic-scale electronic phenomena to wave-like quasiparticle states in superconducting Bi$_2$Sr$_2$CaCu$_2$O$_{8+δ}$'' Comment on ``Relating atomic-scale electronic phenomena to wave-like quasiparticle states in superconducting Bi$_2$Sr$_2$CaCu$_2$O$_{8+δ}$'' △ Less

Submitted 29 April, 2003; originally announced April 2003.

Comments: 1 pages, 1 figures

arXiv:cond-mat/0302473 [pdf, ps, other]

doi 10.1103/PhysRevB.68.012505

Defect and anisotropic gap induced quasi-one-dimensional modulation of local density of states in YBa$_2$Cu$_3$O$_{7-δ}$

Authors: Khee-Kyun Voo, Hong-Yi Chen, W. C. Wu

Abstract: Motivated by recent angle-resolved photoemission spectroscopy (ARPES) measurement that superconducting YBa$_2$Cu$_3$O$_{7-δ}$ (YBCO) exhibits a $d_{x^2-y^2} + s$-symmetry gap, we show possible quasi-one-dimensional modulations of local density of states in YBCO. These aniostropic gap and defect induced stripe structures are most conspicuous at higher biases and arise due to the nesting effect as… ▽ More Motivated by recent angle-resolved photoemission spectroscopy (ARPES) measurement that superconducting YBa$_2$Cu$_3$O$_{7-δ}$ (YBCO) exhibits a $d_{x^2-y^2} + s$-symmetry gap, we show possible quasi-one-dimensional modulations of local density of states in YBCO. These aniostropic gap and defect induced stripe structures are most conspicuous at higher biases and arise due to the nesting effect associated with a Fermi liquid. Observation of these spectra by scanning tunneling microscopy (STM) would unify the picture among STM, ARPES, and inelastic neutron scattering for YBCO. △ Less

Submitted 24 February, 2003; originally announced February 2003.

Comments: 4 pages, 4 figures

arXiv:cond-mat/0006312 [pdf, ps, other]

Spin Excitation in d-wave Superconductors : A Fermi Liquid Picture

Authors: K. -K. Voo, H. -Y. Chen, W. C. Wu

Abstract: A detailed study of the Inelastic Neutron Scattering (INS) spectra of the high-$T_c$ cuprates based on the Fermi liquid (FL) picture is given. We focus on the issue of the transformation between the commensurate and incommensurate (IC) excitation driven by frequency or $temperature$. For La$_{2-x}$Sr$_x$CuO$_4$ (LSCO), the condition of small $Δ(0)/v_F a$ (where $a$ is the lattice constant, and h… ▽ More A detailed study of the Inelastic Neutron Scattering (INS) spectra of the high-$T_c$ cuprates based on the Fermi liquid (FL) picture is given. We focus on the issue of the transformation between the commensurate and incommensurate (IC) excitation driven by frequency or $temperature$. For La$_{2-x}$Sr$_x$CuO$_4$ (LSCO), the condition of small $Δ(0)/v_F a$ (where $a$ is the lattice constant, and henceforth will be set to 1) can simultaneously reproduces the always existing IC peaks in the superconducting (SC) and normal state, and the always fixed location at temperature or frequency change. For YBa$_2$Cu$_3$O$_{6+x}$ (YBCO), a moderate $Δ(0)/v_F a$ and proximity of the van Hove singularity (vHS) at ${\bar M}=(0,π)$ to the Fermi level can reproduce the frequency- and temperature-driven shifting IC peaks in the SC state, and the vanishing of the IC peak in the normal state. The commensurate peak is found to be more appropriately described as a random phase approximation (RPA) effect. We address the conditional peak shifting behavior to a refined consideration on the nesting effect which is previously overlook. As a result, both the data on LSCO and the recent data on YBCO (on YBa$_2$Cu$_3$O$_{6.7}$ by Arai $et$ $al.$ and YBa$_2$Cu$_3$O$_{6.85}$ by Bourges $et$ $al.$) can be reasonably reconciled within a FL picture. We also point out that the one-dimensional-like data by Mook $et$ $al.$ on a detwinned and more underdoped sample YBa$_2$Cu$_3$O$_{6.6}$ could be due to a gap anisotropy effect discussed by Rendell and Carbotte, and we proceed to suggest a way of clarifying it. △ Less

Submitted 12 January, 2001; v1 submitted 21 June, 2000; originally announced June 2000.

Comments: 13 pages, 13 figures, revised version

arXiv:cond-mat/9911321 [pdf, ps, other]

doi 10.1016/S0921-4534(00)01505-7

Commensurate and Incommensurate Spin Fluctuations in YBa_2Cu_3O_{6+y}

Authors: K. -K. Voo, W. C. Wu

Abstract: We present an interpretation of the recent neutron data on the commensurate and incommensurate spin fluctuations found in YBa_2d$Cu_3O_{6+y} based on a special configuration of the electronic dispersion and intervention from the d_{x^2-y^2}-wave superconducting phase. The observed switch over between the commensurate and incommensurate fluctuation spectra at the change of frequency or temperatur… ▽ More We present an interpretation of the recent neutron data on the commensurate and incommensurate spin fluctuations found in YBa_2d$Cu_3O_{6+y} based on a special configuration of the electronic dispersion and intervention from the d_{x^2-y^2}-wave superconducting phase. The observed switch over between the commensurate and incommensurate fluctuation spectra at the change of frequency or temperature is naturally accounted within this scenario. △ Less

Submitted 19 November, 1999; originally announced November 1999.

Comments: 4 pages and 4 figs

Showing 1–43 of 43 results for author: Vo, K