-
Determination of time dependent source terms for Stokes systems in unbounded domains
Authors:
Adel Blouza,
Léo Glangetas,
Yavar Kian,
Van-Sang Ngo
Abstract:
This article is devoted to the analysis of inverse source problems for Stokes systems in unbounded domains where the corresponding velocity flow is observed on a surface. Our main objective is to study the unique determination of general class of time-dependent and vector-valued source terms with potentially unknown divergence. Taking into account the challenges inherent in this class of inverse s…
▽ More
This article is devoted to the analysis of inverse source problems for Stokes systems in unbounded domains where the corresponding velocity flow is observed on a surface. Our main objective is to study the unique determination of general class of time-dependent and vector-valued source terms with potentially unknown divergence. Taking into account the challenges inherent in this class of inverse source problems, we aim to identify the most precise conditions that ensure their resolution. Motivated by various fluid motion problems, we explore several class of boundary measurements. Our proofs are based on different arguments, including unique continuation properties for Stokes systems, approximate controllability, complex analysis, and the application of explicit harmonic functions and explicit solutions for Stokes systems. This analysis is complemented by a reconstruction algorithm and examples of numerical computations.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Brain Tumor Segmentation in MRI Images with 3D U-Net and Contextual Transformer
Authors:
Thien-Qua T. Nguyen,
Hieu-Nghia Nguyen,
Thanh-Hieu Bui,
Thien B. Nguyen-Tat,
Vuong M. Ngo
Abstract:
This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT). By architectural expansion CoT, the proposed model extends its architecture to a 3D format, integrates it smoothly with the base model to utilize the complex contextual information found in MRI scan…
▽ More
This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT). By architectural expansion CoT, the proposed model extends its architecture to a 3D format, integrates it smoothly with the base model to utilize the complex contextual information found in MRI scans, emphasizing how elements rely on each other across an extended spatial range. The proposed model synchronizes tumor mass characteristics from CoT, mutually reinforcing feature extraction, facilitating the precise capture of detailed tumor mass structures, including location, size, and boundaries. Several experimental results present the outstanding segmentation performance of the proposed method in comparison to current state-of-the-art approaches, achieving Dice score of 82.0%, 81.5%, 89.0% for Enhancing Tumor, Tumor Core and Whole Tumor, respectively, on BraTS2019.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Automating Attendance Management in Human Resources: A Design Science Approach Using Computer Vision and Facial Recognition
Authors:
Bao-Thien Nguyen-Tat,
Minh-Quoc Bui,
Vuong M. Ngo
Abstract:
Haar Cascade is a cost-effective and user-friendly machine learning-based algorithm for detecting objects in images and videos. Unlike Deep Learning algorithms, which typically require significant resources and expensive computing costs, it uses simple image processing techniques like edge detection and Haar features that are easy to comprehend and implement. By combining Haar Cascade with OpenCV2…
▽ More
Haar Cascade is a cost-effective and user-friendly machine learning-based algorithm for detecting objects in images and videos. Unlike Deep Learning algorithms, which typically require significant resources and expensive computing costs, it uses simple image processing techniques like edge detection and Haar features that are easy to comprehend and implement. By combining Haar Cascade with OpenCV2 on an embedded computer like the NVIDIA Jetson Nano, this system can accurately detect and match faces in a database for attendance tracking. This system aims to achieve several specific objectives that set it apart from existing solutions. It leverages Haar Cascade, enriched with carefully selected Haar features, such as Haar-like wavelets, and employs advanced edge detection techniques. These techniques enable precise face detection and matching in both images and videos, contributing to high accuracy and robust performance. By doing so, it minimizes manual intervention and reduces errors, thereby strengthening accountability. Additionally, the integration of OpenCV2 and the NVIDIA Jetson Nano optimizes processing efficiency, making it suitable for resource-constrained environments. This system caters to a diverse range of educational institutions, including schools, colleges, vocational training centers, and various workplace settings such as small businesses, offices, and factories. ... The system's affordability and efficiency democratize attendance management technology, making it accessible to a broader audience. Consequently, it has the potential to transform attendance tracking and management practices, ultimately leading to heightened productivity and accountability. In conclusion, this system represents a groundbreaking approach to attendance tracking and management...
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Graph-Based Optimisation of Network Expansion in a Dockless Bike Sharing System
Authors:
Mark Roantree,
Niamh Murphi,
Dinh Viet Cuong,
Vuong Minh Ngo
Abstract:
Bike-sharing systems (BSSs) are deployed in over a thousand cities worldwide and play an important role in many urban transportation systems. BSSs alleviate congestion, reduce pollution and promote physical exercise. It is essential to explore the spatiotemporal patterns of bike-sharing demand, as well as the factors that influence these patterns, in order to optimise system operational efficiency…
▽ More
Bike-sharing systems (BSSs) are deployed in over a thousand cities worldwide and play an important role in many urban transportation systems. BSSs alleviate congestion, reduce pollution and promote physical exercise. It is essential to explore the spatiotemporal patterns of bike-sharing demand, as well as the factors that influence these patterns, in order to optimise system operational efficiency. In this study, an optimised geo-temporal graph is constructed using trip data from Moby Bikes, a dockless BSS operator. The process of optimising the graph unveiled prime locations for erecting new stations during future expansions of the BSS. The Louvain algorithm, a community detection technique, is employed to uncover usage patterns at different levels of temporal granularity. The community detection results reveal largely self-contained sub-networks that exhibit similar usage patterns at their respective levels of temporal granularity. Overall, this study reinforces that BSSs are intrinsically spatiotemporal systems, with community presence driven by spatiotemporal dynamics. These findings may aid operators in improving redistribution efficiency.
△ Less
Submitted 28 March, 2024;
originally announced April 2024.
-
Charting Ethical Tensions in Multispecies Technology Research through Beneficiary-Epistemology Space
Authors:
Steve Benford,
Clara Mancini,
Alan Chamberlain,
Eike Schneiders,
Simon Castle-Green,
Joel Fischer,
Ayse Kucukyilmaz,
Guido Salimbeni,
Victor Ngo,
Pepita Barnard,
Matt Adams,
Nick Tandavanitj,
Ju Row Farr
Abstract:
While ethical challenges are widely discussed in HCI, far less is reported about the ethical processes that researchers routinely navigate. We reflect on a multispecies project that negotiated an especially complex ethical approval process. Cat Royale was an artist-led exploration of creating an artwork to engage audiences in exploring trust in autonomous systems. The artwork took the form of a ro…
▽ More
While ethical challenges are widely discussed in HCI, far less is reported about the ethical processes that researchers routinely navigate. We reflect on a multispecies project that negotiated an especially complex ethical approval process. Cat Royale was an artist-led exploration of creating an artwork to engage audiences in exploring trust in autonomous systems. The artwork took the form of a robot that played with three cats. Gaining ethical approval required an extensive dialogue with three Institutional Review Boards (IRBs) covering computer science, veterinary science and animal welfare, raising tensions around the welfare of the cats, perceived benefits and appropriate methods, and reputational risk to the University. To reveal these tensions we introduce beneficiary-epistemology space, that makes explicit who benefits from research (humans or animals) and underlying epistemologies. Positioning projects and IRBs in this space can help clarify tensions and highlight opportunities to recruit additional expertise.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Designing Multispecies Worlds for Robots, Cats, and Humans
Authors:
Eike Schneiders,
Steve Benford,
Alan Chamberlain,
Clara Mancini,
Simon Castle-Green,
Victor Ngo,
Ju Row Farr,
Matt Adams,
Nick Tandavanitj,
Joel Fischer
Abstract:
We reflect on the design of a multispecies world centred around a bespoke enclosure in which three cats and a robot arm coexist for six hours a day during a twelve-day installation as part of an artist-led project. In this paper, we present the project's design process, encompassing various interconnected components, including the cats, the robot and its autonomous systems, the custom end-effector…
▽ More
We reflect on the design of a multispecies world centred around a bespoke enclosure in which three cats and a robot arm coexist for six hours a day during a twelve-day installation as part of an artist-led project. In this paper, we present the project's design process, encompassing various interconnected components, including the cats, the robot and its autonomous systems, the custom end-effectors and robot attachments, the diverse roles of the humans-in-the-loop, and the custom-designed enclosure. Subsequently, we provide a detailed account of key moments during the deployment and discuss the design implications for future multispecies systems. Specifically, we argue that designing the technology and its interactions is not sufficient, but that it is equally important to consider the design of the `world' in which the technology operates. Finally, we highlight the necessity of human involvement in areas such as breakdown recovery, animal welfare, and their role as audience.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Remark on the Entropy Production of Adaptive Run-and-Tumble Chemotaxis
Authors:
Minh D. N. Nguyen,
Phuc H. Pham,
Khang V. Ngo,
Van H. Do,
Shengkai Li,
Trung V. Phan
Abstract:
Chemotactic active particles, such as bacteria and cells, exhibit an adaptive run-and-tumble motion, giving rise to complex emergent behaviors in response to external chemical fields. This motion is generated by the conversion of internal chemical energy into self-propulsion, allowing each agent to sustain a steady-state far from thermal equilibrium and perform works. The rate of entropy productio…
▽ More
Chemotactic active particles, such as bacteria and cells, exhibit an adaptive run-and-tumble motion, giving rise to complex emergent behaviors in response to external chemical fields. This motion is generated by the conversion of internal chemical energy into self-propulsion, allowing each agent to sustain a steady-state far from thermal equilibrium and perform works. The rate of entropy production serves as an indicates of how extensive these agents operate away from thermal equilibrium, providing a measure for estimating maximum obtainable power. Here we present the general framework for calculating the entropy production rate created by such population of agents from the first principle, using the minimal model of bacterial adaptive chemotaxis, as they execute the most basic collective action -- the mass transport.
△ Less
Submitted 27 January, 2024; v1 submitted 5 November, 2023;
originally announced November 2023.
-
Machine Learning-Based Intrusion Detection: Feature Selection versus Feature Extraction
Authors:
Vu-Duc Ngo,
Tuan-Cuong Vuong,
Thien Van Luong,
Hung Tran
Abstract:
Internet of things (IoT) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT net…
▽ More
Internet of things (IoT) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT networks have been developed, which often rely on either feature extraction or feature selection techniques for reducing the dimension of input data before being fed into machine learning models. This aims to make the detection complexity low enough for real-time operations, which is particularly vital in any intrusion detection systems. This paper provides a comprehensive comparison between these two feature reduction methods of intrusion detection in terms of various performance metrics, namely, precision rate, recall rate, detection accuracy, as well as runtime complexity, in the presence of the modern UNSW-NB15 dataset as well as both binary and multiclass classification. For example, in general, the feature selection method not only provides better detection performance but also lower training and inference time compared to its feature extraction counterpart, especially when the number of reduced features K increases. However, the feature extraction method is much more reliable than its selection counterpart, particularly when K is very small, such as K = 4. Additionally, feature extraction is less sensitive to changing the number of reduced features K than feature selection, and this holds true for both binary and multiclass classifications. Based on this comparison, we provide a useful guideline for selecting a suitable intrusion detection type for each specific scenario, as detailed in Tab. 14 at the end of Section IV.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Improving Estimation of the Koopman Operator with Kolmogorov-Smirnov Indicator Functions
Authors:
Van A. Ngo,
Yen Ting Lin,
Danny Perez
Abstract:
It has become common to perform kinetic analysis using approximate Koopman operators that transforms high-dimensional time series of observables into ranked dynamical modes. Key to a practical success of the approach is the identification of a set of observables which form a good basis in which to expand the slow relaxation modes. Good observables are, however, difficult to identify {\em a priori}…
▽ More
It has become common to perform kinetic analysis using approximate Koopman operators that transforms high-dimensional time series of observables into ranked dynamical modes. Key to a practical success of the approach is the identification of a set of observables which form a good basis in which to expand the slow relaxation modes. Good observables are, however, difficult to identify {\em a priori} and sub-optimal choices can lead to significant underestimations of characteristic timescales. Leveraging the representation of slow dynamics in terms of Hidden Markov Model (HMM), we propose a simple and computationally efficient clustering procedure to infer surrogate observables that form a good basis for slow modes. We apply the approach to an analytically solvable model system, as well as on three protein systems of different complexities. We consistently demonstrate that the inferred indicator functions can significantly improve the estimation of the leading eigenvalues of the Koopman operators and correctly identify key states and transition timescales of stochastic systems, even when good observables are not known {\em a priori}.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Fast and Efficient Malware Detection with Joint Static and Dynamic Features Through Transfer Learning
Authors:
Mao V. Ngo,
Tram Truong-Huu,
Dima Rabadi,
Jia Yi Loo,
Sin G. Teo
Abstract:
In malware detection, dynamic analysis extracts the runtime behavior of malware samples in a controlled environment and static analysis extracts features using reverse engineering tools. While the former faces the challenges of anti-virtualization and evasive behavior of malware samples, the latter faces the challenges of code obfuscation. To tackle these drawbacks, prior works proposed to develop…
▽ More
In malware detection, dynamic analysis extracts the runtime behavior of malware samples in a controlled environment and static analysis extracts features using reverse engineering tools. While the former faces the challenges of anti-virtualization and evasive behavior of malware samples, the latter faces the challenges of code obfuscation. To tackle these drawbacks, prior works proposed to develop detection models by aggregating dynamic and static features, thus leveraging the advantages of both approaches. However, simply concatenating dynamic and static features raises an issue of imbalanced contribution due to the heterogeneous dimensions of feature vectors to the performance of malware detection models. Yet, dynamic analysis is a time-consuming task and requires a secure environment, leading to detection delays and high costs for maintaining the analysis infrastructure. In this paper, we first introduce a method of constructing aggregated features via concatenating latent features learned through deep learning with equally-contributed dimensions. We then develop a knowledge distillation technique to transfer knowledge learned from aggregated features by a teacher model to a student model trained only on static features and use the trained student model for the detection of new malware samples. We carry out extensive experiments with a dataset of 86709 samples including both benign and malware samples. The experimental results show that the teacher model trained on aggregated features constructed by our method outperforms the state-of-the-art models with an improvement of up to 2.38% in detection accuracy. The distilled student model not only achieves high performance (97.81% in terms of accuracy) as that of the teacher model but also significantly reduces the detection time (from 70046.6 ms to 194.9 ms) without requiring dynamic analysis.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Deep Neural Network-Based Detector for Single-Carrier Index Modulation NOMA
Authors:
Toan Gian,
Vu-Duc Ngo,
Tien-Hoa Nguyen,
Trung Tan Nguyen,
Thien Van Luong
Abstract:
In this paper, a deep neural network (DNN)-based detector for an uplink single-carrier index modulation nonorthogonal multiple access (SC-IM-NOMA) system is proposed, where SC-IM-NOMA allows users to use the same set of subcarriers for transmitting their data modulated by the sub-carrier index modulation technique. More particularly, users of SC-IMNOMA simultaneously transmit their SC-IM data at d…
▽ More
In this paper, a deep neural network (DNN)-based detector for an uplink single-carrier index modulation nonorthogonal multiple access (SC-IM-NOMA) system is proposed, where SC-IM-NOMA allows users to use the same set of subcarriers for transmitting their data modulated by the sub-carrier index modulation technique. More particularly, users of SC-IMNOMA simultaneously transmit their SC-IM data at different power levels which are then exploited by their receivers to perform successive interference cancellation (SIC) multi-user detection. The existing detectors designed for SC-IM-NOMA, such as the joint maximum-likelihood (JML) detector and the maximum likelihood SIC-based (ML-SIC) detector, suffer from high computational complexity. To address this issue, we propose a DNN-based detector whose structure relies on the model-based SIC for jointly detecting both M-ary symbols and index bits of all users after trained with sufficient simulated data. The simulation results demonstrate that the proposed DNN-based detector attains near-optimal error performance and significantly reduced runtime complexity in comparison with the existing hand-crafted detectors.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Deep Learning-Based Signal Detection for Dual-Mode Index Modulation 3D-OFDM
Authors:
Dang-Y Hoang,
Tien-Hoa Nguyen,
Vu-Duc Ngo,
Trung Tan Nguyen,
Nguyen Cong Luong,
Thien Van Luong
Abstract:
In this paper, we propose a deep learning-based signal detector called DuaIM-3DNet for dual-mode index modulation-based three-dimensional (3D) orthogonal frequency division multiplexing (DM-IM-3D-OFDM). Herein, DM-IM-3D- OFDM is a subcarrier index modulation scheme which conveys data bits via both dual-mode 3D constellation symbols and indices of active subcarriers. Thus, this scheme obtains bette…
▽ More
In this paper, we propose a deep learning-based signal detector called DuaIM-3DNet for dual-mode index modulation-based three-dimensional (3D) orthogonal frequency division multiplexing (DM-IM-3D-OFDM). Herein, DM-IM-3D- OFDM is a subcarrier index modulation scheme which conveys data bits via both dual-mode 3D constellation symbols and indices of active subcarriers. Thus, this scheme obtains better error performance than the existing IM schemes when using the conventional maximum likelihood (ML) detector, which, however, suffers from high computational complexity, especially when the system parameters increase. In order to address this fundamental issue, we propose the usage of a deep neural network (DNN) at the receiver to jointly and reliably detect both symbols and index bits of DM-IM-3D-OFDM under Rayleigh fading channels in a data-driven manner. Simulation results demonstrate that our proposed DNN detector achieves near-optimal performance at significantly lower runtime complexity compared to the ML detector.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Long-Short History of Gradients is All You Need: Detecting Malicious and Unreliable Clients in Federated Learning
Authors:
Ashish Gupta,
Tie Luo,
Mao V. Ngo,
Sajal K. Das
Abstract:
Federated learning offers a framework of training a machine learning model in a distributed fashion while preserving privacy of the participants. As the server cannot govern the clients' actions, nefarious clients may attack the global model by sending malicious local gradients. In the meantime, there could also be unreliable clients who are benign but each has a portion of low-quality training da…
▽ More
Federated learning offers a framework of training a machine learning model in a distributed fashion while preserving privacy of the participants. As the server cannot govern the clients' actions, nefarious clients may attack the global model by sending malicious local gradients. In the meantime, there could also be unreliable clients who are benign but each has a portion of low-quality training data (e.g., blur or low-resolution images), thus may appearing similar as malicious clients. Therefore, a defense mechanism will need to perform a three-fold differentiation which is much more challenging than the conventional (two-fold) case. This paper introduces MUD-HoG, a novel defense algorithm that addresses this challenge in federated learning using long-short history of gradients, and treats the detected malicious and unreliable clients differently. Not only this, but we can also distinguish between targeted and untargeted attacks among malicious clients, unlike most prior works which only consider one type of the attacks. Specifically, we take into account sign-flipping, additive-noise, label-flipping, and multi-label-flipping attacks, under a non-IID setting. We evaluate MUD-HoG with six state-of-the-art methods on two datasets. The results show that MUD-HoG outperforms all of them in terms of accuracy as well as precision and recall, in the presence of a mixture of multiple (four) types of attackers as well as unreliable clients. Moreover, unlike most prior works which can only tolerate a low population of harmful users, MUD-HoG can work with and successfully detect a wide range of malicious and unreliable clients - up to 47.5% and 10%, respectively, of the total population. Our code is open-sourced at https://github.com/LabSAINT/MUD-HoG_Federated_Learning.
△ Less
Submitted 14 August, 2022;
originally announced August 2022.
-
Generalized BER of MCIK-OFDM with Imperfect CSI: Selection combining GD versus ML receivers
Authors:
Vu-Duc Ngo,
Thien Van Luong,
Nguyen Cong Luong,
Minh-Tuan Le,
Thi Thanh Huyen Le,
Xuan-Nam Tran
Abstract:
This paper analyzes the bit error rate (BER) of multicarrier index keying - orthogonal frequency division multiplexing (MCIK-OFDM) with selection combining (SC) diversity reception. Particularly, we propose a generalized framework to derive the BER for both the low-complexity greedy detector (GD) and maximum likelihood (ML) detector. Based on this, closedform expressions for the BERs of MCIK-OFDM…
▽ More
This paper analyzes the bit error rate (BER) of multicarrier index keying - orthogonal frequency division multiplexing (MCIK-OFDM) with selection combining (SC) diversity reception. Particularly, we propose a generalized framework to derive the BER for both the low-complexity greedy detector (GD) and maximum likelihood (ML) detector. Based on this, closedform expressions for the BERs of MCIK-OFDM with the SC using either the ML or the GD are derived in presence of the channel state information (CSI) imperfection. The asymptotic analysis is presented to gain helpful insights into effects of different CSI conditions on the BERs of these two detectors. More importantly, we theoretically provide opportunities for using the GD instead of the ML under each specific CSI uncertainty, which depend on the number of receiver antennas and the M-ary modulation size. Finally, extensive simulation results are provided in order to validate our theoretical expressions and analysis.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Enhancing Diversity of OFDM with Joint Spread Spectrum and Subcarrier Index Modulations
Authors:
Vu-Duc Ngo,
Thien Van Luong,
Nguyen Cong Luong,
Mai Xuan Trang,
Minh-Tuan Le,
Thi Thanh Huyen Le,
Xuan-Nam Tran
Abstract:
This paper proposes a novel spread spectrum and sub-carrier index modulation (SS-SIM) scheme, which is integrated to orthogonal frequency division multiplexing (OFDM) framework to enhance the diversity over the conventional IM schemes. Particularly, the resulting scheme, called SS-SIMOFDM, jointly employs both spread spectrum and sub-carrier index modulations to form a precoding vector which is th…
▽ More
This paper proposes a novel spread spectrum and sub-carrier index modulation (SS-SIM) scheme, which is integrated to orthogonal frequency division multiplexing (OFDM) framework to enhance the diversity over the conventional IM schemes. Particularly, the resulting scheme, called SS-SIMOFDM, jointly employs both spread spectrum and sub-carrier index modulations to form a precoding vector which is then used to spread an M-ary complex symbol across all active sub-carriers. As a result, the proposed scheme enables a novel transmission of three signal domains: SS and sub-carrier indices, and a single M-ary symbol. For practical implementations, two reduced-complexity near-optimal detectors are proposed, which have complexities less depending on the M-ary modulation size. Then, the bit error probability and its upper bound are analyzed to gain an insight into the diversity gain, which is shown to be strongly affected by the order of sub-carrier indices. Based on this observation, we propose two novel sub-carrier index mapping methods, which significantly increase the diversity gain of SSSIM-OFDM. Finally, simulation results show that our scheme achieves better error performance than the benchmarks at the cost of lower spectral efficiency compared to classical OFDM and OFDM-IM, which can carry multiple M-ary symbols.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Hydrostatic limit of the Navier-Stokes-alpha model
Authors:
Léo Glangetas,
Van-Sang Ngo,
El Mehdi Said
Abstract:
In this paper we study the hydrostatic limit of the Navier-Stokes-alpha model in a very thin striped domain. We derive some Prandtl-type limit equations for this model and we prove the global well-posedness of the limit system for small initial conditions in an appropriate analytic function space.
In this paper we study the hydrostatic limit of the Navier-Stokes-alpha model in a very thin striped domain. We derive some Prandtl-type limit equations for this model and we prove the global well-posedness of the limit system for small initial conditions in an appropriate analytic function space.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach
Authors:
Mao V. Ngo,
Tie Luo,
Tony Q. S. Quek
Abstract:
The advances in deep neural networks (DNN) have significantly enhanced real-time detection of anomalous data in IoT applications. However, the complexity-accuracy-delay dilemma persists: complex DNN models offer higher accuracy, but typical IoT devices can barely afford the computation load, and the remedy of offloading the load to the cloud incurs long delay. In this paper, we address this challe…
▽ More
The advances in deep neural networks (DNN) have significantly enhanced real-time detection of anomalous data in IoT applications. However, the complexity-accuracy-delay dilemma persists: complex DNN models offer higher accuracy, but typical IoT devices can barely afford the computation load, and the remedy of offloading the load to the cloud incurs long delay. In this paper, we address this challenge by proposing an adaptive anomaly detection scheme with hierarchical edge computing (HEC). Specifically, we first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network. We also incorporate a parallelism policy training method to accelerate the training process by taking advantage of distributed models. We build an HEC testbed using real IoT devices, implement and evaluate our contextual-bandit approach with both univariate and multivariate IoT datasets. In comparison with both baseline and state-of-the-art schemes, our adaptive approach strikes the best accuracy-delay tradeoff on the univariate dataset, and achieves the best accuracy and F1-score on the multivariate dataset with only negligibly longer delay than the best (but inflexible) scheme.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Enhanced nonlinear quantum metrology with weakly coupled solitons and particle losses
Authors:
Alexander Alodjants,
Dmitriy Tsarev,
The Vinh Ngo,
Ray-Kuang Lee
Abstract:
The estimation of physical parameters with Heisenberg sensitivity and beyond is one of the crucial problems for current quantum metrology. However, unavoidable lossy effect is commonly believed to be the main obstacle when applying fragile quantum states. To utilize the lossy quantum metrology, we offer an interferometric procedure for phase parameters estimation at the Heisenberg (up to 1/N) and…
▽ More
The estimation of physical parameters with Heisenberg sensitivity and beyond is one of the crucial problems for current quantum metrology. However, unavoidable lossy effect is commonly believed to be the main obstacle when applying fragile quantum states. To utilize the lossy quantum metrology, we offer an interferometric procedure for phase parameters estimation at the Heisenberg (up to 1/N) and super-Heisenberg (up to 1/N^3) scaling levels in the framework of the linear and nonlinear metrology approaches, respectively. The heart of our setup is the novel soliton Josephson Junction (SJJ) system providing the formation of the quantum probe, i.e, the entangled Fock (N00N-like) state, beyond the superfluid-Mott insulator quantum phase transition point. We illustrate that such states are close to the optimal ones even with moderate losses. The enhancement of phase estimation accuracy remains feasible both for the linear and nonlinear metrologies with the SJJs, and allows further improvement for the current experiments performed with atomic condensate solitons with a mesoscopic number of particles.
△ Less
Submitted 7 August, 2021;
originally announced August 2021.
-
Commuting Varieties and Cohomological Complexity
Authors:
Nham V. Ngo,
Paul D. Levy,
Klemen Šivic
Abstract:
In this paper we determine, for all $r$ sufficiently large, the irreducible component(s) of maximal dimension of the variety of commuting $r$-tuples of nilpotent elements of $\mathfrak{gl}_n$. Our main result is that in characteristic $\neq 2,3$, this nilpotent commuting variety has dimension $(r+1)\lfloor \frac{n^2}{4}\rfloor$ for $n\geq 4$, $r\geq 7$. We use this to find the dimension of the (or…
▽ More
In this paper we determine, for all $r$ sufficiently large, the irreducible component(s) of maximal dimension of the variety of commuting $r$-tuples of nilpotent elements of $\mathfrak{gl}_n$. Our main result is that in characteristic $\neq 2,3$, this nilpotent commuting variety has dimension $(r+1)\lfloor \frac{n^2}{4}\rfloor$ for $n\geq 4$, $r\geq 7$. We use this to find the dimension of the (ordinary) $r$-th commuting varieties of $\mathfrak{gl}_n$ and $\mathfrak{sl}_n$ for the same range of values of $r$ and $n$.
Our principal motivation is the connection between nilpotent commuting varieties and cohomological complexity of finite group schemes, which we exploit in the last section of the paper to obtain explicit values for complexities of a large family of modules over the $r$-th Frobenius kernel $({\rm GL}_n)_{(r)}$. These results indicate an inequality between the complexities of a rational $G$-module $M$ when restricted to $G_{(r)}$ or to $G(\mathbb F_{p^r})$; we subsequently establish this inequality for every simple algebraic group $G$ defined over an algebraically closed field of good characteristic, significantly extending a result of Lin and Nakano.
△ Less
Submitted 4 April, 2022; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Sampling by Divergence Minimization
Authors:
Ameer Dharamshi,
Vivian Ngo,
Jeffrey S. Rosenthal
Abstract:
We introduce a Markov Chain Monte Carlo (MCMC) method that is designed to sample from target distributions with irregular geometry using an adaptive scheme. In cases where targets exhibit non-Gaussian behaviour, we propose that adaption should be regional rather than global. Our algorithm minimizes the information projection component of the Kullback-Leibler (KL) divergence between the proposal an…
▽ More
We introduce a Markov Chain Monte Carlo (MCMC) method that is designed to sample from target distributions with irregular geometry using an adaptive scheme. In cases where targets exhibit non-Gaussian behaviour, we propose that adaption should be regional rather than global. Our algorithm minimizes the information projection component of the Kullback-Leibler (KL) divergence between the proposal and target distributions to encourage proposals that are distributed similarly to the regional geometry of the target. Unlike traditional adaptive MCMC, this procedure rapidly adapts to the geometry of the target's current position as it explores the surrounding space without the need for many preexisting samples. The divergence minimization algorithms are tested on target distributions with irregularly shaped modes and we provide results demonstrating the effectiveness of our methods.
△ Less
Submitted 6 May, 2022; v1 submitted 2 May, 2021;
originally announced May 2021.
-
Structural Textile Pattern Recognition and Processing Based on Hypergraphs
Authors:
Vuong M. Ngo,
Sven Helmer,
Nhien-An Le-Khac,
M-Tahar Kechadi
Abstract:
The humanities, like many other areas of society, are currently undergoing major changes in the wake of digital transformation. However, in order to make collection of digitised material in this area easily accessible, we often still lack adequate search functionality. For instance, digital archives for textiles offer keyword search, which is fairly well understood, and arrange their content follo…
▽ More
The humanities, like many other areas of society, are currently undergoing major changes in the wake of digital transformation. However, in order to make collection of digitised material in this area easily accessible, we often still lack adequate search functionality. For instance, digital archives for textiles offer keyword search, which is fairly well understood, and arrange their content following a certain taxonomy, but search functionality at the level of thread structure is still missing. To facilitate the clustering and search, we introduce an approach for recognising similar weaving patterns based on their structures for textile archives. We first represent textile structures using hypergraphs and extract multisets of k-neighbourhoods describing weaving patterns from these graphs. Then, the resulting multisets are clustered using various distance measures and various clustering algorithms (K-Means for simplicity and hierarchical agglomerative algorithms for precision). We evaluate the different variants of our approach experimentally, showing that this can be implemented efficiently (meaning it has linear complexity), and demonstrate its quality to query and cluster datasets containing large textile samples. As, to the est of our knowledge, this is the first practical approach for explicitly modelling complex and irregular weaving patterns usable for retrieval, we aim at establishing a solid baseline.
△ Less
Submitted 20 March, 2021;
originally announced March 2021.
-
Learning Hidden Chemistry with Deep Neural Networks
Authors:
Tien-Cuong Nguyen,
Van-Quyen Nguyen,
Van-Linh Ngo,
Quang-Khoat Than,
Tien-Lam Pham
Abstract:
We demonstrate a machine learning approach designed to extract hidden chemistry/physics to facilitate new materials discovery. In particular, we propose a novel method for learning latent knowledge from material structure data in which machine learning models are developed to present the possibility that an atom can be paired with a chemical environment in an observed materials. For this purpose,…
▽ More
We demonstrate a machine learning approach designed to extract hidden chemistry/physics to facilitate new materials discovery. In particular, we propose a novel method for learning latent knowledge from material structure data in which machine learning models are developed to present the possibility that an atom can be paired with a chemical environment in an observed materials. For this purpose, we trained deep neural networks acquiring information from the atom of interest and its environment to estimate the possibility. The models were then used to establish recommendation systems, which can suggest a list of atoms for an environment within a structure. The center atom of that environment was then replaced with the various recommended atoms to generate new structures. Based on these recommendations, we also propose a method of dissimilarity measurement between the atoms and, through hierarchical cluster analysis and visualization using the multidimensional scaling algorithm, illustrate that this dissimilarity can reflect the chemistry of the elements. Finally, our models were applied to the discovery of new structures in the well-known magnetic material Nd$_2$Fe$_{14}$B. Our models propose 108 new structures, 71 of which are confirmed to converge to local-minimum-energy structures with formation energy less than 0.1 eV by first-principles calculations.
△ Less
Submitted 31 July, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Mesoscopic quantum superposition states of weakly-coupled matter-wave solitons
Authors:
Dmitriy Tsarev,
Alexander Alodjants,
The Vinh Ngo,
Ray-Kuang Lee
Abstract:
The Josephson junctions (JJs) are at the heart of modern quantum technologies and metrology. In this work we establish quantum features of an atomic soliton Josephson junction (SJJ) device, which consists of two weakly-coupled condensates with negative scattering length. The condensates are trapped in a double-well potential and elongated in one dimension. Starting with classical field theory we m…
▽ More
The Josephson junctions (JJs) are at the heart of modern quantum technologies and metrology. In this work we establish quantum features of an atomic soliton Josephson junction (SJJ) device, which consists of two weakly-coupled condensates with negative scattering length. The condensates are trapped in a double-well potential and elongated in one dimension. Starting with classical field theory we map for the first time a two-soliton problem onto the effective two-mode Hamiltonian and perform a second quantization procedure. Compared to the conventional Bosonic Josephson junction (BJJ) condensate system, we show that the SJJ-model in quantum domain exhibits unusual features due to its effective nonlinear strength proportional to the square of total particle number, $N^2$. A novel self-tuning effect for the effective tunneling parameter is also demonstrated in the SJJ-model, which depends on the particle number and rapidly vanishes as the JJ population imbalance increases. The formation of entangled Fock state superposition is predicted for the quantum SJJ-model, revealing dominant $N00N$-state components at the "edges" for $n=0, N$ particle number. We have shown that the obtained quantum state is more resistant to few particle losses from the condensates if tiny components of entangled Fock states are present in the vicinity of the major $N00N$-state component. This peculiarity of the quantum SJJ-model establishes an important difference from its semiclassical analogue obtained in the framework of Hartree approach.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.
-
Bose-Einstein condensate soliton qubit states for metrological applications
Authors:
The Vinh Ngo,
Dmitriy Tsarev,
Ray-Kuang Lee,
Alexander Alodjants
Abstract:
By utilizing Bose-Einstein condensate solitons, optically manipulated and trapped in a double-well potential, coupled through nonlinear Josephson effect, we propose novel quantum metrology applications with two soliton qubit states. In addition to steady-state solutions in different scenarios, phase space analysis, in terms of population imbalance - phase difference variables, is also performed to…
▽ More
By utilizing Bose-Einstein condensate solitons, optically manipulated and trapped in a double-well potential, coupled through nonlinear Josephson effect, we propose novel quantum metrology applications with two soliton qubit states. In addition to steady-state solutions in different scenarios, phase space analysis, in terms of population imbalance - phase difference variables, is also performed to demonstrate macroscopic quantum self-trapping regimes. Schrödinger-cat states, maximally path-entangled ($N00N$) states, and macroscopic soliton qubits are predicted and exploited for the distinguishability of obtained macroscopic states in the framework of binary (non-orthogonal) state discrimination problem. For arbitrary phase estimation in the framework of linear quantum metrology approach, these macroscopic soliton states are revealed to have a scaling up to the Heisenberg limit (HL). The examples are illustrated for HL estimation of angular frequency between the ground and first excited macroscopic states of the condensate, which opens new perspectives for current frequency standards technologies.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.
-
Coordinated Container Migration and Base Station Handover in Mobile Edge Computing
Authors:
Mao V. Ngo,
Tie Luo,
Hieu T. Hoang,
Tony Q. S. Quek
Abstract:
Offloading computationally intensive tasks from mobile users (MUs) to a virtualized environment such as containers on a nearby edge server, can significantly reduce processing time and hence end-to-end (E2E) delay. However, when users are mobile, such containers need to be migrated to other edge servers located closer to the MUs to keep the E2E delay low. Meanwhile, the mobility of MUs necessitate…
▽ More
Offloading computationally intensive tasks from mobile users (MUs) to a virtualized environment such as containers on a nearby edge server, can significantly reduce processing time and hence end-to-end (E2E) delay. However, when users are mobile, such containers need to be migrated to other edge servers located closer to the MUs to keep the E2E delay low. Meanwhile, the mobility of MUs necessitates handover among base stations in order to keep the wireless connections between MUs and base stations uninterrupted. In this paper, we address the joint problem of container migration and base-station handover by proposing a coordinated migration-handover mechanism, with the objective of achieving low E2E delay and minimizing service interruption. The mechanism determines the optimal destinations and time for migration and handover in a coordinated manner, along with a delta checkpoint technique that we propose. We implement a testbed edge computing system with our proposed coordinated migration-handover mechanism, and evaluate the performance using real-world applications implemented with Docker container (an industry-standard). The results demonstrate that our mechanism achieves 30%-40% lower service downtime and 13%-22% lower E2E delay as compared to other mechanisms. Our work is instrumental in offering smooth user experience in mobile edge computing.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
Hydrostatic approximation of the 2D primitive equations in a thin strip
Authors:
Nacer Aarach,
Van-Sang Ngo
Abstract:
We prove the global wellposedness of the 2D non-rotating primitive equations with no-slip boundary conditions in a thin strip of width $\eps$ for small data which are analytic in the tangential direction. We also prove that the hydrostatic limit (when $\eps \to 0$) is a couple of a Prandtl-like system for the velocity with a transport-diffusion equation for the temperature.
We prove the global wellposedness of the 2D non-rotating primitive equations with no-slip boundary conditions in a thin strip of width $\eps$ for small data which are analytic in the tangential direction. We also prove that the hydrostatic limit (when $\eps \to 0$) is a couple of a Prandtl-like system for the velocity with a transport-diffusion equation for the temperature.
△ Less
Submitted 26 February, 2023; v1 submitted 29 June, 2020;
originally announced June 2020.
-
Contextual-Bandit Anomaly Detection for IoT Data in Distributed Hierarchical Edge Computing
Authors:
Mao V. Ngo,
Tie Luo,
Hakima Chaouchi,
Tony Q. S. Quek
Abstract:
Advances in deep neural networks (DNN) greatly bolster real-time detection of anomalous IoT data. However, IoT devices can hardly afford complex DNN models, and offloading anomaly detection tasks to the cloud incurs long delay. In this paper, we propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems to solve this problem, for b…
▽ More
Advances in deep neural networks (DNN) greatly bolster real-time detection of anomalous IoT data. However, IoT devices can hardly afford complex DNN models, and offloading anomaly detection tasks to the cloud incurs long delay. In this paper, we propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems to solve this problem, for both univariate and multivariate IoT data. First, we construct multiple anomaly detection DNN models with increasing complexity, and associate each model with a layer in HEC from bottom to top. Then, we design an adaptive scheme to select one of these models on the fly, based on the contextual information extracted from each input data. The model selection is formulated as a contextual bandit problem characterized by a single-step Markov decision process, and is solved using a reinforcement learning policy network. We build an HEC testbed, implement our proposed approach, and evaluate it using real IoT datasets. The demo shows that our proposed approach significantly reduces detection delay (e.g., by 71.4% for univariate dataset) without sacrificing accuracy, as compared to offloading detection tasks to the cloud. We also compare it with other baseline schemes and demonstrate that it achieves the best accuracy-delay tradeoff. Our demo is also available online: https://rebrand.ly/91a71
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
Crop Knowledge Discovery Based on Agricultural Big Data Integration
Authors:
Vuong M. Ngo,
M-Tahar Kechadi
Abstract:
Nowadays, the agricultural data can be generated through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, agricultural laboratories, farmers, government agencies and agribusinesses. The analysis of this big data enables farmers, companies and agronomists to extract high business and scientific knowledge, improving their operational p…
▽ More
Nowadays, the agricultural data can be generated through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, agricultural laboratories, farmers, government agencies and agribusinesses. The analysis of this big data enables farmers, companies and agronomists to extract high business and scientific knowledge, improving their operational processes and product quality. However, before analysing this data, different data sources need to be normalised, homogenised and integrated into a unified data representation. In this paper, we propose an agricultural data integration method using a constellation schema which is designed to be flexible enough to incorporate other datasets and big data models. We also apply some methods to extract knowledge with the view to improve crop yield; these include finding suitable quantities of soil properties, herbicides and insecticides for both increasing crop yield and protecting the environment.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Data Warehouse and Decision Support on Integrated Crop Big Data
Authors:
V. M. Ngo,
N. A. Le-Khac,
M. T. Kechadi
Abstract:
In recent years, precision agriculture is becoming very popular. The introduction of modern information and communication technologies for collecting and processing Agricultural data revolutionise the agriculture practises. This has started a while ago (early 20th century) and it is driven by the low cost of collecting data about everything; from information on fields such as seed, soil, fertilise…
▽ More
In recent years, precision agriculture is becoming very popular. The introduction of modern information and communication technologies for collecting and processing Agricultural data revolutionise the agriculture practises. This has started a while ago (early 20th century) and it is driven by the low cost of collecting data about everything; from information on fields such as seed, soil, fertiliser, pest, to weather data, drones and satellites images. Specially, the agricultural data mining today is considered as Big Data application in terms of volume, variety, velocity and veracity. Hence it leads to challenges in processing vast amounts of complex and diverse information to extract useful knowledge for the farmer, agronomist, and other businesses. It is a key foundation to establishing a crop intelligence platform, which will enable efficient resource management and high quality agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse (ADW). ADW is characterised by its (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) consistency, availability and partition tolerant; (9) cloud compatibility. We also evaluate the performance of ADW and present some complex queries to extract and return necessary knowledge about crop management.
△ Less
Submitted 12 April, 2021; v1 submitted 9 March, 2020;
originally announced March 2020.
-
Linear Programming Contractor for Interval Distribution State Estimation Using RDM Arithmetic
Authors:
VietCuong Ngo,
Wenchuan Wu
Abstract:
State estimation (SE) of distribution networks heavily relies on pseudo measurements that introduce significant errors, since real-time measurements are insufficient. Interval SE models are regularly used, where true values of system states are supposed to be within the estimated ranges. However, conventional interval SE algorithms cannot consider the correlations of same interval variables in dif…
▽ More
State estimation (SE) of distribution networks heavily relies on pseudo measurements that introduce significant errors, since real-time measurements are insufficient. Interval SE models are regularly used, where true values of system states are supposed to be within the estimated ranges. However, conventional interval SE algorithms cannot consider the correlations of same interval variables in different terms of constraints, which results in overly conservative estimation results. In this paper, we propose a Linear Programming (LP) Contractor algorithm that uses a relative distance measure (RDM) interval operation to solve this problem. In the proposed model, measurement errors are assumed to be bounded into given sets, thus converting the state variables to RDM variables. In this case, the SE model is a non-convex model, and the solution credibility cannot be guaranteed. Therefore, each nonlinear measurement equation in the model is transformed into dual inequality linear equations using the mean value theorem. The SE model is finally reformulated as a linear programming contractor that iteratively narrows the upper and lower bounds of the system state variables. Numerical tests on IEEE three-phase distribution networks show that the proposed method outperforms the conventional interval-constrained propagation, modified Krawczyk-operator and optimization based interval SE methods.
△ Less
Submitted 27 March, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
Adaptive Anomaly Detection for IoT Data in Hierarchical Edge Computing
Authors:
Mao V. Ngo,
Hakima Chaouchi,
Tie Luo,
Tony Q. S. Quek
Abstract:
Advances in deep neural networks (DNN) greatly bolster real-time detection of anomalous IoT data. However, IoT devices can barely afford complex DNN models due to limited computational power and energy supply. While one can offload anomaly detection tasks to the cloud, it incurs long delay and requires large bandwidth when thousands of IoT devices stream data to the cloud concurrently. In this pap…
▽ More
Advances in deep neural networks (DNN) greatly bolster real-time detection of anomalous IoT data. However, IoT devices can barely afford complex DNN models due to limited computational power and energy supply. While one can offload anomaly detection tasks to the cloud, it incurs long delay and requires large bandwidth when thousands of IoT devices stream data to the cloud concurrently. In this paper, we propose an adaptive anomaly detection approach for hierarchical edge computing (HEC) systems to solve this problem. Specifically, we first construct three anomaly detection DNN models of increasing complexity, and associate them with the three layers of HEC from bottom to top, i.e., IoT devices, edge servers, and cloud. Then, we design an adaptive scheme to select one of the models based on the contextual information extracted from input data, to perform anomaly detection. The selection is formulated as a contextual bandit problem and is characterized by a single-step Markov decision process, with an objective of achieving high detection accuracy and low detection delay simultaneously. We evaluate our proposed approach using a real IoT dataset, and demonstrate that it reduces detection delay by 84% while maintaining almost the same accuracy as compared to offloading detection tasks to the cloud. In addition, our evaluation also shows that it outperforms other baseline schemes.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
Approximate controllability of second grade fluids
Authors:
Van-Sang Ngo,
Geneviève Raugel
Abstract:
This paper deals with the controllability of the second grade fluids, a class of non-Newtonian of differentiel type, on a two-dimensional torus. Using the method of Agrachev-Sarychev [1], [2] and of Sirikyan [26], we prove that the system of second grade fluids is approximately controllable by a finite-dimensional control force.
This paper deals with the controllability of the second grade fluids, a class of non-Newtonian of differentiel type, on a two-dimensional torus. Using the method of Agrachev-Sarychev [1], [2] and of Sirikyan [26], we prove that the system of second grade fluids is approximately controllable by a finite-dimensional control force.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
Nonequilibrium path-ensemble averages for symmetric protocols
Authors:
Trung Hai Nguyen,
Van Ngo,
João Paulo Castro Zerba,
Sergei Noskov,
David D. L. Minh
Abstract:
According to the nonequilibrium work relations, path-ensembles generated by irreversible processes in which a system is driven out of equilibrium according to a predetermined protocol may be used to compute equilibrium free energy differences and expectation values. Estimation has previously been improved by considering data collected from the reverse process, which starts in equilibrium in the fi…
▽ More
According to the nonequilibrium work relations, path-ensembles generated by irreversible processes in which a system is driven out of equilibrium according to a predetermined protocol may be used to compute equilibrium free energy differences and expectation values. Estimation has previously been improved by considering data collected from the reverse process, which starts in equilibrium in the final thermodynamic state of the forward process and is driven according to the time-reversed protocol. Here, we develop a theoretically rigorous statistical estimator for nonequilibrium path-ensemble averages specialized for symmetric protocols, in which forward and reverse processes are identical. The estimator is tested with a number of model systems: a symmetric 1D potential, an asymmetric 1D potential, the unfolding of deca-alanine, separating a host-guest system, and translocating a potassium ion through a gramicidin A ion channel. When reconstructing free energies using data from symmetric protocols, the new estimator outperforms existing rigorous unidirectional and bidirectional estimators, converging more quickly and resulting in smaller error. However, in most cases, using the bidirectional estimator with data from a forward and reverse pair of asymmetric protocols outperforms the corresponding symmetric protocol and estimator with the same amount of simulation time. Hence, the new estimator is only recommended when the bidirectional estimator is not feasible or is expected to perform poorly. The symmetric estimator has similar performance to a unidirectional protocol of half the length and twice the number of trajectories.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.
-
Designing and Implementing Data Warehouse for Agricultural Big Data
Authors:
Vuong M. Ngo,
Nhien-An Le-Khac,
M-Tahar Kechadi
Abstract:
In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, u…
▽ More
In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, unstructured, heterogeneous, non-standardized, and inconsistent. Hence, the agricultural data mining is considered as Big Data application in terms of volume, variety, velocity and veracity. It is a key foundation to establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse by combining Hive, MongoDB and Cassandra. Our data warehouse capabilities: (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) replication and recovery; (9) consistency, availability and partition tolerant; (10) distributed and cloud deployment. We also evaluate the performance of our data warehouse.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
Using Entity Relations for Opinion Mining of Vietnamese Comments
Authors:
P. T. Nguyen,
L. T. Le,
V. M. Ngo,
P. M. Nguyen
Abstract:
In this paper, we propose several novel techniques to extract and mining opinions of Vietnamese reviews of customers about a number of products traded on e-commerce in Vietnam. The assessment is based on the emotional level of customers on a specific product such as mobile and laptop. We exploit the features of the products because they are much interested by customers and have many products in th…
▽ More
In this paper, we propose several novel techniques to extract and mining opinions of Vietnamese reviews of customers about a number of products traded on e-commerce in Vietnam. The assessment is based on the emotional level of customers on a specific product such as mobile and laptop. We exploit the features of the products because they are much interested by customers and have many products in the Vietnam e-commerce market. Thence, it can be known the favorites and dislikes of customers about exploited products.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Machine Learning based English Sentiment Analysis
Authors:
T. N. T. Tran,
L. K. N. Nguyen,
V. M. Ngo
Abstract:
Sentiment analysis or opinion mining aims to determine attitudes, judgments and opinions of customers for a product or a service. This is a great system to help manufacturers or servicers know the satisfaction level of customers about their products or services. From that, they can have appropriate adjustments. We use a popular machine learning method, being Support Vector Machine, combine with th…
▽ More
Sentiment analysis or opinion mining aims to determine attitudes, judgments and opinions of customers for a product or a service. This is a great system to help manufacturers or servicers know the satisfaction level of customers about their products or services. From that, they can have appropriate adjustments. We use a popular machine learning method, being Support Vector Machine, combine with the library in Waikato Environment for Knowledge Analysis (WEKA) to build Java web program which analyzes the sentiment of English comments belongs one in four types of woman products. That are dresses, handbags, shoes and rings. We have developed and test our system with a training set having 300 comments and a test set having 400 comments. The experimental results of the system about precision, recall and F measures for positive comments are 89.3%, 95.0% and 92,.1%; for negative comments are 97.1%, 78.5% and 86.8%; and for neutral comments are 76.7%, 86.2% and 81.2%.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Detecting Vietnamese Opinion Spam
Authors:
T. H. H Duong,
T. D. Vu,
V. M. Ngo
Abstract:
Recently, Vietnamese Natural Language Processing has been researched by experts in academic and business. However, the existing papers have been focused only on information classification or extraction from documents. Nowadays, with quickly development of the e-commerce websites, forums and social networks, the products, people, organizations or wonders are targeted of comments or reviews of the n…
▽ More
Recently, Vietnamese Natural Language Processing has been researched by experts in academic and business. However, the existing papers have been focused only on information classification or extraction from documents. Nowadays, with quickly development of the e-commerce websites, forums and social networks, the products, people, organizations or wonders are targeted of comments or reviews of the network communities. Many people often use that reviews to make their decision on something. Whereas, there are many people or organizations use the reviews to mislead readers. Therefore, it is so necessary to detect those bad behaviors in reviews. In this paper, we research this problem and propose an appropriate method for detecting Vietnamese reviews being spam or non-spam. The accuracy of our method is up to 90%.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
A Similarity Measure for Weaving Patterns in Textiles
Authors:
Sven Helmer,
Vuong M. Ngo
Abstract:
We propose a novel approach for measuring the similarity between weaving patterns that can provide similarity-based search functionality for textile archives. We represent textile structures using hypergraphs and extract multisets of k-neighborhoods from these graphs. The resulting multisets are then compared using Jaccard coefficients, Hamming distances, and cosine measures. We evaluate the diffe…
▽ More
We propose a novel approach for measuring the similarity between weaving patterns that can provide similarity-based search functionality for textile archives. We represent textile structures using hypergraphs and extract multisets of k-neighborhoods from these graphs. The resulting multisets are then compared using Jaccard coefficients, Hamming distances, and cosine measures. We evaluate the different variants of our similarity measure experimentally, showing that it can be implemented efficiently and illustrating its quality using it to cluster and query a data set containing more than a thousand textile samples.
△ Less
Submitted 10 October, 2018;
originally announced October 2018.
-
Discovering Latent Information By Spreading Activation Algorithm For Document Retrieval
Authors:
Vuong M. Ngo
Abstract:
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do not contain the keywords but contain information related to the query are not retrieved. Spreading activation is an algorithm for finding latent information in a query by exploiting relations between nodes in an associative network or semantic network. However, the classical spreading activat…
▽ More
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do not contain the keywords but contain information related to the query are not retrieved. Spreading activation is an algorithm for finding latent information in a query by exploiting relations between nodes in an associative network or semantic network. However, the classical spreading activation algorithm uses all relations of a node in the network that will add unsuitable information into the query. In this paper, we propose a novel approach for semantic text search, called query-oriented-constrained spreading activation that only uses relations relating to the content of the query to find really related information. Experiments on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8% respectively better than the syntactic search and the search using the classical constrained spreading activation.
KEYWORDS: Information Retrieval, Ontology, Semantic Search, Spreading Activation
△ Less
Submitted 29 July, 2018;
originally announced August 2018.
-
Opinion Spam Recognition Method for Online Reviews using Ontological Features
Authors:
L. H. Nguyen,
N. T. H. Pham,
V. M. Ngo
Abstract:
Nowadays, there are a lot of people using social media opinions to make their decision on buying products or services. Opinion spam detection is a hard problem because fake reviews can be made by organizations as well as individuals for different purposes. They write fake reviews to mislead readers or automated detection system by promoting or demoting target products to promote them or to damage…
▽ More
Nowadays, there are a lot of people using social media opinions to make their decision on buying products or services. Opinion spam detection is a hard problem because fake reviews can be made by organizations as well as individuals for different purposes. They write fake reviews to mislead readers or automated detection system by promoting or demoting target products to promote them or to damage their reputations. In this paper, we pro-pose a new approach using knowledge-based Ontology to detect opinion spam with high accuracy (higher than 75%). Keywords: Opinion spam, Fake review, E-commercial, Ontology.
△ Less
Submitted 29 July, 2018;
originally announced July 2018.
-
Combining Named Entities with WordNet and Using Query-Oriented Spreading Activation for Semantic Text Search
Authors:
Vuong M. Ngo,
Tru H. Cao,
Tuan M. V. Le
Abstract:
Purely keyword-based text search is not satisfactory because named entities and WordNet words are also important elements to define the content of a document or a query in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. Words in WordNet also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those feature…
▽ More
Purely keyword-based text search is not satisfactory because named entities and WordNet words are also important elements to define the content of a document or a query in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. Words in WordNet also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. We propose an ontology-based generalized Vector Space Model to semantic text search. It exploits ontological features of named entities and WordNet words, and develops a query-oriented spreading activation algorithm to expand queries. In addition, it combines and utilizes advantages of different ontologies for semantic annotation and searching. Experiments on a benchmark dataset show that, in terms of the MAP measure, our model is 42.5% better than the purely keyword-based model, and 32.3% and 15.9% respectively better than the ones using only WordNet or named entities.
Keywords: semantic search, spreading activation, ontology, named entity, WordNet.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Exploring Combinations of Ontological Features and Keywords for Text Retrieval
Authors:
Tru H. Cao,
Khanh C. Le,
Vuong M. Ngo
Abstract:
Named entities have been considered and combined with keywords to enhance information retrieval performance. However, there is not yet a formal and complete model that takes into account entity names, classes, and identifiers together. Our work explores various adaptations of the traditional Vector Space Model that combine different ontological features with keywords, and in different ways. It sho…
▽ More
Named entities have been considered and combined with keywords to enhance information retrieval performance. However, there is not yet a formal and complete model that takes into account entity names, classes, and identifiers together. Our work explores various adaptations of the traditional Vector Space Model that combine different ontological features with keywords, and in different ways. It shows better performance of the proposed models as compared to the keyword-based Lucene, and their advantages for both text retrieval and representation of documents and queries.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
A Generalized Vector Space Model for Ontology-Based Information Retrieval
Authors:
Vuong M. Ngo,
Tru H. Cao
Abstract:
Named entities (NE) are objects that are referred to by names such as people, organizations and locations. Named entities and keywords are important to the meaning of a document. We propose a generalized vector space model that combines named entities and keywords. In the model, we take into account different ontological features of named entities, namely, aliases, classes and identifiers. Moreove…
▽ More
Named entities (NE) are objects that are referred to by names such as people, organizations and locations. Named entities and keywords are important to the meaning of a document. We propose a generalized vector space model that combines named entities and keywords. In the model, we take into account different ontological features of named entities, namely, aliases, classes and identifiers. Moreover, we use entity classes to represent the latent information of interrogative words in Wh-queries, which are ignored in traditional keyword-based searching. We have implemented and tested the proposed model on a TREC dataset, as presented and discussed in the paper.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Semantic Document Clustering on Named Entity Features
Authors:
Tru H. Cao,
Vuong M. Ngo,
Dung T. Hong,
Tho T. Quan
Abstract:
Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many cases are of user concerns. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type p…
▽ More
Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many cases are of user concerns. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type pairs, and identifiers, instead of keywords. Then, hierarchical document clustering can be performed using the similarity measure defined as the cosines of the vectors representing documents. Experimental results are presented and discussed. Clustering documents by information of named entities could be useful for managing web-based learning materials with respect to related objects.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Ontology-Based Query Expansion with Latently Related Named Entities for Semantic Text Search
Authors:
Vuong M. Ngo,
Tru H. Cao
Abstract:
Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. Besides, the meaning of a query may imply latent name…
▽ More
Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. Besides, the meaning of a query may imply latent named entities that are related to the apparent ones in the query. We propose an ontology-based generalized vector space model to semantic text search. It exploits ontological features of named entities and their latently related ones to reveal the semantics of documents and queries. We also propose a framework to combine different ontologies to take their complementary advantages for semantic annotation and searching. Experiments on a benchmark dataset show better search quality of our model to other ones.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
Discovering Latent Concepts and Exploiting Ontological Features for Semantic Text Search
Authors:
Vuong M. Ngo,
Tru H. Cao
Abstract:
Named entities and WordNet words are important in defining the content of a text in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. WordNet words also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related conce…
▽ More
Named entities and WordNet words are important in defining the content of a text in which they occur. Named entities have ontological features, namely, their aliases, classes, and identifiers. WordNet words also have ontological features, namely, their synonyms, hypernyms, hyponyms, and senses. Those features of concepts may be hidden from their textual appearance. Besides, there are related concepts that do not appear in a query, but can bring out the meaning of the query if they are added. The traditional constrained spreading activation algorithms use all relations of a node in the network that will add unsuitable information into the query. Meanwhile, we only use relations represented in the query. We propose an ontology-based generalized Vector Space Model to semantic text search. It discovers relevant latent concepts in a query by relation constrained spreading activation. Besides, to represent a word having more than one possible direct sense, it combines the most specific common hypernym of the remaining undisambiguated multi-senses with the form of the word. Experiments on a benchmark dataset in terms of the MAP measure for the retrieval performance show that our model is 41.9% and 29.3% better than the purely keyword-based model and the traditional constrained spreading activation model, respectively.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
Semantic Search by Latent Ontological Features
Authors:
Tru H. Cao,
Vuong M. Ngo
Abstract:
Both named entities and keywords are important in defining the content of a text in which they occur. In particular, people often use named entities in information search. However, named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. We propose ontology-based extensions of the traditional Vector Space Model that…
▽ More
Both named entities and keywords are important in defining the content of a text in which they occur. In particular, people often use named entities in information search. However, named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. We propose ontology-based extensions of the traditional Vector Space Model that explore different combinations of those latent ontological features with keywords for text retrieval. Our experiments on benchmark datasets show better search quality of the proposed models as compared to the purely keyword-based model, and their advantages for both text retrieval and representation of documents and queries.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features
Authors:
Vuong M. Ngo,
Tru H. Cao,
Tuan M. V. Le
Abstract:
Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings, in general, improves search performance. In this paper, we survey WordNet-based information retrieval systems, which employ a word sense disambiguation method to process queries and documents. The problem is that in many cases a word has more th…
▽ More
Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings, in general, improves search performance. In this paper, we survey WordNet-based information retrieval systems, which employ a word sense disambiguation method to process queries and documents. The problem is that in many cases a word has more than one possible direct sense, and picking only one of them may give a wrong sense for the word. Moreover, the previous systems use only word forms to represent word senses and their hypernyms. We propose a novel approach that uses the most specific common hypernym of the remaining undisambiguated multi-senses of a word, as well as combined WordNet features to represent word meanings. Experiments on a benchmark dataset show that, in terms of the MAP measure, our search engine is 17.7% better than the lexical search, and at least 9.4% better than all surveyed search systems using WordNet.
Keywords Ontology, word sense disambiguation, semantic annotation, semantic search.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
An Efficient Data Warehouse for Crop Yield Prediction
Authors:
Vuong M. Ngo,
Nhien-An Le-Khac,
M-Tahar Kechadi
Abstract:
Nowadays, precision agriculture combined with modern information and communications technologies, is becoming more common in agricultural activities such as automated irrigation systems, precision planting, variable rate applications of nutrients and pesticides, and agricultural decision support systems. In the latter, crop management data analysis, based on machine learning and data mining, focus…
▽ More
Nowadays, precision agriculture combined with modern information and communications technologies, is becoming more common in agricultural activities such as automated irrigation systems, precision planting, variable rate applications of nutrients and pesticides, and agricultural decision support systems. In the latter, crop management data analysis, based on machine learning and data mining, focuses mainly on how to efficiently forecast and improve crop yield. In recent years, raw and semi-processed agricultural data are usually collected using sensors, robots, satellites, weather stations, farm equipment, farmers and agribusinesses while the Internet of Things (IoT) should deliver the promise of wirelessly connecting objects and devices in the agricultural ecosystem. Agricultural data typically captures information about farming entities and operations. Every farming entity encapsulates an individual farming concept, such as field, crop, seed, soil, temperature, humidity, pest, and weed. Agricultural datasets are spatial, temporal, complex, heterogeneous, non-standardized, and very large. In particular, agricultural data is considered as Big Data in terms of volume, variety, velocity and veracity. Designing and developing a data warehouse for precision agriculture is a key foundation for establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. Some of the requirements for such an agricultural data warehouse are privacy, security, and real-time access among its stakeholders (e.g., farmers, farm equipment manufacturers, agribusinesses, co-operative societies, customers and possibly Government agencies). However, currently there are very few reports in the literature that focus on the design of efficient data warehouses with the view of enabling Agricultural Big Data analysis and data mining. In this paper ...
△ Less
Submitted 26 June, 2018;
originally announced July 2018.
-
On the influence of gravity on density-dependent incompressible periodic fluids
Authors:
Van-Sang Ngo,
Stefano Scrobogna
Abstract:
The present work is devoted to the analysis of density-dependent, incompressible fluids in a 3D torus, when the Froude number $\varepsilon$ goes to zero. We consider the very general case where the initial data do not have a zero horizontal average, where we only have smoothing effect on the velocity but not on the density and where we can have resonant phenomena on the domain. We explicitly deter…
▽ More
The present work is devoted to the analysis of density-dependent, incompressible fluids in a 3D torus, when the Froude number $\varepsilon$ goes to zero. We consider the very general case where the initial data do not have a zero horizontal average, where we only have smoothing effect on the velocity but not on the density and where we can have resonant phenomena on the domain. We explicitly determine the limit system when $\varepsilon \to 0$ and prove its global wellposedness. Finally, we prove that for large initial data, the density-dependent, incompressible fluid system is globally wellposed, provided that $\varepsilon$ is small enough.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.