-
A Multivocal Review of MLOps Practices, Challenges and Open Issues
Authors:
Beyza Eken,
Samodha Pallewatta,
Nguyen Khoi Tran,
Ayse Tosun,
Muhammad Ali Babar
Abstract:
With the increasing trend of Machine Learning (ML) enabled software applications, the paradigm of ML Operations (MLOps) has gained tremendous attention of researchers and practitioners. MLOps encompasses the practices and technologies for streamlining the resources and monitoring needs of operationalizing ML models. Software development practitioners need access to the detailed and easily understa…
▽ More
With the increasing trend of Machine Learning (ML) enabled software applications, the paradigm of ML Operations (MLOps) has gained tremendous attention of researchers and practitioners. MLOps encompasses the practices and technologies for streamlining the resources and monitoring needs of operationalizing ML models. Software development practitioners need access to the detailed and easily understandable knowledge of MLOps workflows, practices, challenges and solutions to effectively and efficiently support the adoption of MLOps. Whilst the academic and industry literature on the MLOps has been growing rapidly, there have been relatively a few attempts at systematically synthesizing and analyzing the vast amount of existing literature of MLOps for improving ease of access and understanding. We conducted a Multivocal Literature Review (MLR) of 150 relevant academic studies and 48 gray literature to provide a comprehensive body of knowledge on MLOps. Through this MLR, we identified the emerging MLOps practices, adoption challenges and solutions related to various areas, including development and operation of complex pipelines, managing production at scale, managing artifacts, and ensuring quality, security, governance, and ethical aspects. We also report the socio-technical aspect of MLOps relating to diverse roles involved and collaboration practices across them through the MLOps lifecycle. We assert that this MLR provides valuable insights to researchers and practitioners seeking to navigate the rapidly evolving landscape of MLOps. We also identify the open issues that need to be addressed in order to advance the current state-of-the-art of MLOps.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities
Authors:
David Stap,
Eva Hasler,
Bill Byrne,
Christof Monz,
Ke Tran
Abstract:
Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality. However, it is unclear what is the impact of fine-tuning on desirable LLM behaviors that are not present in neural machine translation models, such as steerability, inherent document-level translation abilities, and the ability to produce less literal translations. We perform an…
▽ More
Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality. However, it is unclear what is the impact of fine-tuning on desirable LLM behaviors that are not present in neural machine translation models, such as steerability, inherent document-level translation abilities, and the ability to produce less literal translations. We perform an extensive translation evaluation on the LLaMA and Falcon family of models with model size ranging from 7 billion up to 65 billion parameters. Our results show that while fine-tuning improves the general translation quality of LLMs, several abilities degrade. In particular, we observe a decline in the ability to perform formality steering, to produce technical translations through few-shot examples, and to perform document-level translation. On the other hand, we observe that the model produces less literal translations after fine-tuning on parallel data. We show that by including monolingual data as part of the fine-tuning data we can maintain the abilities while simultaneously enhancing overall translation quality. Our findings emphasize the need for fine-tuning strategies that preserve the benefits of LLMs for machine translation.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
UCCIX: Irish-eXcellence Large Language Model
Authors:
Khanh-Tung Tran,
Barry O'Sullivan,
Hoang D. Nguyen
Abstract:
The development of Large Language Models (LLMs) has predominantly focused on high-resource languages, leaving extremely low-resource languages like Irish with limited representation. This work presents UCCIX, a pioneering effort on the development of an open-source Irish-based LLM. We propose a novel framework for continued pre-training of LLMs specifically adapted for extremely low-resource langu…
▽ More
The development of Large Language Models (LLMs) has predominantly focused on high-resource languages, leaving extremely low-resource languages like Irish with limited representation. This work presents UCCIX, a pioneering effort on the development of an open-source Irish-based LLM. We propose a novel framework for continued pre-training of LLMs specifically adapted for extremely low-resource languages, requiring only a fraction of the textual data typically needed for training LLMs according to scaling laws. Our model, based on Llama 2-13B, outperforms much larger models on Irish language tasks with up to 12% performance improvement, showcasing the effectiveness and efficiency of our approach. We also contribute comprehensive Irish benchmarking datasets, including IrishQA, a question-answering dataset, and Irish version of MT-bench. These datasets enable rigorous evaluation and facilitate future research in Irish LLM systems. Our work aims to preserve and promote the Irish language, knowledge, and culture of Ireland in the digital era while providing a framework for adapting LLMs to other indigenous languages.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
Authors:
Kim Hoang Tran,
Phuc Vuong Do,
Ngoc Quoc Ly,
Ngan Le
Abstract:
Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuance…
▽ More
Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuances of the scene and struggle with detecting actions that occupy a small portion of the frame. In particular, they face difficulties when dealing with action classes involving small objects, such as balls or yellow/red cards in soccer, which only occupy a fraction of the screen space. To address these challenges, we introduce a novel approach that analyzes and models scene entities using an adaptive attention mechanism. Particularly, our model disentangles the scene content into the global environment feature and local relevant scene entities feature. To efficiently extract environmental features while considering temporal information with less computational cost, we propose the use of a 2D backbone network with a time-shift mechanism. To accurately capture relevant scene entities, we employ a Vision-Language model in conjunction with the adaptive attention mechanism. Our model has demonstrated outstanding performance, securing the 1st place in the SoccerNet-v2 Action Spotting, FineDiving, and FineGym challenge with a substantial performance improvement of 1.6, 2.0, and 1.3 points in avg-mAP compared to the runner-up methods. Furthermore, our approach offers interpretability capabilities in contrast to other deep learning models, which are often designed as black boxes. Our code and models are released at: https://github.com/Fsoft-AIC/unifying-global-local-feature.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Uniform $\mathcal{C}^k$ Approximation of $G$-Invariant and Antisymmetric Functions, Embedding Dimensions, and Polynomial Representations
Authors:
Soumya Ganguly,
Khoa Tran,
Rahul Sarkar
Abstract:
For any subgroup $G$ of the symmetric group $\mathcal{S}_n$ on $n$ symbols, we present results for the uniform $\mathcal{C}^k$ approximation of $G$-invariant functions by $G$-invariant polynomials. For the case of totally symmetric functions ($G = \mathcal{S}_n$), we show that this gives rise to the sum-decomposition Deep Sets ansatz of Zaheer et al. (2018), where both the inner and outer function…
▽ More
For any subgroup $G$ of the symmetric group $\mathcal{S}_n$ on $n$ symbols, we present results for the uniform $\mathcal{C}^k$ approximation of $G$-invariant functions by $G$-invariant polynomials. For the case of totally symmetric functions ($G = \mathcal{S}_n$), we show that this gives rise to the sum-decomposition Deep Sets ansatz of Zaheer et al. (2018), where both the inner and outer functions can be chosen to be smooth, and moreover, the inner function can be chosen to be independent of the target function being approximated. In particular, we show that the embedding dimension required is independent of the regularity of the target function, the accuracy of the desired approximation, as well as $k$. Next, we show that a similar procedure allows us to obtain a uniform $\mathcal{C}^k$ approximation of antisymmetric functions as a sum of $K$ terms, where each term is a product of a smooth totally symmetric function and a smooth antisymmetric homogeneous polynomial of degree at most $\binom{n}{2}$. We also provide upper and lower bounds on $K$ and show that $K$ is independent of the regularity of the target function, the desired approximation accuracy, and $k$.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
How good are my search strings? Reflections on using an existing review as a quasi-gold standard
Authors:
Huynh Khanh Vi Tran,
Jürgen Börstler,
Nauman Bin Ali,
Michael Unterkalmsteiner
Abstract:
Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on se…
▽ More
Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on search validation using a quasi-gold standard (QGS). Furthermore, we aim at providing guidelines for search string validation. Method: We use a recently completed tertiary study as a case and complement our findings with the observations from other researchers studying and advancing EBSE. Results: We found that the issue of assessing QGS quality has not seen much attention in the literature, and the validation of automated searches in SLSs could be improved. Hence, we propose to extend the current search validation approach by the additional analysis step of the automated search validation results and provide recommendations for the QGS construction. Conclusion: In this paper, we report on new issues which could affect search completeness in SLSs. Furthermore, the proposed guideline and recommendations could help researchers implement a more reliable search strategy in their SLSs.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Assessing test artifact quality -- A tertiary study
Authors:
Huynh Khanh Vi Tran,
Michael Unterkalmsteiner,
Jürgen Börstler,
Nauman bin Ali
Abstract:
Context: Modern software development increasingly relies on software testing for an ever more frequent delivery of high quality software. This puts high demands on the quality of the central artifacts in software testing, test suites and test cases. Objective: We aim to develop a comprehensive model for capturing the dimensions of test case/suite quality, which are relevant for a variety of perspe…
▽ More
Context: Modern software development increasingly relies on software testing for an ever more frequent delivery of high quality software. This puts high demands on the quality of the central artifacts in software testing, test suites and test cases. Objective: We aim to develop a comprehensive model for capturing the dimensions of test case/suite quality, which are relevant for a variety of perspectives. Method: We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts. Results: We identified 49 relevant secondary studies. Of these 49 studies, less than half did some form of quality appraisal of the included primary studies and only 3 took into account the quality of the primary study when synthesizing the results. We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated. We also provide a comprehensive model of test case/suite quality with definitions for the quality attributes and measurements based on findings in the literature and ISO/IEC 25010:2011. Conclusion: The test artifact quality model presented in the paper can be used to support test artifact quality assessment and improvement initiatives in practice. Furtherm Information and Software Technology 139 (2021): 106620ore, the model can also be used as a framework for documenting context characteristics to make research results more accessible for research and practice.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Reducing LLM Hallucinations using Epistemic Neural Networks
Authors:
Shreyas Verma,
Kien Tran,
Yusuf Ali,
Guangyu Min
Abstract:
Reducing and detecting hallucinations in large language models is an open research problem. In this project, we attempt to leverage recent advances in the field of uncertainty estimation to reduce hallucinations in frozen large language models. Epistemic neural networks have recently been proposed to improve output joint distributions for large pre-trained models. ENNs are small networks attached…
▽ More
Reducing and detecting hallucinations in large language models is an open research problem. In this project, we attempt to leverage recent advances in the field of uncertainty estimation to reduce hallucinations in frozen large language models. Epistemic neural networks have recently been proposed to improve output joint distributions for large pre-trained models. ENNs are small networks attached to large, frozen models to improve the model's joint distributions and uncertainty estimates. In this work, we train an epistemic neural network on top of the Llama-2 7B model combined with a contrastive decoding feature enhancement technique. We are the first to train an ENN for the next token prediction task and explore the efficacy of this method in reducing hallucinations on the TruthfulQA dataset. In essence, we provide a method that leverages a pre-trained model's latent embeddings to reduce hallucinations.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese
Authors:
Khiem Vinh Tran,
Hao Phu Phan,
Kiet Van Nguyen,
Ngan Luu Thuy Nguyen
Abstract:
In recent years, Visual Question Answering (VQA) has gained significant attention for its diverse applications, including intelligent car assistance, aiding visually impaired individuals, and document image information retrieval using natural language queries. VQA requires effective integration of information from questions and images to generate accurate answers. Neural models for VQA have made r…
▽ More
In recent years, Visual Question Answering (VQA) has gained significant attention for its diverse applications, including intelligent car assistance, aiding visually impaired individuals, and document image information retrieval using natural language queries. VQA requires effective integration of information from questions and images to generate accurate answers. Neural models for VQA have made remarkable progress on large-scale datasets, with a primary focus on resource-rich languages like English. To address this, we introduce the ViCLEVR dataset, a pioneering collection for evaluating various visual reasoning capabilities in Vietnamese while mitigating biases. The dataset comprises over 26,000 images and 30,000 question-answer pairs (QAs), each question annotated to specify the type of reasoning involved. Leveraging this dataset, we conduct a comprehensive analysis of contemporary visual reasoning systems, offering valuable insights into their strengths and limitations. Furthermore, we present PhoVIT, a comprehensive multimodal fusion that identifies objects in images based on questions. The architecture effectively employs transformers to enable simultaneous reasoning over textual and visual data, merging both modalities at an early model stage. The experimental findings demonstrate that our proposed model achieves state-of-the-art performance across four evaluation metrics. The accompanying code and dataset have been made publicly accessible at \url{https://github.com/kvt0012/ViCLEVR}. This provision seeks to stimulate advancements within the research community, fostering the development of more multimodal fusion algorithms, specifically tailored to address the nuances of low-resource languages, exemplified by Vietnamese.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Generative Pre-trained Transformer for Vietnamese Community-based COVID-19 Question Answering
Authors:
Tam Minh Vo,
Khiem Vinh Tran
Abstract:
Recent studies have provided empirical evidence of the wide-ranging potential of Generative Pre-trained Transformer (GPT), a pretrained language model, in the field of natural language processing. GPT has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems, yielding exceptional performance across various tasks. However, the current research landscape co…
▽ More
Recent studies have provided empirical evidence of the wide-ranging potential of Generative Pre-trained Transformer (GPT), a pretrained language model, in the field of natural language processing. GPT has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems, yielding exceptional performance across various tasks. However, the current research landscape concerning GPT's application in Vietnamese remains limited. This paper aims to address this gap by presenting an implementation of GPT-2 for community-based question answering specifically focused on COVID-19 related queries in Vietnamese. We introduce a novel approach by conducting a comparative analysis of different Transformers vs SOTA models in the community-based COVID-19 question answering dataset. The experimental findings demonstrate that the GPT-2 models exhibit highly promising outcomes, outperforming other SOTA models as well as previous community-based COVID-19 question answering models developed for Vietnamese.
△ Less
Submitted 31 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Multimodal Graph Learning for Modeling Emerging Pandemics with Big Data
Authors:
Khanh-Tung Tran,
Truong Son Hy,
Lili Jiang,
Xuan-Son Vu
Abstract:
Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework called MGL4MEP that integrates temporal graph…
▽ More
Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework called MGL4MEP that integrates temporal graph neural networks and multi-modal data for learning and forecasting. We incorporate big data sources, including social media content, by utilizing specific pre-trained language models and discovering the underlying graph structure among users. This integration provides rich indicators of pandemic dynamics through learning with temporal graph neural networks. Extensive experiments demonstrate the effectiveness of our framework in pandemic forecasting and analysis, outperforming baseline methods across different areas, pandemic situations, and prediction horizons. The fusion of temporal graph learning and multi-modal data enables a comprehensive understanding of the pandemic landscape with less time lag, cheap cost, and more potential information indicators.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Robust-MBFD: A Robust Deep Learning System for Motor Bearing Faults Detection Using Multiple Deep Learning Training Strategies and A Novel Double Loss Function
Authors:
Khoa Tran,
Lam Pham,
Hai-Canh Vu
Abstract:
This paper presents a comprehensive analysis of motor bearing fault detection (MBFD), which involves the task of identifying faults in a motor bearing based on its vibration. To this end, we first propose and evaluate various machine learning based systems for the MBFD task. Furthermore, we propose three deep learning based systems for the MBFD task, each of which explores one of the following tra…
▽ More
This paper presents a comprehensive analysis of motor bearing fault detection (MBFD), which involves the task of identifying faults in a motor bearing based on its vibration. To this end, we first propose and evaluate various machine learning based systems for the MBFD task. Furthermore, we propose three deep learning based systems for the MBFD task, each of which explores one of the following training strategies: supervised learning, semi-supervised learning, and unsupervised learning. The proposed machine learning based systems and deep learning based systems are evaluated, compared, and then they are used to identify the best model for the MBFD task. We conducted extensive experiments on various benchmark datasets of motor bearing faults, including those from the American Society for Mechanical Failure Prevention Technology (MFPT), Case Western Reserve University Bearing Center (CWRU), and the Condition Monitoring of Bearing Damage in Electromechanical Drive Systems from Paderborn University (PU). The experimental results on different datasets highlight two main contributions of this study. First, we prove that deep learning based systems are more effective than machine learning based systems for the MBFD task. Second, we achieve a robust and general deep learning based system with a novel loss function for the MBFD task on several benchmark datasets, demonstrating its potential for real-life MBFD applications.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Filling the Holes on 3D Heritage Object Surface based on Automatic Segmentation Algorithm
Authors:
Sinh Van Nguyen,
Son Thanh Le,
Minh Khai Tran,
Le Thanh Sach
Abstract:
Reconstructing and processing the 3D objects are popular activities in the research field of computer graphics, image processing and computer vision. The 3D objects are processed based on the methods like geometric modeling, a branch of applied mathematics and computational geometry, or the machine learning algorithms based on image processing. The computation of geometrical objects includes proce…
▽ More
Reconstructing and processing the 3D objects are popular activities in the research field of computer graphics, image processing and computer vision. The 3D objects are processed based on the methods like geometric modeling, a branch of applied mathematics and computational geometry, or the machine learning algorithms based on image processing. The computation of geometrical objects includes processing the curves and surfaces, subdivision, simplification, meshing, holes filling, reconstructing, and refining the 3D surface objects on both point cloud data and triangular mesh. While the machine learning methods are developed using deep learning models. With the support of 3D laser scan devices and Lidar techniques, the obtained dataset is close to original shape of the real objects. Besides, the photography and its application based on the modern techniques in recent years help us collect data and process the 3D models more precise. This article proposes an improved method for filling holes on the 3D object surface based on an automatic segmentation. Instead of filling the hole directly as the existing methods, we now subdivide the hole before filling it. The hole is first determined and segmented automatically based on computation of its local curvature. It is then filled on each part of the hole to match its local curvature shape. The method can work on both 3D point cloud surfaces and triangular mesh surface. Comparing to the state of the art methods, our proposed method obtained higher accuracy of the reconstructed 3D objects.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
An Empirically Grounded Reference Architecture for Software Supply Chain Metadata Management
Authors:
Nguyen Khoi Tran,
Samodha Pallewatta,
M. Ali Babar
Abstract:
With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Ado…
▽ More
With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Adopting SSC metadata requires organisations to procure or develop a Software Supply Chain Metadata Management system (SCM2), a suite of software tools for performing life cycle activities of SSC metadata documents such as creation, signing, distribution, and consumption. Selecting or developing an SCM2 is challenging due to the lack of a comprehensive domain model and architectural blueprint to aid practitioners in navigating the vast design space of SSC metadata terminologies, frameworks, and solutions. This paper addresses the above-mentioned challenge by presenting an empirically grounded Reference Architecture (RA) comprising of a domain model and an architectural blueprint for SCM2 systems. Our proposed RA is constructed systematically on an empirical foundation built with industry-driven and peer-reviewed SSC security frameworks. Our theoretical evaluation, which consists of an architectural mapping of five prominent SSC security tools on the RA, ensures its validity and applicability, thus affirming the proposed RA as an effective framework for analysing existing SCM2 solutions and guiding the engineering of new SCM2 systems.
△ Less
Submitted 8 June, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Safe Stabilizing Control for Polygonal Robots in Dynamic Elliptical Environments
Authors:
Kehan Long,
Khoa Tran,
Melvin Leok,
Nikolay Atanasov
Abstract:
This paper addresses the challenge of safe navigation for rigid-body mobile robots in dynamic environments. We introduce an analytic approach to compute the distance between a polygon and an ellipse, and employ it to construct a control barrier function (CBF) for safe control synthesis. Existing CBF design methods for mobile robot obstacle avoidance usually assume point or circular robots, prevent…
▽ More
This paper addresses the challenge of safe navigation for rigid-body mobile robots in dynamic environments. We introduce an analytic approach to compute the distance between a polygon and an ellipse, and employ it to construct a control barrier function (CBF) for safe control synthesis. Existing CBF design methods for mobile robot obstacle avoidance usually assume point or circular robots, preventing their applicability to more realistic robot body geometries. Our work enables CBF designs that capture complex robot and obstacle shapes. We demonstrate the effectiveness of our approach in simulations highlighting real-time obstacle avoidance in constrained and dynamic environments for both mobile robots and multi-joint robot arms.
△ Less
Submitted 30 April, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Test-Case Quality -- Understanding Practitioners' Perspectives
Authors:
Huynh Khanh Vi Tran,
Nauman Bin Ali,
Jürgen Börstler,
Michael Unterkalmsteiner
Abstract:
Background: Test-case quality has always been one of the major concerns in software testing. To improve test-case quality, it is important to better understand how practitioners perceive the quality of test-cases. Objective: Motivated by that need, we investigated how practitioners define test-case quality and which aspects of test-cases are important for quality assessment. Method: We conducted s…
▽ More
Background: Test-case quality has always been one of the major concerns in software testing. To improve test-case quality, it is important to better understand how practitioners perceive the quality of test-cases. Objective: Motivated by that need, we investigated how practitioners define test-case quality and which aspects of test-cases are important for quality assessment. Method: We conducted semi-structured interviews with professional developers, testers and test architects from a multinational software company in Sweden. Before the interviews, we asked participants for actual test cases (written in natural language) that they perceive as good, normal, and bad respectively together with rationales for their assessment. We also compared their opinions on shared test cases and contrasted their views with the relevant literature. Results: We present a quality model which consists of 11 test-case quality attributes. We also identify a misalignment in defining test-case quality among practitioners and between academia and industry, along with suggestions for improving test-case quality in industry. Conclusion: The results show that practitioners' background, including roles and working experience, are critical dimensions of how test-case quality is defined and assessed.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
License Plate Recognition Based On Multi-Angle View Model
Authors:
Dat Tran-Anh,
Khanh Linh Tran,
Hoai-Nam Vu
Abstract:
In the realm of research, the detection/recognition of text within images/videos captured by cameras constitutes a highly challenging problem for researchers. Despite certain advancements achieving high accuracy, current methods still require substantial improvements to be applicable in practical scenarios. Diverging from text detection in images/videos, this paper addresses the issue of text dete…
▽ More
In the realm of research, the detection/recognition of text within images/videos captured by cameras constitutes a highly challenging problem for researchers. Despite certain advancements achieving high accuracy, current methods still require substantial improvements to be applicable in practical scenarios. Diverging from text detection in images/videos, this paper addresses the issue of text detection within license plates by amalgamating multiple frames of distinct perspectives. For each viewpoint, the proposed method extracts descriptive features characterizing the text components of the license plate, specifically corner points and area. Concretely, we present three viewpoints: view-1, view-2, and view-3, to identify the nearest neighboring components facilitating the restoration of text components from the same license plate line based on estimations of similarity levels and distance metrics. Subsequently, we employ the CnOCR method for text recognition within license plates. Experimental results on the self-collected dataset (PTITPlates), comprising pairs of images in various scenarios, and the publicly available Stanford Cars Dataset, demonstrate the superiority of the proposed method over existing approaches.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Addressing the Scalability Bottleneck of Semantic Technologies at Bosch
Authors:
Diego Rincon-Yanez,
Mohamed H. Gad-Elrab,
Daria Stepanova,
Kien Trung Tran,
Cuong Chu Xuan,
Baifan Zhou,
Evgeny Karlamov
Abstract:
At the heart of smart manufacturing is real-time semi-automatic decision-making. Such decisions are vital for optimizing production lines, e.g., reducing resource consumption, improving the quality of discrete manufacturing operations, and optimizing the actual products, e.g., optimizing the sampling rate for measuring product dimensions during production. Such decision-making relies on massive in…
▽ More
At the heart of smart manufacturing is real-time semi-automatic decision-making. Such decisions are vital for optimizing production lines, e.g., reducing resource consumption, improving the quality of discrete manufacturing operations, and optimizing the actual products, e.g., optimizing the sampling rate for measuring product dimensions during production. Such decision-making relies on massive industrial data thus posing a real-time processing bottleneck.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Robust-MBDL: A Robust Multi-branch Deep Learning Based Model for Remaining Useful Life Prediction and Operational Condition Identification of Rotating Machines
Authors:
Khoa Tran,
Hai-Canh Vu,
Lam Pham,
Nassim Boudaoud
Abstract:
In this paper, a Robust Multi-branch Deep learning-based system for remaining useful life (RUL) prediction and condition operations (CO) identification of rotating machines is proposed. In particular, the proposed system comprises main components: (1) an LSTM-Autoencoder to denoise the vibration data; (2) a feature extraction to generate time-domain, frequency-domain, and time-frequency based feat…
▽ More
In this paper, a Robust Multi-branch Deep learning-based system for remaining useful life (RUL) prediction and condition operations (CO) identification of rotating machines is proposed. In particular, the proposed system comprises main components: (1) an LSTM-Autoencoder to denoise the vibration data; (2) a feature extraction to generate time-domain, frequency-domain, and time-frequency based features from the denoised data; (3) a novel and robust multi-branch deep learning network architecture to exploit the multiple features. The performance of our proposed system was evaluated and compared to the state-of-the-art systems on two benchmark datasets of XJTU-SY and PRONOSTIA. The experimental results prove that our proposed system outperforms the state-of-the-art systems and presents potential for real-life applications on bearing machines.
△ Less
Submitted 14 December, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
Authors:
Seamless Communication,
Loïc Barrault,
Yu-An Chung,
Mariano Cora Meglioli,
David Dale,
Ning Dong,
Paul-Ambroise Duquenne,
Hady Elsahar,
Hongyu Gong,
Kevin Heffernan,
John Hoffman,
Christopher Klaiber,
Pengwei Li,
Daniel Licht,
Jean Maillard,
Alice Rakotoarison,
Kaushik Ram Sadagopan,
Guillaume Wenzek,
Ethan Ye,
Bapi Akula,
Peng-Jen Chen,
Naji El Hachem,
Brian Ellis,
Gabriel Mejia Gonzalez,
Justin Haaheim
, et al. (43 additional authors not shown)
Abstract:
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s…
▽ More
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication
△ Less
Submitted 24 October, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
VBD-MT Chinese-Vietnamese Translation Systems for VLSP 2022
Authors:
Hai Long Trieu,
Song Kiet Bui,
Tan Minh Tran,
Van Khanh Tran,
Hai An Nguyen
Abstract:
We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method fo…
▽ More
We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method for backtranslation, which leverage large scale available monolingual data. Additionally, several other methods are applied to improve the translation quality including ensembling and postprocessing. We achieve 38.9 BLEU on ChineseVietnamese and 38.0 BLEU on VietnameseChinese on the public test sets, which outperform several strong baselines.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering
Authors:
Khiem Vinh Tran,
Kiet Van Nguyen,
Ngan Luu Thuy Nguyen
Abstract:
Visual Question Answering (VQA) is an intricate and demanding task that integrates natural language processing (NLP) and computer vision (CV), capturing the interest of researchers. The English language, renowned for its wealth of resources, has witnessed notable advancements in both datasets and models designed for VQA. However, there is a lack of models that target specific countries such as Vie…
▽ More
Visual Question Answering (VQA) is an intricate and demanding task that integrates natural language processing (NLP) and computer vision (CV), capturing the interest of researchers. The English language, renowned for its wealth of resources, has witnessed notable advancements in both datasets and models designed for VQA. However, there is a lack of models that target specific countries such as Vietnam. To address this limitation, we introduce a transformer-based Vietnamese model named BARTPhoBEiT. This model includes pre-trained Sequence-to-Sequence and bidirectional encoder representation from Image Transformers in Vietnamese and evaluates Vietnamese VQA datasets. Experimental results demonstrate that our proposed model outperforms the strong baseline and improves the state-of-the-art in six metrics: Accuracy, Precision, Recall, F1-score, WUPS 0.0, and WUPS 0.9.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
ARIST: An Effective API Argument Recommendation Approach
Authors:
Son Nguyen,
Cuong Tran Manh,
Kien T. Tran,
Tan M. Nguyen,
Thu-Trang Nguyen,
Kien-Tuan Ngo,
Hieu Dinh Vo
Abstract:
Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers'…
▽ More
Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers' expectations when they define and use API methods. To implement this idea in the recommendation process, ARIST combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e.g., variables or method calls) in the given context. In ARIST, the LMs and the recommending features are used to suggest the promising candidates identified by PA. Meanwhile, PA navigates the LMs and the features working on the set of the valid candidates which satisfy syntax, accessibility, and type-compatibility constraints defined by the programming language in use. Our evaluation on a large dataset of real-world projects shows that ARIST improves the state-of-the-art approach by 19% and 18% in top-1 precision and recall for recommending arguments of frequently-used libraries. For general argument recommendation task, i.e., recommending arguments for every method call, ARIST outperforms the baseline approaches by up to 125% top-1 accuracy. Moreover, for newly-encountered projects, ARIST achieves more than 60% top-3 accuracy when evaluating on a larger dataset. For working/maintaining projects, with a personalized LM to capture developers' coding practice, ARIST can productively rank the expected arguments at the top-1 position in 7/10 requests.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Authors:
Kim Hoang Tran,
Anh Duy Le Dinh,
Tien Phat Nguyen,
Thinh Phan,
Pha Nguyen,
Khoa Luu,
Donald Adjeroh,
Gianfranco Doretto,
Ngan Hoang Le
Abstract:
Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach, requiring less prior information. However, current GMOT methods often rely on initial bounding boxes and struggle t…
▽ More
Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach, requiring less prior information. However, current GMOT methods often rely on initial bounding boxes and struggle to handle variations in factors such as viewpoint, lighting, occlusion, and scale, among others. Our contributions commence with the introduction of the \textit{Referring GMOT dataset} a collection of videos, each accompanied by detailed textual descriptions of their attributes. Subsequently, we propose $\mathtt{Z-GMOT}$, a cutting-edge tracking solution capable of tracking objects from \textit{never-seen categories} without the need of initial bounding boxes or predefined categories. Within our $\mathtt{Z-GMOT}$ framework, we introduce two novel components: (i) $\mathtt{iGLIP}$, an improved Grounded language-image pretraining, for accurately detecting unseen objects with specific characteristics. (ii) $\mathtt{MA-SORT}$, a novel object association approach that adeptly integrates motion and appearance-based matching strategies to tackle the complex task of tracking objects with high similarity. Our contributions are benchmarked through extensive experiments conducted on the Referring GMOT dataset for GMOT task. Additionally, to assess the generalizability of the proposed $\mathtt{Z-GMOT}$, we conduct ablation studies on the DanceTrack and MOT20 datasets for the MOT task. Our dataset, code, and models are released at: https://fsoft-aic.github.io/Z-GMOT.
△ Less
Submitted 13 June, 2024; v1 submitted 28 May, 2023;
originally announced May 2023.
-
FairDP: Certified Fairness with Differential Privacy
Authors:
Khang Tran,
Ferdinando Fioretto,
Issa Khalil,
My T. Thai,
NhatHai Phan
Abstract:
This paper introduces FairDP, a novel mechanism designed to achieve certified fairness with differential privacy (DP). FairDP independently trains models for distinct individual groups, using group-specific clipping terms to assess and bound the disparate impacts of DP. Throughout the training process, the mechanism progressively integrates knowledge from group models to formulate a comprehensive…
▽ More
This paper introduces FairDP, a novel mechanism designed to achieve certified fairness with differential privacy (DP). FairDP independently trains models for distinct individual groups, using group-specific clipping terms to assess and bound the disparate impacts of DP. Throughout the training process, the mechanism progressively integrates knowledge from group models to formulate a comprehensive model that balances privacy, utility, and fairness in downstream tasks. Extensive theoretical and empirical analyses validate the efficacy of FairDP and improved trade-offs between model utility, privacy, and fairness compared with existing methods.
△ Less
Submitted 21 August, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval
Authors:
Trung-Nghia Le,
Tam V. Nguyen,
Minh-Quan Le,
Trong-Thuan Nguyen,
Viet-Tham Huynh,
Trong-Le Do,
Khanh-Duy Le,
Mai-Khiem Tran,
Nhat Hoang-Xuan,
Thang-Long Nguyen-Ho,
Vinh-Tiep Nguyen,
Tuong-Nghiem Diep,
Khanh-Duy Ho,
Xuan-Hieu Nguyen,
Thien-Phuc Tran,
Tuan-Anh Yang,
Kim-Phat Tran,
Nhu-Vinh Hoang,
Minh-Quang Nguyen,
E-Ro Nguyen,
Minh-Khoi Nguyen-Nhat,
Tuan-An To,
Trung-Truc Huynh-Le,
Nham-Tan Nguyen,
Hoang-Chau Luong
, et al. (8 additional authors not shown)
Abstract:
3D object retrieval is an important yet challenging task that has drawn more and more attention in recent years. While existing approaches have made strides in addressing this issue, they are often limited to restricted settings such as image and sketch queries, which are often unfriendly interactions for common users. In order to overcome these limitations, this paper presents a novel SHREC chall…
▽ More
3D object retrieval is an important yet challenging task that has drawn more and more attention in recent years. While existing approaches have made strides in addressing this issue, they are often limited to restricted settings such as image and sketch queries, which are often unfriendly interactions for common users. In order to overcome these limitations, this paper presents a novel SHREC challenge track focusing on text-based fine-grained retrieval of 3D animal models. Unlike previous SHREC challenge tracks, the proposed task is considerably more challenging, requiring participants to develop innovative approaches to tackle the problem of text-based retrieval. Despite the increased difficulty, we believe this task can potentially drive useful applications in practice and facilitate more intuitive interactions with 3D objects. Five groups participated in our competition, submitting a total of 114 runs. While the results obtained in our competition are satisfactory, we note that the challenges presented by this task are far from fully solved. As such, we provide insights into potential areas for future research and improvements. We believe we can help push the boundaries of 3D object retrieval and facilitate more user-friendly interactions via vision-language technologies. https://aichallenge.hcmus.edu.vn/textanimar
△ Less
Submitted 9 August, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval
Authors:
Trung-Nghia Le,
Tam V. Nguyen,
Minh-Quan Le,
Trong-Thuan Nguyen,
Viet-Tham Huynh,
Trong-Le Do,
Khanh-Duy Le,
Mai-Khiem Tran,
Nhat Hoang-Xuan,
Thang-Long Nguyen-Ho,
Vinh-Tiep Nguyen,
Nhat-Quynh Le-Pham,
Huu-Phuc Pham,
Trong-Vu Hoang,
Quang-Binh Nguyen,
Trong-Hieu Nguyen-Mau,
Tuan-Luc Huynh,
Thanh-Danh Le,
Ngoc-Linh Nguyen-Ha,
Tuong-Vy Truong-Thuy,
Truong Hoai Phong,
Tuong-Nghiem Diep,
Khanh-Duy Ho,
Xuan-Hieu Nguyen,
Thien-Phuc Tran
, et al. (9 additional authors not shown)
Abstract:
The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this…
▽ More
The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this end, we introduce a novel SHREC challenge track that focuses on retrieving relevant 3D animal models from a dataset using sketch queries and expedites accessing 3D models through available sketches. Furthermore, a new dataset named ANIMAR was constructed in this study, comprising a collection of 711 unique 3D animal models and 140 corresponding sketch queries. Our contest requires participants to retrieve 3D models based on complex and detailed sketches. We receive satisfactory results from eight teams and 204 runs. Although further improvement is necessary, the proposed task has the potential to incentivize additional research in the domain of 3D object retrieval, potentially yielding benefits for a wide range of applications. We also provide insights into potential areas of future research, such as improving techniques for feature extraction and matching and creating more diverse datasets to evaluate retrieval performance. https://aichallenge.hcmus.edu.vn/sketchanimar
△ Less
Submitted 9 August, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Active Membership Inference Attack under Local Differential Privacy in Federated Learning
Authors:
Truc Nguyen,
Phung Lai,
Khang Tran,
NhatHai Phan,
My T. Thai
Abstract:
Federated learning (FL) was originally regarded as a framework for collaborative learning among clients with data privacy protection through a coordinating server. In this paper, we propose a new active membership inference (AMI) attack carried out by a dishonest server in FL. In AMI attacks, the server crafts and embeds malicious parameters into global models to effectively infer whether a target…
▽ More
Federated learning (FL) was originally regarded as a framework for collaborative learning among clients with data privacy protection through a coordinating server. In this paper, we propose a new active membership inference (AMI) attack carried out by a dishonest server in FL. In AMI attacks, the server crafts and embeds malicious parameters into global models to effectively infer whether a target data sample is included in a client's private training data or not. By exploiting the correlation among data features through a non-linear decision boundary, AMI attacks with a certified guarantee of success can achieve severely high success rates under rigorous local differential privacy (LDP) protection; thereby exposing clients' training data to significant privacy risk. Theoretical and experimental results on several benchmark datasets show that adding sufficient privacy-preserving noise to prevent our attack would significantly damage FL's model utility.
△ Less
Submitted 24 July, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
EVJVQA Challenge: Multilingual Visual Question Answering
Authors:
Ngan Luu-Thuy Nguyen,
Nghia Hieu Nguyen,
Duong T. D Vo,
Khanh Quoc Tran,
Kiet Van Nguyen
Abstract:
Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addi…
▽ More
Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems. We released the challenge on the Codalab evaluation system for further research.
△ Less
Submitted 17 April, 2024; v1 submitted 22 February, 2023;
originally announced February 2023.
-
ViHOS: Hate Speech Spans Detection for Vietnamese
Authors:
Phu Gia Hoang,
Canh Duc Luu,
Khanh Quoc Tran,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
The rise in hateful and offensive language directed at other users is one of the adverse side effects of the increased use of social networking platforms. This could make it difficult for human moderators to review tagged comments filtered by classification systems. To help address this issue, we present the ViHOS (Vietnamese Hate and Offensive Spans) dataset, the first human-annotated corpus cont…
▽ More
The rise in hateful and offensive language directed at other users is one of the adverse side effects of the increased use of social networking platforms. This could make it difficult for human moderators to review tagged comments filtered by classification systems. To help address this issue, we present the ViHOS (Vietnamese Hate and Offensive Spans) dataset, the first human-annotated corpus containing 26k spans on 11k comments. We also provide definitions of hateful and offensive spans in Vietnamese comments as well as detailed annotation guidelines. Besides, we conduct experiments with various state-of-the-art models. Specifically, XLM-R$_{Large}$ achieved the best F1-scores in Single span detection and All spans detection, while PhoBERT$_{Large}$ obtained the highest in Multiple spans detection. Finally, our error analysis demonstrates the difficulties in detecting specific types of spans in our data for future research.
Disclaimer: This paper contains real comments that could be considered profane, offensive, or abusive.
△ Less
Submitted 26 January, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
Proof of Swarm Based Ensemble Learning for Federated Learning Applications
Authors:
Ali Raza,
Kim Phuc Tran,
Ludovic Koehl,
Shujun Li
Abstract:
Ensemble learning combines results from multiple machine learning models in order to provide a better and optimised predictive model with reduced bias, variance and improved predictions. However, in federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns. Hence, a mechanism is required to combine results of local models to produce a global mode…
▽ More
Ensemble learning combines results from multiple machine learning models in order to provide a better and optimised predictive model with reduced bias, variance and improved predictions. However, in federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns. Hence, a mechanism is required to combine results of local models to produce a global model. Most distributed consensus algorithms, such as Byzantine fault tolerance (BFT), do not normally perform well in such applications. This is because, in such methods predictions of some of the peers are disregarded, so a majority of peers can win without even considering other peers' decisions. Additionally, the confidence score of the result of each peer is not normally taken into account, although it is an important feature to consider for ensemble learning. Moreover, the problem of a tie event is often left un-addressed by methods such as BFT. To fill these research gaps, we propose PoSw (Proof of Swarm), a novel distributed consensus algorithm for ensemble learning in a federated setting, which was inspired by particle swarm based algorithms for solving optimisation problems. The proposed algorithm is theoretically proved to always converge in a relatively small number of steps and has mechanisms to resolve tie events while trying to achieve sub-optimum solutions. We experimentally validated the performance of the proposed algorithm using ECG classification as an example application in healthcare, showing that the ensemble learning model outperformed all local models and even the FL-based global model. To the best of our knowledge, the proposed algorithm is the first attempt to make consensus over the output results of distributed models trained using federated learning.
△ Less
Submitted 2 January, 2023; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Differentiable Physics-based Greenhouse Simulation
Authors:
Nhat M. Nguyen,
Hieu T. Tran,
Minh V. Duong,
Hanh Bui,
Kenneth Tran
Abstract:
We present a differentiable greenhouse simulation model based on physical processes whose parameters can be obtained by training from real data. The physics-based simulation model is fully interpretable and is able to do state prediction for both climate and crop dynamics in the greenhouse over very a long time horizon. The model works by constructing a system of linear differential equations and…
▽ More
We present a differentiable greenhouse simulation model based on physical processes whose parameters can be obtained by training from real data. The physics-based simulation model is fully interpretable and is able to do state prediction for both climate and crop dynamics in the greenhouse over very a long time horizon. The model works by constructing a system of linear differential equations and solving them to obtain the next state. We propose a procedure to solve the differential equations, handle the problem of missing unobservable states in the data, and train the model efficiently. Our experiment shows the procedure is effective. The model improves significantly after training and can simulate a greenhouse that grows cucumbers accurately.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
A Comparative Study of Question Answering over Knowledge Bases
Authors:
Khiem Vinh Tran,
Hao Phu Phan,
Khang Nguyen Duc Quach,
Ngan Luu-Thuy Nguyen,
Jun Jo,
Thanh Tam Nguyen
Abstract:
Question answering over knowledge bases (KBQA) has become a popular approach to help users extract information from knowledge bases. Although several systems exist, choosing one suitable for a particular application scenario is difficult. In this article, we provide a comparative study of six representative KBQA systems on eight benchmark datasets. In that, we study various question types, propert…
▽ More
Question answering over knowledge bases (KBQA) has become a popular approach to help users extract information from knowledge bases. Although several systems exist, choosing one suitable for a particular application scenario is difficult. In this article, we provide a comparative study of six representative KBQA systems on eight benchmark datasets. In that, we study various question types, properties, languages, and domains to provide insights on where existing systems struggle. On top of that, we propose an advanced mapping algorithm to aid existing models in achieving superior results. Moreover, we also develop a multilingual corpus COVID-KGQA, which encourages COVID-19 research and multilingualism for the diversity of future AI. Finally, we discuss the key findings and their implications as well as performance guidelines and some future improvements. Our source code is available at \url{https://github.com/tamlhp/kbqa}.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Speech-to-Speech Translation For A Real-world Unwritten Language
Authors:
Peng-Jen Chen,
Kevin Tran,
Yilin Yang,
Jingfei Du,
Justine Kao,
Yu-An Chung,
Paden Tomasello,
Paul-Ambroise Duquenne,
Holger Schwenk,
Hongyu Gong,
Hirofumi Inaguma,
Sravya Popuri,
Changhan Wang,
Juan Pino,
Wei-Ning Hsu,
Ann Lee
Abstract:
We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating…
▽ More
We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating human annotated data, automatically mining data from large unlabeled speech datasets, and adopting pseudo-labeling to produce weakly supervised data. On the modeling, we take advantage of recent advances in applying self-supervised discrete representations as target for prediction in S2ST and show the effectiveness of leveraging additional text supervision from Mandarin, a language similar to Hokkien, in model training. Finally, we release an S2ST benchmark set to facilitate future research in this field. The demo can be found at https://huggingface.co/spaces/facebook/Hokkien_Translation .
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Heterogeneous Randomized Response for Differential Privacy in Graph Neural Networks
Authors:
Khang Tran,
Phung Lai,
NhatHai Phan,
Issa Khalil,
Yao Ma,
Abdallah Khreishah,
My Thai,
Xintao Wu
Abstract:
Graph neural networks (GNNs) are susceptible to privacy inference attacks (PIAs), given their ability to learn joint representation from features and edges among nodes in graph data. To prevent privacy leakages in GNNs, we propose a novel heterogeneous randomized response (HeteroRR) mechanism to protect nodes' features and edges against PIAs under differential privacy (DP) guarantees without an un…
▽ More
Graph neural networks (GNNs) are susceptible to privacy inference attacks (PIAs), given their ability to learn joint representation from features and edges among nodes in graph data. To prevent privacy leakages in GNNs, we propose a novel heterogeneous randomized response (HeteroRR) mechanism to protect nodes' features and edges against PIAs under differential privacy (DP) guarantees without an undue cost of data and model utility in training GNNs. Our idea is to balance the importance and sensitivity of nodes' features and edges in redistributing the privacy budgets since some features and edges are more sensitive or important to the model utility than others. As a result, we derive significantly better randomization probabilities and tighter error bounds at both levels of nodes' features and edges departing from existing approaches, thus enabling us to maintain high data utility for training GNNs. An extensive theoretical and empirical analysis using benchmark datasets shows that HeteroRR significantly outperforms various baselines in terms of model utility under rigorous privacy protection for both nodes' features and edges. That enables us to defend PIAs in DP-preserving GNNs effectively.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
Does Joint Training Really Help Cascaded Speech Translation?
Authors:
Viet Anh Khoa Tran,
David Thulke,
Yingbo Gao,
Christian Herold,
Hermann Ney
Abstract:
Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results. However, fundamental challenges such as error propagation from the automatic speech recognition system still remain. To mitigate these problems, recently, people turn their attention to direct data and propose various joint training methods.…
▽ More
Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results. However, fundamental challenges such as error propagation from the automatic speech recognition system still remain. To mitigate these problems, recently, people turn their attention to direct data and propose various joint training methods. In this work, we seek to answer the question of whether joint training really helps cascaded speech translation. We review recent papers on the topic and also investigate a joint training criterion by marginalizing the transcription posterior probabilities. Our findings show that a strong cascaded baseline can diminish any improvements obtained using joint training, and we suggest alternatives to joint training. We hope this work can serve as a refresher of the current speech translation landscape, and motivate research in finding more efficient and creative ways to utilize the direct data for speech translation.
△ Less
Submitted 24 November, 2022; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context
Authors:
Lam Pham,
Dusan Salovic,
Anahid Jalali,
Alexander Schindler,
Khoa Tran,
Canh Vu,
Phu X. Nguyen
Abstract:
In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of Mobile…
▽ More
In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of MobileNetV1, MobileNetV2, VGG16, VGG19, ResNet50V2, ResNet152V2, DenseNet121, DenseNet201, and Xception. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages residual-inception architectures and multiple kernels. Given the novel residual-inception (NRI) model, we further evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events occurring in a sound scene recording can help to improve ASC accuracy, then indicate how a sound scene context is well presented by combining both sound scene and sound event information. We conduct extensive experiments on various ASC datasets, including Crowded Scenes, IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, 2022 Task 1. The experimental results on several different ASC challenges highlight two main achievements; the first is to propose robust, general, and low complexity ASC systems which are suitable for real-life applications on a wide range of edge devices and mobiles; the second is to propose an effective visualization method for comprehensively presenting a sound scene context.
△ Less
Submitted 16 October, 2022;
originally announced October 2022.
-
PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling Systems Evaluation
Authors:
Ishan Jindal,
Alexandre Rademaker,
Khoi-Nguyen Tran,
Huaiyu Zhu,
Hiroshi Kanayama,
Marina Danilevsky,
Yunyao Li
Abstract:
Semantic role labeling (SRL) identifies the predicate-argument structure in a sentence. This task is usually accomplished in four steps: predicate identification, predicate sense disambiguation, argument identification, and argument classification. Errors introduced at one step propagate to later steps. Unfortunately, the existing SRL evaluation scripts do not consider the full effect of this erro…
▽ More
Semantic role labeling (SRL) identifies the predicate-argument structure in a sentence. This task is usually accomplished in four steps: predicate identification, predicate sense disambiguation, argument identification, and argument classification. Errors introduced at one step propagate to later steps. Unfortunately, the existing SRL evaluation scripts do not consider the full effect of this error propagation aspect. They either evaluate arguments independent of predicate sense (CoNLL09) or do not evaluate predicate sense at all (CoNLL05), yielding an inaccurate SRL model performance on the argument classification task. In this paper, we address key practical issues with existing evaluation scripts and propose a more strict SRL evaluation metric PriMeSRL. We observe that by employing PriMeSRL, the quality evaluation of all SoTA SRL models drops significantly, and their relative rankings also change. We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Using Anomaly Detection to Detect Poisoning Attacks in Federated Learning Applications
Authors:
Ali Raza,
Shujun Li,
Kim-Phuc Tran,
Ludovic Koehl
Abstract:
Adversarial attacks such as poisoning attacks have attracted the attention of many machine learning researchers. Traditionally, poisoning attacks attempt to inject adversarial training data in order to manipulate the trained model. In federated learning (FL), data poisoning attacks can be generalized to model poisoning attacks, which cannot be detected by simpler methods due to the lack of access…
▽ More
Adversarial attacks such as poisoning attacks have attracted the attention of many machine learning researchers. Traditionally, poisoning attacks attempt to inject adversarial training data in order to manipulate the trained model. In federated learning (FL), data poisoning attacks can be generalized to model poisoning attacks, which cannot be detected by simpler methods due to the lack of access to local training data by the detector. State-of-the-art poisoning attack detection methods for FL have various weaknesses, e.g., the number of attackers has to be known or not high enough, working with i.i.d. data only, and high computational complexity. To overcome above weaknesses, we propose a novel framework for detecting poisoning attacks in FL, which employs a reference model based on a public dataset and an auditor model to detect malicious updates. We implemented a detector based on the proposed framework and using a one-class support vector machine (OC-SVM), which reaches the lowest possible computational complexity O(K) where K is the number of clients. We evaluated our detector's performance against state-of-the-art (SOTA) poisoning attacks for two typical applications of FL: electrocardiograph (ECG) classification and human activity recognition (HAR). Our experimental results validated the performance of our detector over other SOTA detection methods.
△ Less
Submitted 9 May, 2023; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Sockeye 3: Fast Neural Machine Translation with PyTorch
Authors:
Felix Hieber,
Michael Denkowski,
Tobias Domhan,
Barbara Darques Barros,
Celina Dong Ye,
Xing Niu,
Cuong Hoang,
Ke Tran,
Benjamin Hsu,
Maria Nadejde,
Surafel Lakew,
Prashant Mathur,
Anna Currey,
Marcello Federico
Abstract:
Sockeye 3 is the latest version of the Sockeye toolkit for Neural Machine Translation (NMT). Now based on PyTorch, Sockeye 3 provides faster model implementations and more advanced features with a further streamlined codebase. This enables broader experimentation with faster iteration, efficient training of stronger and faster models, and the flexibility to move new ideas quickly from research to…
▽ More
Sockeye 3 is the latest version of the Sockeye toolkit for Neural Machine Translation (NMT). Now based on PyTorch, Sockeye 3 provides faster model implementations and more advanced features with a further streamlined codebase. This enables broader experimentation with faster iteration, efficient training of stronger and faster models, and the flexibility to move new ideas quickly from research to production. When running comparable models, Sockeye 3 is up to 126% faster than other PyTorch implementations on GPUs and up to 292% faster on CPUs. Sockeye 3 is open source software released under the Apache 2.0 license.
△ Less
Submitted 2 August, 2022; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Remote Sensing Image Classification using Transfer Learning and Attention Based Deep Neural Network
Authors:
Lam Pham,
Khoa Tran,
Dat Ngo,
Jasmin Lampert,
Alexander Schindler
Abstract:
The task of remote sensing image scene classification (RSISC), which aims at classifying remote sensing images into groups of semantic categories based on their contents, has taken the important role in a wide range of applications such as urban planning, natural hazards detection, environment monitoring,vegetation mapping, or geospatial object detection. During the past years, research community…
▽ More
The task of remote sensing image scene classification (RSISC), which aims at classifying remote sensing images into groups of semantic categories based on their contents, has taken the important role in a wide range of applications such as urban planning, natural hazards detection, environment monitoring,vegetation mapping, or geospatial object detection. During the past years, research community focusing on RSISC task has shown significant effort to publish diverse datasets as well as propose different approaches to deal with the RSISC challenges. Recently, almost proposed RSISC systems base on deep learning models which prove powerful and outperform traditional approaches using image processing and machine learning. In this paper, we also leverage the power of deep learning technology, evaluate a variety of deep neural network architectures, indicate main factors affecting the performance of a RSISC system. Given the comprehensive analysis, we propose a deep learning based framework for RSISC, which makes use of the transfer learning technique and multihead attention scheme. The proposed deep learning framework is evaluated on the benchmark NWPU-RESISC45 dataset and achieves the best classification accuracy of 94.7% which shows competitive to the state-of-the-art systems and potential for real-life applications.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems
Authors:
Nguyen Khoi Tran,
Bushra Sabir,
M. Ali Babar,
Nini Cui,
Mehran Abolhasan,
Justin Lipman
Abstract:
Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fai…
▽ More
Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Vietnamese Hate and Offensive Detection using PhoBERT-CNN and Social Media Streaming Data
Authors:
Khanh Q. Tran,
An T. Nguyen,
Phu Gia Hoang,
Canh Duc Luu,
Trong-Hop Do,
Kiet Van Nguyen
Abstract:
Society needs to develop a system to detect hate and offense to build a healthy and safe environment. However, current research in this field still faces four major shortcomings, including deficient pre-processing techniques, indifference to data imbalance issues, modest performance models, and lacking practical applications. This paper focused on developing an intelligent system capable of addres…
▽ More
Society needs to develop a system to detect hate and offense to build a healthy and safe environment. However, current research in this field still faces four major shortcomings, including deficient pre-processing techniques, indifference to data imbalance issues, modest performance models, and lacking practical applications. This paper focused on developing an intelligent system capable of addressing these shortcomings. Firstly, we proposed an efficient pre-processing technique to clean comments collected from Vietnamese social media. Secondly, a novel hate speech detection (HSD) model, which is the combination of a pre-trained PhoBERT model and a Text-CNN model, was proposed for solving tasks in Vietnamese. Thirdly, EDA techniques are applied to deal with imbalanced data to improve the performance of classification models. Besides, various experiments were conducted as baselines to compare and investigate the proposed model's performance against state-of-the-art methods. The experiment results show that the proposed PhoBERT-CNN model outperforms SOTA methods and achieves an F1-score of 67,46% and 98,45% on two benchmark datasets, ViHSD and HSD-VLSP, respectively. Finally, we also built a streaming HSD application to demonstrate the practicality of our proposed system.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Mod2Dash: A Framework for Model-Driven Dashboards Generation
Authors:
Liuyue Jiang,
Nguyen Khoi Tran,
M. Ali Babar
Abstract:
The construction of an interactive dashboard involves deciding on what information to present and how to display it and implementing those design decisions to create an operational dashboard. Traditionally, a dashboard's design is implied in the deployed dashboard rather than captured explicitly as a digital artifact, preventing it from being backed up, version-controlled, and shared. Moreover, pr…
▽ More
The construction of an interactive dashboard involves deciding on what information to present and how to display it and implementing those design decisions to create an operational dashboard. Traditionally, a dashboard's design is implied in the deployed dashboard rather than captured explicitly as a digital artifact, preventing it from being backed up, version-controlled, and shared. Moreover, practitioners have to implement this implicit design manually by coding or configuring it on a dashboard platform. This paper proposes Mod2Dash, a software framework that enables practitioners to capture their dashboard designs as models and generate operational dashboards automatically from these models. The framework also provides a GUI-driven customization approach for practitioners to fine-tune the auto-generated dashboards and update their models. With these abilities, Mod2Dash enables practitioners to rapidly prototype and deploy dashboards for both operational and research purposes. We evaluated the framework's effectiveness in a case study on cyber security visualization for decision support. A proof-of-concept of Mod2Dash was employed to model and reconstruct 31 diverse real-world cyber security dashboards. A human-assisted comparison between the Mod2Dash-generated dashboards and the baseline dashboards shows a close matching, indicating the framework's effectiveness for real-world scenarios.
△ Less
Submitted 15 May, 2022;
originally announced May 2022.
-
The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation
Authors:
Tobias Domhan,
Eva Hasler,
Ke Tran,
Sony Trenous,
Bill Byrne,
Felix Hieber
Abstract:
Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of Neural Machine Translation models by constraining the set of allowed output words during inference. The chosen set is typically determined by separately trained alignment model parameters, independent of the source-sentence context at inference time. While vocabulary selection appears competitive with re…
▽ More
Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of Neural Machine Translation models by constraining the set of allowed output words during inference. The chosen set is typically determined by separately trained alignment model parameters, independent of the source-sentence context at inference time. While vocabulary selection appears competitive with respect to automatic quality metrics in prior work, we show that it can fail to select the right set of output words, particularly for semantically non-compositional linguistic phenomena such as idiomatic expressions, leading to reduced translation quality as perceived by humans. Trading off latency for quality by increasing the size of the allowed set is often not an option in real-world scenarios. We propose a model of vocabulary selection, integrated into the neural translation model, that predicts the set of allowed output words from contextualized encoder representations. This restores translation quality of an unconstrained system, as measured by human evaluations on WMT newstest2020 and idiomatic expressions, at an inference latency competitive with alignment-based selection using aggressive thresholds, thereby removing the dependency on separately trained alignment models.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
A Framework for Automating Deployment and Evaluation of Blockchain Network
Authors:
Nguyen Khoi Tran,
M. Ali Babar,
Andrew Walters
Abstract:
Blockchain network deployment and evaluation have become prevalent due to the demand for private blockchains by enterprises, governments, and edge computing systems. Whilst a blockchain network's deployment and evaluation are driven by its architecture, practitioners still need to learn and carry out many repetitive and error-prone activities to transform architecture into an operational blockchai…
▽ More
Blockchain network deployment and evaluation have become prevalent due to the demand for private blockchains by enterprises, governments, and edge computing systems. Whilst a blockchain network's deployment and evaluation are driven by its architecture, practitioners still need to learn and carry out many repetitive and error-prone activities to transform architecture into an operational blockchain network and evaluate it. Greater efficiency could be gained if practitioners focus solely on the architecture design, a valuable and hard-to-automate activity, and leave the implementation steps to an automation framework. This paper proposes an automation framework called NVAL (Network Deployment and Evaluation Framework), which can deploy and evaluate blockchain networks based on their architecture specifications. The key idea of NVAL is reusing and combining the existing automation scripts and utilities of various blockchain types to deploy and evaluate incoming blockchain network architectures. We propose a novel meta-model to capture blockchain network architectures as computer-readable artefacts and employ a state-space search approach to plan and conduct their deployment and evaluation. An evaluative case study shows that NVAL successfully combines seven deployment and evaluation procedures to deploy 65 networks with 12 different architectures and generate 295 evaluation datasets whilst incurring a negligible processing time overhead.
△ Less
Submitted 24 July, 2022; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Lightweight Transformer in Federated Setting for Human Activity Recognition
Authors:
Ali Raza,
Kim Phuc Tran,
Ludovic Koehl,
Shujun Li,
Xianyi Zeng,
Khaled Benzaidi
Abstract:
Human activity recognition (HAR) is a machine learning task with important applications in healthcare especially in the context of home care of patients and older adults. HAR is often based on data collected from smart sensors, particularly smart home IoT devices such as smartphones, wearables and other body sensors. Deep learning techniques like convolutional neural networks (CNNs) and recurrent…
▽ More
Human activity recognition (HAR) is a machine learning task with important applications in healthcare especially in the context of home care of patients and older adults. HAR is often based on data collected from smart sensors, particularly smart home IoT devices such as smartphones, wearables and other body sensors. Deep learning techniques like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used for HAR, both in centralized and federated settings. However, these techniques have certain limitations: RNNs cannot be easily parallelized, CNNs have the limitation of sequence length, and both are computationally expensive. Moreover, in home healthcare applications the centralized approach can raise serious privacy concerns since the sensors used by a HAR classifier collect a lot of highly personal and sensitive data about people in the home. In this paper, to address some of such challenges facing HAR, we propose a novel lightweight (one-patch) transformer, which can combine the advantages of RNNs and CNNs without their major limitations, and also TransFed, a more privacy-friendly, federated learning-based HAR classifier using our proposed lightweight transformer. We designed a testbed to construct a new HAR dataset from five recruited human participants, and used the new dataset to evaluate the performance of the proposed HAR classifier in both federated and centralized settings. Additionally, we use another public dataset to evaluate the performance of the proposed HAR classifier in centralized setting to compare it with existing HAR classifiers. The experimental results showed that our proposed new solution outperformed state-of-the-art HAR classifiers based on CNNs and RNNs, whiling being more computationally efficient.
△ Less
Submitted 4 November, 2022; v1 submitted 1 October, 2021;
originally announced October 2021.
-
Deep hierarchical reinforcement agents for automated penetration testing
Authors:
Khuong Tran,
Ashlesha Akella,
Maxwell Standen,
Junae Kim,
David Bowman,
Toby Richer,
Chin-Teng Lin
Abstract:
Penetration testing the organised attack of a computer system in order to test existing defences has been used extensively to evaluate network security. This is a time consuming process and requires in-depth knowledge for the establishment of a strategy that resembles a real cyber-attack. This paper presents a novel deep reinforcement learning architecture with hierarchically structured agents cal…
▽ More
Penetration testing the organised attack of a computer system in order to test existing defences has been used extensively to evaluate network security. This is a time consuming process and requires in-depth knowledge for the establishment of a strategy that resembles a real cyber-attack. This paper presents a novel deep reinforcement learning architecture with hierarchically structured agents called HA-DRL, which employs an algebraic action decomposition strategy to address the large discrete action space of an autonomous penetration testing simulator where the number of actions is exponentially increased with the complexity of the designed cybersecurity network. The proposed architecture is shown to find the optimal attacking policy faster and more stably than a conventional deep Q-learning agent which is commonly used as a method to apply artificial intelligence in automatic penetration testing.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
A Faster Algorithm for Quickest Transshipments via an Extended Discrete Newton Method
Authors:
Miriam Schlöter,
Martin Skutella,
Khai Van Tran
Abstract:
The Quickest Transshipment Problem is to route flow as quickly as possible from sources with supplies to sinks with demands in a network with capacities and transit times on the arcs. It is of fundamental importance for numerous applications in areas such as logistics, production, traffic, evacuation, and finance. More than 25 years ago, Hoppe and Tardos presented the first (strongly) polynomial-t…
▽ More
The Quickest Transshipment Problem is to route flow as quickly as possible from sources with supplies to sinks with demands in a network with capacities and transit times on the arcs. It is of fundamental importance for numerous applications in areas such as logistics, production, traffic, evacuation, and finance. More than 25 years ago, Hoppe and Tardos presented the first (strongly) polynomial-time algorithm for this problem. Their approach, as well as subsequently derived algorithms with strongly polynomial running time, are hardly practical as they rely on parametric submodular function minimization via Megiddo's method of parametric search. The main contribution of this paper is a considerably faster algorithm for the Quickest Transshipment Problem that instead employs a subtle extension of the Discrete Newton Method. This improves the previously best known running time of $\tilde{O}(m^4k^{14})$ to $\tilde O(m^2k^5+m^3k^3+m^3n)$, where $n$ is the number of nodes, $m$ the number of arcs, and $k$ the number of sources and sinks.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Gait analysis with curvature maps: A simulation study
Authors:
Khac Chinh Tran,
Marc Daniel,
Jean Meunier
Abstract:
Gait analysis is an important aspect of clinical investigation for detecting neurological and musculoskeletal disorders and assessing the global health of a patient. In this paper we propose to focus our attention on extracting relevant curvature information from the body surface provided by a depth camera. We assumed that the 3D mesh was made available in a previous step and demonstrated how curv…
▽ More
Gait analysis is an important aspect of clinical investigation for detecting neurological and musculoskeletal disorders and assessing the global health of a patient. In this paper we propose to focus our attention on extracting relevant curvature information from the body surface provided by a depth camera. We assumed that the 3D mesh was made available in a previous step and demonstrated how curvature maps could be useful to assess asymmetric anomalies with two simple simulated abnormal gaits compared with a normal one. This research set the grounds for the future development of a curvature-based gait analysis system for healthcare professionals.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.