1. Introduction
User authentication technology has been studied for many years and has a variety of applications. Representative examples include account access to financial services, secure access to networks and physical spaces in corporate environments, retail and payment systems, and smart home user identification [
1,
2,
3]. User authentication technologies can be divided into the following categories. Knowledge-based authentication technology relies on information learned by the user, such as passwords, personal identification numbers (PINs), and security questions. Possession-based methods use a physical element owned by the user, such as a smart card, USB token, or smartphone. In such cases, there is a risk of loss. Biometric methods use an individual’s biological characteristics, including fingerprints, face, iris, vein patterns, and voice. These features are unique, secure, and easy to use, but they are relatively expensive to implement and maintain. Dynamic, secure code-based authentication uses a one-time password (OTP) that can be used only once and is often delivered through SMS, apps, or hardware tokens. This method is highly secure and can be implemented on various devices; however, it may also affect user convenience. Recent research trends have tended to adopt multi-authentication methods to strengthen security and provide convenience to users.
The key features of user authentication technology include security, convenience, and personal information protection. User authentication technologies must provide strong security to reliably verify the identity of a user and prevent unauthorized access. Security is linked to the accuracy of the authentication method, which is a crucial criterion for judging the quality of a specific authentication method and should be considered when evaluating the performance of user authentication technology. Security can be enhanced using secure algorithms, encryption technologies, multi-factor authentication, and biometrics. In addition, user authentication should provide a convenient and easy experience for users. Convenience can be improved through faster access, fewer clicks or inputs, and user-friendly interfaces. Finally, user authentication technology must adequately protect user privacy and provide transparency regarding the collected data. Based on these features, fingerprint recognition and passwords are mainly used in user authentication. The easiest and most accurate technology is fingerprint recognition, which has a recognition rate of 99%. However, fingerprint recognition has the disadvantage that the user must perform separate actions. Similarly, because passwords are composed of a combination of letters and numbers, there is always the possibility of hacking.
Recently, biometric signal-based user authentication methods, such as ECG, PPG, EMG, and electroencephalography (EEG), have been studied. Authentication using biometric signals is simple to acquire and enables user-unconscious authentication; therefore, it is a highly scalable technology that can overcome the shortcomings of traditional methods. Additionally, it can continuously acquire signals for user authentication and is highly secure. Since various bio-signals are generated unconsciously and each individual’s signals contain unique information, a high recognition rate can be obtained.
User authentication based on ECG signals is the most widely studied method [
4,
5,
6]. There are three main applications: the authentication of patients in a hospital (HOS), the authentication of people at a building entrance (security check), and continuous authentication for personal usage (wearable device) [
7]. In the proposed algorithm, a machine learning-based ECG authentication mechanism can be applied. Methods for implementing an embedded authentication system that considers user convenience have also been studied [
8,
9]. When tested on 90 individuals, the system achieved 99.54% accuracy for QRS complex identification [
8]. However, the data represent results obtained only under stable conditions. When implemented on an Artix-7 FPGA, the entire design occupies 1712 slices (5%) and 978.7 KB of memory and dissipates 31.75 mW of total chip dynamic power. User authentication using PPG signals has also been proposed. Since PPG sensors can be attached to the wrist, much research is being conducted on wearable devices [
10]. The proposed system can continuously authenticate users based on their cardiac characteristics with a high accuracy of over 90%. However, it is difficult to distinguish between users using PPGs; therefore, accuracy remains a considerable problem [
11,
12]. When using EEG signals, wearing a sensor to acquire the signals is inconvenient; however, its accuracy is high. The accuracy is significantly affected by the environmental conditions under which the EEG is acquired [
13]. EEG signals offer a more secure biometric approach for user authentication [
14]. The classification is based on an artificial neural network (ANN).
An authentication method using EMG signals has also been proposed; however, its accuracy remains at approximately 90% [
15]. Compared to existing fingerprint recognition methods, much improvement is still needed for practical applications. However, EMG signals, similar to PPG signals, can be produced in the form of watches, making them suitable for wearable applications. Additionally, several EMG sensors have been developed for HMI applications. A representative example is the MYO armband developed by Thalmic Labs (Kitchener, ON, Canada). Using these armbands, EMG-based personal identification based on continuous wavelet transform (CWT) and convolutional neural networks (CNNs) showed that the identification accuracy could exceed 90% for 21 subjects [
16].
However, in the case of EMG signals, the characteristic points of each user are extremely small compared to those of other biological signals; therefore, accuracy becomes a crucial factor in the field of user authentication [
17,
18]. Additionally, signals obtained from the same person tend to vary based on muscle fatigue and measurement conditions. This variation results in a decline in the certification rate, making practical application difficult. To address this issue, complex algorithms and multiple biometric authentication technologies have been proposed; however, these complex algorithms tend to be hardware-intensive [
19]. Moreover, when calculations are performed on a server system, security vulnerabilities arise due to transmission time and the risk of hacking during the transmission process. Consequently, interest in on-device artificial intelligence (AI) has recently increased, leading to the development and utilization of processing units (PUs) designed for AI [
20,
21,
22,
23,
24]. The focus of this study is on the characteristics of an authentication system that includes the development of a wearable device using an EMG sensor and direct authentication through an on-device neural network implemented on an edge device.
From a hardware deployment perspective, AI hardware is largely divided into microcontroller unit (MCU)-based acceleration and digital accelerators. Representative examples of MCU-based acceleration include Hello Edge, which focuses on keyword spotting, and IoTNet [
21]. In this case, the number of calculations was approximately 40 mega floating-point operations (MFLOPs). The MCU performs sequential operations according to instruction sets; therefore, it requires a long time to execute. In contrast, digital accelerators, including FPGAs, can perform parallel operations, making them suitable for high-speed operations. FPGA-based deep learning accelerators generally exhibit a calculation capacity above 50 giga floating-point operations (GFLOPs). However, since an FPGA is a hardware implementation, it has limited resources for network configuration. To address this constraint, the network size must be reduced while maintaining accuracy [
25].
In this study, a hardware structure that uses fewer resources to enable the production of wearable devices is proposed. It maintains an accuracy above 99% by applying an innovative optimization algorithm. First, to effectively extract the features of EMG signals, various time analysis and time-frequency analysis methods were compared. Based on this comparison, the empirical mode decomposition (EMD) method was selected and applied. Next, for classification, a neural network was used, incorporating both the Siamese model and the LSTM model. The Siamese model can easily categorize differences during the recognition process. It consists of twin CNNs to perform feature extraction. In the case of CNNs, hardware usage is minimized by using a simple structure for approximate feature extraction. While most Siamese models use 4 to 10 CNN layers to achieve high accuracy, this study employs only 2 CNN layers [
26]. Errors that occur in a simple Siamese model can be precisely classified using the LSTM. The LSTM contributes to improving accuracy by efficiently distinguishing between content that should be retained for a long time and content that is less important. The overall structure is called the SSiamese-LSTM, which emphasizes the simple Siamese structure. For the network proposed in this study, the hardware resource requirements are analyzed using an FPGA. For accuracy evaluation, EMG measurement results from Chosun University were used, and data from 100 individuals were compared. Thus, a high-accuracy AI system using SSiamese-LSTM for user authentication can be developed for edge devices. This system will be integrated with the proposed EMG sensor in the form of a watch and used for user authentication.
Section 2 discusses the EMG signal acquisition method for the user authentication system.
Section 3 provides an in-depth exploration of the SSiamese-LSTM network.
Section 4 presents the testing and analysis results of the network discussed in
Section 3.
Section 5 presents the overall hardware structure and the verification process using high-level synthesis (HLS) for hardware deployment. Finally,
Section 6 presents the conclusions, discussions, and future research directions.
4. Testing and Analysis Results of the Proposed SSiamese-LSTM Network
The training dataset consisted of EMG data from 80 individuals. Additionally, the learning process utilized validation data comprising 25% of the total training data. In this case, categorical cross-entropy for the two classes was used as the loss function for training. These two classes were separated using one-hot encoding, which converts categorical variables into binary vectors. An adaptive moment estimation (Adam) optimizer with an adaptive gradient algorithm (AdaGrad) was used in the learning process. The batch size, an important hyperparameter for the learning process, was determined to be 512 after optimization.
The parameters are updated through the learning process, and the accuracy, recall, and precision of user authentication applications can be calculated. These evaluation indicators can be derived using true positive (
TP), true negative (
TN), false positive (
FP), and false negative (
FN) values [
34]. In this paper, cases where the same individual is recognized are assumed to be “positive”.
Using these parameters, the
F1 score, which is often used for comparisons between neural networks, is defined as follows:
The accuracy of the training data, including the validation data, was approximately 99.17%, while the recall of the training data was approximately 100%. The precision of the training data, including the validation data, was approximately 98.4%. This analysis was performed based on actual measured EMG data from 80 individuals of various age groups. The F1 score, a crucial metric for assessment, was calculated to be approximately 0.992.
Figure 4 depicts the confusion matrix, showing the relationship between the values predicted by the model (x-axis) and those observed (y-axis).
Figure 5 shows the receiver operating characteristic (ROC) curve. The x-axis of the ROC curve represents the false positive rate (FPR), and the y-axis represents the true positive rate (TPR). The area under the ROC curve, known as the area under the curve (AUC), is used to evaluate the overall classification performance of the model. The AUC value for this model is 1, indicating excellent performance.
These results show extremely high accuracy compared to previous results for user authentication using biometric signals, and the comparison is summarized in
Table 3 [
35,
36,
37]. Although direct comparison is challenging due to the variety of biometric signals, the differing number of acquisition channels, the diverse signal interpretation methods, and the different neural network models used, the proposed method achieved higher accuracy compared to existing methods. This outcome holds significance as it was accomplished with limited hardware resources and approaches the accuracy levels of commercial systems. This is particularly noteworthy as it relies on EMG data collected across diverse environmental conditions. Additionally, using the same or similar datasets with different neural networks resulted in accuracies of 92.5% [
38], 94% [
25], and 94.35% [
39], respectively, demonstrating that the proposed network has relatively higher accuracy.
To ensure high accuracy in user authentication using EMG signals, a time–frequency domain analysis was performed using the EMD method, and embedding was carried out using a simple Siamese model. Classification was then performed using an LSTM model, emphasizing the key parts to remember. Additionally, the goal is to minimize hardware resources for wearable applications. A user authentication system suitable for wearable devices ensures high accuracy and can be utilized in applications such as access control for doors, system logins, and financial transactions. In this context, a multimodal method would help ensure 100% accuracy; however, this requires additional hardware resources. To address this challenge more easily, it can be beneficial to repeat the user’s actions or reduce the number of users for authentication.
Based on this learned model, an accuracy analysis was performed using a completely new EMG dataset of 20 individuals. In this case, because a simple Siamese structure and an additional LSTM model are used, the adaptability to new data is relatively poor compared with the basic Siamese model. The accuracy, recall, and precision for the 20 participants are 86.3%, 80%, and 91.5%, respectively. The F1 score for the 20 individuals was calculated to be approximately 0.854. When applying to an actual wearable device, the hardware needs to be configured using an existing model, and the learned parameters are transmitted and used. In this case, fine-tuning for each user is performed separately to ensure individualization and higher accuracy. Fine-tuning is the process of adjusting the weights of a pre-trained model to enhance its performance on a specific task or dataset. Therefore, fine-tuning enables achieving high accuracy even with a short learning process. The results of fine-tuning with the existing learning parameters are as follows: the accuracy, recall, and precision for the 20 individuals after fine-tuning are 99.4%, 100%, and 98.8%, respectively. The F1 score after fine-tuning is 0.994. These results demonstrate that the proposed neural networks can be easily applied to new users and that accuracy is further improved when fine-tuning is applied. From a security perspective, personalized models such as these are desirable.
5. Hardware Deployment
On-device AI systems refer to technologies that run AI models directly on the device itself. Unlike cloud-based AI systems, these systems operate without sending data to external servers. On-device AI systems do not transmit the user’s data externally but process the data on the device itself, thereby protecting privacy. This reduces the risk of personal information infringement by avoiding the exposure of users’ sensitive data to external entities.
Additionally, cloud-based AI systems transmit data to external servers for processing and then report the results back to the device, which may cause delays due to the data transfer time. In contrast, on-device AI systems execute models directly on the device, significantly reducing latency. Furthermore, on-device AI systems can execute models even when they are not connected to a network, reducing network dependency.
Finally, on-device AI systems are more energy-efficient than cloud-based systems. Because data are processed on the device, the bandwidth load for data transmission is reduced, which can lower energy consumption and promote environmental friendliness. Due to these advantages, the market for on-device AI is expanding.
The hardware used to implement on-device AI primarily consists of mobile devices or embedded systems. Typical mobile processors such as Qualcomm’s Snapdragon, Graphics Processing Units (GPUs), Neural Processing Units (NPUs), or AI accelerators are used for mobile devices. In the case of embedded systems, certain types use application-specific integrated circuits (ASICs), digital signal processors (DSPs), and FPGAs. Hardware for on-device AI primarily aims at achieving low power consumption and low latency. For mobile devices, battery life and thermal management must also be considered.
In this study, an FPGA was employed to implement on-device AI. When compared to a GPU, an FPGA offers several advantages. Firstly, while the GPU relies on instruction-based operations, leading to longer execution times, the FPGA allows for parallel computation, thus reducing latency. Additionally, the FPGA enables a reduction in the number of computations, resulting in a significant decrease in power consumption. Lastly, on-device AI implemented with an FPGA can be readily transitioned to digital-based ASICs, offering further cost advantages.
Hardware deployment using an FPGA is conducted through high-level synthesis (HLS). HLS offers a high-level abstraction for hardware design, enabling software engineers to design hardware effectively. This is particularly beneficial when engineers need to convert software-containing complex algorithms, such as deep learning models, into hardware. Initially, the software code is written in a high-level language (e.g., C, C++), typically describing the algorithm’s behavior and utilizing abstract data structures. Subsequently, HLS tools convert the high-level source code into the Verilog hardware description language (HDL) for hardware design. Within Verilog HDL, the network is transformed into digital logic components such as NAND gates, NOR gates, etc., through synthesis. While writing extensive logic directly in Verilog HDL can be time-consuming, HLS significantly reduces hardware development time and enhances user convenience [
40]. The SSiamese-LSTM network, initially implemented using Keras, can be translated into digital logic using HLS for hardware deployment.
For further convenience in neural network design, high-level synthesis for machine learning (hls4ml) simplifies hardware deployment by configuring and combining basic utilities corresponding to each layer, such as 1-dimensional convolutional, batch normalization, max-pooling, and LSTM layers [
41]. Other hardware implementation methods with automatic generation include Vivado SDAccel and the NVIDIA deep learning accelerator (NVDLA), where the core directly converts a neural network written in Caffe or similar frameworks into a register-transfer level (RTL) [
42]. In this paper, hls4ml served as the primary hardware deployment method.
First, in the hardware deployment, the 32-bit floating-point numbers implemented in Keras are converted to fixed-point numbers for high computational accuracy and efficiency. In this case, although quantized Keras (QKeras) could assist with quantization, it was not considered in this paper due to potential accuracy degradation. The syntax for fixed-point numbers is “ap_fixed<9,4>” with a sign bit. In this case, “9” denotes the total number of bits used, and “4” represents the number of bits in the integer part, including the sign bit. Bit precision directly affects accuracy and hardware resource usage. The objective of this study is to achieve high accuracy; thus, high-bit precision is employed for hardware deployment. Through numerical profiling, hardware deployment is performed using “ap_fixes<33,8>”.
Figure 6 illustrates the bit precision of the parameters used in each layer by Keras and the bit precision of the implemented hardware.
For hardware deployment, the additional components required from hls4ml are the optimization strategy and reuse factor. The optimization strategy includes “latency” for reducing response time and “resource” for minimizing hardware usage. For FPGA implementation, the resource optimization strategy is chosen. The reuse factor implies that the same hardware is used for multiple calculations. A high reuse factor is detrimental to latency; however, it is an effective way to reduce hardware resource usage. Excessive reuse factors may increase memory usage and require caution. In this study, the reuse factor was set to 300 through various comparisons. Additionally, for the merged layers where no defined reuse factor exists, it is added and utilized accordingly.
For the FPGA chipset, “xcku5p-ffva676-3-e” was chosen due to its relatively abundant internal memory. The proposed neural network architecture comprises configuring units for each layer and driving them in block form, which involves the heavy use of first-in-first-out buffers (FIFOs). Both the CNN and LSTM layers require significant memory usage due to their behavioral characteristics. In an FPGA, the internal memory is divided into relatively small-sized block random access memory (BRAM) and relatively large-sized ultra RAM (URAM).
The accuracy, recall, and precision of the implemented hardware for the EMG data from 80 individuals are approximately 99%, 100%, and 98.1%, respectively. The F1 score is approximately 0.99. The decrease in the F1 score through hardware implementation was approximately 0.002, which is an exceedingly small value.
Table 4 shows the device utilization for the FPGA chipset “xcku5p-ffva676-3-e”.
The implemented device utilization indicates that a significant amount of BRAM memory is consumed due to the use of FIFOs to connect many layers and buffers for storing intermediate results of the CNNs and LSTMs. To optimize this, #pragma-based methods provided by HLS for efficient hardware implementation should be utilized. For FIFOs, data packing can be applied using the syntax “#pragma HLS DATA_PACK variable=a”. Data packing is a technique that can reduce overall memory usage and improve processing speed by mapping data types to appropriately sized bundles of registers. Secondly, for CNNs and LSTMs, the hardware can be implemented to force the use of URAM instead of BRAM by using the syntax “#pragma HLS RESOURCE variable=a core=RAM_S2P_URAM” to store intermediate calculation values. This method forcibly allocates resources to the URAM, enabling memory optimization. The final device utilization results are presented in
Table 5. Thus, it can be confirmed that the proposed SSiamese-LSTM model can be implemented in an FPGA through the redistribution of resources.
In the case of hardware deployment, such as in the proposed structure, the operation speed is as follows: when using a high-performance Intel Core i7-7700HQ CPU, it takes approximately 2.5 ms to process one data point. In contrast, the proposed structure takes approximately 7 ms. In the case of the proposed hardware, the computation speed is slightly slower due to the use of limited resources. However, real-time user authentication is possible using the proposed hardware because there is no need to transmit data to a separate server.
For the gate count, when finally implemented in IC form, “yosys” is used to perform the gate count based on a 2-input NAND gate [
43]. Yosys is an open-source hardware synthesis tool used to synthesize and optimize digital circuits written in Verilog, VHDL, and other hardware description languages. It facilitates the preparation of physical implementations by mapping designs to ASIC technology. Accordingly, the total number of gates is synthesized to 4,969,698 NAND gates. This represents a medium-sized AI system and is of a size that can be applied to wearable devices. For example, assuming that the size of a 2-input NAND gate is 6 µm
2 based on 90 nm technology, this indicates that an IC can be manufactured with a size of 30 mm
2.
Figure 7 shows a schematic diagram of a user authentication system configured in the form of a wearable watch using the proposed SSiamese-LSTM structure, including an EMG sensor, analog-to-digital converter (ADC), and neural networks implemented with digital logic (single printed circuit board (PCB)).
6. Conclusions and Discussions
In conclusion, this study explores user authentication using EMG signals, focusing on their ease of acquisition and potential for wearable applications such as wristwatches. Despite the advantages of EMG, its reported accuracy is approximately 90%, which is inferior to traditional methods such as fingerprint recognition. To address the relatively lower accuracy of EMG signals compared to traditional methods like fingerprint recognition, novel approaches such as EMD feature extraction for time–frequency domain analysis and the SSiamese-LSTM model were introduced. The proposed neural network effectively combines the strengths of the Siamese model for efficient embedding and the LSTM model to improve classification.
The primary goal of this study was to reduce hardware resource consumption while maintaining high accuracy, making the system suitable for wearable devices. The proposed network structure, optimized using Keras, was validated with EMG data from 100 individuals collected from Chosun University. The model achieved an accuracy of 99.17%, a recall of 100%, a precision of 98.4%, and an F1 score of 0.992 while maintaining low resource requirements suitable for wearable devices implemented via FPGAs.
Maintaining accuracy and managing memory efficiently are essential for FPGA implementation. The model was implemented on the “xcku5p-ffva676-3-e” FPGA chipset through various optimizations. The obtained accuracy was 99%, which is almost identical to that of the Keras implementation. In terms of hardware implementation, the system demonstrated a latency of 7 ms per data processing, confirming that real-time user authentication is possible. The optimized neural network can be implemented on an FPGA for edge devices and easily converted to digital logic. Additionally, when manufacturing the proposed neural network as an IC, 4,969,698 2-input NAND gates are used, creating a medium-sized AI system. Using a 90 nm process, it can be manufactured in 30 mm2.
The proposed research has several advantages, including high accuracy, low resource requirements, and suitability for real-time user authentication on wearable devices. However, it also has some limitations, such as slightly slower processing speed due to limited resources and the need for further integration and optimization for practical applications.
Future work will focus on integrating the entire circuit into hardware, including the fabricated EMG sensor and the proposed SSiamese-LSTM model, within a single PCB system. The goal is to produce the system in the form of a wearable watch. Addressing the current limitations, such as optimizing processing speed and further reducing resource consumption, will be critical in enhancing user authentication systems for various applications.