In this section, we conduct a comprehensive analysis of our findings, showing the performance of the supervised deep learning model for species classification and an evaluation of the overall output of the integrated end-to-end bat monitoring system.
4.1. Species Classification Using Supervised Learning
The convolutional neural network (CNN) model, tailored for the classification of labeled bat species, showed good performance, achieving an average test accuracy of 97.5% with a standard deviation of 0.9% and a mean F1-score of 0.9578, with a standard deviation of 0.02 using the MSFB features. These metrics, quantitatively denoted with their respective standard deviations, underscore the model’s consistent reliability and accuracy in classification tasks. Given the dataset’s imbalanced nature, a detailed classwise evaluation is imperative to gauge the model’s capacity for feature extraction and differentiation across varied species more accurately. The precision, recall, and F1-Scores for each class shown in
Figure 6 show uniformly high values across all species, with the exception of Rousettus aegyptius. The confusion matrix depicted in
Figure 7 shows the details of errors. During model optimization, training and validation loss curves showed no overfitting.
A pairwise ROC analysis was also conducted, with the outcomes shown in
Figure 8. This analysis reaffirms our initial observations, showcasing area under the curve (AUC) scores spanning from an impressive 0.97 to an optimal 1.0, predominantly achieving the latter. These findings underscore the proposed model’s superior or comparable efficacy against contemporary alternatives while highlighting its advantage as a significantly more efficient neural network in terms of computational resource requirements. For misclassifications, particularly those concerning the R. aegyptius class, it becomes evident that environmental noise significantly contributes to prediction inaccuracies. Additionally, the variation within species—attributed to differences in foraging behavior and habitat conditions (open versus closed environments)—emerges as a factor in some misclassifications. This suggests that bat calls for a given species may exhibit noticeable differences in varying ecological contexts. Enriching our dataset to encompass a broader spectrum of echolocation call variations within species is posited as a strategy to enhance the classifier’s accuracy.
Adapting the original CNN model to TensorFlow Lite and TensorRT formats for deployment on edge devices necessitates an evaluation of the potential impact on model accuracy and efficiency. Converting the model to TensorFlow Lite (TF Lite) and TensorRT incorporates techniques such as post-training quantization and model pruning, which are essential steps to optimize the model for deployment on embedded systems [
10]. For example, TensorRT optimizes the original TF model by scanning the graphs and optimizing the subgraphs [
61]. After the optimizations by TensorRT, the resulting model can only run on NVIDIA GPUs. TensorFlow Lite models are generated using the TF Lite converter tool, which takes in the standard model and performs specific quantization optimizations and pruning to generate a TFLite model file (.tflite) [
62]. This model is run on a CPU and can leverage GPU delegates to speed up executions. Both frameworks leverage the data conversion from a higher precision variable (such as float32) to a smaller precision format (such as int8), which modifies the weights of the model and can result in a model size that is up to 4× smaller than the original [
10]. These modifications, while instrumental in reducing the computational and memory footprint of the model, may lead to a slight decrease in model accuracy [
63]. As detailed in
Figure 9, we conducted a 10-fold cross-validation to compare the performance of the embedded models against the original TensorFlow model, as depicted in
Table 4. The comparative analysis reveals a slight decrease in accuracy and an increase in the F1-scores following the conversion to the embedded formats. This suggests that the translation to TensorFlow Lite and TensorRT did not significantly compromise and may, in some aspects, enhance the model’s performance. In terms of model size, the conversion to TensorFlow Lite resulted in a 12.4% reduction, whereas the TensorRT model saw a 12.8% decrease in size.
In our exploration of a semi-supervised learning approach, we evaluated our CNN model within the semi-supervised generative adversarial network (SGAN) framework, adjusting the ratios of labeled to unlabeled (synthesized) data. The outcomes, including F1 scores and accuracy for varying data proportions, are systematically displayed in
Figure 10. Initially, we benchmarked the performance with our fully supervised CNN model. Subsequent trials reveal a decrement in both F1 scores and accuracy upon introducing a 50% blend of labeled and generated data, with a further decline observed when the labeled data is limited to 25%. These observations align with theoretical expectations. Presently, within each training epoch, the discriminator’s weights undergo dual training phases—separately with real and synthesized images—whereas the generator’s weights are adjusted once. This discrepancy suggests an avenue for future exploration; enhancing the SGAN model’s performance might be achievable by equalizing the training frequency of the generator. Additionally, the application of hyper-parameter optimization strategies, such as Bayesian optimization, holds promise for refining SGAN parameters. Data augmentation techniques, including temporal and speed perturbation, are also proposed as methods to improve the model’s semi-supervised learning efficacy, potentially leading to significant improvements in SGAN performance.
4.2. Web Platform
In this study, we have developed a comprehensive end-to-end program utilizing our supervised CNN model for real-time bat species detection. Presently, the system features a web-based application interface designed to facilitate interactive user engagement with the detection data via signals received from edge devices. The resulting system, which connects the edge node to the web platform, is highlighted in
Figure 11. The next sections will delve into the specifics of the web application component, detailing its implementation and operational functionalities.
4.2.1. Database Management System
We have elected to utilize the document-oriented MongoDB NoSQL database system [
64] for data management purposes. Interfacing with the database within the application server is facilitated through the use of the Mongoose library. This setup allows for efficient storage and retrieval of detection data, which is categorized according to three primary attributes: the geographic location of the detection (latitude and longitude coordinates), the timestamp of the detection (formatted according to the ISO 8601-1:2019 standard), and the scientific name of the detected bat species.
4.2.2. Map Interface of Bat Detections
The development of the website’s front-end was accomplished using a combination of JavaScript (ES12), HTML 5.0, CSS 3, and Bootstrap 4 [
65], ensuring a design that is both responsive and mobile-friendly, as depicted in
Figure 12. The backend features an ExpressJS-based web server [
66].
The web platform’s homepage is characterized by a map interface facilitated through the Google Maps API, which displays markers for each detection event. These markers are accompanied by a color-coded legend to aid in data visualization, as illustrated in
Figure 13. Users have the capability to interact with the map, selecting markers to obtain detailed information on specific detection events. Additionally, the interface offers a toggle feature, allowing users to switch between the marker view and a heatmap representation. This heatmap functionality enables an effective visualization of bat population density across different regions, enhancing the user’s ability to interpret and analyze detection data.
Additionally, the web platform offers users the functionality to filter detections based on the species of bat and the date of detection, enhancing the specificity of the data displayed. Users can also limit the number of detections shown on the interface. For those preferring a different data presentation format, an option is provided to display the detections in tabular form. The average response time for the tested web pages is 827 milliseconds, ensuring smooth transitions and accessibility for the end user.
4.2.3. Dashboard Interface of Bat Detections
The dashboard section of the web application is designed to present a comprehensive summary of the bat detections, aggregating data stored within the database. To develop this interactive dashboard, we employed the Plotly JavaScript library, chosen for its robust feature set that includes versatile filter controls integrated with the charts. These controls enable users to interactively customize their data visualization experience, such as by selecting specific time ranges for line charts or choosing to include or exclude particular bat species from the displayed chart. Further visualizations of our front-end interface are provided in the GitHub repository.
4.3. Edge Device Analysis
In our study, the selection of edge devices for deploying our model was rigorously assessed based on key performance metrics: latency, power consumption (measured as current drawn), and CPU utilization. Latency measurements, cataloged in ascending order, are depicted in
Figure 14, captured within a 10-min interval. The figure elucidates the aggregate duration, encompassing both preprocessing and CNN model inference times. Notably, the compact nature of our model results in the inference phase constituting less than 10% of the total processing time. Given the processing of 3-s audio segments for spectrogram analysis, all evaluated edge devices were deemed compatible with our requirements. By analyzing the audio segment and transmitting the signal through the LoRaWAN protocol, real-time classification of subsequent audio segments in the buffer is facilitated. Particularly, the TensorFlow Lite modified model, when executed on an RPi 400, demonstrated the most efficient performance, achieving the lowest average latency of 0.39 s. Conversely, the RPi 3B+ exhibited a marginally higher latency, recording a total time of 0.57 s while running the identical model. The Jetson Nano, despite its TensorRT optimization, exhibited a longer total processing time. This observation suggests that, contrary to typical expectations where GPU and TPU-equipped microcomputers excel in machine learning tasks, our model’s compactness and efficiency do not significantly leverage advanced computational resources. For devices further constrained to computational resources, savings can be introduced in the audio-to-spectrogram conversion process. One could opt to use the STFT-based spectrogram, which does not require the additional mel-filters to be applied to generate the MSFB. Alternatively, a more practical approach can involve using algorithms that can reduce the overall computations to generate the spectrogram, such as nnAudio, which can be four times faster than the librosa package in producing spectrograms for the same input signal [
67].
We also test the current drawn by each of the edge devices for a fixed voltage source of 5 V. The summary of the current drawn by each device over a 10-min window is presented in
Figure 15. The current was measured using a digital USB ammeter (Yocto-Amp [
68]) connected in series with the edge devices. The measured values in mA are then read via a USB serial interface. The CPU-based RPi 3B+, followed by the RPi 400, consumes the least amount of current, making the RPi 3B+ most suitable for deployment in a power-constrained remote setting. The Google Coral running the TPU consumed the most current, with a mean of 789 mA. A Kruskal-Wallis test shows that there is a statistically significant difference in current consumption between the 5 test cases with a
p-value of 0.0. All the maximum currents drawn are compared against the hardware datasheet, and they are lower than the recommended maximum.
In a battery-powered configuration utilizing the RPi 3B+ with an average current draw of 528 mA from a 5 V source, the operational power demand of the system is approximately 2.64 W. Employing a 50 Ah commercial power bank as the energy reserve, the effective output is calculated to be 250 Wh (50 Ah × 5 V), projecting the operational longevity of our edge device on this power supply to be around 94 h (250/2.64). Considering the variable nature of CPU usage, a conservative estimate adjusts the expected battery life to 50% of this duration [
69], translating to an operational range between 47 and 94 h without the need for manual intervention. Previous work [
70] has shown that integrating an RPi with a 12 V 50 W solar panel with a solar charger and a 12 V battery can facilitate continuous operation in the field, devoid of frequent maintenance needs.
Figure 16 shows the CPU utilization percentages for various edge devices, revealing that the average utilization across all devices does not exceed 12%, with the RPi 3B+ peaking at under 75% utilization. Additionally, temperature measurements for these devices confirmed adherence to operational norms set forth in their respective datasheets. A detailed analysis of CPU utilization, captured over a five-minute interval while the CNN model operates on these devices, is illustrated in
Figure 17. Initial fluctuations observed across all devices can be attributed to the operating system’s memory allocation processes and the setup phase. Notably, when running the TensorRT model, the Jetson Nano exhibits more pronounced variations in CPU utilization compared to the TensorFlow Lite (TFLite) model, offering insights into its differential impact on power usage. The RPi 3B+ experiences a significant utilization spike at the simulation’s onset and another at approximately the halfway mark. Despite its lower overall power consumption, selecting the RPi 3B+ necessitates consideration of these initial power surges.
A detailed breakdown of the system’s total cost, amounting to
$712.75 at the time of purchase, is provided for reference. The RPi 3B+ is selected for this analysis due to its smaller current requirements as well as its most affordable cost. This cost encompasses various components: the LoRa Raspberry Pi Gateway with Enclosure (
$199.95), the SparkFun LoRa Gateway—1-Channel (ESP32) (
$34.95), the Pettersson M500-384 USB Ultrasound Microphone (
$342.00), the Adafruit 6 V 6 W Solar Panel (
$69.00), a Solar Lithium Ion/Polymer charger (
$17.50), a Lithium Ion Polymer Battery (
$9.95), and the Raspberry Pi 3B+ (
$40.00) [
63]. This cost analysis offers a comprehensive view of the financial requirements for implementing such a system. In comparison, the SonoBat 4 (version 3.1.7p) software suite itself costs
$680 for the Universal package and
$1536 for the North American version [
7], while the SM4BAT bat recorder system by Kaleidoscope costs
$999 [
8].