1 Introduction

1.1 Background

The ability to communicate effectively is a fundamental skill in human development. It is especially integral in social skill development and mental health, and parents particularly value the ability of their child to communicate [1]. Specific language impairment (SLI) is a communication disorder that interferes with language development in the absence of other physical and mental disabilities that could impact development, such as intellectual disability, autism, or hearing impairment. SLI is currently diagnosed in approximately 7% of preschool children [2]. It is more common in boys than girls, and there is recent evidence that there is a strong genetic link, with 50–70% of children with SLI having a family member(s) with this condition [3]. A child with SLI often has a history of delayed expressive language development, presenting symptoms such as difficulties in sentence construction, vocabulary development, making conversation, understanding and following grammatical rules, and understanding spoken instructions [4, 5].

Many children with SLI have persistent language difficulties into adulthood. Individuals with SLI can have difficulties developing and maintaining relationships in their daily, academic, and social lives. An early diagnosis to allow evidence-based management is very important to minimize impact. Identification and treatment of SLI should be performed as early as possible, but individuals can still benefit from diagnosis at any age, even into adulthood [6]. If a diagnosis of SLI is considered, for example, by a doctor, teacher, or parent, an individual will be referred to a speech and language therapist. The evaluation type will depend on the child's age. Still, it could include direct observations of the child, interviews, and questionnaires completed by parents and or teachers, psychometric assessments, and specialist language ability tests. This process can be long and complex, with global inequities in accessing speech and language specialists. Moreover, work supports that language ability tests alone can be inaccurate for SLI diagnosis [7,8,9]. Thus, it is essential to consider other pathways to optimize rapid and accurate diagnoses.

1.2 Motivation

Specific language impairment (SLI) is prevalent worldwide, with life-long impacts. Early detection of speech disorders is critical to facilitate the most impactful treatment. However, diagnosis is often delayed as it requires access to speech specialists, language therapists, and comprehensive assessments. Machine learning can be a powerful tool in early diagnosis. This current work proposes a novel, highly accurate handcrafted learning model with a low time burden.

1.3 Our method

Deep networks have attained high classification results easily since their architectures are effective [10,11,12]. In deep architecture, multilevel/multilayer most useful features are extracted and classified using the fine-tuned weights. We have used the concept of deep networks and handcrafted feature extractors in this work. Handcrafted features are divided into two sub-branches to obtain deep levels. Various statistical and textural features are extracted from these multiple levels. Various simple statistical features are extracted from the signal in this work.

A novel graph-based local extractor is proposed to create textural features. A chemical graph has been used to model this textural extractor since graph-based models are popular in the literature, especially graph neural networks (GNN) [13], as they yield high classification performances. We can generate many graphs, but our objective is to use a nature-inspired model. Therefore, we applied it to chemical graphs. Moreover, between 2019 and 2021, the entire world is faced the COVID--19 pandemic. Therefore, a molecule of the favipiravir (it is a molecule of the drug for developing the COVID-19 cure) has been used to propose a textural extractor. Therefore, we searched for COVID-19-related chemicals on the PubChem [14] to propose a novel chemical structure-based feature extractor.

The favipiravir is the simplest chemical structure we have used, but others have generally used chained structures. This work investigated the feature extraction ability of the COVID-19-related chemical structure by proposing a new chemical structure graph-based local feature extractor. Moreover, favipiravir has a simple molecular structure. Therefore, we selected favipiravir. The early molecular graph-based feature extraction models were published in 2021. Aydemir et al. [15] presented a molecular graph-based pattern using melamine molecules and introduced a major depression detection model using EEG signals with their melamine pattern. Barua et al. [16] proposed a graph-based textural feature extraction function using the aspirin molecule's graph to detect PD. In this work, we were inspired by these works and proposed a new molecular-based feature extraction function to classify vowels. Levels of the feature extraction are created using WPD [17], and the presented favipiravir-based learning model also uses an iterative feature selector (INCA) [18] to choose the optimal feature vector automatically.

In this study, a molecule of the favipiravir (a molecule of the drug for developing the COVID-19 cure) has been proposed as a textural extractor. Also, favipiravir is a simple molecular structure that can accurately pick the subtle changes from the signal and yield clinically significant texture features to yield high classification performance.

1.4 Novelties and contributions

Novel sides of this paper are given below:

  • A new nature-inspired graph-based feature extraction function has been proposed.

  • A multilevel hybrid feature extraction-based architecture has been proposed, and the results are validated by deploying two validators (tenfold CV and LOSO CV).

  • To the best of our knowledge, we are the first team to use a chemical graph pattern for vowel classification.

    The major contributions of the presented favipiravir-based handcrafted learning model are given below:

  • A novel graph-based textural feature extraction function is proposed, and this function uses the molecular graph of a COVID-19 drug (favipiravir) [19]. The feature generation ability of this molecular structure is investigated using vowel signals.

  • The proposed favipiravir pattern-based handcrafted learning network used WPD to create wavelet sub-bands. Using wavelet sub-bands, high-level features are extracted. Our handcrafted learning network is developed using a public vowel dataset and has achieved the highest classification performance.

  • We calculated classification results by deploying tenfold CV and LOSO CV. As a result, we achieved 98.86% classification accuracy using LOSO CV, demonstrating our proposal's robust classification ability.

2 Literature review

Many methods have been presented in the literature for different diagnosis systems [16, 20, 21]. In particular, computer-aided diagnosis (CAD) systems are being developed through machine learning-based methods for the diagnosis of SLI. We have summarized the currently reported SLI CAD below.

Kaushik et al. [22] applied an SLI detection method named SLINet. They used a 2D convolutional neural network and used the Laboratory of Artificial Neural Network dataset [23]. This dataset consisted of 98 subjects (54 SLI, 44 control). They reported accuracy of 99.09% with their SLINet method using tenfold cross-validation. Gray [24] presented an approach for SLI detection in children. In their study, test/retest reliability and diagnostic accuracy parameter were used to evaluate SLI diagnostic. For this purpose, the evaluation in non-word repetition, digit span, Kaufman Assessment Battery for Children, and Structured Photographic Expressive Language Test II was performed with 44 preschool children (22 SLI, 22 control). They reported specificity and sensitivity of 100.0% and 95.00%, respectively. Armon-Lotem and Meir [25] proposed a method for SLI identification in bilingual (Hebrew, Russian) children. Their study used forward digit span, non-word, and sentence repetition tests. 230 mono- and bilingual children (175 control, 55 SLI) were selected for evaluation. They achieved accuracy, sensitivity, and specificity rates of 94.00%, 80.00% 97.00%, respectively, for both languages. Slogrove and Haar [26] applied mel-frequency cepstral coefficients of voice signals for SLI detection. They attained an accuracy rate of 99.00% using a random forest classifier. Reddy et al. [27] used glottal source features, mel-frequency cepstral coefficients, and a feed-forward neural network for SLI detection in children. Their study used speech signals of 54 SLI patients and 44 control children. They reported accuracy of 98.82% with feature selection. Oliva et al. [28] proposed an SLI detection method using machine learning techniques. In their study, data (mean length of utterance, ungrammatical sentences, correct use of articles, correct use of verbs, correct use of clitics, correct use of theme arguments, proportion of ditransitive structures) of 24 SLI children and 24 control children were used. They reported a sensitivity and specificity rates of 97.00%, 100.0%, respectively.

3 Material and methods

3.1 Dataset

This dataset was collected from the Department of Circuit Theory, Faculty of Electrical Engineering at Czech Technical University in Prague [29, 30]. The dataset consists of 54 children with SLI and 44 healthy children (Total = 98 participants). The speech and language therapists and specialists from Motol Hospital classified the children as healthy or SLI. The healthy Czech children data were collected between 2003 and 2005, comprising 15 boys and 29 girls aged between 4 and 12 years. SLI Czech children data were collected between 2009 and 2013, consisting of 35 boys and 19 girls aged between 6 and 12 years. The distribution of the subjects according to gender and class is given in Table 1.

Table 1 Distribution of subjects according to gender and class

Few audio recordings were recorded in the classroom or language therapist's room when there was noise around. Thus, the natural environment was created to obtain the normal behavior of the children, and sound signals were acquired from SONY digital Dictaphone (fs = 16 kHz), and MD SONY MZ-N710 (fs = 44.1 kHz) was used to record healthy subjects with 16-bit resolution in stereo mode in the wav format. The recordings of the subjects with SLI were recorded on the computer via the SHURE lavalier microphone (fs = 44.1 kHz, 16-bit resolution in mono mode). Words and phrases were selected in collaboration with clinical psychologists and speech and language therapists. Each expression the children voiced was recorded as a separate file. The details of the sound files used in this work are given in Table 2.

Table 2 Details of sound files used in this work

3.2 Methods

This work presents a handcrafted learning method to automatically detect speech disorders using vowels. The primary objective of the proposed favipiravir pattern-based model is to achieve high classification performances with low-time complexity. The proposed model involves three steps: presented favipiravir and WPD-based feature creation, iterative feature selection using INCA, and classification using SVM [31] with a tenfold and LOSO CV strategy. The block diagram of the proposed favipiravir pattern learning model is given in Fig. 1.

Fig. 1
figure 1

Illustration of the proposed favipiravir pattern-based learning model. Note: FV1, FV2, …, and FV31 are the feature vector; Concat = features concatenation.

In this work, a vowel dataset and sub-bands of WPD are the input to our proposed favipiravir pattern-based learning model. In this work, we used four leveled WPD to generate 30 sub-bands. Moreover, a symlet 4 wavelet filter is chosen to decompose signals for this work [32]. Two feature extraction functions are used in the feature extraction step: (i) favipiravir pattern and (ii) statistical feature extractor. Using these two functions, statistical and textural features are extracted from original vowel and wavelet coefficients (see Fig. 1). The presented favipiravir pattern and statistical extractor generate 256 and 40 features, respectively. These features are concatenated to obtain 256 + 40 = 296 features from a signal. Hence, each feature extraction step is shown in Fig. 1 (right-hand side) and extracts 296 features. We used 31 signals (original vowel signal and 30 wavelet frequency bands) in our work. Thus, the proposed feature extraction model (FV1, FV2….FV31) creates \(296\times 31=9176\) features from a vowel. The INCA chooses the most informative features from the 9176 features, and the selected feature vector is classified using a SVM classifier.

More detailed information about the proposed model is explained below.

3.2.1 Feature generation

This is the important step of feature generation. The main purpose of this step is to extract features at low and high levels. Hence, WPD [17] is used to achieve it in this work. WPD is a tree-based wavelet transformation model. In the WPD, wavelet transformation has been applied to both low-pass filter and high-pass filter bands. Thus, a wavelet tree has been created, and 2n sub-bands have been created by deploying WPD. The proposed multilevel feature creation step uses two feature extractors: favipiravir pattern (generates textural feature) and statistical extractor. The used feature extractors generate features at both low and high levels.

Step 1: Apply WPD to vowel signal to generate four levels of WPD using symlet 4 as the mother wavelet function. Hence, using WPD, 30 sub-bands are generated.

Step 2: Extract 20 statistical features, namely average, mode, median, Higuchi, standard deviation, maximum, minimum, Lyapunov exponent [33], skewness, kurtosis, variance, fuzzy entropy [34], log entropy [35], Shannon entropy [36], sure entropy [37], norm entropy, Renyi entropy [38], mean absolute deviation, range, and root mean square, are extracted.

These 20 features are applied to the signal and absolute values of the signal to obtain 40 statistical features. The used statistical feature generator is named \(sfg(.),\) and the mathematical model of the statistical feature generation is given below.

$$fs^{1} = sfg\left( v \right)$$
(1)
$$fs^{k + 1} = sfg\left( {SB_{k} } \right), \;k \in \left\{ {1,2, \ldots ,30} \right\}$$
(2)

Herein,\(v\) is original vowel signal, \(f{s}^{k}\) is the kth extracted statistical feature vector with a length of 40, and \(S{B}_{k}\) is the kth wavelet frequency band. Using Eqs. (1) and (2), 31 statistical feature vectors are obtained.

Step 3: Generate textural features using the proposed favipiravir pattern. The detailed explanations of this extractor are given below.

The favipiravir is an anti-viral agent that inhibits RNA viruses. The molecular formula of the favipiravir is C5H4FN3O2. In this work, the feature generation ability of favipiravir is investigated. Therefore, a local pattern is proposed in this work. The textural feature extraction of this work is explained in Eqs. (3) – (4).

$${\text{f}}t^{1} = {\text{fp}}\left( v \right)$$
(3)
$${\text{f}}t^{k + 1} = {\text{fp}}\left( {{\text{SB}}_{k} } \right), k \in \left\{ {1,2, \ldots ,30} \right\}$$
(4)

where \(f{t}^{k}\) is kth textural feature vector with a length of 256, and \(fp(.)\) function defines the proposed favipiravir pattern. The steps of the proposed \(fp(.)\) are:

Step 3.1: Divide the signal into the overlapping window with a length of 42.

$$b\left(j\right)=\mathrm{signal}\left(i+j-1\right),i\in \left\{\mathrm{1,2},\dots ,\mathrm{length}-41\right\}, j\in \left\{\mathrm{1,2},\dots ,42\right\}$$
(5)

where \(b\) is the used overlapping block with a length of 42, \(signal\) is the used signal, and \(length\) is the length of the used signal.

Step 3.2: Reshape the overlapped block into a matrix with a size of 7 × 6 to model the favipiravir pattern [19]. One-dimensional signals are used in this work, but the favipiravir pattern has a 2D pattern (see Fig. 2). To apply this pattern on the signal, vector-to-matrix transformation is used. The minimum size of the matrix used to model favipiravir is 7 × 6. Therefore, we used a 7 × 6 size matrix. Moreover, a directed graph is modeled on this matrix to extract binary features. The modeled pattern and the used chemical structure shape are depicted in Fig. 2.

Fig. 2
figure 2

Chemical structure of favipiravir and its presented pattern. A directed graph is used in this work, and each edge is numbered to extract the bits

Step 3.3: Generate 14 bits using the favipiravir pattern. The numbered edges (depicted in Fig. 2) and signum function are used to extract bits. In the binary feature extraction phase, each edge's initial and finite values are used as input to the signum function. The mathematical definition of the signum function is given in Eq. (6).

$$\vartheta \left( {{\text{fp}},{\text{sp}}} \right) = \left\{ \begin{gathered} 0,\;fp - sp \hfill \\ 1,\;fp - sp \ge 0 \hfill \\ \end{gathered} \right.$$
(6)

where \(\vartheta (.,.)\) represents signum function, \(fp\) defines the first parameter, and \(sp\) represents the second parameter. Using the signum function, 14 bits are extracted.

Step 3.4: Create two-bit groups which are named left and right. Using the first seven bits, the left bit group is created. Other bits consist of the right bit group. The categorization of the bit groups is processed using Eqs. (7) and (8).

$${\text{left}}\left( r \right) = {\text{bit}}\left( r \right), r \in \left\{ {1,2, \ldots ,7} \right\}$$
(7)
$${\text{right}}\left( r \right) = {\text{bit}}\left( {r + 7} \right)$$
(8)

Step 3.5: Construct the map signals using left and right bit categories.

$${\text{map}}^{{{\text{left}}}} \left( j \right) = \mathop \sum \limits_{r = 1}^{7} left\left( r \right) \times 2^{r - 1}$$
(9)
$${\text{map}}^{{{\text{right}}}} \left( j \right) = \mathop \sum \limits_{r = 1}^{7} {\text{right}}\left( r \right) \times 2^{r - 1}$$
(10)

where \({\mathrm{map}}^{\mathrm{left}}\) and \({\mathrm{map}}^{\mathrm{right}}\) are the generated feature maps using the left and right bits, respectively.

Step 3.6: Extract histograms of these map signals. Each mapped signal is coded with seven bits. Therefore, the length of the created histogram is 27 = 128.

Step 3.7: Merge the extracted two histograms to obtain a feature vector of length = 256.

Step 4: Merge the extracted textural and statistical features to obtain the final feature vector.

$${\text{fvl}} = {\text{concat}}\left( {fs^{1} ,ft^{1} , fs^{2} ,ft^{2} , \ldots ,fs^{31} ,ft^{31} } \right)$$
(11)

where \(fvl\) is the created last feature vector of length = 9176.

3.2.2 Feature selection

Tuncer et al. [18] proposed a new version of neighborhood component analysis (NCA) named iterative neighborhood component analysis (INCA). They used an iteration and a loss value to improve NCA. NCA is the feature selection version of the k nearest neighbor and is a weight-based selector. Using the generated weights, the most valuable features are selected. The used iteration for this work ranges from 500 to 1000, and the SVM classifier is used as a loss value generator. In this study, 501 feature vectors are selected. Hence, SVM calculates 501 misclassification rates. Then, the optimal feature vector is selected using the minimum error misclassification rate.

Step 5: Apply INCA on the created 9176 using the above-given parameters to select the most informative 856 features.

3.2.3 Classification

SVM classifier is used for the classification. To choose the most appropriate classifier, the classification learner tool of MATLAB2021a is used. The attributes used for the Cubic SVM classifier are: Polynomial kernel is 3rd degree, coding is one-vs-one, box constraint level is one, and kernel scale is automatic. The classifier is developed using tenfold cross-validation. In our work, this classifier is used both as an error rate calculator and classifier.

Step 6: Classify the selected features using Cubic SVM (CSVM) [39] is used.

4 Performance analysis

We have proposed a favipiravir pattern and statistical feature extraction-based vowel classification model. Performance analysis of this model is given in this section.

4.1 Experimental setup

The proposed favipiravir pattern was implemented on a laptop with a configuration of 16 GB RAM, an Intel i7 11370 h processor with a 3.30 GHz clock, and a Windows 10.1 Professional operating system. We used the programming environment of MATLAB2021a. Our proposed favipiravir-based learning method is a parametric learning model. The parameters used in this work are tabulated in Table 3.

Table 3 Details of parameters used in various steps of the proposed favipiravir pattern-based learning model

4.2 Results

The commonly used classification performance metrics such as precision, recall, accuracy, and F1-score [40, 41] are used to evaluate the performance of our proposed model. Table 4 denotes the results and confusion matrix obtained using our proposed method. In addition, the results obtained with a tenfold CV strategy are given in Table 4.

Table 4 Results and confusion matrix obtained for our proposed favipiravir pattern-based model with tenfold CV

Tenfold cross-validation has been used to develop the model. The fold-wise results obtained are tabulated in Table 5.

Table 5 Fold-wise results (%) obtained for our proposed favipiravir pattern-based model

It can be noted from Table 5 that 100% classification performances were attained for the two, three, four, five, six, and eight folds.

The performance obtained with LOSO CV is shown in Table 6. Our model obtained high performance using LOSO CV, which justifies the accuracy and robustness of the proposed model.

Table 6 Results of our proposed model by deploying LOSO CV

4.3 Time complexity analysis

Our proposed favipiravir model is a handcrafted learning model. The details of the time complexity calculation of our method are denoted in Table 7. Big theta (\(\theta )\) notation is used to indicate the time complexity of our proposed model.

Table 7 Details of time complexity for our proposed favipiravir pattern

Table 7 demonstrated that the proposed model has simple time complexity. Generally, deep learning models have a high time burden such as exponential time complexity.

4.4 Ablation of the proposed model

We have proposed a hybrid feature extraction-based vowel classification model. We used WPD, favipiravir pattern, statistical feature extraction, and INCA to generate the final feature vector in this model. To develop an accurate and robust system, we presented two models. The first model is a statistical-based feature generation model, and the second is a favipiravir pattern-based feature extraction model. The classification results of these models have been given in this section for the ablation model. The overviews of these methods are demonstrated in Fig. 3.

Fig. 3
figure 3

The presented methods for ablation a statistical feature extraction-based model, and b favipiravir pattern-based textural feature extraction model

The accuracy obtained using only these models with tenfold CV is given in Table 8.

Table 8 Accuracy (%) obtained for the models

It can be seen from Table 8 that both features (statistical and textural) are valuable for classification. Using our model, we increased the classification accuracy using both of these features. Our proposed system selected 856 features, and the chosen features are shown in Fig. 4.

Fig. 4
figure 4

Distribution of the selected features. Our proposed model chose 856 features, and 694 of them are textural features generated by deploying the favipiravir pattern, and 162 features are extracted using a statistical feature extractor

Moreover, we have used four leveled WPD transformations. Therefore, this transformation generated 30 (\({=2}^{4+1}-2\)) wavelet sub-bands since WPD is a tree-based model. The classification accuracies obtained at various levels of WPD are shown in Fig. 5.

Fig. 5
figure 5

Classification accuracies obtained by our proposed model at various levels of WPD

It can be noted from the above that level four is the most appropriate WPD level, which can be used to obtain the highest classification accuracy.

5 Discussion

This work presents a novel handcrafted learning model named WPD-favipiravir and the INCA-based learning method. This model used WPD, favipiravir pattern, and statistical feature extractor in the feature generation step. The most valuable 856 features are selected from 9156 features using INCA. The plot of misclassification values obtained for the various features selected using INCA is denoted in Fig. 6.

Fig. 6
figure 6

Misclassification values obtained for the various features selected using INCA

It can be noted from Fig. 6 that the minimum misclassification rate of 0.0013 is obtained using 856 features.

In the classification phase, decision tree (DT), linear discriminant (LD) [42], naïve Bayes (NB) [43], linear SVM (LSVM) [44], quadratic SVM (QSVM) [45], Cubic SVM (CSVM) [39], Gaussian SVM (GSVM) [39], k nearest neighbor (kNN) [46, 47], bagged tree (BT) [48], and multilayer perceptron (MLP) [49] classifiers were used. The plot of accuracies (%) obtained using these classifiers is shown in Fig. 7.

Fig. 7
figure 7

Plot of accuracy (%) obtained using ten shallow classifiers

It can be noted from Fig. 7 that CSVM is the best classifier yielding an accuracy of 99.83% in detecting SLI using the vowels.

The summary of comparison of the proposed model with other published works for automated detection of SLI using the same vowel dataset [29] is tabulated in Table 9.

Table 9 Summary of comparison of the proposed model with other published works using the same database

The main advantages of our proposed system are given below:

  • A graph-based local feature extractor called the favipiravir pattern is presented. Using the proposed favipiravir (a drug for COVID-19) pattern, the feature extraction ability of this molecule is explored on a vowel dataset.

  • A multilevel feature generator is presented using both favipiravir, statistical extractor, and WPD. The presented feature generator can be used for other healthcare applications.

  • A highly accurate handcrafted learning method is presented.

  • Our proposed model has attained superior results compared to other state-of-the-art methods (see Table 9).

  • Robust results are obtained using tenfold CV and LOSO CV.

  • Our presented favipiravir pattern-based model is easy to implement. Thus, researchers can implement this model to detect various diseases using other physiological signals.

The limitation of this work is that we have used a smaller vowel dataset to develop the model. However, in future, we plan to use the bigger dataset to develop an accurate and robust automated model.

6 Conclusions and future works

SLI is a common condition globally, and early diagnosis is crucial to allow early instigation of effective treatments and the most optimal outcomes. However, the current diagnosis is time-consuming, expensive, and tedious. The prime objective of this work is to detect SLI using an accurate handcrafted learning method. The proposed handcrafted learning method uses a graph-based feature extraction (favipiravir pattern). Favipiravir is a drug developed to cure COVID-19. We investigated other abilities, for instance, graph-based feature extraction. To evaluate the effect of the favipiravir in machine learning, a textural extractor is proposed using the molecular structure of the favipiravir. The proposed favipiravir pattern-based learning method is developed to detect SLI automatically using a vowel dataset.

We proposed a handcrafted learning architecture and used effective methods in this architecture to obtain high classification performance. Our main objective is to present a simple, highly accurate signal classification model. Thus, a multileveled hybrid feature extraction method is proposed using WPD, statistical feature extraction, and the suggested favipiravir pattern. INCA is used to choose the top features, and the selected features are classified using SVM classifier with tenfold and LOSO CV strategies.

Our developed model achieved an accuracy, precision, recall, and F1 score of 99.87 ± 0.18, 99.86 ± 0.19, 99.88 ± 0.18, and 99.87 ± 0.19, respectively, using SVM with a tenfold cross-validation strategy. These results justify the use of our proposed favipiravir molecular model in the detection of SLI accurately. Our proposal attained 98.86% classification accuracy with LOSO CV. The proposed favipiravir-based model and our architecture achieved the best classification performance compared to other state-of-art models.

Our generated model is accurate and developed using the public database. We plan to use bigger vowel datasets to detect SLI disorder in future. Also, we intend to use a new molecular structure-based deep model (like CNN since CNN uses convolution) to detect SLI accurately. This novel automated SLI detection system can potentially assist clinicians and educators in the accurate and early diagnosis of SLI. Furthermore, variable desktop/mobile applications can be developed by collecting huge databases and second-generation deep learning models to automatically diagnose SLI.