The most popular techniques used to extract features characterizing EMG signals are autoregressive modeling, higher order frequency moments, multi-resolution wavelet analysis, the maximum eigenvalues of the time-varying covariance patterns between EMG channels, signal variance values computed in overlapping windows, techniques based on fractal analysis, principle component analysis, independent component analysis, non-negative matrix factorization, short time Fourier transform and the orthogonal fuzzy neighborhood discriminant analysis.
We use time domain features in this work, assuming that such features are better at describing the underlying muscle orchestration than frequency domain features. By muscle activation pattern, we refer to the orchestration of EMG activity onsets in eight muscles of the arm and shoulder, namely right and left sides for extensor digitorum communis (EDC), flexor carpi radialis (FCR), rhomboideus and trapezius, during a golf swing. A comparison of onset sequences through EMG channels between swings of the same player and between different players is done visually.
Channel-specific statistics of filtered EMG signal, pairwise correlations between filtered EMG dynamics in various muscles during swing, the details of EMG signal peaks in each muscle and the concurrency of peak positions are computed as predictive features for the random forest model. Using computed features, the regression task seeks to estimate the target attribute value (club head speed or ball carry distance), and the classification task seeks to detect an effective shot, where effectiveness corresponds to being above the personal average of the player in question.
3.1. Muscle Activation Sequence Profiling
We expect to obtain some insights into golf swing patterns by studying muscle activation profiles, characterized by EMG signals recorded from different muscles.
Figure 3 present two examples of EMG signals recorded from two golf players. On the
X axis is shown a sample number (1000 samples/s is the sampling frequency), while the
Y axis reflects the absolute value of an EMG signal amplitude. Since noise is always present in EMG signals, some preprocessing of EMG signals is needed to obtain robust characterization of the muscle activation sequence. Therefore, in addition to the raw EMG signal, information resulting from signal preprocessing is also shown in
Figure 3.
We define a muscle activation profile as a vector,
, of time differences between the start of swing-related activation in the reference muscle and the other seven muscles. Thus,
consists of seven components
. Channel 8 (left
trapezius) was used as the reference for activation times in this work. The start of swing-related activations, marked by a dashed vertical line ending with a circle in
Figure 3, are to be determined in all eight channels, to obtain components of
. This is not an easy task, since rather substantial activation can be observed in some muscles, even before the golf swing start (see
Figure 3). To detect the beginning of swing-related activations in different channels, the following processing steps are performed.
Filter the signal
in each channel
i by four sliding Gaussian filters
,
,
and
of the following width: 256, 512, 1024, 2048 samples. The resulting signals
are shown in
Figure 3 by the red, green, blue and black curves.
For each channel
i, determine a threshold value
:
where
means the number of elements in the sum,
α and
δ are parameters selected experimentally and
stands for the position of the maximum value of the
i-th channel signal filtered by the
filter. Values of
and
worked well in our study. The determined thresholds are shown on the
Y axis in
Figure 3.
The start of swing-related activation in the
i-th channel
is then given by:
where
stands for three time moments, where
,
and
.
Define
λ as the maximum difference in activation time between the reference channel and all remaining channels:
Inspect and post-process the results manually if
or
, where global
and
are determined experimentally from all EMG channels and swings of all participants.
The determined
values constitute components
and
. In
Figure 3, the determined
values are shown by dashed vertical lines with a circle on the top.
3.2. EMG Features for Predictive Modeling
Pre-processing of an EMG signal usually corresponds to full-wave rectification, followed by filtering of the rectified signal to obtain a smooth linear envelope of the EMG. Most software packages for EMG analysis in sports biomechanics use a fourth-order Butterworth filter [
55]. We apply a second-order Butterworth filter for both high- and low-pass filtering. To eliminate phase lag, resulting from a second-order filter, the signal is filtered again in the reverse direction, which effectively converts the filter to a fourth-order filter. Low-pass filtering follows after demeaning the high-pass filtering result, and the cut-off frequency is 20 and 400 Hz for high- and low-pass, respectively. An example of the filtered signal is provided in the bottom part of
Figure 4. The Butterworth high-pass filter with a corner frequency of 20 Hz is also recommended by [
56] for general use.
Peak detection, which is at the core of feature extraction here, is done using a MATLAB function
ipeak from the package
ipeak7.zip (Version 7.1), created and maintained by Prof. Thomas C. O’Haver [
57]. A program window with the
ipeak function result for the filtered EMG signal from a single channel is displayed in
Figure 5. To find peaks and extract a swing segment from the surface EMG signal, the following steps are performed.
Normalize the rectified signal
in each channel
i to the peak activation level (see
Figure 4):
Filter the normalized signal of each channel
i using the high- and low-pass Butterworth filters (the result can be seen as the resulting black curve in the bottom part of
Figure 4).
In each channel
i, detect all peaks in the filtered signal using the
ipeak function (with maximum peak density parameter
PeakD = 2), and retain the two highest peaks, denoting them in the order of appearance as
and
. In
Figure 5, Peak # 10 would correspond to
, and Peak # 11 would correspond to
.
Pool absolute time-based positions of retained peaks from all channels into a single vector of positions . Enumerate all possible i values into two identical lists and of channel indexes.
Check for outliers among the elements of the vector using the box-plot rule, finding values exceeding the 1.5 inter-quartile range above/below the upper/lower quartile. If outliers are found, remove the numbers of the corresponding channels from and/or .
Compute swing start (
) and end (
) times by using the position and width of the marginal
and
peaks:
where
k is the channel number of the marginal peak and functions
and
return the position and width of the peak.
Retain only the segment from to .
If outliers were found (in Step # 5), proceed from Step # 3.
Convert an absolute position of each peak within the signal into a relative position within the swing by taking into account the swing duration ().
After extracting the swing segment, the feature vector is determined by the feature set, where the very first feature is always sex (0, woman; 1, man), and the remaining features are related to: (1) channel statistics and correlations; (2) properties of the first peak; (3) properties of the second peak; (4) properties derived from both peaks.
Table 1 presents a list of feature sets used in this work. Statistics, used to enrich Feature Sets # 1, 2, 5, 6, 8, 9, 11, 12, 15–22 with various aspects of the distribution in question, encompass the following 12 characteristics: minimum, maximum, mean, median, lower quartile (
), upper quartile (
), standard deviation, inter-quartile range, lower range (median,
), upper range (
, median), skewness and kurtosis. Since there are 4 channels on one side and the EMG dynamics of each channel is compressed into 12 characteristics, Feature Sets # 1, 2 (
ChanStats) contain 48 (4 × 12) features plus an extra feature for the duration in seconds; therefore, 49 features in total.
The main properties of a peak, estimated using the
ipeak function (see the top part of
Figure 5), are the following: position, height, width, area and fitting error. All peak properties are used in Feature Sets # 7, 10, 13, 14, while Feature Sets # 8, 9, 11, 12, 15–22 are calculated using peak position only. Feature Sets # 13, 14 are based on the mathematical operation between properties of the first and second peaks, where Feature Set # 13 uses subtraction and Feature Set # 14 uses the division of the corresponding properties. Features Sets # 7, 10, 13, 14 are constructed considering 5 properties for each of the 8 channels; therefore, they contain 40 (5 × 8) features in total.
Feature Sets # 3–6, 8, 9, 11, 12, 15–22 are derived from all possible channel pairs. Having 8 channels, we obtain 28 arrangements. Correlation-related feature sets are based on the arithmetical operation between Spearman and Pearson correlations for each pair of channels. Feature Sets # 3, 5 (containing CorrDeltain their name) consider subtraction, and Feature Sets # 4, 6 (containing CorrRatioin their name) consider the division of correlations for the corresponding channel pair.
Peak position-based Feature Sets # 8, 9, 11, 12, 15–22 (containing Sync in their name) evaluate peak position concurrency for each pair of channels. The concurrency is measured here as a signed difference for Feature Sets # 8, 11, 15–18 or an absolute difference for Feature Sets # 9, 12, 19–22 (containing Absin their name). Feature Sets # 8, 9, 11, 12 are constructed using the difference between positions of the 1st and 2nd peaks separately. These separate concurrences are further combined by the basic arithmetical operations: subtraction in Feature Sets # 15, 19; addition in Feature Sets # 16, 20; multiplication in Feature Sets # 16, 21; division in Feature Sets # 17, 22.
3.3. Prediction Using Random Forest
Regression models seek to predict club head speed or the ball carry distance of a shot. Binary classification models try to detect the personal effectiveness of speed or distance, that a resulting shot for a player in question is effective or ineffective in terms of speed or distance. Regression has a continuous, while detection has a discreet target attribute. Discretization into labels for the detection task is done with respect to the player-specific average: a shot resulting in being above the personal average club head speed or ball carry distance is considered as effective, and a shot resulting in being below the personal average club head speed or ball carry distance is considered as ineffective.
We used random forest (RF) [
58] for predictive modeling. The MATLAB port of the original RF algorithm was obtained from [
59]. RF can be used for both classification and regression. In classification, RF is a committee of decision trees, where the final output is derived from voting (see
Figure 6). In regression, voting by a committee is replaced with averaging. Random forests have been successfully applied in a variety of fields [
60]. The core idea of RF is to combine many (
B in total) decision trees, built using different bootstrap samples of the original dataset and a random subset (of predetermined size
q) of features
(see
Figure 6). RF is known to be robust against over-fitting, and as the number of trees increases, the generalization error converges to a limit [
58]. Low bias and low correlation are essential for the robust generalization performance of the ensemble. To get low bias, trees are unpruned (grown to the maximum depth). To achieve the low correlation of trees, randomization is applied. RF is constructed in the following steps:
Choose the forest size B as a number of trees to grow and the subspace size as a number of features to provide for each tree node.
Draw a bootstrap sample (random sample with replacement) of the learning dataset, which generally results in unique observations for training, thus leaving for testing as the out-of-bag (OOB) dataset for that particular tree.
Grow an unpruned tree using the bootstrap sample. When growing a tree, at each node, q variables are randomly selected out of the p available.
Repeat Steps 2 and 3 until the size of the forest reaches B.
Because of the “repeated measures” aspect in golf data, where each player is represented by several swings, sampling part of the RF had to be modified to ensure that all swings of each player are either included in a bootstrap sample or left aside as OOB. The list of passed parameters to the external interface function (C++ source code to build MATLAB mex files) was expanded by the additional indexList parameter, containing subject’s unique ID for each swing. The implemented modification of bootstrap sampling converts leave-one-out validation into leave-one-subject-out validation. Furthermore, the RF setting, allowing to perform stratified sampling, was configured in the classification task to preserve the sex ratio of the full dataset in each drawn bootstrap sample.
3.4. Decision-Level Fusion
Individual RFs were built independently on bootstrap sets, and the decisions of these individual experts were combined in a meta-learner fashion. RF was used in this work both as a base learner and as a meta learner. This implies that outputs from RF models from the first stage are treated as inputs (meta-features) for another RF in the second stage. The set of meta-features was slightly expanded by always including sex and intentionally avoiding removal of sex in the decision optimization stage.
For the classification task, an input to the meta-learner is the difference between class
a posteriori probabilities obtained from the base-learner. Given a trained RF, this difference is estimated as:
where
is the object being classified,
L is the number of trees
in the random forest for which observation
is OOB,
q is a class label and
stands for the
q-th class frequency in the leaf node, into which
falls in the
i-th tree
of the forest:
where
Q is the number of classes and
is the number of training data from class
q and falling into the same leaf node of
as
.
For the regression task, input to the meta-learner is the averaged OOB prediction of the base-learner. Given a trained RF, this estimate is provided by:
where
is the data instance represented by the feature set in question,
L is the number of trees
in the random forest for which the observation
is OOB and
corresponds to the predicted value of target attribute.
Meta-features were also investigated by performing permutation-based variable importance analysis. The mean decrease in accuracy (for the detection task) or the mean decrease in root mean squared error (for the regression task) were considered here as variable importance measures. Each meta-feature is processed by permuting its values several times, and the mean difference in RF performance on OOB data is estimated.
3.5. Decision Optimization
In ensemble learning, non-generative ensemble or decision optimization [
61,
62] refers to model selection through tuning of the meta-learner and/or selection of meta-features. RF parameter
B was fixed throughout experiments, but parameter
q was selected in a grid search (for individual RFs) or brute force (for combiner RF) fashion. All features were used for individual RFs. Decision optimization to search out a subset of meta-features was done only for the RF combiner through a specific variant of the evolutionary algorithm. Each generated meta-feature subset was evaluated with all possible
q values, and the best evaluation, summarized using the cost of log-likelihood ratio (C
llr, see
Section 3.6.1 for details), is returned as a result of the fitness function.
We have chosen angle-modulated differential evolution (AMDE) as the decision optimization technique for wrapper-based meta-feature selection in the meta-learner model. AMDE was initially proposed in [
63] as a homomorphous mapping technique enabling the differential evolution algorithm to operate within a binary space. In [
64], AMDE was used to select features for support vector machine, while in [
65], AMDE was applied for decision optimization of meta-learner RF.
The angle modulation (AM) technique uses a generating function to transform a 4-dimensional continuous-valued space into a larger binary-valued
n-dimensional Hamming space (
n is the total number of individual feature sets), which corresponds to convenient tuning of a feature inclusion mask by optimizing 4 continuous parameters. The transformation is achieved using the following trigonometric function:
where the coefficients
a,
b,
c and
d are four optimized parameters determining the shape of the generating function (
a, horizontal shift;
b, maximal
frequency;
c, maximal
frequency;
d, vertical shift) and
x is an element from a set (size
n) of evenly-spaced values, for example, starting from 0.1 with a step of 0.1 until 0.1*
n is reached. We generate a binary selection vector
by:
where
and the
value is calculated using Equation (
10). By proceeding with
k, a vector
of length
n is generated. The vector
represents the potential solution within the binary problem space; in our case, a binary mask of the meta-features to select.