Figure 1.
Picture diagrams of 12 empathetic and 12 non-empathetic advertisements in experiment one.
Figure 1.
Picture diagrams of 12 empathetic and 12 non-empathetic advertisements in experiment one.
Figure 2.
Picture diagrams of 4 empathetic and 4 non-empathetic advertisements in experiment two.
Figure 2.
Picture diagrams of 4 empathetic and 4 non-empathetic advertisements in experiment two.
Figure 3.
Flow chart of an MFCC feature extraction method.
Figure 3.
Flow chart of an MFCC feature extraction method.
Figure 4.
Raw signal in the time domain.
Figure 4.
Raw signal in the time domain.
Figure 5.
Raw signal in the frequency domain.
Figure 5.
Raw signal in the frequency domain.
Figure 6.
Audio signal in the time domain after the pre-emphasis filter process.
Figure 6.
Audio signal in the time domain after the pre-emphasis filter process.
Figure 7.
Audio signal in the frequency domain after the pre-emphasis filter process.
Figure 7.
Audio signal in the frequency domain after the pre-emphasis filter process.
Figure 8.
The power spectrum (periodogram) of audio signal was computed after the STFT process.
Figure 8.
The power spectrum (periodogram) of audio signal was computed after the STFT process.
Figure 9.
The shape of the mel filter and mel inversion filter.
Figure 9.
The shape of the mel filter and mel inversion filter.
Figure 10.
The mel spectrogram after the DCT process.
Figure 10.
The mel spectrogram after the DCT process.
Figure 11.
The flow chart of the second algorithm.
Figure 11.
The flow chart of the second algorithm.
Figure 12.
The spectrogram color map of audio has been split into the RGB images. (a) The default color picture of MFCC; The RGB pictures from blue (d) to green (c) to red (b).
Figure 12.
The spectrogram color map of audio has been split into the RGB images. (a) The default color picture of MFCC; The RGB pictures from blue (d) to green (c) to red (b).
Figure 13.
The flow chart of the third algorithm.
Figure 13.
The flow chart of the third algorithm.
Figure 14.
The flow chart of the three methods for empathy evaluation in audio signal processing.
Figure 14.
The flow chart of the three methods for empathy evaluation in audio signal processing.
Figure 15.
The flow chart of three methods for empathy evaluation in audio signal processing.
Figure 15.
The flow chart of three methods for empathy evaluation in audio signal processing.
Figure 16.
The feature values of the audio signal are used to form three sets of visual images obtained by using different kernel functions.
Figure 16.
The feature values of the audio signal are used to form three sets of visual images obtained by using different kernel functions.
Table 1.
Summary of the features of audio.
Table 1.
Summary of the features of audio.
Spectral | Spectral centroid, spectral roll off, spectral flux, MFCCs. |
Temporal | Zero-crossing profile, key clarity, harmonic change, musical mode. |
Rhythmic | Beat histogram, average tempo (BPM). |
Table 2.
The average error (coefficients F1, F2, F3, F4, F5, F6, F7, F8, F10) (p < 0.001) of the characteristic values of acoustic and physical elements is the difference between empathetic and non-empathetic labels in experiment one.
Table 2.
The average error (coefficients F1, F2, F3, F4, F5, F6, F7, F8, F10) (p < 0.001) of the characteristic values of acoustic and physical elements is the difference between empathetic and non-empathetic labels in experiment one.
Label | F1 | F2 | F3 | F4 | F5 |
---|
| Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std |
---|
Non-empathetic | 1.11 | 0.41 | −92.85 | 0.68 | 44.63 | 0.89 | −153.48 | 0.98 | 107.12 | 1.16 |
Empathetic | −30.97 | 0.68 | −127.33 | 1.16 | 71.69 | 1.49 | −172.1 | 1.75 | 112.66 | 1.94 |
Label | F6 | F7 | F8 | F10 | |
| Mean | Std | Mean | Std | Mean | Std | Mean | Std | | |
Non-empathetic | −198.14 | 1.2 | 76.23 | 1.41 | −145.67 | 1.32 | −118.32 | 1.34 | | |
Empathetic | −240.14 | 2.12 | 81.7 | 2.41 | 114.56 | 2.22 | −138.58 | 2.19 | | |
Table 3.
The average error (coefficients F1, F2, F3, F4, F5, F6, F7, F8, F10) (p < 0.001) of the characteristic values of acoustic and physical elements is the difference between empathetic and non-empathetic labels in experiment two.
Table 3.
The average error (coefficients F1, F2, F3, F4, F5, F6, F7, F8, F10) (p < 0.001) of the characteristic values of acoustic and physical elements is the difference between empathetic and non-empathetic labels in experiment two.
Label | F1 | F2 | F3 | F4 | F5 |
---|
| Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std |
---|
Empathetic | −240.02 | 1.84 | 133.42 | 0.55 | −32.75 | 0.42 | 16.39 | 0.27 | −12.2 | 0.32 |
Non-empathetic | −351.08 | 2.51 | 121.99 | 0.78 | −13.71 | 0.91 | 23.8 | 0.35 | −1.69 | 0.46 |
Label | F6 | F7 | F8 | F10 | |
| Mean | Std | Mean | Std | Mean | Std | Mean | Std | | |
Empathetic | 11.9 | 0.23 | −8.47 | 0.18 | −2.52 | 0.17 | −1.87 | 0.16 | | |
Non-empathetic | 5.28 | 0.2 | −1.73 | 0.21 | 1.13 | 0.13 | −2.92 | 0.16 | | |
Table 4.
Average error (p < 0.001) of characteristic values of audio and physical elements (low power, medium power, high power, low frequency, middle frequency, and high frequency) compared with empathetic and non-empathetic labels in experiment one.
Table 4.
Average error (p < 0.001) of characteristic values of audio and physical elements (low power, medium power, high power, low frequency, middle frequency, and high frequency) compared with empathetic and non-empathetic labels in experiment one.
Label | R-High | G-High | B-High | R-Middle |
---|
| Mean | Std | Mean | Std | Mean | Std | Mean | Std |
---|
Non-empathetic | 48.8 | 0.1 | 119.1 | 0.2 | 137.6 | 0.07 | 61.6 | 0.45 |
Empathetic | 48.1 | 0.06 | 118 | 0.2 | 137.9 | 0.03 | 59 | 0.35 |
Label | G-Middle | B-Middle | R-Low | B-Low |
| Mean | Std | Mean | Std | Mean | Std | Mean | Std |
Non-empathetic | −174.6 | 0.38 | 118.7 | 0.3 | 98.3 | 0.4 | 94.6 | 0.2 |
Empathetic | −171.6 | 0.36 | 120.6 | 0.2 | 96.6 | 0.4 | 95.5 | 0.2 |
Table 5.
Average error (p < 0.001) of characteristic values of audio and physical elements (low power, medium power, high power, low frequency, middle frequency, and high frequency) compared with empathetic and non-empathetic labels in experiment two.
Table 5.
Average error (p < 0.001) of characteristic values of audio and physical elements (low power, medium power, high power, low frequency, middle frequency, and high frequency) compared with empathetic and non-empathetic labels in experiment two.
Label | R-High | G-High | B-High | R-Middle |
---|
| Mean | Std | Mean | Std | Mean | Std | Mean | Std |
---|
Non-empathetic | 67.1 | 0.78 | 119.1 | 0.2 | 117.7 | 0.53 | 68.3 | 0.76 |
Empathetic | 60.0 | 0.62 | 118.0 | 0.2 | 122.3 | 0.43 | 61.7 | 0.6 |
Label | G-Middle | B-Middle | R-Low | B-Low |
| Mean | Std | Mean | Std | Mean | Std | Mean | Std |
Non-empathetic | 168.5 | 0.63 | 116.4 | 0.51 | 65.7 | 0.78 | 116.7 | 0.53 |
Empathetic | 164.9 | 0.52 | 120.6 | 0.41 | 58.6 | 0.63 | 121.2 | 0.43 |
Table 6.
Average error of characteristic value of audio and physical element features used for difference comparison between empathetic/non-empathetic labels in experiment one (low-power average, low-power standard, medium-power average, medium-power standard, and high-power means high-power standard) (p < 0.001).
Table 6.
Average error of characteristic value of audio and physical element features used for difference comparison between empathetic/non-empathetic labels in experiment one (low-power average, low-power standard, medium-power average, medium-power standard, and high-power means high-power standard) (p < 0.001).
Label | Low-Power Mean | Low-Power Std | Middle-Power Mean |
---|
| Mean | Std | Mean | Std | Mean | Std |
---|
Non-empathetic | 2371.9 | 29.2 | 9412.2 | 123.1 | 74.87 | 3.47 |
Empathetic | 1587.4 | 25.8 | 6086.1 | 106.4 | 92.36 | 4.97 |
Label | Middle-Power Std | High-Power Mean | High-Power Std |
| Mean | Std | Mean | Std | Mean | Std |
Non-empathetic | 213.7 | 10.8 | 0.86 | 0.05 | 4.31 | 0.33 |
Empathetic | 227.6 | 12.8 | 1.48 | 0.09 | 7.61 | 0.47 |
Table 7.
Average error of characteristic value of audio and physical element features used for difference comparison between empathetic/non-empathetic labels in experiment two, (low-power average, low-power standard, medium-power average, medium-power standard, and high-power means high-power standard) (p < 0.001).
Table 7.
Average error of characteristic value of audio and physical element features used for difference comparison between empathetic/non-empathetic labels in experiment two, (low-power average, low-power standard, medium-power average, medium-power standard, and high-power means high-power standard) (p < 0.001).
Label | Low-Power Mean | Low-Power Std | Middle-Power Mean |
---|
| Mean | Std | Mean | Std | Mean | Std |
---|
Non-empathetic | 268.5 | 3.19 | 7429.3 | 71.82 | 0.067 | 0.003 |
Empathetic | 133.9 | 12.67 | 1531 | 19.19 | 0.199 | 0.007 |
Label | Middle-Power Std | High-Power Mean | High-Power Std |
| Mean | Std | Mean | Std | Mean | Std |
Non-empathetic | 0.33 | 0.01 | 36.22 | 0.56 | 237 | 3.37 |
Empathetic | 0.86 | 0.03 | 36.77 | 0.46 | 237.5 | 3.94 |
Table 8.
The feature values obtained by the three audio extraction methods are training accuracy and test accuracy after machine learning of the BT, DT, MLP, RF, KNN, and SVM classifiers.
Table 8.
The feature values obtained by the three audio extraction methods are training accuracy and test accuracy after machine learning of the BT, DT, MLP, RF, KNN, and SVM classifiers.
Classifier Method | Training Accuracy | Test Accuracy |
---|
Extraction feature of method 1 | |
AdaBoost | 66.284% | 54.212% |
Decision tree | 73.244% | 64.956% |
Random forest | 76.53% | 70.791% |
SVM | 74% | 66.5% |
KNN | 65.3% | 66.2% |
MLP | 55.3% | 55.1% |
Extraction feature of method 2 | |
AdaBoost | 72.92% | 68.5% |
Decision tree | 77.842% | 72.66% |
Random forest | 78.732% | 72.721% |
SVM | 61.2% | 58.5% |
KNN | 70.2% | 61.7% |
MLP | 58.5% | 50.5% |
Extraction feature of method 3 | |
AdaBoost | 78.092% | 78.125% |
Decision tree | 83.824% | 81.196% |
Random forest | 99.173% | 86.171% |
SVM | 75.3% | 62.3% |
KNN | 71.2% | 60.5% |
MLP | 67.823% | 67.9%. |
Table 9.
The evaluation result of a learning rate of 0.1, 0.4, 0.8 in the AdaBoost model.
Table 9.
The evaluation result of a learning rate of 0.1, 0.4, 0.8 in the AdaBoost model.
Model | Label | Precision | Recall | F1-Score | Support |
---|
Learning rate of 0.1 | Non-empathetic | 0.70 | 0.78 | 0.74 | 2299 |
Empathetic | 0.75 | 0.66 | 0.70 | 2017 |
Avg/total | 0.73 | 0.72 | 0.72 | 4316 |
Learning rate of 0.4 | Non-empathetic | 0.76 | 0.80 | 0.78 | 2299 |
Empathetic | 0.78 | 0.74 | 0.76 | 2017 |
Avg/total | 0.77 | 0.77 | 0.77 | 4316 |
Learning rate of 0.8 | Non-empathetic | 0.79 | 0.77 | 0.78 | 2299 |
Empathetic | 0.77 | 0.79 | 0.78 | 2017 |
Avg/total | 0.78 | 0.78 | 0.78 | 4316 |
Table 10.
The evaluation result of the number of trees being 10, 40, 80 in the random forest model.
Table 10.
The evaluation result of the number of trees being 10, 40, 80 in the random forest model.
Model | Label | Precision | Recall | F1-Score | Support |
---|
Number trees of 10 | Non-empathetic | 0.82 | 0.88 | 0.85 | 2299 |
Empathetic | 0.87 | 0.81 | 0.84 | 2017 |
Avg/total | 0.85 | 0.84 | 0.84 | 4316 |
Number trees of 40 | Non-empathetic | 0.85 | 0.87 | 0.86 | 2299 |
Empathetic | 0.87 | 0.84 | 0.86 | 2017 |
Avg/total | 0.86 | 0.86 | 0.86 | 4316 |
Number trees of 80 | Non-empathetic | 0.86 | 0.87 | 0.86 | 2299 |
Empathetic | 0.87 | 0.85 | 0.86 | 2017 |
Avg/total | 0.86 | 0.86 | 0.86 | 4316 |
Table 11.
The evaluation result of a maximum depth of 10, 50, 100 in the decision tree model.
Table 11.
The evaluation result of a maximum depth of 10, 50, 100 in the decision tree model.
Model | Label | Precision | Recall | F1-Score | Support |
---|
Maximum depth of 10 | Non-empathetic | 0.82 | 0.81 | 0.81 | 2299 |
Empathetic | 0.81 | 0.82 | 0.81 | 2017 |
Avg/total | 0.81 | 0.81 | 0.81 | 4316 |
Maximum depth of 50 | Non-empathetic | 0.81 | 0.82 | 0.81 | 2299 |
Empathetic | 0.81 | 0.80 | 0.81 | 2017 |
Avg/total | 0.81 | 0.81 | 0.81 | 4316 |
Maximum depth of 100 | Non-empathetic | 0.81 | 0.81 | 0.81 | 2299 |
Empathetic | 0.81 | 0.80 | 0.81 | 2017 |
Avg/total | 0.81 | 0.81 | 0.81 | 4316 |
Table 12.
The evaluation result of the iteration of ordinal numbers 200, 350, 500 in the MLP model.
Table 12.
The evaluation result of the iteration of ordinal numbers 200, 350, 500 in the MLP model.
Model | Label | Precision | Recall | F1-Score | Support |
---|
Iteration ordinal number 200 | Non-empathetic | 0.62 | 0.93 | 0.75 | 2299 |
Empathetic | 0.86 | 0.42 | 0.56 | 2017 |
Avg/total | 0.74 | 0.68 | 0.65 | 4316 |
Iteration ordinal number 350 | Non-empathetic | 0.62 | 0.95 | 0.75 | 2299 |
Empathetic | 0.88 | 0.41 | 0.56 | 2017 |
Avg/total | 0.75 | 0.68 | 0.65 | 4316 |
Iteration ordinal number 500 | Non-empathetic | 0.62 | 0.95 | 0.75 | 2299 |
Empathetic | 0.88 | 0.41 | 0.56 | 2017 |
Avg/total | 0.75 | 0.68 | 0.65 | 4316 |
Table 13.
The evaluation result of NuSVC, SVC, and LinearSVC in the SVM model.
Table 13.
The evaluation result of NuSVC, SVC, and LinearSVC in the SVM model.
Model | | Precision | Recall | F1-Score | Support |
---|
svm.NuSVC | 0 | 0.61 | 0.73 | 0.67 | 2299 |
1 | 0.64 | 0.51 | 0.57 | 2017 |
Avg/total | 0.58 | 0.52 | 0.47 | 4316 |
svm.SVC | 0 | 0.77 | 0.77 | 0.72 | 2299 |
1 | 0.71 | 0.75 | 0.67 | 2017 |
Avg/total | 0.74 | 0.76 | 0.69 | 4316 |
svm.LinearSVC | 0 | 0.66 | 0.69 | 0.68 | 2299 |
| 1 | 0.66 | 0.63 | 0.64 | 2017 |
| Avg/total | 0.58 | 0.66 | 0.66 | 4316 |