Paediatric heart sound signal analysis towards classification using multifractal spectra
Ana Gavrovska, Goran Zaji, Vesna Bogdanovi et al.
In the past few decades, analysis of heart sound signals (i.e. the
phonocardiogram or PCG), especially for automated heart sound segmentation
and classification, has been widely studied and has been reported to have
Physiol. Meas. 37 (2016) 2181
1. Introduction
Cardiovascular diseases (CVDs) continue to be the leading cause of morbidity and mortality
worldwide. An estimated 17.5 million people died from CVDs in 2012, representing 31% of
all global deaths (WHO 2015). One of the first steps in evaluating the cardiovascular system
in clinical practice is physical examination. Auscultation of the heart sounds is an essential
part of the physical examination and may reveal many pathologic cardiac conditions such as
arrhythmias, valve disease, heart failure, and more. Heart sounds provide important initial
clues in disease evaluation, serve as a guide for further diagnostic examination, and thus play
an important role in the early detection for CVDs.
During the cardiac cycle, the heart first experiences electrical activation, which then leads
to mechanical activity in the form of atrial and ventricular contractions. This in turn forces
blood between the chambers of the heart and around the body, as a result of the opening and
closure of the heart valves. This mechanical activity, and the sudden start or stop of the flow
of blood within the heart, gives rise to vibrations of the entire cardiac structure (Leatham
1975). These vibrations are audible on the chest wall, and listening for specific heart sounds
can give an indication of the health of the heart. An audio recording (or graphical) time series
Figure 1. Phonocardiograms (above) from normal and abnormal heart sounds with
pressure diagrams (below). Red indicates aortic pressure, green ventricular pressure
and blue atrial pressure. Reproduced under the CC BY-SA 3.0 license and adapted from
Madhero (2010).
representation of the resultant sounds, transduced at the chest surface is known as a heart
sound recording or phonocardiogram (PCG).
Four locations are most often used to listen to and transduce the heart sounds, which are
named according to the positions in which the valves can be best heard (Springer 2016):
• Aortic area—centred at the second right intercostal space.
• Pulmonic area—in the second intercostal space along the left sternal border.
• Tricuspid area—in the fourth intercostal space along the left sternal edge.
• Mitral area—at the cardiac apex, in the fifth intercostal space on the midclavicular line.
Fundamental heart sounds (FHSs) usually include the first (S1) and second (S2) heart
sounds (Leatham 1975). S1 occurs at the beginning of isovolumetric ventricular contraction,
when already closed mitral and tricuspid valves suddenly reach their elastic limit due to the
rapid increase in pressure within the ventricles. S2 occurs at the beginning of diastole with
the closure of the aortic and pulmonic valves (see figure 1.) While the FHSs are the most rec-
ognizable sounds of the heart cycle, the mechanical activity of the heart may also cause other
audible sounds, such as the third heart sound (S3), the fourth heart sound (S4), systolic ejec-
tion click (EC), mid-systolic click (MC), the diastolic sound or opening snap (OS), as well as
heart murmurs caused by turbulent, high-velocity flow of blood.
The spectral properties of heart sounds and PCG recording artifacts have been well described
(Leatham 1975). The upper panel of figure 2 shows the frequency distribution examples of dif-
ferent components in heart sound (A from a normal heart sound and B from a heart sound with
S3 component, both recorded at the tricuspid area). As shown, the S1–S4 components overlap
with each other in the frequency domain. Similarly, murmurs and artifacts from respiration
Figure 2. General spectral regions for different heart sounds, and other physiological
sounds during heart sound recordings. Adapted from Leatham (1975) and Springer
and other non-physiological events also overlap significantly. Arrows indicate (theoretical)
typical frequency regions for each type of heart sound: S1 for 10–140 Hz (energy concentra-
tion usually in low frequencies of 25–45 Hz), S2 for 10–200 Hz (energy concentration usually
in low frequencies of 55–75) and S3 and S4 for 20–70 Hz. Murmurs tend to manifest diverse
frequency ranges and depending on their nature they can be as high as 600 Hz. Respiration
usually has a frequency range of 200–700 Hz (Tilkian and Conover 2001). This makes the
separation of heart sounds from each other, and from abnormal sounds or artifacts, impossible
in the frequency domain. The morphological similarity of the noise to normal and abnormal
heart sounds makes identification of the latter also extremely difficult in the time domain. The
lower panel of figure 2 shows the sound pressure levels for different frequency ranges.
Automated analysis of the heart sound in clinical applications usually consists of three
steps shown in figure 3; pre-processing, segmentation and classification. Over the past few
decades, methods for automated segmentation and classification of heart sounds have been
widely studied. Many methods have demonstrated potential to accurately detect pathologies
in clinical applications. Unfortunately, comparisons between techniques have been hindered
by the lack of high-quality, rigorously validated, and standardized databases of heart sound
Figure 3. Typical three steps for automated analysis of heart sound in clinical
signals obtained from a variety of healthy and pathological conditions. In many cases, both
experimental and clinical data are collected at considerable expense, but only analyzed once
by their collectors and then filed away indefinitely, because funding climates change, and col-
laborators move on. Moreover, the activation energy needed to document data for external use,
store and share data in a semi-permanent manner is rarely available at the end of a research
The PhysioNet/Computing in Cardiology Challenge 2016 (PhysioNet/CinC Challenge
2016) attempts to address some of these issues by assembling the research community to
contribute multiple promising databases (Clifford et al 2016). Prior to the PhysioNet/CinC
Challenge 2016 there were only three public heart sound databases available: (i) The Michigan
heart sound and murmur database (UMHS), (ii) The PASCAL database (Bentley et al 2011)
and (iii) The Cardiac Auscultation of Heart Murmurs database (eGeneralMedical). These
three databases can be summarized as follows:
• The Michigan heart sound and murmur database (MHSDB) was provided by the
University of Michigan Health System. It includes only 23 heart sound recordings with
a total of time length of 1496.8 s and is available from
• The PASCAL database comprises 176 recordings for heart sound segmentation and 656
recordings for heart sound classification. Although the number of the recordings is rela-
tively large, the recordings have the limited time length from 1 s to 30 s. They also have a
limited frequency range below 195 Hz due to the applied low-pass filter, which removes
many of the useful heart sound components for clinical diagnosis. It is available from
• The Cardiac Auscultation of Heart Murmurs database is provided by eGeneral Medical
Inc., includes 64 recordings. It is not open and requires payment for access from: www.
It is important to note that these three databases are limited by the recording number, length
or signal frequency range. In addition, two of these databases are intended to teach medi-
cal students auscultation, and therefore comprise high-quality recordings of very pronounced
murmurs, not often seen in real-world recordings. In the PhysioNet/CinC Challenge 2016,
a large collection of heart sound recordings was obtained from different real-world clinical
and nonclinical environments (such as in-home visits). The data include not only clean heart
sounds but also very noisy recordings, providing authenticity to the challenge. The data were
recorded from both normal subjects and pathological patients, and from both children and
adults. The data were also recorded from different locations, depending on the individual
protocols used for each data set. However, they were generally recorded at the four common
recording locations of aortic area, pulmonic area, tricuspid area and mitral area. Although a
limited portion of the data has been held back for test purposes (Challenge scoring), much of
the hidden test data will be released on PhysioNet after the conclusion of the Challenge and
subsequent special issue in the Journal Physiological Measurement. The purpose of this paper
is to provide a detailed description of the heart sound data that comprise the training and test
sets for the PhysioNet/CinC Challenge 2016, and to help researchers improve their algorithms
in the Official Phase of the Challenge.
Table 1 details the composition of the assembled heart sound database. There are a total of nine
heart sound databases collected independently by seven different research teams from seven
countries and three continents, over a period of more than a decade. As a result, the hardware,
recording locations, data quality and patient types differ substantially, and the methods for
identifying gold standard diagnoses also vary. A description of each composite database is
now given. The acoustic data were saved in either the text format or the.wav format.
The Aalborg University heart sounds database (AADHSDB) was contributed by Schmidt et al
(2010a, 2010b, 2015). Heart sound recordings were made from the 4th intercostal space at the
left sternal border on the chest of subjects using a Littmann E4000 electronic stethoscope (3M,
Table 1. Detailed profiles for the assembled heart sound databases for the 2016 PhysioNet/CinC Challenge.
MITHSDB Normal 38 Unknown Unknown Nine different Recorded 117 33 ± 5 One PCG 44 100 Hz Welch Allyn 20 Hz–
MVP 37 recording in-home 134 One ECG Meditron 20 kHz
Benign 34 positions visits or in 118 electronic
AD 5 hospital, 17 stethoscope
MPC 7 uncontrolled 23
AADHSDB Normal 121 Unknown 58/93 Tricuspid Rest 544 8 One PCG 4000 Hz 3M Littmann 20 Hz–
CAD 30 area 151 E4000 1000 Hz
AUTHHSDB Normal 11 29 ± 8 5/6 Apex Rest 11 47 ± 25 Two PCGs 4000 Hz AUDIOSCOPE Unknown
MR 17 75 ± 7 12/5 Auscultation 17 60 ± 30
AS 17 76 ± 10 11/6 positions 17 43 ± 21
TUTHSDB Normal 28 Unknown Unknown Four typical Rest 174 15 One PCG 4000 Hz Unknown Unknown
Pathologic 16
UHAHSDB Normal: NHC 19 18–40 unknown Unknown Rest 19 14 ± 5 One PCG 8000 Hz Prototype Unknown
Normal: 6 unknown unknown 20 10 ± 3 (Infral
MARS500 Corporation)
Pathologic 30 44–90 10/20 40 16 ± 9
DLUTHSDB Normal 174 25 ± 3 2/172 Multi-position Rest or after 338 209 ± 78 PCG, ECG, 800 Hz– MLT201 or Unknown
at chest exercise PPG and 22 050 Hz piezoelectric
RESP sensor
CAD 335 60 ± 12 227/108 Mitral Rest 335 17 ± 12 One PCG 8000 Hz 3M Littmann 1–1000 Hz
SUAHSDB Normal 79 56 ± 16 69/43 Apex Rest 81 33 ± 5 One PCG 8000 Hz JABES 20–1000 Hz
Pathologic 33 33 electronic
SSHHSDB Normal 12 Unknown Unknown 2th intercostal Rest 12 36 ± 12 One PCG 8000 Hz unknown unknown
Pathological 23 23
SUFHSDB Fetal 116 — — Maternal abdomen Rest 119 90 One PCG 8000 Hz, JABES 20–1000 Hz
C Liu et al
Maternal 109 29 ± 6 109/0 Unknown Rest 92 90 One PCG 44 100 Hz electronic
Total — 1297 — — — — 2435 — — — — —
Note: MIT: Massachusetts Institute of Technology, AAD: Aalborg University, AUTH: Aristotle University of Thessaloniki, TUT: K N Toosi University of Technology, UHA: University of Haute Alsace, DLUT: Dalian University of Technology,
SU: Shiraz University, SSH: Skejby Sygehus Hospital, MVP: mitral valve prolapse, Benign: innocent or benign murmurs, AD: aortic disease, MPC: miscellaneous pathological conditions, CAD: coronary artery disease, MR: mitral regurgitation,
AS: aortic stenosis, PCG: phonocardiogram, ECG: electrocardiogram, PPG: photoplethysmogram, RESP: respiratory.
Maplewood, Minnesota). The frequency response of the stethoscope was 20–1000 Hz. The
sample rate was 4000 Hz with 16 bit quantization. A total of 151 subjects were recorded from
patients were referred for coronary angiography at the Cardiology Department at Aalborg
Hospital, Denmark. The aim of the study was diagnosis of coronary artery disease (CAD)
from heart sound, however in the current database normal and abnormal are defined base on if
the patient has a heart valve defect either identified in the patient record or identified by a clear
systolic or diastolic murmur. A total of 30 subjects had heart valve defect and where defined
as abnormal. Patients were asked to breathe normally during the heart sound acquisition and
between one and six PCG recordings were collected from each subject, resulting in a total of
695 recordings. Most of the recordings have a fixed time length of 8 s while a few recordings
have a time length less than 8 s.
The Aristotle University of Thessaloniki heart sounds database (AUTHHSDB) was contrib-
uted by Papadaniil and Hadjileontiadis (2014). Heart sounds were recorded in the first Cardiac
Clinic of Papanikolaou General Hospital in Thessaloniki, Greece, using AUDIOSCOPE, a
custom-made electronic stethoscope that records signals amplified and unfiltered. The sample
rate was 4000 Hz with 16 bit quantization. Forty-five subjects were enrolled within an age
range of 18–90 years; in particular, 11 normal subjects, 17 patients with aortic stenosis (AS)
and 17 patients with mitral regurgitation (MR). The diagnosis and the severity of the heart
valve diseases were determined by the doctors, based on the echocardiogram of the patient.
The recordings were recorded from the auscultation position of the chest where the murmur
is best heard for each valve dysfunction, while the normal heart sounds were recorded from
the apex. Each subject gave one PCG recording (total 45 recordings) and the recordings had
varied time length from 10 s to 122 s (mean ± SD: 50 ± 26 s).
The K N Toosi University of Technology heart sounds database (TUTHSDB) was contrib-
uted by Naseri and Homaeinezhad (2013) and Naseri et al (2013). It includes a total of 28
healthy volunteers and 16 patients with different types of valve diseases. The actual diagnoses
were determined by echocardiography prior to recording of PCG signals. PCG signals were
recorded by using an electronic stethoscope (3M Littmanns 3200) at four different locations
(not simultaneously): pulmonic, aortic, tricuspid and apex at a sampling rate of 4000 Hz with
16 bit amplitude resolution for exactly 15 s each. Two subjects only had 3 PCG recordings,
resulting in a total of 174 PCG recordings.
The University of Haute Alsace heart sounds database (UHAHSDB) was contributed by
Moukadem et al (2011, 2013). Heart sound signals were recorded using prototype stetho-
scopes produced by Infral Corporation (Strasbourg, France). The sample rate was 8000 Hz
with 16 bit quantization. The dataset contains total 79 PCG recordings, including 39 normal
sounds and 40 pathological cardiac sounds. The normal sound recordings were separated into
two sub-files: ‘NHC’ (19 recordings) and ‘MARS500’ (20 recordings). ‘NHC’ recordings
were collected from 19 normal subjects, aged from 18 to 40 years. The recording length
varied from 7 s to 29 s (mean ± SD: 14 ± 5 s). ‘MARS500’ recordings were collected from
The Dalian University of Technology heart sounds database (DLUTHSDB) was contributed
by Tang et al (2010a, 2010b, 2012) and Li et al (2011). Subjects included 174 healthy vol-
unteers (2 female and 172 male, aged from 4 to 35 years, mean ± SD: 25 ± 3 years) and
335 CAD patients (227 female and 108 male, aged from 10 to 88 years, mean ± SD: 60 ±
12 years). Heart sounds from the CAD patients were recorded in the Second Hospital of
Dalian Medical University using an electronic stethoscope (3M Littmann). CAD patients were
confirmed based on the cardiologist’s diagnosis. Only PCG signals were available and all of
them were collected from the mitral position at the chest. Data were saved in the.wav format
using a sampling rate of 8000 Hz with 16 bit quantization. Each patient provided one PCG
recording and there were a total of 335 recordings. The recording length varied from about
3 s to 98 s (mean ± SD: 17 ± 12 s). Heart sound signals from the healthy volunteers were
recorded using a microphone sensor (MLT201, ADinstrument, Australia) or a piezoelectric
sensor (Xinhangxingye Technology Co. Ltd, China) at the Biomedical Engineering Lab in
DLUT, China. Each subject contributed one or several recordings and a total of 338 record-
ings were collected. Recordings included either a single channel (PCG) or several channels
(PCG combined with ECG, photoplethysmogram or respiratory signals). ECG signals were
the standard lead-II ECG. Photoplethysmogram signals were recorded from the carotid artery
or finger. Respiratory signals were collected using a MLT1132 belt transducer (ADinstrument,
Australia) to record chest movement. The recording lengths varied from about 27.5 s to 312.5 s
(mean ± SD: 209 ± 78 s). Various sampling rates were used (800 Hz, 1000 Hz, 2000 Hz,
3000 Hz, 4000 Hz, 8000 Hz or 22 050 Hz) depending on different research aims. All 338
recordings from the healthy volunteers could be separated into two sub-types: recordings dur-
ing rest (218 recordings) where the subjects were in peaceful calm states, and recordings dur-
ing non-resting states (120 recordings). Non-resting recordings were collected immediately
after step climbing (116 recordings), during cycles of breath holding (3 recordings), and after
the bike cycling (1 recording).
The Shiraz University adult heart sounds database (SUAHSDB) was contributed by
Samieinasab and Sameni (2015). This database was constructed using recordings made from
79 healthy subjects and 33 patients (total 69 female and 43 male, aged from 16 to 88 years,
mean ± SD: 56 ± 16 years). The JABES digital electronic stethoscope (GS Technology Co.
Ltd, South Korea) was used, placed on the chest, commonly above the apex region of the
heart. The Audacity cross-platform audio software was used for recording and editing the
signals on a PC. The subjects were asked to relax and breathe normally during the recording
session. The database consists of 114 recordings (each subject/patient had one heart sound
signal but one healthy subject had three), resulting in 81 normal recordings and 33 patho-
logical recordings. The recording length varied from approximately 30 s–60 s (mean ± SD:
33 ± 5 s). The sampling rate was 8000 Hz with 16 bit quantization except for three recordings
at 44 100 Hz and one at 384 000 Hz. The data were recorded in wideband mode of the digital
stethoscope, with a frequency response of 20 Hz–1 kHz.
The Skejby Sygehus Hospital heart sounds database (SSHHSDB) was assembled from patients
referred to Skejby Sygehus Hospital, Denmark. It comprises 35 recordings from 12 normal
subjects and 23 pathological patients with heart valve defect. All recordings are obtained from
the 2nd intercostal room just right to sternum. The recording length varied from approxi-
mately 15 s–69 s (mean ± SD: 36 ± 12 s) and the sampling rate was 8000 Hz.
The Shiraz University fetal heart sounds database (SUFHSDB) was also contributed by
Samieinasab and Sameni (2015). This database was constructed using recordings made from
109 pregnant women (mothers aged from 16 to 47 years, mean ± SD: 29 ± 6 years with
BMI from 19.5 to 38.9, mean ± SD: 29.2 ± 4.0). The JABES digital electronic stethoscope
(GS Technology Co. Ltd, South Korea) was used, and placed on the lower maternal abdo-
men as described in Samieinasab and Sameni (2015). In the case of twins (seven cases) the
data were collected twice according to the locations advised by the expert gynecologist. The
Audacity cross-platform audio software was used for recording and editing the signals on a
PC. In total, 99 subjects had one signal recorded, three subjects had two and seven cases of
twins were recorded individually, resulting in 119 total recordings. The average duration of
each record was about 90 s. The sampling rate was generally 8000 Hz with 16 bit quantiza-
tion and a few recordings were sampled at 44 100 Hz. The data were recorded in wideband
mode of the digital stethoscope, with a frequency response of 20 Hz–1 kHz. In most cases (91
subjects), the heart sounds of the mothers were also recorded before each fetal PCG recording
session. As a result, a total number of 92 maternal heart sounds data (90 subjects had one heart
sound signal but one had two signals recorded) are also available in the dataset.
Note that since the PhysioNet/CinC Challenge 2016 was focused on adult heart sounds,
this SUFHSDB dataset was excluded only from the challenge; but has been included in the
online database. The inclusion of this dataset in the open-access database was provided to
enable researchers to test single channel fetal, maternal, and environmental noise separation
algorithms, although it is not part of the Challenge described in this article.
The segmentation of the FHSs is a first step in the automatic analysis of heart sounds. The
accurate localization of the FHSs is a prerequisite for the identification of the systolic or
diastolic regions, allowing the subsequent classification of pathological situations in these
regions (Liang et al 1997b, Springer et al 2014, Springer 2016). S1 is initiated by the closure
of the atrioventricular valves at the beginning of the systole and occurs immediately after the
Figure 4. Example of an ECG-labelled PCG, with the ECG, PCG and four states of
the heart cycle (S1, systole, S2 and diastole) shown. The R-peak and end-T-wave are
labelled as references for defining the approximate positions of S1 and S2 respectively.
Mid-systolic clicks, typical of mitral valve prolapse, can be seen in the systole states.
Adapted from Springer (2016).
R-peak (ventricular depolarization) of the ECG. S2 is initiated by the closure of the semilunar
valves at the beginning of the diastole and occurs approximately at the end-T-wave of the ECG
(the end of ventricular depolarization). The time order of these features in ECG and PCG is
shown in figure 4 (Springer 2016). In clinical practice, the criteria adopted by the cardiolo-
gist to annotate the beginning and the ending of S1 and S2 sounds was defined as follows: the
beginning of S1 is the start of the high frequency vibration due to mitral closure, the begin-
ning of S2 is the start of the high frequency vibration due to aortic closure, and the endings of
S1 and S2 are annotated by the end of the high frequency vibrations (Moukadem et al 2013).
Many methods of heart sound segmentation have been studied over the past few decades.
The typical methods can be classified into four types: the first type is envelope-based method,
i.e. using a variety of techniques to construct the envelopes of heart sound and thus to perform
the heart sound segmentation; the second one is feature-based method, i.e. by calculating the
features of heart sounds to segment the signal; the third one is the machine learning method
and the last one, also as the state-of-the-art method, is hidden Markov model (HMM) method.
We will give a brief summary for the aforementioned four types of heart sound segmentation
methods. The size of the database of subjects and recordings used in each study, as well as the
numerical results, will be also presented (see table 2).
Shannon energy envelope is the most used envelope for PCG envelope extraction. Liang et al
proposed a normalized average Shannon energy envelope (Liang et al 1997a), which empha-
sized the medium-intensity sounds while attenuating the low-intensity components. The per-
formance of this method was evaluated using 515 PCG cycles from 37 recordings acquired
from children with murmurs and achieved 93% accuracy for PCG segmentation. Another
study from Liang et al employed wavelet decomposition before estimation of the Shannon
envelope and segmented heart sound into four parts: S1, systole, S2 and diastole (Liang et al
Envelope-based method
Liang et al (1997a) Normal and — 37 Each 7–12 s 515 11 025 — — 93
Liang et al (1997b) Normal and — 77 Each 6–13 s 1165 11 025 — — 93
Moukadem et al (2013) Normal — 80 Each 6–12 s — 8000 96 95 —
Pathological 97 95 —
Sun et al (2014) Normal 45 — Total 600 s — 44 100 — — 96.69
Pathological 76 Total 7730 s
MHSDB — 23 Total 1497 s
Choi and Jiang (2008) Normal — — — 500 — — — 100
Pathological — — 88.2
Yan et al (2010) Normal and — 9 Each <5 s — — — — 99.0
Ari et al (2008) Normal and 71 71 — 357 Varied — — 97.47
Feature-based method
Naseri and Pathological — — Total 42 min — 4000 99.00 98.60 —
Homaeinezhad (2013)
Kumar et al (2006) Pathological 55 55 Each <120 s 7530 44 100 97.95 98.20 —
Varghees and Normal and — 64 Each <10 s 701 Varied 99.43 93.56 —
Ramachandran (2014) pathological
Pedrosa et al (2014) Pathological 72 72 Each 60 s — — 89.2 98.6 —
adults and
Vepa et al (2008) Normal and — — — 166 — — — 84.0
Papadaniil and Normal and 43 43 — 2602 44 100 — — 83.05
Hadjileontiadis (2014) pathological
Gharehbaghi et al (2011) Normal and 120 120 Each 10 s 1976 44 100 — — S1: 97
pathological S2: 94
Machine learning method
Oskiper and Normal 30 — Each 20 s — — — — S1 96.2
Watrous (2002)
Sepehri et al (2010) Normal and 60 120 Total 1200 s — — — — 93.6
Chen et al (2009) Normal — 27 Each 30 s 997 — 92.1 88.4 —
Gupta et al (2007) Normal and — 41 — 340 8000 — — 90.29
Tang et al (2012) Normal 3 3 — 565 2000 — — S1 94.9
Pathological 23 23 S2 95.9
Rajan et al (2006) Normal and 42 42 Each 13 s — — — — 90.5
HMM methods
Gamero and Watrous, Normal 80 80 Each 20 s — 11 000 95 97 —
Ricke et al (2005) — 9 9 — — 997 — — 98
Gill et al (2005) Normal 17 44 Each 30–60 s — 4000 S1:98.6 S1:96.9 —
S2:98.3 S2:96.5
Table 2. (Continued)
Segmentation results
1997b). This method was evaluated using 1165 cardiac cycles and resulted in an improved
accuracy from 84% (without wavelet decomposition) to 93% (with wavelet decomposition)
on a set of 77 noisy recordings including both normal and abnormal heart sounds. Moukadem
et al proposed a method to calculate the Shannon energy envelope of the local spectrum cal-
culated by the S-transform for each sample of heart sound signal. This method was evaluated
on 40 normal and 40 pathological heart sound recordings. The sensitivity and positive pre-
dictivity were both higher than 95% for normal and pathological heart sound segmentation
(Moukadem et al 2013).
Envelope extraction based on Hilbert transform can be divided into two aspects: (1) the
envelope is the decimated signal of the real part of a complex analytic signal, and (2) the
instantaneous frequency is the derivative of the imaginary part of complex analytic signal. Sun
et al proposed an automatic segmentation method based on Hilbert transform (Sun et al 2014).
This method considered the characteristics of envelopes near the peaks of S1, the peaks of S2,
the transmission points T12 from S1 to S2, and the transmission points T21 from S2 to S1.
It was validated using 7730 s of heart sounds from pathological patients, 600 s from normal
subjects, and 1496.8 s from Michigan MHSDB database. For the sounds where S1 cannot be
separated from S2, an average accuracy of 96.69% was achieved. For the sounds where S1 can
be separated from S2, an average accuracy of 97.37% was achieved.
Jiang and Choi proposed an envelope extraction method named cardiac sound character-
istic waveform (CSCW) (Jiang and Choi 2006). However, they only reported the example
figures without reporting any quantitative results. In their following study, they compared this
CSCW method with other two popular envelope-based methods: Shannon energy and Hilbert
transform envelopes, and found the CSCW method to be superior to both of these, conclud-
ing that their method led to more accurate segmentation results: 100% and 88.2% on normal
and pathological patients respectively, as compared to 78.2% and 89.4% for the Shannon
energy envelope and 51.4% and 47.3% for the Hilbert transform envelope (Choi and Jiang
2008). However, these results were only evaluated on 500 selected cardiac cycles without a
split between their training and test sets. Yan et al also used a similar characteristic moment
waveform envelope method for segmenting heart sound (Yan et al 2010). This method was
only evaluated on a small dataset of 9 recordings and reported an accuracy of 99.0%, again
without a train-test split.
A simple squared-energy envelope was proposed by Ari et al (2008). It is primarily based
on the use of frequency content present in the signal, calculation of energy in time windows
and timing relations of signal components. It was shown to have a better performance than
Shannon energy envelope when employing a threshold-based detection method. Testing on a
total of 357 cycles from 71 recordings showed the segmentation accuracy is 97.47% (without
a train-test split).
Naseri and Homaeinezhad used frequency- and amplitude-based features, and then employed a
synthetic decision making algorithm for heart sound segmentation (Naseri and Homaeinezhad
2013). The proposed method was applied to 52 PCG signals gathered from patients with dif-
ferent valve diseases and achieved an average sensitivity of 99.00% and positive predictivity
of 98.60%. Kumar et al proposed a detection method based on a high frequency feature,
which is extracted from the heart sound using the fast wavelet decomposition (Kumar et al
2006). This feature is physiologically motivated by the accentuated pressure differences found
across heart valves, both in native and prosthetic valves. The method was validated on patients
with mechanical and bioprosthetic heart valve implants in different locations, as well as with
patients with native valves, and achieved an averaged sensitivity of 97.95% and positive pre-
dictivity of 98.20%.
Varghees and Ramachandran used an instantaneous phase feature from the analytical sig-
nal after calculating the Shannon entropy (Varghees and Ramachandran 2014). This method
is a quite straightforward approach that does not use any search-back steps. It was tested
using both clean and noisy PCG signals with both normal and pathological heart sounds (701
cycles), and achieved an average sensitivity of 99.43% and positive predictivity of 93.56%
without a train-test split. Pedrosa et al used periodic component features from the analysis sig-
nal of the autocorrelation function to segment heart sound signal (Pedrosa et al 2014). Their
method was tested on 72 recordings and had sensitivity and positive predictivity of 89.2% and
98.6% respectively.
Unlike using the absolute amplitude or frequency characteristics of heart sounds, Nigam
and Priemer used complexity-based features by utilizing the underlying complexity of the
dynamical heart sound for PCG segmentation and this method showed good performance on
the synthetic data (Nigam and Priemer 2005). However, this study did not provide any quanti
tative results for evaluation. Vepa et al also used complexity-based features for heart sound
segmentation, which combined energy-based and simplicity-based features computed from
multi-level wavelet decomposition coefficients (Vepa et al 2008). The method was evaluated
on only 166 cycles and achieved an accuracy of 84.0%.
Papadaniil and Hadjileontiadis employed kurtosis-based features alongside ensemble
empirical mode decomposition to select non-Gaussian intrinsic mode functions (IMFs), and
then detected the start and end positions of heart sounds within the selected IMFs (Papadaniil
and Hadjileontiadis 2014). The method was tested on 11 normal subjects and 32 pathological
patients, and achieved an accuracy of 83.05%. In addition, an ECG-referred pediatric heart
sound segmentation method was proposed in Gharehbaghi et al (2011). This algorithm was
applied on 120 recordings of normal and pathological children, totally containing 1976 car-
diac cycles, and achieved accuracy of 97% for S1 and 94% for S2.
Neural network technology is widely used as a typical machine learning method for heart
sound segmentation. Oskiper and Watrous proposed a time-delay neural network method for
detecting the S1 sound (Oskiper and Watrous 2002). The method consists of a single hidden
layer network, with time-delay links connecting the hidden units to the time-frequency energy
coefficients of Morlet wavelet decomposition. The results tested on 30 healthy subjects (with-
out a train-test split) showed an accuracy of 96.2%. Sepehri et al used a multi-layer perceptron
neural network classifier for heart sound segmentation, which paid special attention to the
physiological effects of respiration on pediatric heart sounds (Sepehri et al 2010). A total of
823 cycles from 40 recordings of normal children and 80 recordings of children with con-
genital heart diseases were tested and an accuracy of 93.6% was achieved when splitting the
recordings equally between training and test datasets.
K-means clustering is another widely used method. Chen et al used a K-means clustering
and a threshold method to identify the heart sounds, achieving 92.1% sensitivity and 88.4%
positive predictivity tested on 27 recordings from healthy subjects (Chen et al 2009). Gupta
et al also used K-means clustering combined with homomorphic filtering for segmenting heart
sounds into single cardiac cycle (S1-systole-S2-diastole) (Gupta et al 2007). This method
was tested on 340 cycles and achieved an accuracy of 90.29%. Tang et al employed dynamic
clustering for segmenting heart sounds (Tang et al 2012). In this method, the heart sound
signal was first separated into cardiac cycles based on the instantaneous cycle frequency and
then was decomposed into time-frequency atoms, and finally the atoms of heart sounds were
clustered in time-frequency plane allowing the classification of S1 and S2. The results tested
on 25 subjects showed an accuracy of 94.9% for S1 and 95.9% for S2.
Rajan et al developed an unsupervised segmentation method by first using Morlet wavelet
decomposition to obtain a time-scale representation of the heart sounds and then using an
energy profile of the time-scale representation and a singular value decomposition technique
to identify heart sound segments (Rajan et al 2006). This method was tested on a dataset of
42 adult patients and achieved an accuracy of 90.5%.
used the average duration of these sounds and autocorrelation analysis of systolic and diastolic
durations to derive Gaussian distributions for the expected duration of each of the four states,
i.e. S1, systole, S2 and diastole. The employed features were the homomorphic envelope and
three frequency band features (25–50, 50–100 and 100–150 Hz). These features, along with
the hand-labelled positions of the states, were used to derive Gaussian distribution-based
emission probabilities for the HMM. The duration distributions were then incorporated into
the forward and backward paths of the Viterbi algorithm. The results on the separate test set
were 98.8% sensitivity and 98.6% positive predictivity.
Based on Schmidt et al’s work (Schmidt et al 2010a), Springer et al used the HSMM
method and extended it with the use of logistic regression for emission probability estimation,
to address the problem of accurate segmentation of noisy, real-world heart sound recordings
(Springer et al 2016). Meanwhile, a modified Viterbi algorithm for decoding the most-likely
sequence of states was also implemented. It was evaluated on a large dataset of 10 172 s of
heart sounds recorded from 112 patients and achieved an average F1 score of 95.63% on a
separate test dataset, significantly improving upon the highest score of 86.28% achieved by
the other reported methods in the literature when evaluated on the same test data. Therefore,
this method is regarded as the state-of-the-art method in heart sound segmentation studies.
The automated classification of pathology in heart sounds has been described in the literature
for over 50 years, but accurate diagnosis remains a significant challenge. Gerbarg et al (1963)
were the first to publish on the automatic classification of pathology in heart sounds, (specifi-
cally to aid the identification of children with rheumatic heart disease) and used a threshold-
based method. The typical methods for heart sound classification can be grouped into four
categories: (1) artificial neural network-based classification; (2) support vector machine-based
classification; (3) HMM-based classification and (4) clustering-based classification. The cur
rent prominent works in this field are summarized in table 3. The important notes about the
evaluation of the method, such as whether the data was split into training and test sets, are also
reported. For relative brevity, only the notable studies with sizeable datasets are summarized
in detail below.
The artificial neural network (ANN) is the most widely used machine learning-based approach
for heart sound classification. Unless auto-associative in nature, ANN classifiers require dis-
criminative signal features as inputs. Relatively little work has been performed on optimizing
network architectures in this context. Typical signal features include: wavelet features, time,
frequency and complexity-based features and time-frequency features.
Wavelet-based features are most widely employed in ANN approaches to classification of
heart sounds. Akay et al combined wavelet features with an ANN for the automatic detection
of CAD patients (Akay et al 1994). They computed four features (mean, variance, skewness
and kurtosis) of the extracted coefficients of wavelet transform from the diastolic period of
heart cycles. These features, alongside physical characteristics (sex, age, weight, blood pres
sure), were fed into a fuzzy neural network, and a sensitivity of 85% and a specificity of 89%
on a separate test set of 82 recordings were reported. Liang and Hartimo (1998) employed
wavelet packet decomposition with the aim of differentiating between pathological and inno-
cent murmurs in children when using ANN classification. Eight nodes of the wavelet packet
Table 3. Summary of the previous heart sound classification works.
Author Database Recording length method Features Se (%) Sp (%) Acc (%) Notes on database
Akay et al (1994) 42 normal and Each 10 cycles ANN Wavelet 85 89 86 30 training,
72 CAD patients 82 test
Liang and Hartimo 40 normal and Each 7–12 s ANN Wavelet 80 90 85 65 training,
(1998) 45 pathological 20 test
Uguz (2012a) 40 normal, — ANN Wavelet 100 95.24 98.33 50–50 train-test split
40 pulmonary and
40 mitral stenosis
Bhatikar et al (2005) 88 innocent murmurs Each 10–15 s ANN Frequency 83 90 — 188 training,
and 153 pathological 53 test
Sepehri et al (2008) 36 normal and Each 10 s ANN Frequency 95 93.33 — 40 training,
54 pathological 50 test
Ahlstrom et al (2006) 7 normal, 23 AS Each 12 cycles ANN Complexity — — 86 Cross-validation
and 6 MR
De Vos and 113 normal and Each 6 cycles ANN Time-frequency 90 96.5 — Cross-validation
Blanckenberg (2007) 50 pathological
Uguz (2012b) 40 normal, — ANN Time-frequency 90.48 97.44 95 50–50 train-test split
40 pulmonary and
40 mitral stenosis
Ari et al (2010) 64 patients (normal Each 8 cycles SVM Wavelet — — 86.72 50–50 train-test split
and pathological)
Zheng et al (2015) 40 normal and — SVM Wavelet 93.48 98.55 97.17 Cross-validation
67 pathological
Patidar et al (2015) Total 4628 heart — SVM Wavelet 98.8 99.3 98.9 80% training, 20%
cycles, 626 normal test
and 4002 pathological
Table 3. (Continued)
Author Database Recording length method Features Se (%) Sp (%) Acc (%) Notes on database
Maglogiannis et al 38 normal and — SVM Frequency 87.5 94.74 91.43 Cross-validation
(2009) 160 heart valve
disease patients
Gharehbaghi et al 30 normal, Each 10 s SVM Frequency 86.4 89.3 — 50–50 train-test split
(2015) 26 innocent
and 30 AS
Wang et al (2007) 20 normal and — HMM Signal amplitude, ⩾95.2 ⩾95.3 — No separate training
21 murmurs patients STFT and MFCC and test
Chauhan et al (2008) 20 normal and — HMM Signal amplitude, — — 99.21 No separate training
tree were selected automatically using on an information-based cost function. The cost func-
tion values then served as the feature vector. With a 65/20 patient train/test split they achieved
80% sensitivity and 90% specificity on the test data. Uguz (2012a) employed an ANN with the
features from a discrete wavelet transform and a fuzzy logic approach to perform three-class
classification: normal, pulmonary stenosis, and mitral stenosis. With a 50/50 train/test split
of a dataset of 120 subjects, they reported 100% sensitivity, 95.24% specificity, and 98.33%
average accuracy for the three-classes.
Bhatikar et al (2005) used the fast Fourier transform (FFT) to extract the energy spectrum
features in frequency domain, and then used these as inputs to an ANN. Using a separate
test set of 53 patients they reported 83% sensitivity and 90% specificity when differentiating
between innocent and pathological murmurs. Sepehri et al (2008) identified the five frequency
bands that led to the greatest difference in spectral energy between normal and pathological
recordings and used the spectral energy in these bands as the input features for the ANN.
Reported results on 50 test records were 95% sensitivity and 93.33% specificity for a binary
classification. Ahlstrom et al (2006) assessed a range of non-linear complexity-based features
that had not previously been used for murmur classification. They included up to 207 features
and finally selected 14 features to present to an ANN. They reported 86% classification acc
uracy for a three-class problem: normal, AS and MR.
De Vos and Blanckenberg (2007) used time-frequency features and extracted the energy in
12 frequency bins at 10 equally-spaced time intervals over each heart cycle to presents to an
ANN. They reported a sensitivity and specificity of 90% and 96.5% respectively on 163 test
patients (aged between 2 months and 16 years). Uguz (2012b) also used time-frequency as an
input to an ANN. A total of 120 heart sound recordings, split 50/50 into train/test, and reported
90.48% sensitivity, 97.44% specificity and 95% accuracy for a three-class classification prob-
lem (normal, pulmonary and mitral stenosis heart valve diseases).
A number of researchers have applied a support vector machine (SVM) approach to the heart
sound classification in recent years. Since SVMs are another form of supervised machine
learning, the features chosen are rather similar to those based on ANN approaches.
Wavelet-based features are therefore widely employed in SVM-based methods. Ari et al
(2010) used a least square SVM (LSSVM) method for classification of normal and abnormal
heart sounds based on the wavelet features. The performance of the proposed method was eval-
uated on 64 recordings comprising of normal and pathological cases. The LSSVM was trained
and tested on a 50/50 split (32 patients in each set) and the authors reported an 86.72% accuracy
on their test dataset. Zheng et al (2015) decomposed heart sounds using wavelet packets and
then extracted the energy fraction and sample entropy as features for the SVM input. Tested on
40 normal and 67 pathological patients, they reported a 97.17% accuracy, 93.48% sensitivity
and 98.55% specificity. Patidar et al (2015) investigated the use of the tunable-Q wavelet trans-
form as an input to LSSVM with varying kernel functions. Testing on a dataset of 4628 cycles
from 163 heart sound recordings (and an unknown number of patients) they reported a 98.8%
sensitivity and 99.3% specificity, but without stratifying patients (having mutually exclusive
patients in testing and training sets), and therefore overfitting to their data.
Maglogiannis et al (2009) used Shannon energy and frequency features from four frequency
bands (50–250, 100–300, 150–350, 200–400 Hz) to develop an automated diagnosis system for
the identification of heart valve diseases based on an SVM classifier. Testing on 38 normal and 160
heart valve disease patients they reported an 87.5% sensitivity, 94.74% specificity and 91.43%
accuracy. Gharehbaghi et al (2015) used frequency band power over varying length frames during
systole as input features and used a growing-time SVM (GTSVM) to classify pathological and
innocent murmurs. When using a 50/50 train/test split (from a total of 30 patients with AS, 26
with innocent murmurs and 30 normals), they reported 86.4% sensitivity and 89.3% specificity.
HMM methods are not only widely employed for heart sound segmentation, but are also used
for pathology classification of heart sounds. In the case of classifying pathology, the posterior
probability of the heart sound signal or the extracted features given a trained HMM can be
used to differentiate between healthy and pathological recordings.
Wang et al (2007) used a combination of HMM and mel-frequency cepstral coefficients
(MFCCs) to classify heart sound signals. The feature extraction was performed using three
methods: time-domain feature, short-time Fourier transforms (STFT) and MFCCs. Testing on
20 normal and 21 abnormal patients with murmurs they reported a sensitivity of 95.2% and a
specificity of 95.3%. In a subsequent study, they also used MFCCs to extract representative fea-
tures and developed a HMM-based method for heart sound classification (Chauhan et al 2008).
The method was applied to 1381 cycles of real and simulated, normal and abnormal heart sounds
and they reported an accuracy of 99.21%. However, both studies failed to make use of a separate
test set when evaluating their classification methods and the methods are likely to be highly over-
trained. Saracoglu (2012) applied a HMM in an unconventional manner, by fitting an HMM to
the frequency spectrum extracted from entire heart cycles. The exact classification procedure of
using the HMMs is unclear, but it is thought that they trained four HMMs, and then evaluated the
posterior probability of the features given each model to classify the recordings. They optimized
the HMM parameters and PCA-based feature selection on a training set and reported 95% sen-
sitivity, 98.8% specificity and 97.5% accuracy on a test dataset of 60 recordings.
In summary, although HMM-based approaches are regarded as the state-of-the-art heart
sound segmentation method, their potential to classify heart sounds has not yet been ade-
quately demonstrated.
A number of researchers have made use of the unsupervised k-nearest neighbours (kNN) algo-
rithm to classify pathology in heart sounds. Bentley et al (1998) showed that discrete wavelet
transform features outperformed morphological features (time and frequency features from
S1 and S2) when performing heart sound classification using such a method. They used a
binary kNN classifier and reported 100% and 87% accuracy when detecting pathology in
patients with heart valve disease and prosthetic heart valves respectively on an unspecified
sized database. Quiceno-Manrique et al (2010) used a simple kNN classifier with features
from various time-frequency representations on a subset of 16 normal and 6 pathological
patients. They reported 98% accuracy for discriminating between normal and pathologic
beats. However, the kNN classifier parameters were optimized on the test set, indicating a
likelihood of over-training. Avendano-Valencia et al (2010) also employed time-frequency
features and kNN approach for classifying normal and murmur patients. In order to extract the
most relevant time-frequency features, two specific approaches for dimensionality reduction
were presented in their method: feature extraction by linear decomposition, and tiling parti-
tion of the time-frequency plane. The experiments were carried out using 26 normal and 19
pathological recordings and they reported an average accuracy of 99.0% when using 11-fold
cross-validation with grid-based dimensionality reduction.
Heart sound recordings (from nine independent databases) sourced from seven contributing
research groups described in section 2 (with the exception of the SUFHSDB since it was
from fetal and maternal heart sounds), were used in the Challenge, resulting in eight inde-
pendent heart sound databases. Four of the databases were divided into both training and
test sets with a 70-30 training-test split. The other four databases were exclusively assigned
to either training or test set with the consideration of balancing the data as much as possible
between categories. The Challenge training set includes data from six databases (with file
names prefixed alphabetically, a through f, training sets a through e were provided before the
official phase and training set f was added after the beginning of the official phase) containing
a total of 3153 heart sound recordings from 764 subjects/patients, lasting from 5 s to just over
120 s. The Challenge test set also included data from six databases (b through e, plus g and i)
Figure 5. Example of a heart sound recording segment with good signal quality (A)
and poor signal quality (B).
containing a total of 1277 heart sound recordings from 308 subjects/patients, lasting from
6 s to 104 s. The total number of recordings created for the Challenge was 4430 and is dif-
ferent from the reported number of 2435 in table 1. This is because the 338 recordings from
normal subjects in the DLUTHSDB are generally longer than 100 s and each recording was
segmented into several relatively short recordings. All recordings were resampled to 2000 Hz
using an anti-alias filter and provided as.wav format. Each recording contains only one PCG
lead, except for training set a, which also contains a simultaneously recorded ECG (2016).
In each of the databases, each recording begins with the same letter followed by a sequen-
tial, but random number. Files from the same patient are unlikely to be numerically adjacent.
The training and test sets have each been divided so that they are two sets of mutually exclu-
sive populations (i.e. no recordings from the same subject/patient were in both training and
test sets). Moreover, there are four collected databases that have been semi-randomly placed
exclusively in either the training or test sets (to ensure there are ‘novel’ recording types and
to reduce over-fitting on the recording methods). Databases a and f are found exclusively in
the training set and g and i are exclusively found in the test set. The test set is unavailable to
the public and will remain private for the purpose of scoring. (In the future, as more data are
added, we may release all the data to the public.) Participants may note the existence of a vali-
dation dataset in the data folder. This data is a copy of 301 recordings from the training set,
and is used to validate uploaded entries before their evaluation on the test set.
In both training and test sets, heart sound recordings were divided into two types: normal
and abnormal recordings. The normal recordings were from healthy subjects and the abnormal
ones were from patients with a confirmed cardiac diagnosis. The patients were noted to suffer
from a variety of illnesses (which is not provided here on a case-by-case basis but is detailed
in an online appendix ( to this article for the training
Figure 6. (A) An example of the state labels of a heart sound segment with automatically
generated annotations (using Springer’s segmentation algorithm) and (B) the same data
and annotations after hand-correction.
set data), but typically they are heart valve defects and CAD patients. Heart valve defects
include MVP, MR, aortic regurgitation, AS and valvular surgery. All the recordings from the
patients were generally labeled as abnormal. We do not provide more specific classification
for these abnormal recordings. Please note that both training and test sets are unbalanced, i.e.
the number of normal recordings does not equal that of abnormal ones. Challengers will have
to consider this when they train and test their algorithms.
In addition, to facilitate the challengers in training their algorithms to identify low signal
quality recordings, we provided the labels for ‘unsure’ recordings with poor signal quality
in all training data. We also provided reference annotations for the four heart sound states
(S1, systole, S2 and diastole) for each beat for all recordings that were not belong to ‘unsure’
type. The reference annotations were obtained by using Springer’s segmentation algorithm
(Springer et al 2016) and subsequently manually reviewing and correcting each beat labels,
resulting in a total of 84 425 beats in training set and 32 440 beats in test set after hand cor-
rection. Figure 6 illustrates an example where the automatic segmentation algorithm out-
puts the wrong annotation and the corresponding correct annotation from hand-correction.
Table 4 summarizes the number of patients and recordings, the recording percentages and
time lengths, the percentages of hand corrected recordings and heart beats, as well as the
corresponding number of hand corrected recordings/beats for each database, for both training
and test sets. As shown in table 4, 20.7% of the recordings in the training set and 15.3% of the
recordings in the test set required hand correction, with corresponding percentages of hand
corrected heart beats at 11.7% and 10.9% respectively.
Table 4. Summary of the training and test sets used in 2016 PhysioNet/CinC Challenge.
Proportion of recordings (%) Recording length (s) Hand # Beats (after hand corrected)
Challenge # # Hand corrected corrected
set Sub-set Data source Patients Recordings Abnormal Normal Unsure Min Median Max recordings (%) beats (%) Min Median Max Total
Training Training-a MITHSDB 121 409 67.5 28.4 4.2 9.3 35.6 36.5 28.9 11.6 12 37 78 14 559
Training-b AADHSDB 106 490 14.9 60.2 24.9 5.3 8 8 32.9 25.9 4 9 15 3353
Training-c AUTHHSDB 31 31 64.5 22.6 12.9 9.6 44.4 122.0 67.7 31.5 15 67 143 1808
Training-d UHAHSDB 38 55 47. 3 47. 3 5. 5 6.6 12.3 48.5 56.4 19.5 6 14 72 853
Training-e DLUTHSDB 356 2054 7.1 86.7 6.2 8.1 21.1 101.7 13.7 9.7 4 27 174 59 593
Training-f SUAHSDB 112 114 27.2 68.4 4.4 29.4 31.7 59.6 35.1 16.9 7 39 75 4259
Total/ 764 3153 18.1 73.0 8.8 5.3 20.8 122.0 20.7 11.7 4 26 174 84 425
Test Test-b AADHSDB 45 205 15.6 48.8 35.6 6.3 8 8 35.6 33.7 6 9 16 1269
Test-c AUTHHSDB 14 14 64.3 28.6 7.1 19.3 54.4 86.9 28.6 24.7 32 57 107 853
Test-d UHAHSDB 17 24 45.8 45.8 8.3 6.1 11.4 17.1 37.5 19.7 7 11 24 260
Test-e DLUTHSDB 153 883 6.7 86.4 6.9 8.1 21.8 103.6 11.4 8.8 3 28 169 26 724
Test-g TUTHSDB 44 116 18.1 81.9 0 15 15 15 0 0 9 18 29 2048
Test-i SSHHSDB 35 35 60 34.3 5.7 15.0 31.7 68.8 22.9 26.4 18 36 59 1286
Total/ 308 1277 12.0 77.1 10.9 6.1 17.7 103.6 15.3 10.9 3 24 169 32 440
Table 5. Rules for determining the classification result of current recording from
Challenger’s algorithm.
Challenger report result
Signal Percentages
Diagnosis quality of recordings Abnormal Unsure Normal
Reference label Abnormal (1) Good (1) wa1 Aa1 Aq1 An1
Poor (0) wa2 Aa2 Aq2 An2
Normal (−1) Good (1) wn1 Na1 Nq1 Nn1
Poor (0) wn2 Na2 Nq2 Nn2
Table 6. Numbers of raw and selected recordings for each database in the training set.
The overall score is computed based on the number of recordings classified as normal, abnor-
mal or unsure, in each of the two reference categories. Table 5 shows the rules for determining
the classification result of current recording from Challenger’s algorithm (Clifford et al 2016).
The modified sensitivity (Se) and specificity (Sp) are defined as:
wa1 × Aa1 wa2 × (Aa2 + Aq 2)
Se =
(1) +
Aa1 + Aq1 + An1 Aa2 + Aq 2 + An2
As a basic starting point for the Challenge we provided a benchmark classifier that relied
on relatively obvious parameters extracted from the heart sound segmentation code. For the
Physiol. Meas. 37 (2016) 2181 C Liu et al
pending competition results in the 2016 PhysioNet/CinC Challenge, challengers can refer to
Clifford et al (2016). Here we briefly describe the approach for training and testing the code
on the Challenge training data only.
Since both training and test sets are unbalanced, first, a balanced heart sound database from
training set was selected. (Otherwise, without prior probabilities on the illness, a prevalence
bias would be created.) Table 6 summarizes the numbers of the raw heart sound recordings in
training set, and the numbers of the selected recordings for each training database.
Springer’s segmentation code (Springer et al 2016) was used to segment each selected heart
sound recording to generate the time durations for the four states: S1, systole, S2 and diastole.
Twenty features were extracted from the position information of the four states as follows:
1. m_RR: mean value of RR intervals
2. sd_RR: standard deviation (SD) of RR intervals
3. m_IntS1: mean value of S1 intervals
4. sd_IntS1: SD of S1 intervals
5. m_IntS2: mean value of S2 intervals
6. sd_IntS2: SD of S2 intervals
7. m_IntSys: mean of systolic intervals
8. sd_IntSys: SD of systolic intervals
9. m_IntDia: mean of diastolic intervals
10. sd_IntDia: SD of diastolic intervals
11. m_Ratio_SysRR: mean of the ratio of systolic interval to RR of each heart beat
12. sd_Ratio_SysRR: SD of the ratio of systolic interval to RR of each heart beat
13. m_Ratio_DiaRR: mean of ratio of diastolic interval to RR of each heart beat
14. sd_Ratio_DiaRR: SD of ratio of diastolic interval to RR of each heart beat
15. m_Ratio_SysDia: mean of the ratio of systolic to diastolic interval of each heart beat
16. sd_Ratio_SysDia: SD of the ratio of systolic to diastolic interval of each heart beat
17. m_Amp_SysS1: mean of the ratio of the mean absolute amplitude during systole to that
during the S1 period in each heart beat
18. sd_Amp_SysS1: SD of the ratio of the mean absolute amplitude during systole to that
during the S1 period in each heart beat
19. m_Amp_DiaS2: mean of the ratio of the mean absolute amplitude during diastole to that
during the S2 period in each heart beat
20. sd_Amp_DiaS2: SD of the ratio of the mean absolute amplitude during diastole to that
during the S2 period in each heart beat
Logistic regression (LR) allows the identification of the impact of multiple independent vari-
ables in predicting the membership of one of the multiple dependent categories. Binary logis-
tic regression (BLR) is an extension of linear regression, to address the fact that the latter
struggles with dichotomous problems. This difficulty is overcome by applying a mathematical
transformation of the output of the classifier, transforming it into a bounded value between 0
and 1 more appropriate for binary predictions.
In the current study, the output variable Y is a positive (1, abnormal) or negative (−1, normal)
classification for heart sound recording.
All 20 features were tested and a forward likelihood ratio selection was used, in order of
likelihood. If the accuracy of the model exhibited a statistical difference with the model prior
to the addition of a feature, the newly added feature is included in the model. The forward
selection is terminated if the newly added feature did not significantly improve the normal/
abnormal classification results. In this way, correlated predictors are unlikely to be included
in the model, but it does not guarantee an optimal combination of features. Moreover, we note
that the features we have chosen are by no means likely to include the most useful features.
6.4. Feature results comparison between the selected balanced data from training set
Table 7 shows the average values of all 20 features for normal and abnormal heart sound
recordings on the selected balanced data from training set. The Kolmogorov-Smirnov test
for verifying the normal distribution of all features was applied using the SPSS Statistics
19 software package (SPSS Inc., USA). The results showed that only the sd_Ratio_DiaRR
feature exhibited Gaussian distributions in both normal and abnormal groups. Therefore, the
group t test was performed for the sd_Ratio_DiaRR feature and a Wilcoxon rank sum test
was performed for other 19 features to test the statistical differences between the two groups.
The results showed that 13 features exhibited statistical differences between the two groups
whereas 7 features did not exhibit statistically significant differences.
Table 8. BLR results (equation (3)) of the Aa, An, Na and Nn numbers and the three
indices (Se, Sp and Score) for all selected balanced training database: 472 abnormal and
472 normal recordings.
Aa An Na Nn Se Sp Score
293 179 141 331 0.62 0.70 0.66
Equation (3) shows the derived BLR prediction formula with the corresponding regression
coefficients for normal/abnormal heart sound recordings classification on all selected bal-
anced data from training set. Seven features were identified as the predictable features, includ-
ing: sd_RR, sd_IntS1, m_IntS2, sd_IntS2, sd_IntSys, m_IntDia and sd_Ratio_SysDia.
z = wT X = 0.062-0.013 × sd_RR + 0.067 × sd_IntS1-0.032 × m_IntS2
+ 0.041 × sd_IntS2 + 0.058 × sd_IntSys
+ 0.002 × m_IntDia + 0.035 × sd_Ratio_SysDia (3)
Table 8 provides the results of Aa, An, Na and Nn numbers and the three evaluation metrics
(Se, Sp and Score) defined in section 5.3. Using equation (3), the normal/abnormal classifica-
tion results were 0.62 for Se, 0.70 for Sp and a Challenge Score of 0.66 on the training data.
We also use both a K = 10-fold cross validation, stratifying by patient, and a leave-one-out
(database) cross validation, stratifying by database to test the performances of BLR model on
all selected balanced training data. This is important to note, since including patients in the
training data and reporting on test data that includes the same data will give a falsely inflated
accuracy. Similarly, using a leave-one-out approach to each database, provides a deeper under-
standing of which databases can result in heavy biases, and may help provide a more accurate
estimate of the out of sample accuracy of the algorithm. Tables 9 and 10 show the corresponding
results from 10-fold cross validation and leave-one-out cross validation. Note that the results
are subject to statistical variation because of the subsampling. We also note that the average
running time on the training set used 5.26% of quota and 5.22% of quota on the hidden test
set using Matlab 2016a. We note that this classification algorithm is not intended to provide
a sensible way to classify the recordings, but rather to illustrate how a simple algorithm can
achieve basic results, but that the results will also vary highly based on which databases are
used to train and test the classifiers. We also note that improving the segmentation algorithm
may be key to improving the results of any given classifier. Finally, we note that our classifier
did not attempt to label any recordings as unknown or unreadable. Any useful algorithm must
endeavor to do so, since the intention is for this algorithm to be used at the source of recording,
where a re-recording can be triggered in the event that an automated algorithm is likely to fail.
Differentiating abnormality from noise is often a difficult but critical issue in biomedical signal
analysis, as we have noted in previous competitions (Clifford and Moody 2012).
The public release of the heart sound database has many potential benefits to a wide range
of users. First, those who lack access to well-characterized real clinical signals may benefit
from access to these data for developing prototype algorithms. The availability of these data
can encourage researchers from a variety of backgrounds to develop innovative methods
to tackle problems in heart sound signal processing that they might not otherwise have
Table 9. K = 10-fold cross validation results for all selected balanced training database:
472 abnormal and 472 normal recordings.
K-fold (10-fold) cross validation on the selected balanced training set
Fold iterate Aa An Na Nn Se Sp Score
1 30 17 13 34 0.64 0.72 0.68
2 25 22 18 29 0.53 0.62 0.57
3 30 17 16 32 0.64 0.67 0.65
4 31 17 14 33 0.65 0.70 0.67
5 31 16 11 36 0.66 0.77 0.71
6 30 17 16 31 0.64 0.66 0.65
7 21 26 18 29 0.45 0.62 0.53
8 29 18 16 31 0.62 0.66 0.64
9 30 18 10 38 0.63 0.79 0.71
10 30 17 14 33 0.64 0.70 0.67
Mean 29 19 15 33 0.61 0.69 0.65
SD 3 3 3 3 0.07 0.06 0.06
Table 10. Balanced leave-one-out cross validation results for all training databases:
472 abnormal and 472 normal recordings.
Leave-one-out cross validation on the balanced training set
Excluded database Aa An Na Nn Se Sp Score
Training-a 28 89 23 94 0.24 0.80 0.52
Training-b 84 20 91 13 0.81 0.13 0.47
Training-c 6 1 1 6 0.86 0.86 0.86
Training-d 4 23 5 22 0.15 0.81 0.48
Training-e 134 49 69 114 0.73 0.62 0.68
Training-f 33 1 30 4 0.97 0.12 0.54
Mean — — — — 0.63 0.56 0.59
SD — — — — 0.34 0.34 0.15
An additional benefit is that the data can be re-evaluated with new advances in machine
learning and signal processing as they become available. The public data are also essential
resources for developers and evaluators who need to test their algorithms with realistic data
and to perform these tests repeatedly and reproducibly on a public platform.
In addition, these databases have value in medical and biomedical engineering education
by providing well-documented heart sound recordings from both healthy subjects and patients
with a variety of clinically significant diseases. By making well-characterized clinical data
available to educational institutions, these databases will make it possible to answer numerous
physiological or pathological questions without the need to develop a new set of reference data.
The availability of open source state of the art signal processing algorithms for heart sound
segmentation provided for the competition, and the subsequent open source classification
algorithms provided by competitors is likely to provide an impulse into the field and raise the
benchmark for FDA approval and diagnostic performance of industrial systems (Goldberger
et al 2000). We hope that this new heart sound database will help realize these benefits and
their often-unanticipated rewards to those with an interest in heart sound signal processing.
We wish to thank the providers of the heart sound databases described in this paper and made
available for the competition:
• The MITHSDB was provided by Prof John Guttag and Dr Zeeshan Syed from MIT.
• The AADHSDB was provided by Dr Samuel E Schmidt from Aalborg University.
• The AUTHHSDB was provided by Dr Chrysa D Papadaniil from Aristotle University of
• The TUTHSDB was provided by Dr Hosein Naseri from K N Toosi University of
• The UHAHSDB was provided by Dr Ali Moukadema from University of Haute Alsace.
• The DLUTHSDB was provided by Dr Hong Tang from Dalian University of Technology.
• The SUAHSDB and SUFHSDB were provided by Dr Reza Sameni from Shiraz University
and annotated by Dr Mohammad Reza Samieinasab from Isfahan University of Medical
Sciences. The two datasets were recorded as part of the MS thesis of Ms Maryam
Samieinasab at Shiraz University. The authors would like to thank Dr M Hosseiniasl and
Ms Nasihatkon from Shiraz Hafez Hospital, for their valuable assistance during fetal
PCG recordings.
• The SSHHSDB was provided by the company of Medicom Innovation Partner and
Mr Bjørn Knud Andersen at
This work was supported by the National Institutes of Health (NIH) grant R01-EB001659 from
the National Institute of Biomedical Imaging and Bioengineering (NIBIB) and R01GM104987
from the National Institute of General Medical Sciences.
