Movement Recognition using
the Accelerometer in Smartphones
Sian Lun LAU1, Klaus DAVID1
1Chair for Communication Technology,
University of Kassel,
Wilhelmshöher Allee 73, Kassel, 34119, Germany
Tel: +49 561 804 6446, Fax: + 49 561 804 6360, Email: comtec@uni-kassel.de
Abstract: The area of activity recognition is essential for context-aware systems.
Previous and current investigations demonstrate that the accelerometer is suitable for
accurate movement and activity recognition. Since smartphones are used by people
in their daily lives, they can be seen as an attractive sensor device for the purpose of
activity recognition. In our work, experiments have been carried out to investigate
the suitability of the built-in accelerometer by comparing the influences of
classification algorithms, features and the combination of sampling rates and
window sizes for features extraction have on the classification accuracy. Obtained
results indicate that smartphones similar to the test device provide good accuracy in
recognizing common movements.
Keywords: context-awareness, activity recognition, classification, smartphone.
1. Introduction
In the past decades, much work has been carried out in the area of situation and activity
recognition. The different approaches can be summarized as follows: By attaching different
sensors to the human body and its surrounding, the acquired information will be processed,
analyzed, and evaluated to interpret further information. In the area of context-awareness,
this obtained information is known as contexts. A context-aware system can “understand”
the situation of a user with the presence of contexts and then can provide suitable service
behaviour adaptation.
The previous work explored the possibilities in using wearable sensors to recognize
movement and activities. While the results showed great potential, in our opinion the
placing of multiple sensors on the human body is simply not practical for the everyday user.
For example, in [1] and [2] multiple sensors were placed all over the body. Some of the
work was carried out in laboratory environments [2][3][4]. In order to enable such
approaches to be used in our daily lives, the sensor devices have to be non-obtrusive for the
users targeted. In the near future, we expect some of these prototypes to be made into
miniature wearable sensors. However, even at present there are already some commercially
available products such as mobile phones and especially smartphones that have the
potential to be used as suitable sensor devices.
Mobiles phones are common items in most countries today [5]. Their widespread usage
is advantageous for the realisation of smart adaptive systems, such as context-aware
systems. The number of mobile phones that have built-in sensors and multiple connectivity
possibilities are increasing throughout the past years, making these mobile phones a
potential interface and mediator between users and context-aware systems [6]. For example,
one can collect accelerometer sensor values and ambient sound from a user’s surrounding
and send this information via Wireless LAN or 3G network to a server for processing and
recording. As compared to the work of a few years ago, it is possible to realise this
approach using a commercially available mobile phone such as a Nokia N95 or an Apple
iPhone. These mobile phones, also commonly called smartphones, provide advanced
functionality beyond voice calls and short message services.
The recognition tasks performed using a smartphone can be used in various application
scenarios. For example, the system can log a user’s movements through out his daily life.
This can be a useful method to monitor patients or elderly people living alone. The
movement contexts can be used to derive higher activity contexts such as the event a person
is currently attending or the activity he is undertaking. These contexts can then be applied
in context-aware systems to provide adaptive services to the user.
In this paper, we will investigate the following issues:
1. How suitable are the accelerometers found in common smartphones in the area of
movement recognition as compared to approaches using dedicated accelerometers?
2. What are the possible algorithms and features that can provide accurate movement
recognition using smartphones?
3. Which sampling rate and window size combination for features extraction will provide
better movement recognition using smartphones?
The structure of this paper is as follows: In Section 2, a summary of related work will
be presented. Section 3 will elaborate on the approach and idea, and Section 4 will present
the experiments performed. The results will be discussed in Section 5 and the paper will be
concluded in Section 6.
2. Related work
The earlier work investigated the possibilities to use accelerometers as main sensor data for
activity recognition. Work such as [1], [2], [3], [4], and [7] demonstrated that the usage of
dedicated accelerometers can provide good results in the area of activity recognition. The
different investigations have one thing in common - multiple accelerometers were placed on
different parts of the body, either wired or wireless, and users were required to perform
designated movements. The recorded accelerometer values were processed and the resulted
features were evaluated using classification algorithms for potential recognition.
The ideas were expanded with the inclusion of additional sensor information. For
example, in [8] a heart rate monitor was coupled with data taken from five accelerometers
to detect physical activities. The team at Intel Research in Seattle and University of
Washington used the multi-modal sensor board (MSB) that had accelerometer, audio,
temperature, IR/visible/high-frequency light, humidity, barometric pressure and digital
compass [9]. They investigated activity recognition classification of physical activities with
multiple MSBs. The group in [10] used a tri-axial accelerometer together with a wearable
camera to recognize human activity. Note that these dedicated accelerometers used in the
above work are capable of producing sampling rate more than 100Hz and are accurate up to
±10G. In our tests, we managed to obtain sampling rates around 60Hz-70Hz using a Nokia
N95 8GB. Most of the above investigations used accelerometer sampling rates from 30Hz
to 50Hz, except for [1] and [2] that sampled the accelerometer data at 76.25Hz and 93Hz
respectively.
The above research has obtained good results with recognition accuracies between 83%
and 99%. These investigations have shown that by analyzing the accelerometer data,
systems were able to provide good recognition. However, these set ups required multiple
accelerometers to be worn and observed. For a typical real user, it can be rather obtrusive
and troublesome to wear multiple sensors or sensor boards at specific positions. This factor
also partly motivated us to use a smartphone as a non-obtrusive sensor source. The
inclusion of other sensors did give a slight improvement on accuracy rate, but two issues
remain outstanding. Firstly, not all additions of different sensors improved accuracy
significantly. Secondly, additional sensors also mean more data to be processed and
computed. Therefore, an appropriate selection of sensors that provide good recognition will
make for the ideal solution.
Recently, the three-dimensional accelerometer integrated in smartphones was also
investigated as a potential sensor for movement recognition. In [11], the accelerometer of a
Nokia N95 was used as a step counter. The results showed that such smartphones can
provide accurate step-counts comparable to some of the commercial, dedicated step counter
products, provided the phone is firmly attached to the body. The DiaTrace project [12] uses
a mobile phone with accelerometers for physical activity monitoring. The proof-of concept
prototype obtained a recognition accuracy of >95% for activity types of resting, walking,
running, cycling and car driving. Brezmes et. al. used also the accelerometer data collected
with a Nokia N95 with K-nearest neighbour algorithm to detect common movements [13].
These investigations mentioned that the accelerometer sampling rate on smartphones are
generally lower as compared to dedicated sensors. The obtained results also showed the
potential of smartphones being used for activity recognition. However, there was no
comparison performed in identifying suitable algorithms, sampling rates, and features.
The results above have laid down a good foundation for the work here. For activity
recognition using a commercial product such as smartphones, one of the many challenges is
to find out the relevant criteria such as algorithm and feature extraction methods that will
enable successful recognition and utilization of such context information.
3. Approach
We would like to investigate the implementation of movement recognition using
smartphones based on the above mentioned approaches and experiences. The usage of a
smartphone as a sensor device has the following advantages:
1. The available sensors are built-in. As long as the desired context can be derived and
recognized from the data of the built-in sensors, the users are not required to use
external sensors in order to collect needed information. In cases where the need occurs,
additional sensors and devices can be interfaced to smartphones to extend necessary
sensors other than the ones built-in. The flexibility and readiness of the smartphone as a
sensor device are seen as advantages.
2. The smartphones of today have many properties that enable context-aware related
implementations. Most smartphones have relatively high processing power and
sufficient memory for data processing tasks. They also contain more than adequate
storage space to store data and computed information. The smartphones also provide
communication possibilities that allow information exchange between user and external
services. The smartphone itself can be seen as a small computing device with common
connectivity integrated.
3. A smartphone is likely to be with a user during his daily activities. It can be seen as a
natural choice of a non-obtrusive device. The chances of users feeling awkward or
uncomfortable will be much lower as compared to approaches that affect the usage
habits of the users.
4. Most smartphones have also relatively long operation durations. For an average user,
under normal usage patterns (some daily phone conversations and text messages), a
smartphone should have at least a day’s operation time before a recharge is required.
With proper management for sensor data polling, a whole day sensor data collection is
achievable.
In this paper, experiments were carried out to investigate movement recognition with
supervised classification algorithms using a smartphone with a built-in accelerometer. As
mentioned earlier, the accelerometers in smartphones are not able to produce high sampling
rates such as those in [1] and [2]. Therefore, applicable combinations of sampling rates and
window sizes for feature extraction should be investigated. The details on the selection of
algorithms, features, sampling rates and window sizes are presented in the next section.
It is also important to know how the location of the smartphone will affect the
recognition accuracy. From the study carried out in [5], it is observed that a large number of
men keep their mobile phones in their trousers, while the women in their bags. Previous
work showed that position around the waist or slightly under the waist (e.g. the trouser
pocket) will bring promising results. Therefore, for our first implementation, we expect the
user to carry out his daily activity with his smartphone kept in a regular position, such as his
trouser pocket or at his waist.
The idea of movement recognition on smartphones should provide value-added benefits
especially for the user. On one hand, it can be used in a context-aware environment to
provide adaptive services. On the other hand, the benefits can also further motivate users to
actively interact with and use the system. Similarly, the selected recognition methods
should also provide good accuracy. Constant falsely recognized contexts will eventually
cause users to stop using the system.
Based on the above requirements, experiments were carried out to investigate the issues
mentioned in Section 1. The following sections will elaborate the experiments carried out
and the obtained results.
4. Experiments
The experiments have the goal to enable movement detection using accelerometer data
collected from a smartphone. A Nokia N95 8GB smartphone was chosen as the test device.
It has the Symbian S60 3rd Edition FP1 operating system and has a built-in triaxial
accelerometer. The highest sampling rate achieved when all three axes readings were
recorded on the phone’s mass storage was around 60Hz-70Hz. The recording was
performed using a script written in Python for S60.
In our experiments we have chosen five common movements, typically also used in the
references of section 2, and which can be observed in daily lives - walking, standing,
sitting, walking up and down the stairs. This choice of movements facilitates a comparison
to previous work, i.e. how well and suitable are smartphones in movement acceleration
measurements.
A Nokia N800 Internet Tablet was used to manually annotate the movements
performed. The test person was required to select a corresponding movement annotation
using a dedicated application on the N800 before the movement was carried out. This gave
an approximation of the duration of each annotated movement. The data is later combined
using a script to create an annotated record of movements with the respective accelerometer
readings. As mentioned by [12], a sampling rate of 32Hz is considered as sufficient for
potential recognition of body movements using accelerometer. Therefore, the accelerometer
raw data was processed to produce simulated samples with sampling rates of 5Hz, 10Hz,
20Hz and 40Hz. The smaller sampling rates were chosen to investigate whether the smaller
rates can produce equally good recognition accuracies.
The sliding window technique was used for feature extraction. Features were then
computed from these samples using different window sizes. Selected window sizes were 5,
10, 20, 40 and 80 samples per window. The combination of different sample rates and
window sizes will represent windows equivalent to 0.5, 1, 2 and 4 seconds. The
experiments used windows with 50% overlap as perform in work such as [1] and [7].
The accelerometer data and the respective annotation were carried out in two settings.
In both settings, the test person performed the selected actions in a sequence (e.g. sitting,
walking, standing, walking, stairs up and walking). In the first setting (S1), the smartphone
was placed in the right trouser pocket at a fixed position. For the second setting (S2), the
smartphone was placed in the same pocket as in S1 but without a fixed position. In this
case, the phone might turn a little when the test person moves around.
The samples collected according to each setting will be classified to see how well can
recognition be achieved respectively. Additionally, data from both settings will be
combined to investigate whether the recognition accuracy will be greatly affected. The
accuracy is determined by calculating the percentage of the number of correctly classified
instances with respect to the total number of instances.
The following sub-sections will elaborate how the classifications were carried out.
4.1 Selection of features
Some common features used in previous work are mean, standard deviation, energy of the
Fast Fourier Transform (FFT), or correlation. We think it is of interest to understand
whether it is still possible to obtain comparable good results with a minimum number of
features. Therefore, we have restricted our selection to the following four features: the
mean and standard deviation of the accelerometer raw data, and the mean and standard
deviation of the FFT components in the frequency domain
Three combinations of features were chosen in the classification evaluation process.
The combination C1 included only average and standard deviation of accelerometer values
of each axis and all three axes. For combination C2, only average and standard deviation of
FFT coefficients of each axis and all three axes were taken in consideration. The
combination C3 included all four features of each axis and all three axes.
4.2 Classification
In order to verify the practicability of the above decisions, the sensor data was classified
using selected supervised classification methods. Decision Tree (DT), Bayesian Network
(BN), Naïve Bayes (NB), K Nearest Neighbour (KNN) and Support Vector Machine
(SVM) were chosen because the previous work showed good results using these algorithms.
On top of these classifiers, we also have added also rule based learner (Jrip) and Sequential
Minimal Optimization (SMO) for the purpose of comparison.
The classification evaluations were carried out using the 10-fold cross-validation
method. This will provide an estimation how well the models will perform generally. The
above classifiers were available in the Weka Toolkit [14]. Each combination of features
was calculated using different combinations of sampling rate and windows size. The
produced features together with the respective movement annotations were then analysed
and evaluated using Weka. The classification evaluation tests were carried on a virtual
machine with a 2.5Ghz CPU and 1GB RAM. In the next section, the results will be
presented and discussed.
5. Results and discussion
The results were analyzed in the following manner: Firstly, the results for both settings
were compared to see which classifiers gave the best recognition accuracies. Secondly, a
comparison of the influence of sampling rates and window sizes was made among the best
results. Thirdly, an evaluation was carried out with the built models to test an additional test
set. Both first and second analyses were done based of the evaluation results from the 10fold cross validation evaluations.
5.1 Classification evaluation and influence of features
The result of classifications showed that C1 and C3 gave best recognition accuracy up to
99.27% (Setting S1) and 96.59% (Setting S2) for all five movements. Among these best
results are the combination of all three axes and the Z-axis. The latter measures the vertical
movement of the test person. Therefore, these two combinations provided better results than
single axis sampling of X- or Y-axis.
By comparing the best results (e.g. as shown in Table 1), we observed that the
combination C1 generally gave very good recognition accuracy for all movements. Among
all classifiers, the KNN gave better accuracy in many combinations. Other algorithms such
as DT, NB, and BN gave also relatively good results.
Table 1: The result of the classifier evaluation for setting S1
Combination
C1 - all axes
C1 - all axes
C1 - all axes
C3 - all axes
C3 - all axes
C3 - all axes
C1 - all axes
C1 - all axes
Algorithm Sampling Rate (Hz)
KNN
20
NB
20
DT
20
KNN
20
DT
20
BN
20
KNN
10
DT
10
Window Size (samples)
80
80
80
80
80
80
40
40
Accuracy (%)
99.27
97.81
97.08
95.62
95.62
95.62
95.42
95.42
Table 2: The result of the classifier evaluation for combined settings
Combination
C1 - all axes
C1 - all axes
C1 - all axes
C1 - all axes
C1 - all axes
C3 - Z-axis
C3 - all axes
Algorithm Sampling Rate (Hz)
KNN
20
Jrip
20
BN
20
KNN
20
NB
20
KNN
20
Jrip
20
Window Size (samples)
80
80
80
40
80
80
80
Accuracy (%)
96.52
93.53
93.53
93.19
93.03
93.03
92.54
For the case where data from both settings were combined, the movement recognition
accuracies were slightly lower. In Table 2, KNN gave rather high accuracy. Again, the
combination C1 gave the best accuracies. The rule learner Jrip performed relatively well
also.
As observed above, the combination C1 gave better results than C2 and C3. In other
words, the mean and standard deviation of accelerometer data are features that can produce
good recognition. The addition of FFT features also obtained good results, and are slightly
lower in terms of accuracy as compared to the combination C1. Processing and recognition
running on the smartphone itself are potentially achievable when the selected features and
algorithms are simple and efficient. In this way, remote processing and evaluation can be
excluded.
5.2 Influence of sampling rates and window sizes
It was observed that the best sampling rate and window size combinations are the
combinations of samples rates 20Hz with 80 samples per window, 10Hz with 40 samples
per window and 40Hz with 80 samples per window. The highest accuracy for above
combinations was 99.27%. No matter if it is setting S1, S2 or the mixture of both, 20Hz
with 80 samples worked best with algorithms such as KNN, BN, NB and Jrip. The samples
with the combinations of 20Hz with 40 samples per window and 5 Hz with 10 samples per
window (2 seconds) gave also good accuracy (93.19% and 91.16% respectively) when
evaluated with the KNN algorithm.
In cases where accelerometers are not able to produce higher sampling rate, we expect
the maximum accuracy to be lower. In our experiments, it was possible to obtain an
accuracy of up to 94.26% (Table 3).
Table 3: The result of the classifier evaluation with sampling rate of 5Hz for setting S1
Combination
C1 - all axes
C3 - all axes
C3 - all axes
C1 - all axes
C1 - all axes
Algorithm Sampling Rate (Hz)
DT
5
Jrip
5
DT
5
KNN
5
KNN
5
Window Size (samples)
20
20
20
20
10
Accuracy (%)
94.26
92.62
92.62
91.80
91.16
The evaluation also indicated that lower sampling rates are able to provide good
accuracies. It is possible to obtain >90% accuracy with sampling rates from 5Hz to 20Hz.
5.3 Accuracy of the built models
The above built models and identified combinations of sampling rate, window size and
algorithms were tested against a new set of test data. Similar sequence of movements was
recorded using the same smartphone and was evaluated using the models built from
experiment setting S1. The results are shown in Table 4. The recognition accuracy of the
built models against the test data was acceptable (up to 91.95%). In most cases, movements
going up and down the stairs had lower accuracy. Other movements were almost 100% for
the best results.
Table 4: The result of the classifier evaluation using test data against
the built model from setting S1
Combination
C1 - Z-axis
C1 - Z-axis
C1 - Z-axis
C1 - Z-axis
C3 - Z-axis
C3 - all axes
C3 - Z-axis
Algorithm
KNN
KNN
NB
NB
KNN
KNN
KNN
Sampling Rate (Hz)
10
20
20
10
20
10
10
Window Size (samples)
40
80
80
40
80
40
40
Accuracy (%)
91.95
90.70
90.70
89.66
89.53
88.64
88.50
For this evaluation, it is observed that the combination C1 with KNN and BN using
sampling rates of 10Hz and 20Hz and window lengths of 4 seconds gave the best accuracy.
This gave an estimation of how well these combinations will perform in a real
implementation.
6. Conclusions
In our work we investigated the possibility of using the accelerometer sensor in a Nokia
N95 8GB smartphone to recognize common movements, i.e. walking, standing, sitting,
walking up and down the stairs. From the results of the evaluation, it is observed that such a
smartphone can provide accurate movement recognition. Contrary to previous work, a
sampling rate of 10Hz and 20Hz is sufficient to achieve good accuracy (>90%) with the
combination of only mean and standard deviation of the accelerometer values. The
combination of sampling rates with window sizes of 2 and 4 seconds gave higher accuracies
than smaller window sizes. In cases where the phone is placed in a trouser pocket, even if
the position is not firmly fixed, it is possible to obtain accuracy up to 96.5% using the
algorithm KNN.
The results from the experiments gave us promising results. It will be interesting to
implement the classification algorithms such as KNN and DT on the smartphone for
learning and real time recognition. Additional movements should also be investigated in
order to cover more aspects of daily activities.
Acknowledgement
The authors would like to acknowledge the German Federal Ministry of Education and
Research (BMBF) for funding the project MATRIX (Förderkennzeichen 01BS0802). The
authors are responsible for the content of the publication.
This research has been supported by the VENUS project, which is a research project of
Kassel University, funded by the State of Hesse as part of the program for excellence in
research and development (LOEWE). For additional information please go to:
http://www.iteg.uni-kassel.de/.
References
[1] L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” Pervasive 2004, pp.
1–17, April 2004.
[2] N. Kern, B. Schiele, and A. Schmidt, “Recognizing context for annotating a live life recording,” Personal
Ubiquitous Comput., vol. 11, no. 4, pp. 251–263, 2007.
[3] K. V. Laerhoven and O. Cakmakci, “What shall we teach our pants? ,” in ISWC ’00: Proceedings of the
4th IEEE International Symposium on Wearable Computers, (Washington, DC, USA), p. 77, IEEE
Computer Society, 2000.
[4] J. Mantyjarvi, J. Himberg, and T. Seppanen, “Recognizing human motion with multiple acceleration
sensors,” in Systems, Man, and Cybernetics, 2001 IEEE International Conference on, vol. 2, pp. 747–752
vol.2, 2001.
[5] Y. Cui, J. Chipchase, and F. Ichikawa, “A cross culture study on phone carrying and physical
personalization,” in HCI (10) (N. M. Aykin, ed.), vol. 4559 of Lecture Notes in Computer Science, pp.
483–492, Springer, 2007.
[6] C. Frank, P. Bolliger, F. Mattern, and W. Kellerer, “The sensor internet at work: Locating everyday items
using mobile phones,” Pervasive and Mobile Computing, vol. 4, pp. 421–447, June 2008.
[7] N. Ravi, N. Dandekar, P. Mysore, and M. L. Littman, “Activity recognition from accelerometer data,”
American Association for Artificial Intelligence, 2005.
[8] E. M. Tapia, S. S. Intille, W. Haskell, K. Larson, J. Wright, A. King, and R. Friedman, “Real-Time
recognition of physical activities and their intensities using wireless accelerometers and a heart rate
monitor,” in Wearable Computers, 2007 11th IEEE International Symposium on, pp. 37–40, 2007.
[9] J. Lester, T. Choudhury, and G. Borriello, “A practical approach to recognizeing physical activities,” in
Proceedings of the 4th International Conference, PERVASIVE 2006, Dublin, Ireland, May 7-10, 2006.
[10] Y. Cho, Y. Nam, Y. Choi, and W. Cho, “SmartBuckle: human activity recognition using a 3-axis
accelerometer and a wearable camera,” in Proceedings of the 2nd International Workshop on Systems
and Networking Support for Health Care and Assisted Living Environments, (Breckenridge, Colorado),
pp. 1–3, ACM, 2008.
[11] M. Mladenov and M. Mock, “A step counter service for java-enabled devices using a built-in
accelerometer,” in CAMS ’09: Proceedings of the 1st International Workshop on Context-Aware
Middleware and Services, (New York, NY, USA), pp. 1–5, ACM, 2009.
[12] G. Bieber, J. Voskamp, and B. Urban, “Activity recognition for everyday life on mobile phones,” in HCI
(6) (C. Stephanidis, ed.), vol. 5615 of Lecture Notes in Computer Science, pp. 289–296, Springer, 2009.
[13] T. Brezmes, J. Gorricho, and J. Cotrina, “Activity recognition from accelerometer data on a mobile
phone,” in Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part
II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient
Assisted Living, (Salamanca, Spain), pp. 796–799, Springer-Verlag, 2009.
[14] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The weka data mining
software: An update,” SIGKDD Explorations, vol. 11, 2009.