Forensic Analysis of Whatsapp Messenger On Android Smartphones
Forensic Analysis of Whatsapp Messenger On Android Smartphones
Forensic Analysis of Whatsapp Messenger On Android Smartphones
on Android Smartphones
arXiv:1507.07739v1 [cs.CR] 28 Jul 2015
Cosimo Anglano
DiSIT - Computer Science Institute,
Universitá del Piemonte Orientale, Alessandria (Italy)
email:cosimo.anglano@uniupo.it
Abstract
We present the forensic analysis of the artifacts left on Android
devices by WhatsApp Messenger, the client of the WhatsApp instant
messaging system. We provide a complete description of all the arti-
facts generated by WhatsApp Messenger, we discuss the decoding and
the interpretation of each one of them, and we show how they can be
correlated together to infer various types of information that cannot
be obtained by considering each one of them in isolation.
By using the results discussed in this paper, an analyst will be
able to reconstruct the list of contacts and the chronology of the mes-
sages that have been exchanged by users. Furthermore, thanks to the
correlation of multiple artifacts, (s)he will be able to infer informa-
tion like when a specific contact has been added, to recover deleted
contacts and their time of deletion, to determine which messages have
1
been deleted, when these messages have been exchanged, and the users
that exchanged them.
1 Introduction
The introduction of sophisticated communication services over the Internet,
allowing users to exchange textual messages, as well as audio, video, and
image files, has changed the way people interact among them. The usage
of these services, broadly named instant messaging (IM), has undoubtedly
exploded in the past few years, mainly thanks to the pervasiveness of smart-
phones, that provide quite sophisticated IM applications. Smartphones in-
deed enable users to exploit their data connection to access IM services any-
where and anytime, thus eliminating the costs usually charged by mobile
operators for similar services (e.g., for SMS communication).
Given their popularity, IM services are being increasingly used not only
for legitimate activities, but also for illicit ones [20]: criminals may indeed use
them either to communicate with potential victims, or with other criminals
to escape interception [3]. Therefore, IM applications have the potential of
being a very rich source of evidentiary information in most investigations.
Among IM applications for smartphones, WhatsApp [24] is accredited to
be the most widespread one (reportedly [25], it has over 400 million active
users that exchange, on average, more than 31 billion messages per day, 325
millions of which are photos [12]). Given its recent acquisition by Face-
book, it is reasonable to expect a further growth in its diffusion. Therefore,
the analysis of WhatsApp Messenger, the client of WhatsApp that runs on
smartphones, has recently raised the interest of the digital forensics commu-
nity [19, 10, 21].
In this paper we deal with the forensic analysis of WhatsApp Messenger
on Android smartphones. Android users, indeed, arguably represent the
largest part of the user base of WhatsApp: as of Jan. 2014, Google Playstore
reports a number of downloads included between 100 and 500 millions (the
lower limit having been already hit in Nov. 2012), out of a population of 400
millions of users. Thus, by focusing on the Android platform, we maximize
the potential investigative impact of our work.
Several works, appeared recently in the literature [19, 10] deal with the
same problem. However, as discussed later, these works are limited in scope,
as they focus on only the reconstruction of the chronology of exchanged
2 Related works
The forensic analysis of IM applications on smartphones has been the subject
of various works published in the literature.
Compared with existing works, however, our contribution (a) has a wider
scope, as it considers all the artifacts generated by WhatsApp Messenger
(namely, the database of contacts, the log files, the avatar pictures, and the
preference files), (b) presents a more thorough and complete analysis of these
artifacts, and (c) explains how these artifacts can be correlated to deduce
various type of information having an evidentiary value, such as whether a
message has been actually delivered to its destination after having been sent,
if a user joined or left a group chat before or after a given time, and when a
given user has been added to the list of contacts.
[8] focus on the forensic analysis of three IM applications (namely AIM,
Yahoo! Messenger, and Google Talk) on the iOS platform. Their work
differs from ours for both the IM applications and the smartphone platform
it considers.
users are still blocked, and which ones have been instead unblocked. It is
worth pointing out that the above inferences can be made only if the log files
reporting blocking and unblocking events are still available (i.e., they have
not been deleted by WhatsApp Messenger to create room for newer ones).
As a final consideration, we note that no information whatsoever is stored
on the side of the contact that gets blocked, so it is not possible to tell whether
the user of the device under analysis has been blocked or not by anyone of
his/her contacts.
By examining these records, we note that (a) all the messages have been
exchanged with the same contact 39348xxxxxxx (they all store the same What-
sApp ID in the key remote id field), (b) the conversation has been started
by that contact (key from me = ’0’ in record no.1) with a textual message
whose content was “Message 1” (field data) on Feb. 13th, 2012 06:59:09
(field received timestamp), and (c) the device owner replied at 07:00:23 of
the same day (field timestamp) with the message corresponding to record no.
2 (key from me=’1’) with content “Reply 1” (field data). The conversation
then continued with another message-reply exchange.
From these records, we also note that each message carries its own unique
identifier in the key id field: this value, set by the sender, is obtained by con-
catenating the timestamp corresponding to the last start time of WhatsApp
Multimedia files When the user sends a multimedia file, several activities
take place automatically (i.e., without informing the involved users). First,
WhatsApp Messenger copies the file into the folder listed in Table 1, row 8.
Then, it uploads the file to the WhatsApp server, that sends back the URL
of the corresponding location. Finally, the sender sends to the recipient a
message containing this URL and, upon receiving this message, the recipient
sends an acknowledgment back to the sender.
When these steps have been completed, the sender stores into his/her
messages table a record like the one shown in Fig. 4 (where we show only
the fields related to message contents that are listed in Table 4). As can
be seen from the above figure, the type of the file is indicated (besides the
wa media type field, not shown in the figure) by the media mime type field
(’image/jpeg’ in the example). Its name is instead stored in the media name
field (IMG-20131021-WA0000.jpg in the example), its size in bytes by me-
dia size (40267 in the example), and its thumbnail in the raw data field (as a
blob, i.e. a binary large object). Furthermore, the media url field stores the
URL of the location on the central server where the file has been temporar-
ily stored, whose last part (highlighted in Fig. 4 by framing it) corresponds
to the name given by the server to that file. Finally, the base64-encoded
SHA-256 hash of the transmitted file is stored in the media hash field.
to those stored by the sender (in particular, wa media type, media mime type,
media size, raw data, and media hash). Conversely, the contents of media url
is different, except for the name given to the file by the server (highlighted
in Fig. 5 by framing it).
Unlike the sender, however, the media name field is empty, so the local
name given by WhatsApp Messenger to that file is unknown. The file can
be however identified by comparing the SHA-256 hash stored into the corre-
sponding record with that of all the files that have been received (that are
Contact cards Messages carrying contacts cards (extracted from the phone-
book of the sender) correspond – both on the sender and on the recipient
side – to messages record that store the transmitted information (in VCARD
format) into the data field, and the name given by the sender to that contact
in the media name field. An example of such a record is shown in Fig. 6.
actually delivered to its recipients. As a matter of fact, after the user has
pushed the “send” button of WhatsApp Messenger, the message can be in one
of the following three states: (a) waiting on the local device to be transmitted
to the central server, or (b) stored on the central server but waiting to be
transmitted to its recipient(s), or (c) delivered to its recipient(s).
The ability to distinguish the various states of a message may be crucial
in an investigation where it must be ascertained whether a message has been
actually delivered or not to its destinations, and when such a delivery has
taken place.
The current state of a message, as well as the times of its state changes,
can be understood by correlating the values contained in several fields of
the corresponding record of the sender database 3 , namely status, timestamp,
received timestamp, receipt server timestamp, and receipt device timestamp.
To explain, let us consider a scenario in which a user sends a message
when both him/her and the recipient are off-line (Fig. 8(a)), then the sender
gets reconnected to the network while the recipient is still offline (Fig. 8(b)),
and then, finally, also the recipient gets connected (Fig. 8(c)).
When the message is sent, a record is stored in the messages table of
the sender, even if the central server is unreachable. In this case, as shown
in Fig. 8(a), in this record we have that status=’0’, timestamp=’x’, and re-
ceived timestamp=’y’, where ’x’ and ’y’ correspond to when the user has
sent the message and when the record has been added to the chat database,
respectively.
Thus, a record such that key from me=’1’ and status=’0’ corresponds to
a message that has not been delivered to the central server yet.
3
For a recipient, a message can be only in the received state, corresponding to status=’0’
Later, when the sender returns on-line, the message is forwarded to the
central server that replies with an ack. When this ack is received, the sender
updates the corresponding record as shown in Fig. 8(b) by setting status=’4’,
and the value of receipt server timestamp to the reception time of the ack.
Thus, a record such that key from me=’1’ and status=’4’ corresponds to
a message that has been delivered the central server, but not yet to its desti-
nation(s).
Finally, when the recipient returns on line, it receives the message from
the central server, and sends an ack to the sender. Upon receiving this ack,
the sender updates again the record corresponding to that message (as shown
in Fig. 8(c)) by setting status=’5’, and the value of receipt device timestamp
to the reception time of the ack.
Thus, a record such that key from me=’1’ and status=’5’ corresponds to
a message that has been delivered to its destination.
From the above discussion, it follows that the times of the state changes of
a message can be tracked by means of the values stored in the various times-
tamp fields of the corresponding record. For instance, in the case in Fig. 8(c),
we can deduce that the message has been generated on Oct. 16th, 2013
14:15:37.884 (timestamp field), has been waited to be transmitted to the cen-
tral server until 14:17:05.551 of the same day (receipt server timestamp field),
and has been finally delivered to its recipient at 14:21:59.135 (receipt device timestamp
field).
5 Conclusions
In this paper we have discussed the forensic analysis of the artifacts left
by WhatsApp Messenger on Android smartphones, and we have shown how
these artifacts can provide many information of evidentiary value. In partic-
ular, we have shown how to interpret the data stored into the contacts and
chat databases in order to reconstruct the list of contacts and the chronology
of the messages that have been exchanged by users.
More importantly, we have also shown the importance of correlating
among them the artifacts generated by WhatsApp Messenger in order to
gather information that cannot be inferred by examining them in isolation.
As a matter of fact, while the analysis of the contacts database makes it
possible to reconstruct the list of contacts, the correlation with the events
stored into the log files maintained by WhatsApp Messenger allows the in-
vestigator to infer also when a specific contact has been added, or to recover
deleted contacts and their time of deletion. Similarly, the correlation of the
References
[1] AccessData Corporation. FTK Imager, 2013. Available at
http://www.accessdata.com/support/product-downloads.
[3] Steven M. Bellovin, Matt Blaze, Sandy Clark, and Susan Landau. Law-
ful Hacking: Using Existing Vulnerabilities for Wiretapping on the Inter-
net. In Proc. of Privacy Legal Scholars Conference, June 2013. Available
at http://dx.doi.org/10.2139/ssrn.2312107.
[21] Yu-Cheng Tso, Shiuh-Jeng Wang, Cheng-Ta Huang, and Wei-Jen Wang.
iPhone Social Networking for Evidence Investigations Using iTunes
Forensics. In Proceedings of the 6th International Conference on Ubiqui-
tous Information Management and Communication, ICUIMC ’12, New
York, NY, USA, 2012. ACM.
[22] Petr Vanek and Kamil Les. Sqlite Databases Made Easy, 2013. Available
at http://sqliteman.com.
[23] Timothy Vidas, Chengye Zhang, and Nicolas Christin. Towards a Gen-
eral Collection Methodology for Android Devices. Digital Investigation,
8, Aug 2011.
[25] Rolfe Winkler. WhatsApp Hits 400 Million Users, Wants to Stay In-
dependent. The Wall Street Journal - Digits, Oct. 2013. Available
at http://blogs.wsj.com/digits/2013/12/19/whatsapp-hits-400-million-
users-wants-to-stay-independent.