Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
44 views9 pages

CAP: A Context-Aware Privacy Protection System For Location-Based Services

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 9

CAP: A Context-Aware Privacy Protection System for Location-Based Services

Aniket Pingley Wei Yu Nan Zhang Xinwen Fu Wei Zhao


George Washington Univ. Cisco Systems George Washington Univ. UMass Lowell Rensselaer Polytechnic Institute
apingley@gwu.edu weyu@cisco.com nzhang10@gwu.edu xinwenfu@cs.uml.edu zhaow3@rpi.edu

Abstract WHERE STARRATING = 4


ORDER BY DISTANCE(Hotel.Location, userLoc) ASC;
We address issues related to privacy protection in where userLoc is the user’s location. Note that the user’s
location-based services (LBS). Most existing research in this location is specified as a constant in the ranking function
field either requires a trusted third-party (anonymizer) or and should be sent along with the query to the LBS server.
uses oblivious protocols that are computationally and com- Despite the benefits provided by LBS, a user may not be
municationally expensive. Our design of privacy-preserving willing to provide its current location to the LBS server
techniques is principled on not requiring a trusted third- due to concerns on location privacy. Such concerns can
party while being highly efficient in terms of time and be attributed to the seriousness of location disclosure and
space complexities. The problem has two interesting and misuse: For example, an adversary may learn a user’s
challenging characteristics: First, the degree of privacy political and religious affiliations based on the locations the
protection and LBS accuracy depends on the context, such user regularly visits. In recent years, there have been several
as population and road density, around a user’s location. reports on the abuse of LBS by individuals and companies
Second, an adversary may violate a user’s location privacy to intrude others’ privacy [16].
in two ways: (i) based on the user’s location information The objective of a privacy-preserving LBS is to protect
contained in the LBS query payload, and (ii) by inferring the privacy of a user’s location while maintaining a high
a user’s geographical location based on its device’s IP level of LBS accuracy (e.g., the rank of a 4-star hotel in
address. To address these challenges, we introduce CAP, the above example). It has received growing attention from
a Context-Aware Privacy-preserving LBS system with in- the research community. A k-anonymity based framework
tegrated protection for data privacy and communication was proposed to protect location privacy by using a trusted
anonymity. We have implemented CAP and integrated it with third-party called the anonymizer [11]. With this framework,
Google Maps, a popular LBS system. Theoretical analysis a user sends its location to the centralized anonymizer, which
and experimental results validate CAP’s effectiveness on pri- subsequently generates a k-anonymized [22] cloaking region
vacy protection, LBS accuracy, and communication Quality- that covers not only this user, but also k − 1 other users.
of-Service. Then, the anonymizer transmits the cloaking region to the
LBS server as the constant in the LBS query, and forwards
1. Introduction the query answer to the user. This framework prevents the
LBS server from distinguishing a user among at least k − 1
Location-based service (LBS) provides a user with con- others.
tents customized by the user’s current location, such as the Unfortunately, in real systems, it may be difficult, if not
nearest restaurants/hotels/clinics, which are retrieved from impossible, to find a trusted third-party anonymizer, espe-
a spatial database stored remotely in the LBS server. LBS cially one which has a large user base to shrink the cloaking
not only serves individual mobile users, but also plays an region for better LBS privacy. To the best of our knowledge,
important role in public safety, transportation, emergency the only existing work which removes the requirement of a
response, and disaster management. With an increasing trusted third-party is a private information retrieval (PIR)-
number of mobile devices featuring built-in Global Position- based approach [7]. Nonetheless, this approach has two
ing System (GPS) technology, LBS has experienced rapid critical drawbacks. First, it can only be applied to LBS
growth in the past few years. According to the ABI research servers which support the PIR-based protocol. Second, as
report [19], the number of GPS-enabled LBS subscribers is a common problem for PIR-based techniques, it may incur
projected to reach 315 million by 2013. high computational and communication overhead unafford-
A request for LBS can be considered a query over the able to mobile devices and the LBS server1 .
LBS server’s spatial database. For example, a query for the
ten nearest four-star hotels can be expressed as the following 1. It was shown that PIR may incur even higher communication overhead
than an oblivious transfer of the entire server-side database [21]. Such a cost
SQL-like top-k query: may be prohibitive for the LBS server if it needs to process concurrently
SELECT TOP 10 FROM Hotel a large number of LBS queries.
In this paper, we initiate the investigation of a privacy- LBS systems.
preserving technique that is efficient in terms of both time In the design of CAP, we also initiate an investigation
and space complexities, does not require a trusted third- of the network anonymity perspective of location privacy.
party, and is transparent to the LBS server so that it can Existing work has shown that a user’s location may be
be readily deployed with existing LBS systems. Such a derived from its IP address based on public information
technique may have to make a tradeoff between privacy about base stations’ locations and IP addresses [6]. For
protection and LBS accuracy. Nonetheless, it should provide example, when 802.11b base stations are used, the user may
effective guarantees on both measures. be positioned within a small radius of 50 meters. As such,
A straightforward method for efficient privacy protection without a trusted third-party anonymizer, location privacy
is to randomly perturb a user’s location based on pre- may be breached through not only an LBS query, but also
determined noise distributions on longitude and latitude. the traffic that carries the query. To address this problem,
This method is, in principle, similar to the randomization we use Tor [4], a popular anonymous routing network, to
approach for privacy-preserving data mining [24]. Nonethe- hide a user’s IP address. Unfortunately, we found that Tor
less, it is unlikely to suffice for LBS because, with a pre- suffers from serious Quality-of-Service (e.g., response time)
determined noise distribution, the levels of privacy protec- degradation which may be unbearable for mobile (e.g., driv-
tion and LBS accuracy largely depend on the “context”, such ing) applications that require short response time. To solve
as road and population density, around a user’s location. For the problem, we present a set of new routing algorithms for
example, intuition suggests that, to achieve the same level of Tor which reduce latency and maximize throughput.
privacy and LBS accuracy, a user should (or could) deviate To the best of our knowledge, CAP is the first real
more from its real location in a rural area than in downtown. privacy-preserving LBS system that provides an efficient and
Thus, a critical challenge for privacy-preserving LBS context-aware solution for both data privacy and communi-
is to achieve context-aware privacy protection. The ex- cation anonymity without the presence of a trusted third-
isting k-anonymity framework does so by leveraging the party. We have implemented CAP in both SUSE Linux
anonymizer’s global knowledge of user distribution (so that 11.0 and Mac OSX Operating Systems, and are porting
the cloaking region is automatically larger in a rural area the system to Linux and OSX-based mobile devices. More
which has fewer users). Without a trusted third-party, we information about the system implementation can be found
must acquire the context information from other sources. at http://seas.gwu.edu/∼nzhang10/cap.
A simple solution is for each mobile device to store a The remainder of the paper is organized as follows. In
complete topology map and retrieve it before perturbation Section II, we formally specify the problem and present
to compute the adjacent area’s context. However, this may the architecture of CAP. Section III is devoted to the devel-
lead to computational and storage overhead unaffordable opment of VHC-mapping. In Section IV, we discuss other
to mobile devices that are not designated GPS navigation design issues of CAP, including the anonymous routing.
systems. Section V contains a detailed experimental evaluation of
In this paper, we introduce CAP, a Context-Aware CAP. Section VI discusses the related work. We conclude
Privacy-preserving LBS system. The main idea behind CAP in Section VII.
is a dimension-reducing projection of every 2-d geographical
location to a 1-d space, such that (i) every point in the 1-d 2. System Overview of CAP
space has homogeneous context (e.g., equal road/population
In this section, we present an overview of CAP, our
density), and (ii) adjacent locations remain close after the
context-aware privacy-preserving LBS system. The focus is
projection. We refer to such a projection as a Various-grid-
on the system infrastructure of CAP and its performance
length Hilbert Curve (VHC)-mapping. With CAP, a user first
measures.
projects its current location to the 1-d space based on VHC-
mapping, and then randomly perturbs the 1-d value based 2.1. Parties
on a pre-determined noise distribution. The perturbed value
is mapped back to the 2-d space according to VHC-mapping There are two parties in the system: a user who uses
and then transmitted as the user’s location to the LBS server. the LBS and a server which provides it. In practice, a
VHC-mapping is designed to provide guarantees on both user may be a mobile device, such as a laptop, PDA, cell
privacy protection and LBS accuracy. It is also very effi- phone, etc, which obtains its location from a positioning
cient in terms of both time and space complexities: The device such as a GPS receiver. Examples of LBS server
VHC-map itself is computed offline based on a real-world include point-of-interest search engines such as Google
topology map, but only costs minimal storage space (e.g., Maps (http://maps.google.com).
our experiments use a VHC-map which is only 1/2000 The interactions between the two parties can be stated
the size of a topology map) and retrieval cost. The usage as follows: The user issues an LBS query to the server.
of perturbation technique ensures transparency to the LBS The LBS query is a top-k query with ranking function
server, and enables CAP to be readily integrated into existing specified as the distance to the user’s current location. After
receiving the LBS query, the server executes it against a the extra distance driven according to the returned tuples),
spatial database and returns the answer to the user. etc [1]. In the theoretical analysis part of this paper, we
Due to privacy concerns, the user is unwilling to disclose adopt rank distance as the accuracy measure. Nonetheless,
its location to the server. Thus, the user’s objective is to in the experimental results, we shall evaluate other possible
obtain the relatively accurate LBS query answer without dis- measures such as the true positive rate.
closing its real location. The server is supposed to correctly
Definition 2.1. The average rank distance of a privacy-
answer the received LBS query. Besides, the objective of
preserving scheme that perturbs userPos from x to R(x)
a malicious server is to compromise the user’s location. In
is
this paper, we refer to a malicious server as an adversary.
lr (x) = AV Gt∈q(x) (|rank(t, q(x)) − rank(t, q(R(x))|).
2.2. System Architecture
where AV G(·) represents the average value, q(x) is the LBS
Figure 1 illustrates the baseline architecture of CAP. query answer when userPos = x, and rank(t, q(x)) is the
Recall that there are two possible ways for a user’s location rank of tuple t in the returned answer q(x).
to be disclosed: through the location information included
in the LBS query, or through the user’s network (e.g., IP)
2.3.2. Privacy Measure. Our privacy measure is principled
address. CAP has two components, location perturbing and
on the same anonymity standard as k-anonymity. The differ-
anonymous routing, principled on eliminating these two
ence, however, is that our system does not feature a trusted
disclosure channels, respectively.
third-party which has a global view of active users. Thus,
The location perturbing component perturbs the user’s our measure is defined over the population among which
location included in the LBS query. It also rearranges the the user is hidden. This is similar to the usage of historic
results returned by the LBS server based on the original footprints of active users for the k-anonymity definition in
user location, in order to provide better data utility. The [23].
anonymous routing component hides the user’s network
identity by routing the LBS query through relaying nodes in Definition 2.2. A privacy-preserving scheme which perturbs
an anonymous communication network, Tor, before sending userPos from x to R(x) satisfies N -camouflage iff there
it to the LBS server. exists a region C of population at least N , such that for any
subregion C 0 ⊆ C, Pr{x ∈ C 0 |R(x)} = |C 0 |/|C|, where | · |
 
is the area of the region.
Positioning
Device
Location Coordinates
According to the definition, a privacy-preserving scheme
  Perturbed
Anonymous
satisfies N -confidentiality iff no adversary can distinguish
  Location Location
  Perturbing Routing between any two locations in a region of population N .
Component   Component
CAP
2.3.3. QoS Measure. Since an LBS user may be constantly
  Anonymous
Communication moving, the overhead of LBS query processing is important
Network: Tor
for the utility of LBS. Such an overhead is a combination of
LBS three parts: the location perturbing component, the random
routing protocol of anonymous routing network, and the
Figure 1. Baseline Architecture of CAP
query processing at the LBS server. Since CAP is transparent
to the LBS server, we discuss the first two parts in the paper.
2.3. Performance Measures
3. Location Perturbing Based on VHC-
The performance of a privacy-preserving LBS system Mapping
should be measured in terms of the accuracy of LBS
query answer, the privacy protection of user’s location, and We focus on the location perturbing component of CAP
the communication quality-of-service (e.g., query response in this section. We begin with introducing our basic ideas,
time). We define these three measures respectively, as fol- and then substantiate the ideas by describing VHC-mapping,
lows. our main technique for this component.

3.1. Key Idea


2.3.1. Accuracy Measure. Since an LBS query is essen-
tially a top-k query over a spatial database, we consider Recall that the location perturbing component perturbs a
accuracy measures for top-k queries. A number of measures user’s position included in an LBS query before sending
have been proposed, including rank distance (i.e., the differ- the query to the LBS server. The objective is to provide
ence between the returned and the true rank of a returned “context-aware” perturbation without incurring the cost of
tuple), true positive rate (i.e., the probability that a tuple in storing and retrieving a full-scale topology map in a mobile
the result is indeed a true top-k tuple), score distance (e.g., device. Our key idea is to pre-compute a projection from
the original space (of latitude and longitude) to a new space, Min-Density Rule: Partition a cell into 4 equal-size
such that subcells iff the total road length (in the original space)
• the projection is locality-preserving i.e., two nearby covered by the cell is at least µ times the edge length of
points in the original space are also close in the the cell, where µ > 1 is a pre-determined granularity ratio.
projected space, and vice-versa, An example of the partitioning result is shown in Fig-
• all points in the new space have homogenous “context” ure 2(b). One can see that the base cells have three possible
i.e., population density, and sizes. According to the min-density rule, a larger base cell
• the projection must be stored with space orders of represents an area with lower road density.
magnitude smaller than the topology map, and can be After the partitioning process, we construct the mapped 1-
efficiently computed. d space as a variation of the Hilbert space-filling curve [17]
After projecting a user’s location to the new space, we to connect all various-size cells in the original 2-d space.
apply homogeneous perturbation to all mapped points in Figure 2(b) depicts an example of such a Hilbert curve, while
the new space, project the perturbed points back to the Figure 2(c) demonstrates a real implementation on the map
original 2-d space, and then output the result as the perturbed of Baltimore, MD with granularity ratio µ = 20.
location. The VHC-mapping is then constructed as follows: A 2-d
point in the original space is mapped to its (geographically)
Figure 2(a) provides a simple illustration of the projection
nearest point on the Hilbert curve. A 1-d point, after being
on 1-d data, where the population density is defined based
perturbed by additive noise, is mapped back to the original
on 6 people A to F . In the original space, the population
space by randomly selecting a 2-d point which can be
density near B, C, or D is higher than A, E, or F . The
mapped to the 1-d perturbed point.
mapping is designed such that every point in the new space
has equal density. Thus, the same noise applied to B, C,
Original Projected
or D will become smaller after being mapped back to the
original space. This is consistent with our intuition that, in A A'
B'
order to provide universal privacy and accuracy guarantees B
C
D C'
for all locations, less perturbation should be applied a higher- D'
E
density area. E'
F F'
3.2. VHC-Mapping (a) 1-d Example (b) Illustration (c) Baltimore, MD

We now introduce Various-size-grid Hilbert Curve Figure 2. Examples of VHC-Mapping


(VHC)-mapping, our main technique for the projection to
homogeneous-context space. We will first describe the con-
struction of VHC-mapping, and then discuss how it satisfies 3.2.2. Justification. We now explain how VHC-mapping
the above-mentioned three conditions. satisfies the three requirements we outlined in Section 3.1: (i)
locality-preserving, (ii) constant density, and (iii) efficiency
3.2.1. Construction of VHC-mapping. The construction of storage and retrieval.
of VHC-mapping must refer to context information such as First, a well-known property of Hilbert curve is locality
road or population density. In the design of CAP, we choose preserving, e.g., two adjacent points in the projected space
road density as input because (i) economic studies show are likely to be close in the original space. Thus, VHC-
that road and population densities are strongly correlated, mapping satisfies the locality-preserving requirement.
following (approximately) a linear relationship [8], and (ii) Next, for the constant-density requirement, there are two
in practice, road density information is readily available2 key observations: First, due to the min-density rule, the total
and usually more accurate than population information. road length covered by each base cell is at most µ times the
Nonetheless, our design of VHC-mapping can be easily edge length of the cell. Second, due to our construction of
adapted to population density. the VHC, the length of the Hilbert curve covered by a base
Without loss of generality, we consider the original 2- cell is approximately the same as the edge length of the cell.
d latitude/longitude space as a square. VHC-mapping in- As such, intuitively, every point on the Hilbert curve (i.e.,
volves a recursive partitioning of the square into various-size in the projected space) can be considered as corresponding
cells according to context information. Each cell is either to about µ points on the roads in the original space. Thus,
partitioned into 4 equal-size square cells, or not (further) the road density is approximately constant for all points
partitioned (i.e., becomes a base cell), based on the following in the projected space. This fulfills the constant-density
rule: requirement.
We now consider the third requirement on the efficiency
2. To calculate the road density of an area, we use the information pro- of storing and conducting VHC-mapping. VHC-mapping
vided by the by the US Census Bureau Topological Integrated Geographic
Encoding and Referencing (TIGER) system which contains information can be stored as a 4-tree based on the partitioning of the
about roads for every county in the US. original space, where each node is either a leaf node (if
corresponding to a base cell) or has 4 children (if further to represent the concatenation operation. We partition the
partitioned). Figure 3(b) depicts an example of such a 4- original map based on the min-density rule (Line 3) and
tree for the VHC-mapping in Figure 3(a). One can see from store the 4-tree into a bit stream B UILD T REE(C) (Line 6).
the figure that base cells of different sizes are corresponding
to leaf nodes at different layers of the tree. Algorithm VHC-Perturb: Online Location Perturbation
Require: Pre-computed VHC-mapping file hcF ile
0
1: Load a 4-tree T of the partition from hcF ile and assign
1 2
4 3
7
8
the 1-d value range for each base cell.
0101
5 6 8 13 2: Wait until receiving userP os for perturbation.
01111111
9 10 5 6 7 9 10 11 12 3: Find the mapped value F (userP os) based on the 1-d
13
12 11
1 2 3 4
1111 value range of the base cell which contains userP os.
4: Generate random noise r according to uniform distribu-
(a) (b) (c)
tion on [−σ, σ].
Figure 3. 4-Tree for Storage of VHC-Mapping 5: Compute R(userP os) = F −1 (F (userP os) + r) by
searching for the base cell which contains 1-d value
Since each node either is a leaf node or has 4 children,
F (userP os) + r. Output R(userP os).
we only need to store 1-bit information to indicate whether
6: Goto 2
it is a leaf. Figure 3(c) shows an example of such encoding
scheme for the tree in Figure 3(b). Since a 4-tree with n leaf
nodes has at most 4n/3 (total) nodes, the space required by Algorithm VHC-Perturb depicts the online retrieval of
the serialized map file is at most 4n/3 bits. VHC-mapping for the perturbation of a user’s loca-
Based on the 4-tree, VHC-mapping can be retrieved and tion. Given a 2-d location userP os, we map it to 1-d
used as follows: First, we reconstruct the 4-tree from the point F (userP os), add a homogeneous noise r, and use
serialized map file and traverse every leaf node to assign its F −1 (F (userP os) + r) as the perturbed location. Note that
corresponding range in the 1-d projected space. In particular, r is generated from a pre-determined distribution.
a leaf node at level i is corresponding to a range of length Algorithm VHC-Build is executed offline and has compu-
d/2i where d is the edge length of the entire map. This step tational complexity of O(n). The computational complexity
has time complexity of O(n). Then, we can conduct VHC- of Algorithm VHC-Perturb is O(n) for the retrieval of VHC-
mapping by searching for the corresponding leaf node (i.e., mapping file (i.e., Line 1) and O(log n) for the perturbation
base cell) of the original 2-d location. The time complexity of each location.
is O(log n). The inverse mapping of a 1-d location in
the projection space back to the original space can be 4. Discussion
done through a binary search on all leaf nodes. The time
In a practical LBS, a mobile’s request should be served
complexity is O(log n).
in a timely fashion. Otherwise, it may no longer be useful
3.3. Algorithms for VHC-Mapping when the mobile has already left the location where the
request was made. Recall from Figure 1, the anonymous
We now present two detailed algorithms for our approach:
communication network also contributes to the overhead of
One is the offline construction and storage of VHC-mapping.
LBS query processing. We now discuss how to tune up the
The other is the online retrieval of VHC-mapping and the
communication QoS of the anonymous routing component.
perturbation of a user’s locations.
In CAP, Tor [4] is used for anonymous communication
Algorithm VHC-Build: Offline Construction between clients and servers. The challenge of tuning up
Require: Map, C as the (rectangle) boundary of the map Tor for an LBS system is how to optimize its QoS while
1: Store CkB UILD T REE (C) as the HC-mapping file. preserving anonymity. Tor has suffered serious performance
2: function B UILD T REE (C) degradation because of its random path selection algorithms
3: if total road length in C ≥ µ· edge length of C then [18]. Tor is an overlay network on the Internet providing
4: Partition C equally into Cnw , Cne , Cse , Csw . anonymous communication. Within the Tor network, to
5: for i = nw, ne, se, sw do browse a web server while hiding the connection, a client
6: return 0kB UILD T REE(Ci ) chooses a series of Tor routers from the Tor router directory.
7: end for The sequence of ordered Tor routers is denoted as path.
8: else The number of Tor routers is the path length. The client
9: return 1 negotiates session keys with the chosen routers, one by one,
10: end if using the Diffie-Hellman handshake protocol and forms a
11: end function circuit.
The client packs application data into cells that are trans-
Algorithm VHC-Build depicts the offline construction mitted over the circuit. Therefore, a set of sequential TCP
and storage of VHC-mapping. In the algorithm, we use k connections are used to relay packets from the source to the
destination. Since Tor routers use donated bandwidth from Algorithm 2 Differentiated Routing with Congestion Avoid-
users, who may limit it using the leaky bucket mechanism, ance (Diff/CA)
the end-to-end throughput will be limited by the bottleneck Require: User specified minimum path throughput capacity
segment [13]. We found that despite Tor’s weighted band- M inBW and tolerable throughput T olBW .
width path selection algorithms, there is a high probability 1: Build a pool of Tor nodes whose bandwidth is greater
that a node with poor bandwidth is chosen because of the or equal to M inBW .
existence of a large number of small-bandwidth Tor routers. 2: Use weighted random algorithm and build a circuit
We propose differential QoS in the Tor network in order through the pool. Measure the circuit throughput until
to improve QoS. The Tor network could be partitioned into its bandwidth is greater or equal to T olBW .
classes of Tor routers with high or low donated bandwidth.
Paths drawn from the class of high-bandwidth routers can
provide better performance. Paths can be chosen for flow the results for the location perturbing and anonymous routing
requests based on a particular flow request’s priority. In components, respectively.
this way, high priority flows (e.g., LBS query request and
response) will obtain high bandwidth and low priority flows 5.1. Experimental Setup
will obtain low bandwidth. So long as user’s requirements We have implemented a prototypical CAP system for
can be met with differential QoS, this will make more Mac OS X and Linux operating system with support for
effective use of bandwidth. GPS and integration with Tor. The positioning device we
Therefore, the anonymous routing component in Figure used is a SiRF Star III GPS receiver which is connected
1 will control Tor’s routing in order to achieve differential to the laptop via USB interface [3]. The location pertur-
QoS for Tor clients. We have implemented the two simple bation component of CAP was implemented using C++
path selection algorithms in favor of differential QoS in the and the Boost library. Qt library and Google Maps APIs
Tor network. The first algorithm is shown in Algorithm 1, (http://code.google.com/apis/maps/) were used for GUI de-
which provides the differential routing with two priorities. velopment to demonstrate the integration of CAP with exist-
This algorithm can be easily extended to support priorities ing LBS systems. For the anonymous routing component, we
larger than two. To provide a better QoS for LBS, a mobile revised Tor version 0.1.1.26. The mobile client is connected
client can choose the top priority, where a user prefers a to the Internet via 802.11b protocol. The LBS server is
path throughput greater or equal to M inBW . running on a desktop machine with 3.2Ghz Intel Core Duo
Algorithm 1 Differentiated Routing (Diff) CPU, 3GB RAM, and Suse 10.3 operating system.
We performed our experiments on the map of Mid-
Require: User specified minimum path throughput
dlesex county, Massachusetts, USA. The map was re-
M inBW
trieved from the 2006 second edition of the Topological
1: Build a pool of Tor nodes whose bandwidth is greater
Integrated Geographic Encoding and Referencing (TIGER)
or equal to M inBW .
system published by the US Census Bureau. The map
2: Use weighted random algorithm and build a circuit
can be downloaded as a zipped TIGER/Line file from
through the pool. Record used Tor nodes in existing
http://www2.census.gov/geo/tiger/tiger2006se/MA.
circuits and future circuits will not use those used Tor
We downloaded 800 POIs, including restaurants, hotels,
nodes.
clinics, and supermarkets in the county from http://www.gps-
data-team.com/poi/. We randomly selected 1000 different
The actual path throughput under Algorithm 1 may be co-ordinate points (latitude and longitude), lying in areas
much lower than M inBW because of congestion on the with varying road densities (e.g., downtown, rural areas,
Internet as numerous flows share the Tor nodes worldwide. suburbs etc.), as possible user locations.
To overcome this problem, the second routing algorithm
(Diff/CA) we propose to consider the congestion avoidance 5.2. Evaluation of Location Perturbing Component
as shown in Algorithm 2. Recall that Tor can create circuits Recall that the “Online Location Perturbation” algorithm
proactively and wait for user connections. To avoid conges- uses random noise generated from uniform distribution
tion, Diff/CA creates circuits proactively, measuring the path [−σ, σ]. We have tested the performance of location per-
throughput until it meets bandwidth requirement. This incurs turbing component by changing the noise parameter σ. We
a delay in circuit creation. Our experiments show that the have also tested for the storage requirements by changing
delay is within a reasonable range. the granularity ratio µ (recall the “Min-Density rule” from
5. Experimental Results Section 3).
To test against locations with diverse road densities, we
In this section, we present the implementation and ex- define the road density index of a location as the level of
perimental evaluation of CAP. We will first introduce the the leaf node that contains this location (root has level 1).
implementation and the experimental setup, and then present The depth of the tree is 13 when µ = 8, which is used in
2D Perturbation Distance (miles)

2D Perturbation Distance (miles)


1.6 2.4 5
1.4 road density index = 13 2.2 4.5
road density index = 11
2 4
1.2 road density index = 13

Rank Distance
road density index = 9
1.8 3.5 road density index = 11
road density index = 7
1 3 road density index = 9
road density index = 5 1.6 Naive technique road density index = 7
0.8 2.5
1.4 VHC-mapping road density index = 5
0.6 2
1.2 1.5
0.4 1 1
0.2 0.8 0.5
0 0.6 0
0.01 0.1 0.5 1 2 5 7 9 11 13 0.01 0.1 0.5 1 2
Noise parameter (!) Road Density Index Noise parameter (!)

Figure 4. 2-d Perturbation dis- Figure 5. Naive technique vs Figure 6. lr (Top-10) vs. σ
tance vs σ VHC-mapping
5
100 100
4.5
90 90
4 True positives rate

True positives rate


noise[-0.01,0.01] 80 80
Rank Distance

3.5 noise[-0.1,0.1] 70 70
3 noise[-0.5,0.5]
60 60
2.5 noise[-1.0,1.0]
noise[-2.0,2.0]
50 50
2 40 road density index = 13 40 noise[-0.01,0.01]
1.5 30 road density index = 11 30 noise[-0.1,0.1]
1 20 road density index = 9 20 noise[-0.5,0.5]
road density index = 7 noise[-1.0,1.0]
0.5 10 10
road density index = 5 noise[-2.0,2.0]
0 0 0
5 7 9 11 13 0.01 0.1 0.5 1 2 5 7 9 11 13
Road Density Index Noise parameter (!) Road Density Index

Figure 7. lr (Top-10) vs. Road Figure 8. True Positive Rate (Top- Figure 9. True Positive Rate (Top-
Density Index 10) vs. σ 10) vs. Road Density Index
Extra traveling distance (miles)

Extra traveling distance (miles)

1 100 200000
Middlesex County,MA

Binary Map file size (bits)


0.9 road density index = 13 90 100000 District of Columbia
road density index = 11
0.8 80 50000
road density index = 9
0.7 70
road density index = 7 25000
0.6 road density index = 5 60
0.5 50 10000
0.4 40 road density index = 13 5000
0.3 30 road density index = 11 2500
0.2 20 road density index = 9

0.1 10 road density index = 7 1000


road density index = 5
0 0 500
0.01 0.1 0.5 1 2 0.01 0.1 0.5 1 2 5 6 7 8 9 10
Noise parameter (!) Noise parameter (!) Granularity Ratio (µ)

Figure 10. Extra miles vs σ Figure 11. Extra miles (%) vs σ Figure 12. Binary map file size vs
µ

most experiments. Generally, the road density increases in techniques. One can clearly observe that in contrast with
exponential order with the road density index. the naive technique, VHC-mapping applies context-aware
Figure 4 depicts the relationship between the average 2-d perturbation, i.e., higher perturbation is applied to locations
perturbation distance DISTANCE(userP os, R(userP os)) with lower road density.
and the noise parameter σ for locations with various road We evaluated the accuracy of location perturbing compo-
densities. The 2-d perturbation distance is the Euclidean nent for a scenario where we issue a top-10 query for the
distance between the original and perturbed locations. Be- nearest POI. Figures 6 and 7 depict the relationship between
sides, we tested with Manhattan distance [12] and obtained the degree of LBS accuracy lr and the noise parameter σ for
similar results. As we can see, the 2-d perturbation distance locations with various road densities. In both the figures,
for a rural location (road density index = 5) is much we can make two observations: First, lr increases with the
greater compared to a downtown (road density index = 13) increase of σ. Second, there is no significant difference
location. This confirms that a rural location merits a larger in LBS accuracy for locations with different road density
perturbation than a downtown location. indices. Similar observations can be made from Figures 8
Figure 5 depicts the comparison between VHC-mapping and 9, where the LBS accuracy measure is the true positive
and a naive technique, which uses universal random noise rate of the returned top-10 results.
to perturb a user’s location, regardless of its context. We To estimate the real-world experience of CAP users, we
have used the same noise parameter value, σ = 2, for both consider the additional distance traveled by a user to reach
1
the returned nearest POI (compared with the real nearest 0.8 Weighted Routing
POI). Figure 10 depicts the relationship between extra miles 0.6 Differential Routing

F(x)
Diff/CA Routing (≤20KB/s)
to be traveled and noise parameter σ. It can be observed that 0.4
a user will have to travel more for higher level of privacy 0.2

protection desired. However, it would be more useful to have 0


0 20 40 60 80 100 120 140
guarantees on extra distance to be traveled given a particular (a) CDF Downloading Time (s)

level of privacy is desired. In other words, we are interested Weighted Routing


0.2 Differential Routing
in observing the value of extra miles to be traveled as a 0.15
Diff/CA Routing (≤20KB/s)
fraction of the 2-d perturbation distance. This is depicted

f(x)
0.1
in Figure 11, where for all the different values of noise 0.05
parameter σ, the extra miles to be traveled is approximately 0 20 40 60 80 100 120 140
65% of the 2-d perturbation distance. (b) PDF Downloading Time (s)

Recall that the granularity ratio µ controls the size of


the 4-tree (i.e., the VHC-mapping), and thus the size of Figure 13. Download Time
binary map file. Figure 12 depicts the relationship between
the storage cost of the 4-tree / binary map file and the Therefore, the adversary cannot identify which area the user
granularity ratio µ. As we can see, the storage cost decreases visits. To relax the trusted third-party assumption, Mokbel et
exponentially when µ increases. In particular, when µ = 10, al. in [15] studied a scheme that leverages the peer-to-peer
we only need 5000 bits for Middlesex county and 500 bit concept. However, the management of trust relationships
for District of Columbia to store the 4-tree / binary map among autonomous peers in LBS remains an open issue. A
file. This is much smaller than the size of the original recent work removed the requirement of trusted third-party
TIGER/Line map. The retrieval of the VHC-map requires by using a private information retrieval (PIR) based scheme
less than 0.1 seconds in our system, and the perturbation of [7]. Most research on user-driven schemes adopts various
a user’s location requires less than 1 millisecond. obfuscation techniques at the user side aimed at protecting
location privacy [2], [5]. For example, Duckham el al. in
5.3. Evaluation of Anonymous Routing Component [5] studied the scheme to protect a user’s real location by
We evaluated the communication QoS achieved by the
inserting some faked locations.
anonymous routing component of CAP. Figure 13 depicts
There has also been research on protecting location
the cumulative distribution function (CDF) and probability
privacy by hiding users’ network identities, such as net-
density function (PDF) of time downloading the map image
work address. For example, Hu et al. in [10] presented a
of 208,310 bytes from TIGER under the anonymous routing
framework which uses random identity addresses such as
algorithms we proposed in Section 4. Diff/CA (≤20KB/s)
IP and MAC addresses and adopts random silent periods
refers to differential routing with congestion avoidance
in which mobile nodes don’t transmit or receive frames.
whose tolerable throughput is 20KB/s. Table 1 gives the
Not much work has been done on the QoS for anonymous
mean, median and confidence interval (95%) (CI) of the
communication networks. McCoy et al. in [14] plainly
downloading time for different Tor routing algorithms.
presented some results of Tor’s performance measurement
We have a few observations from Figure 13 and Ta-
including router geopolitical distributions, circuit latency
ble 1: (i) The performance of Tor’s default routing algo-
and throughput. Snader and Borisov [20] proposed to use
rithm, weighted routing, can be intolerable for performance
bandwidth measurement algorithms and schemes that allow
sensitive service such as LBS. The largest downloading
users to choose higher performance or higher anonymity.
time of the map image is 134.49s. (ii) The differential
routing and the differential routing with congestion avoid-
ance can significantly improve Tor’s performance. With 7. Conclusion
Diff/CA(≤20KB/s), the median downloading time is 5.23s In this paper, we developed CAP to address two chal-
compared with the weighted routing’s 20.04s. lenging issues in privacy-preserving LBS: protection of
user location privacy from both location data and network
6. Related Work communication perspectives. CAP seamlessly integrates its
Existing schemes on preserving location privacy in LBS location perturbation and anonymous routing components.
can be generally classified into two categories: trusted third- We measure CAP in terms of location privacy, LBS query
party based and user based schemes. accuracy and communication QoS of the entire system. Its
Most research on trusted third-party based schemes adopts effectiveness is demonstrated by theoretical analysis, sim-
a k-anonymity based framework. In this framework, a trusted ulations, and experiments with an implemented prototype.
third-party called anonymizer is used to protect location pri- Our work is the first end-to-end solution to protect location
vacy [9], [23]. For example, Gruteser et al. in [9] studied the privacy and improve the accuracy of LBS while taking
k-area cloaking schemes in which the space is divided into communication QoS into account. We believe that this paper
a set of zones where each zone has at least k-sensitive areas. lays the foundation for ongoing studies of privacy-preserving
Table 1. Downloading Time Comparison (unit: seconds)
Weighted Routing Diff Diff/CA (≤5KBs) Diff/CA (≤10KB/s) Diff/CA (≤20KB/s)
Median 20.0422 9.2733 8.9566 7.3709 5.2343
Mean 24.3192 15.0296 12.0749 8.6298 5.711
CI lower limit 19.9743 12.9606 9.8527 6.9204 5.1213
CI upper limit 30.0927 18.4387 14.6869 10.6894 6.4669

LBS. [11] P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias.


Preventing location-based identity inference in anonymous
Acknowledgement spatial queries. IEEE Transactions on Knowledge and Data
Engineering, 19(12):1719–1733, 2007.
This work was supported in part by the National Sci- [12] E. F. Krause. Taxicab Geometry. Dover, 1987.
ence Foundation under grants 0324988, 0329181, 0721571,
0808419, 0845644, 0852673, 0852674, and 0907964. Any [13] Y. Liu, Y. Gu, H. Zhang, W. Gong, and D. Towsley. Ap-
opinions, findings, conclusions, and/or recommendations ex- plication level relay for high-bandwidth data transport. In
pressed in this material, either expressed or implied, are Proc. of the 1st International Workshop on Networks for Grid
(GridNets), 2004.
those of the authors and do not necessarily reflect the views
of the sponsor listed above. [14] D. McCoy, K. Bauer, D. Grunwald, P. Tabriz, and D. Sicker.
Shining light in dark places: A study of anonymous network
References usage. Technical report, University of Colorado at Boulder,
2007.

[1] B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime [15] M. F. Mokbel and C. Y. Chow. Challenges in preserving
measures for top-k algorithms. In VLDB, 2007. location privacy in peer-to-peer environments. In Proc. of
the International Workshop on Information Processing over
[2] C. A. Ardagna, M. Cremonini, E. Damiani, S. D. C. di Vimer- Evolving Networks (WINPEN), 2006.
cati, and P. Samarati. Location privacy protection through
obfuscation-based techniquess. Data and Applications Secu- [16] Moonbuggy. Man accused of stalking with gps, 2004.
rity XXI (Lecture Notes in Computer Science), 2007.
[17] H.-O. Peitgen and D. Saupe. The Science of Fractal Images.
[3] deluogps.com. Sirf star iii based mouse type USB GPS Springer-Verlag, New York, 1988.
receiver for laptop, 2008.
[18] R. Pries, W. Yu, S. Graham, and X. Fu. On performance
[4] R. Dingledine and N. Mathewson. Tor: An anonymous in- bottleneck of anonymous communication networks. In Proc.
ternet communication system. http://archives.seul.org/or/talk/, of the 22nd IEEE International Parallel and Distributed
2006. Processing Symposium (IPDPS), 2008.

[5] M. Duckham and L. Kulik. A formal model of obfuscation [19] A. Research. GPS-enabled location-based services (lbs)
and negotiation or location privacy. In Proc. of the 3rd subscribers will total 315 million in five years, 2006.
Internation Conference on Pervasive Computing and Com-
munications, 2005. [20] R.Snader and N.Borisov. A tune-up for tor: Improving
security and performance in the tor network. In Proc. of
[6] geobytes.com. Ip address locator tool, 2008. the 15th Annual Network and Distributed System Security
Symposium (NDSS), 2008.
[7] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, Tan,
and Kian-Lee. Private queries in location based services: [21] R. Sion and B. Carbunar. On the computational practicality
Anonymizers are not necessary. In Proc. of ACM SIGMOD, of private information retrieval. In Proc. of the 14th Annual
2008. Network and Distributed Security Symposium (NDSS), 2007.

[8] D. R. Glover and J. L. Simon. The effect of population density [22] L. Sweeney. k-anonymity: a model for protecting pri-
on infrastructure: The case of road building. Economic vacy. International Journal on Uncertainty, Fuzziness and
Development and Cultural Change, 23(3):453–468, 1975. Knowledge-based Systems, 10(5):557–570, 2002.

[9] M. Gruteser and X. Liu. Protecting privacy in continuous [23] T. Xu and Y. Cai. Exploring historical location data for
location-tracking applications. IEEE Security and Privacy, anonymity preservation in location-based services. In Proc.
2(2):28–34, 2004. of IEEE International Conference on Computer Communica-
tions (INFOCOM), 2008.
[10] Y.-C. Hu and H. J. Wang. Location privacy in wireless
networks. In Proc. of the ACM SIGCOMM Asia Workshop, [24] N. Zhang and W. Zhao. Privacy-preserving data mining
2005. systems. IEEE Computer, 40(4):52–58, 2007.

You might also like