Analysis and Detection of Simbox Fraud in Mobility Networks: Proceedings - Ieee Infocom April 2014
Analysis and Detection of Simbox Fraud in Mobility Networks: Proceedings - Ieee Infocom April 2014
Analysis and Detection of Simbox Fraud in Mobility Networks: Proceedings - Ieee Infocom April 2014
net/publication/269298339
CITATIONS READS
24 17,227
4 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Roger Piqueras Jover on 16 October 2017.
Abstract—Voice traffic termination fraud, often referred to as connected mobile devices make detecting call bypassing fraud
Subscriber Identity Module box (SIMbox) fraud, is a common extremely challenging. Moreover, traffic patterns and charac-
illegal practice on mobile networks. As a result, cellular operators teristics of fraudulent SIMboxes are very similar to those of
around the globe lose billions annually. Moreover, SIMboxes
compromise the cellular network infrastructure by overloading certain legitimate devices, such as cellular network probes. So,
local base stations serving these devices. This paper analyzes the detecting fraudulent SIMboxes resembles searching for a few
fraudulent traffic from SIMboxes operating with a large number needles in a huge haystack full of small objects that look like
of SIM cards. It processes hundreds of millions of anonymized needles. While operators of the intermediate and destination
voice call detail records (CDRs) from one of the main cellular networks have high financial incentives to understand the
operators in the United States. In addition to overloading voice
traffic, fraudulent SIMboxes are observed to have static physical problem, they do not have the data to analyze the international
locations and to generate disproportionately large volume of calls that are gone. Also, the absence of publicly available
outgoing calls. Based on these observations, novel classifiers for SIMbox-related data is a major obstacle for emerging of
fraudulent SIMbox detection in mobility networks are proposed. comprehensive studies on voice bypassing fraud analysis and
Their outputs are optimally fused to increase the detection rate. detection [5]. By contrast, most of the SIMbox traffic, analyzed
The operator’s fraud department confirmed that the algorithm
succeeds in detecting new fraudulent SIMboxes. in this paper, is on the originating end of the communication,
giving us insight on SIMbox fraud from a different perspective
I. I NTRODUCTION than most networks with a bypass problem.
Cellular network operators lose about 3% of the annual In this work, we analyze fraudulent SIMbox traffic based on
revenue due to fraudulent and illegal services. Juniper Re- anonymized communication data from one of the major tier-
search estimated the total losses from the underground mobile 1 network operators in the United States. SIMboxes operate
network industry to be $58 billion in 2011 [1], [2]. The with a large number of SIM cards from foreign and national
impact of voice traffic termination fraud, commonly known as operators. If an operator detects and shuts down a fraudulent
Subscriber Identity Module (SIM)-box fraud or bypass fraud, account, the fraudsters deploy a set of new SIM cards as in the
on mobile networks is particularly severe in some parts of the Short Message Service (SMS) spam fraud [6]. Also, fraudulent
globe [2]. Recent highly publicized raids on fraudsters include SIMboxes have almost static physical locations and generate
those in Mauritius, Haiti, and El Salvador [3]. disproportionately large number of outgoing calls (100 times
Fraudulent SIMboxes hijack international voice calls and as many as incoming calls). Based on these observations,
transfer them over the Internet to a cellular device, which we introduce three classifiers for fraudulent SIMboxes and
injects them back into the cellular network. As a result, the combine their outputs into a classification rule, which has a
calls become local at the destination network [4], and the high detection rate and correctly filters out mobile network
cellular operators of the intermediate and destination networks probes with traffic patterns similar to those of SIMboxes.
do not receive payments for the call routing and termination. This paper is organized into five sections. Section II
Fraudulent SIMboxes also hijack domestic traffic in certain overviews voice termination fraud in mobility networks and
areas, e.g. in Alaska within the United States, where call illustrates it with some basic examples. Section III analyzes
termination costs are high. In some cases, the traffic is injected SIMbox related traffic, compares it to the legitimate traffic,
into a cellular network and is forwarded to the terminating and, based on the extracted features, presents a novel algo-
country. This increases the call routing cost for the operator rithm for SIMbox detection in mobility networks. Section
of the injected traffic. IV overviews the related work, and Section V concludes the
Besides causing the economic loss, SIMboxes degrade the paper.
local service where they operate. Often, cells are overloaded,
and voice calls routed over a SIMbox have poor quality, which II. VOICE FRAUD IN MOBILE NETWORKS
results in customer dissatisfaction. SIMbox voice fraud occurs when the cost of terminating
Although some vendors provide cellular anti-fraud services, domestic or international calls exceeds the cost of a lo-
the large amount of daily cellular traffic and the number of cal mobile-to-mobile call in a particular region or country.
2
(a)
(b)
Fig. 1. Two examples of one-hope SIM-box bypass fraud: (a) hijacking of an international call (b) hijacking and re-injecting of an international call.
Fraudsters make profit by offering low-cost international and in the area of call recipient to deliver the call as local.
sometimes domestic voice calls to other operators. To bypass As a result, the operators serving the called party do not
call routing fees, they buy or hijack large amounts of SIM receive the corresponding call termination fees. In other cases,
cards, install them into an off-the-shelf hardware1 to connect SIMboxes re-inject telecom voice traffic into the mobility
to the mobility network, which essentially becomes a SIMbox. network masked as mobile customer calls, and the operator
Then the fraudsters transfer a call via the Internet to a SIMbox pays for carrying the re-injected calls.
1 The
Figure 1 shows two examples of how SIMbox bypass fraud
hardware can be used for legitimate purposes, for example in machine-
to-machine (M2M) applications. It has been recently reported to transmit SMS occurs for international phone calls. For simplicity, the exam-
spam [7]. ples assume that there is only one intermediate hop connection
3
between two pairs of countries: country A to country B and (low priority traffic) through an LCR carrier and to pass the
country A to country C. The solid line marks a legitimate path traffic of operator A’s retail customers (high priority traffic)
for a phone call, whereas the dotted line indicates a fraudulent via foreign operators (high cost routes). Suppose a low priority
one when a SIMbox is in place. Actual SIMbox fraud is often call, originating in country A, is intended to reach Bob. In
more complex, involving multiple intermediate steps. the legitimate case, this call is routed over an LCR and is
In Figure 1(a), Alice, who lives in country A, calls Bob, who terminated in operator C’s network. In this stage, the fraud
lives in country B. In the legitimate case, once she dials Bob’s occurs when an illegitimate LCR routes the traffic over IP to
number, the call is routed through the cellular infrastructure the SIMbox in country A.
of operator A to a least cost route (LCR) carrier. Based Finally, in the third stage, the SIMbox injects the traffic
on an agreement between operator B and the LCR carrier, destined to Bob into the cellular network of operator A. At
the call is routed to operator B’s cellular core network. The this point, the communication becomes a wireless call from the
LCR carrier pays operator B a fee in order to have the call SIM card of retail customer Alice, so that the communication
terminated. Then the call is routed through the operator B’s is routed to country C over a high cost foreign provider. In
cellular infrastructure and is delivered to Bob. this case, operator C always receives the call termination fee
The fraud occurs when a fraudulent LCR carrier hijacks either from the legitimate LCR or from the foreign providers,
Alice’s call and forwards it to country B over the Internet, whereas when Alice reports her stolen identity, operator A
e.g. via VoIP. Then in country B, a SIMbox (an associate of becomes liable for the cost of the call.
the fraudulent LCR carrier) transforms the incoming VoIP flow The following example illustrates the difference between
into a local mobile call to Bob, and the operator B looses the legitimate international calling cards (PennyTalk [8], ZapTel
termination fee for the hijacked call. [9], Vonage [10], etc.) and SIMbox fraud that reroutes voice
Figure 1(b) shows a more elaborate SIMbox fraud scheme, calls via VoIP. Suppose that Alice, who is in country A, pur-
which consists of three stages. In the first stage, the fraudsters chased an international calling card to call Bob in country B
hijack legitimate SIM cards from customers of operator A (see Figure 2). She dials either a local or toll free number that
and put them into a SIMbox. For example, they call Alice and connects her to the calling card platform through operator A’s
trick her into providing her account information. Then they cellular network. The platform requests and verifies Alice’s
impersonate Alice and link her wireless account to their own card access code, and as soon as she enters Bob’s phone
SIM card. As a result, Alice’s phone will be unable to connect, number, it forwards her call over the PSTN to country B, and
whereas the SIMbox traffic will be charged to her account. Bob receives a local low cost call connecting him to Alice. A
In the second stage, suppose that Bob lives in country C portion of the call path might be routed over IP. When Alice
and has a SIM card from operator C. International roaming uses the international calling card, she is aware that her call
agreements and standard industry practices specific for country will be routed via VoIP, and so, she agrees to get a low call
C prescribe to pass call traffic from either other network quality. In this case, despite no termination fees, the operators
operators or the Public Switched Telephone Network (PSTN) of Alice’s and Bob’s providers make profit from the legitimate
4
(a) (b)
(c) (d)
Fig. 3. Voice call traffic characteristics of SIMboxes (dots) and legitimate (triangles) accounts
3 This number is not necessarily one because many cellular devices connect
February 2012 – June 2013 versus the average duration of
to the network by means of a third generation (3G) technology based on
wideband code multiple division access (WCDMA). In this technology, a MT calls. MT call durations of fraudulent SIMboxes are much
device can be physically connected to up to 6 sectors at the same time, shorter than those of legitimate customers. Also, in contrast to
combining the signal at the receiver [13]. Depending on the channel conditions some network probing devices, SIMboxes operate less IMSI
and fading, the serving base station might fluctuate throughout the set of 6
base station IDs. As a result, CDR records from the same static device will during longer periods due to frequent account cancellation by
come from up to 6 different sectors. the operator.
6
The distribution of the number of IMSIs per IMEI from the three classifiers:
February 2012 till June 2013 is obtained by parsing CDRs
ybi = w1 x1i + w2 x2i + w3 x3i , (1)
of all the operator’s active subscribers (see Figure 4, where
the actual values on each axis are omitted to hide sensitive where ybi is the prediction of data label yi and w1 +w2 +w3 =
information). As expected, the majority of devices, connected 1. The absence of the intercept in (1) and the constraint w1 +
to the mobile network, operate with one or just a few SIM w2 + w3 = 1 guarantee that the cases when all three classifier
cards (IMSIs). An IMEI operating two SIM cards can repre- predict either 0 or 1 will be classified as a legitimate customer
sent, for example, a used device which was bought on eBay. and a SIMbox, respectively. The regression coefficients w1 ,
Also, Figure 4 shows two big spikes of IMEIs operating with w2 , and w3 are found by the least squares method
many SIM cards. Further analysis identifies those IMEIs as n
X
legitimate cellular network probes. min yi − yi ) 2
(b s.t. w1 + w2 + w3 = 1, (2)
w1 ,w2 ,w3
i=1
TABLE II
C LASSIFICATION R ESULTS
that the overwhelming majority of triplets (x1i , x2i , x3i ), being Algorithm 2 SIMbox detection
either (0, 0, 0) or (1, 1, 1), correspond to correct training data 1: loop
labels (0 or 1, respectively), so that they do not affect the 2: Run Algorithm 1 for the period τ
objective function in (2), whereas they skew the logistic 3: N = Set of indices of all IMEIs in the network
regression. For the training data, the hard margin SVM is not 4: Ω=Ø
separable, whereas the soft margin SVM requires to specify the 5: Θ = Set of indices of IMEIs corresponding to known
extent of misclassifications (through an additional parameter) network probing devices and corporate accounts
and involves n + 4 variables, which makes it less attractive 6: h = Filtering threshold
from the computational perspective. 7: for all i ∈ N \ Θ do
The three classifier were trained on the training dataset, 8: mi = number of IMSIs operated by IMEIi
for which their optimal weight coefficients are w1 = 0.31, 9: if mi > h then
w2 = 0.26, and w3 = 0.43 with the corresponding threshold 10: Ω = Ω ∪ {i}
to be α = 0.39. Table II shows predictions of the three 11: end if
classifiers and of their optimal linear combination on the test- 12: end for
ing dataset. The false positive is the proportion of legitimate 13: for all i ∈ Ω do
customers classified as SIMboxes, whereas false negative is 14: Generate features, then apply the alternating tree,
the proportion of SIMboxes classified as legitimate accounts. functional tree, and random forest obtained at step
The “accuracy” column in Table II shows the proportions of 2 to the features and compute the prediction yi∗
the total correct classifications. Among the three classifiers, 15: end for
the random forest has the lowest false positive and the highest 16: return Detected SIMboxes to the fraud department
false negative, whereas the functional tree has the lowest false 17: update S and L based on the feedback from the fraud
negative and the highest false positive. The optimal linear department
combination of the three classifiers improves both the false 18: end loop
positive and false negative and has the highest accuracy of
99.95%.
For a real mobility network, generating of the set of features attention from both industry and academy [20]. In this fraud,
for each account requires processing hundreds of millions mobile users are tricked to visit phishing urls and to provide
of CDRs, in which case, accounts with less than 10 IMSIs sensitive information. As SIMbox fraudsters, spammers also
per IMEI that are unlikely to be very active SIMboxes are operate large numbers of SIM cards from each IMEI [6]. The
filtered out.4 Additionally, all known legitimate accounts such volume of SMS spam is expected to grow at the annual rate of
as network probing and corporate accounts are also filtered 500% [21]. For a detailed analysis of SMS spam on mobility
out. Thus, after the pre-processing, only 0.02% of all active networks, see [7].
accounts remains for feature extraction and classification. Smart phone GGtracker malware [22] is yet another ex-
Algorithm 2 incorporates Algorithm 1 and presents a scheme ample of recent fraudulent activities. Being embedded into
for detecting new fraudulent SIMboxes in the real mobility a legitimately-looking app, it silently subscribes users to
network. The operator’s fraud department confirmed that Al- premium number services such as a horoscope for $10 per
gorithm 2 successfully identifies new SIMboxes. month and hides all communications with those premium
numbers. The users learn that they are victims of the fraud
IV. R ELATED WORK
only in the end of a billing cycle when they see excessive
The rise of the mobile communication technology for the charges on their bills. For an overview of other malware on
past decade is mirrored by the range and sophistication of Android platforms, see [23].
illegal activities on mobile networks. SMS spam is by far the Subscription fraud occurs when fraudsters steal customer’s
most prevailing illegal activity that has attracted considerable identification and use it to subscribe to a mobility network
4 Section III shows that fraudulent SIMboxes use a large number of SIM
[24], [25]. With a low cost technique, a fraudster can sniff
cards. Thus, first we calculate the total number of SIMs used by each active traffic from a GSM (Global System for Mobile Communi-
IMEI in the network within a week. The threshold of 10 IMSIs is arbitrary. cations) mobility network and break its encryption [26]. As a
8
result, he can obtain the IMSI and the secret key of any victim all active subscribers). The operator’s fraud department has
in his vicinity and then can use wireless service at the victim’s confirmed that the proposed algorithm detects new fraudulent
expense. As with the GGTracker malware, victims learn about SIMboxes with a low false positive.
the fraud only when their bills arrive. ACKNOWLEDGEMENTS
Several security firms offer their services for detection and
prevention of SIMbox fraud [27], [28], however, details of We are grateful to Angus MacLellan and Richard Becker
their detection techniques are not disclosed. To the best of for their help, comments, and valuable suggestions.
our knowledge, there is only one publicly available work R EFERENCES
[5] on this subject. It uses artificial neural networks (multi
[1] H. Windsor, “Mobile Revenue Assurance & Fraud Management,” Ju-
layer perception method) to detect fraudulent SIMboxes based niper Research, http://goo.gl/GX7G4.
on 9 voice call communication features for 6415 subscribers [2] M. Yelland, “Fraud in mobile networks,” Computer Fraud & Security,
from one Cell ID (234,324 calls in total). The method detects vol. 2013, no. 3, pp. 5–9, 2013.
[3] “Raids on SIM Box/GSM Gateway Fraudsters Save Mobile Operators
SIMboxes with 98.71% accuracy. Millions,” Reuters, http://goo.gl/pHCpK.
In contrast to [5], our classification rule is trained and tested [4] “Fraud in the Mobile World,” Revector, http://goo.gl/Uobx6.
on a larger data sample of accounts distributed nationwide, and [5] A. H. Elmi, S. Ibrahim, and R. Sallehuddin, “Detecting sim box fraud
using neural network,” in IT Convergence and Security 2012. Springer,
our features are computed per IMEI (device identifier) rather 2013, pp. 575–582.
than per subscriber identifier. Since SIMboxes operate with [6] I. Murynets and R. Piqueras Jover, “Analysis of SMS Spam in Mobility
multiple SIMs, 500 IMEIs of fraudulent SIMboxes correspond Networks,” in International Journal of Advanced Computer Science
(IJACSci), vol. 3, num.7, July 2013.
to thousands of fraudulent SIMbox subscriber identifiers. Only [7] I. Murynets and R. Piqueras Jover, “Crime scene investigation: SMS
few of our features coincide with those in [5]. For example, spam data analysis,” in Proceedings of the 2012 ACM conference on
the number of locations, which we consider, is quite important Internet measurement. ACM, 2012, pp. 441–452.
[8] “PennyTalk,” http://www.pennytalk.com/.
but is not relevant for [5], since all subscribers in [5] were in [9] “ZapTel Calling Cards,” http://www.zaptel.com/.
one Cell ID. Other important new features include the number [10] “Vonage Calling Cards,” http://lp.vonage.com/callingcard.
of SIM cards, the total number of international calls and its [11] “RCATS - Remote Cellular Active Test System,” JDSU, http://goo.gl/
VEbMA.
ratio to the total number of calls. Also, we show that network [12] A.-L. Barabási and R. Albert, “Emergence of scaling in random net-
probing devices have communication patterns similar to those works,” science, vol. 286, no. 5439, pp. 509–512, 1999.
of SIMboxes. [13] Universal Mobile Telecommunications System (UMTS), “Physical layer
procedures (FDD). 3GPP TS 25.214,” vol. v3.17.0, 1999.
V. C ONCLUSIONS [14] Y. Freund and L. Mason, “The alternating Decision Tree Learning
Algorithm,” 1999.
We have analyzed voice call communication features of [15] G. Holmes, B. Pfahringer, R. Kirkby, E. Frank, and M. Hall, “Multiclass
fraudulent SIMboxes in the mobility network of a major tier- Alternating Decision Trees,” 2002.
1 network operator in the United States and have identified [16] J. Gama, “Functional trees,” Machine Learning, no. 55, 2004.
[17] L. Breiman, “Random Forests,” vol. 45.
call traffic patterns distinguishing fraudulent SIMboxes from [18] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and
legitimate devices. Those patterns include high number of I. Witten, “The WEKA Data Mining Software: An Update,” in SIGKDD
IMSIs per IMEI, large number of international phone calls, Explorations, vol. 11.
[19] M. Zabarankin and S. Uryasev, Statistical Decision Problems: Selected
imbalance between MO and MT traffic (international and Concepts and Portfolio Safeguard Case Studies. Springer, 2013, to
domestic) and static physical location. Based on the features, appear.
we have proposed three classifiers of fraudulent SIMboxes in [20] N. Perlroth, “Spam Invades a Last Refuge, the Cellphone,” The New
York Times, April 2012, http://preview.tinyurl.com/7nwvm3g.
mobility networks: alternating decision tree, functional tree, [21] A. Bobotek, “Threat of Mobile Malware and Abuse,” Messaging Anti-
and random forest. The random forest and functional decision Abuse Working Group (MAAWG), October 2010, http://goo.gl/Ay57e.
tree provide the lowest false positive and the lowest false [22] T. Strazzere, “GGTracker Technical Tear Down,” Lookout Mobile Se-
curity, http://goo.gl/IuVfm.
negative, respectively. The false positive of the alternating [23] Y. Zhou and X. Jiang, “Dissecting android malware: Characterization
decision tree is lower than that of the functional tree, and and evolution,” in Security and Privacy (SP), 2012 IEEE Symposium
its false negative is lower than that of the random forest. The on. IEEE, 2012, pp. 95–109.
[24] Y. Moreau, H. Verrelst, and J. Vandewalle, “Detection of mobile phone
predictions of the three classifiers have been linearly combined fraud using supervised neural networks: A first prototype,” in Artificial
into a classification rule, where classifiers’ weight coefficients Neural NetworksICANN’97. Springer, 1997, pp. 1065–1070.
have been found from minimization of the total classification [25] P. Barson, S. Field, N. Davey, G. McAskie, and R. Frank, “The detection
of fraud in mobile phone networks,” Neural Network World, vol. 6, no. 4,
error on the training dataset. The random forest has the largest pp. 477–484, 1996.
weight coefficient followed by that of the alternating decision [26] K. Nohl and S. Munaut, “Wideband GSM sniffing,” in In 27th Chaos
tree. The accuracy of the classification rule is 99.95%. For Communication Congress, 2010, http://goo.gl/wT5tz.
[27] “SIM Box Detection Service,” Telekom Austria, http://goo.gl/Ac12d.
large data sets, the scalability of the algorithm can be improved [28] “SIMbox detector,” Xintec, http://goo.gl/AUZbe.
by filtering out accounts with less than 10 IMSIs (99.98% of