Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MSR.2019.00067acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

RmvDroid: towards a reliable Android malware dataset with app metadata

Published: 26 May 2019 Publication History

Abstract

A large number of research studies have been focused on detecting Android malware in recent years. As a result, a reliable and large-scale malware dataset is essential to build effective malware classifiers and evaluate the performance of different detection techniques. Although several Android malware benchmarks have been widely used in our research community, these benchmarks face several major limitations. First, most of the existing datasets are outdated and cannot reflect current malware evolution trends. Second, most of them only rely on VirusTotal to label the ground truth of malware, while some anti-virus engines on VirusTotal may not always report reliable results. Third, all of them only contain the apps themselves (apks), while other important app information (e.g., app description, user rating, and app installs) is missing, which greatly limits the usage scenarios of these datasets. In this paper, we have created a reliable Android malware dataset based on Google Play's app maintenance results over several years. We first created four snapshots of Google Play in 2014, 2015, 2017 and 2018 respectively. Then we use VirusTotal to label apps with possible sensitive behaviors, and monitor these apps on Google Play to see whether Google has removed them or not. Based on this approach, we have created a malware dataset containing 9,133 samples that belong to 56 malware families with high confidence. We believe this dataset will boost a series of research studies including Android malware detection and classification, mining apps for anomalies, and app store mining, etc.

References

[1]
H. Wang, H. Li, and Y. Guo, "Understanding the evolution of mobile app ecosystems: A longitudinal measurement study of google play," in Proceedings of the Web Conference 2019 (WWW '19), 2019.
[2]
"2018 Malware Forecast: the onward march of Android malware," 2017, https://nakedsecurity.sophos.com/2017/11/07/2018-malware-forecast-the-onward-march-of-android-malware/.
[3]
W. Enck, P. Gilbert, S. Han, V. Tendulkar, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, "Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones," ACM Transactions on Computer Systems (TOCS), vol. 32, no. 2, p. 5, 2014.
[4]
Y. Feng, S. Anand, I. Dillig, and A. Aiken, "Apposcopy: Semantics-based detection of android malware through static analysis," in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2014, pp. 576--587.
[5]
E. Mariconti, L. Onwuzurike, P. Andriotis, E. De Cristofaro, G. Ross, and G. Stringhini, "Mamadroid: Detecting android malware by building markov chains of behavioral models," in Proceedings of NDSS '17, 2017.
[6]
D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, "Drebin: Effective and explainable detection of android malware in your pocket." in Proceedings of the Network and Distributed System Security Symposium (NDSS '14), vol. 14, 2014, pp. 23--26.
[7]
N. McLaughlin, J. Martinez del Rincon, B. Kang, S. Yerima, P. Miller, S. Sezer, Y. Safaei, E. Trickel, Z. Zhao, A. Doupe et al., "Deep android malware detection," in Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. ACM, 2017, pp. 301--308.
[8]
A. Gorla, I. Tavecchia, F. Gross, and A. Zeller, "Checking app behavior against app descriptions," in Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, pp. 1025--1035.
[9]
L. Yu, X. Luo, C. Qian, S. Wang, and H. K. Leung, "Enhancing the description-to-behavior fidelity in android apps with privacy policy," IEEE Transactions on Software Engineering, vol. 44, no. 9, pp. 834--854, 2018.
[10]
W. Yang, M. Prasad, and T. Xie, "Enmobile: Entity-based characterization and analysis of mobile malware," in Proceedings of the 40th International Conference on Software Engineering (ICSE '18), 2018, pp. 384--394.
[11]
J. Chen, C. Wang, Z. Zhao, K. Chen, R. Du, and G.-J. Ahn, "Uncovering the face of android ransomware: Characterization and real-time detection," IEEE Transactions on Information Forensics and Security, vol. 13, no. 5, pp. 1286--1300, 2018.
[12]
F. Dong, H. Wang, L. Li, Y. Guo, T. F. Bissyandé, T. Liu, G. Xu, and J. Klein, "Frauddroid: Automated ad fraud detection for android apps," in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 2018, pp. 257--268.
[13]
F. Dong, H. Wang, L. Li, Y. Guo, G. Xu, and S. Zhang, "How do mobile apps violate the behavioral policy of advertisement libraries?" in Proceedings of the 19th International Workshop on Mobile Computing Systems & Applications. ACM, 2018, pp. 75--80.
[14]
P. Faruki, A. Bharmal, V. Laxmi, V. Ganmoor, M. S. Gaur, M. Conti, and M. Rajarajan, "Android security: a survey of issues, malware penetration, and defenses," IEEE communications surveys & tutorials, vol. 17, no. 2, pp. 998--1022, 2015.
[15]
S. Arshad, M. A. Shah, A. Khan, and M. Ahmed, "Android malware detection & protection: a survey," International Journal of Advanced Computer Science and Applications, vol. 7, no. 2, pp. 463--475, 2016.
[16]
X. Jiang and Y. Zhou, "Dissecting android malware: Characterization and evolution," in Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 2012, pp. 95--109.
[17]
L. Li, D. Li, T. F. Bissyandé, J. Klein, Y. Le Traon, D. Lo, and L. Cavallaro, "Understanding android app piggybacking: A systematic study of malicious code grafting," IEEE Transactions on Information Forensics and Security, vol. 12, no. 6, pp. 1269--1284, 2017.
[18]
F. Wei, Y. Li, S. Roy, X. Ou, and W. Zhou, "Deep ground truth analysis of current android malware," in Proceedings of the 2017 International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2017, pp. 252--276.
[19]
P. Irolla and A. Dey, "The duplication issue within the drebin dataset," Journal of Computer Virology and Hacking Techniques, vol. 14, no. 3, pp. 245--249, 2018.
[20]
J.-F. Lalande, V. V. T. Tong, M. Leslous, and P. Graux, "Challenges for reliable and large scale evaluation of android malware analysis," in Proceedings of the 2018 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 2018, pp. 1068--1070.
[21]
"Tool for leaking and bypassing Android malware detection system," 2019, https://github.com/sslab-gatech/avpass.
[22]
S. Ma, S. Wang, D. Lo, R. H. Deng, and C. Sun, "Active semi-supervised approach for checking app behavior against its description," in Proceedings of the 39th Annual Computer Software and Applications Conference, vol. 2. IEEE, 2015, pp. 179--184.
[23]
"Massive Android Malware Outbreak Invades Google Play Store," 2017, http://fortune.com/2017/09/14/google-play-android-malware/.
[24]
"Malicious Android apps sneak malware onto your phone with droppers," 2017, https://mashable.com/article/droppers-malware-android-google-play-store/.
[25]
H. Wang, Z. Liu, J. Liang, N. Vallina-Rodriguez, Y. Guo, L. Li, J. Tapiador, J. Cao, and G. Xu, "Beyond google play: A large-scale comparative study of chinese android app markets," in Proceedings of the 2018 Internet Measurement Conference (IMC '18). ACM, 2018, pp. 293--307.
[26]
H. Wang, H. Li, L. Li, Y. Guo, and G. Xu, "Why are android apps removed from google play?: A large-scale empirical study," in Proceedings of the 15th International Conference on Mining Software Repositories (MSR '18). ACM, 2018, pp. 231--242.
[27]
M. Sebastián, R. Rivera, P. Kotzias, and J. Caballero, "Avclass: A tool for massive malware labeling," in Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses (RAID '16). Springer, 2016, pp. 230--253.
[28]
"Google Play API," 2019, https://github.com/facundoolano/google-play-api.
[29]
H. Zhu, H. Xiong, Y. Ge, and E. Chen, "Discovery of ranking fraud for mobile apps," IEEE Transactions on knowledge and data engineering, vol. 27, no. 1, pp. 74--87, 2015.
[30]
Y. Hu, H. Wang, L. Li, Y. Guo, G. Xu, and R. He, "Want to earn a few extra bucks? a first look at money-making apps," in Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER '19), 2019.
[31]
Y. Hu, H. Wang, Y. Zhou, Y. Guo, L. Li, B. Luo, and F. Xu, "Dating with scambots: Understanding the ecosystem of fraudulent dating applications," arXiv preprint arXiv:1807.04901, 2018.

Cited By

View all
  • (2023)One size does not fit allProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620555(5683-5700)Online publication date: 9-Aug-2023
  • (2022)MalWhiteout: Reducing Label Errors in Android Malware DetectionProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3560418(1-13)Online publication date: 10-Oct-2022
  • (2022)Demystifying “removed reviews” in iOS app storeProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558966(1489-1499)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '19: Proceedings of the 16th International Conference on Mining Software Repositories
May 2019
640 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 26 May 2019

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)One size does not fit allProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620555(5683-5700)Online publication date: 9-Aug-2023
  • (2022)MalWhiteout: Reducing Label Errors in Android Malware DetectionProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3560418(1-13)Online publication date: 10-Oct-2022
  • (2022)Demystifying “removed reviews” in iOS app storeProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558966(1489-1499)Online publication date: 7-Nov-2022
  • (2022)MalRadarProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35309066:2(1-27)Online publication date: 6-Jun-2022
  • (2022)AndroOBFSProceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3528493(454-458)Online publication date: 23-May-2022
  • (2022)Rotten apples spoil the bunchProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510161(1919-1931)Online publication date: 21-May-2022
  • (2021)On the Impact of Sample Duplication in Machine-Learning-Based Android Malware DetectionACM Transactions on Software Engineering and Methodology10.1145/344690530:3(1-38)Online publication date: 8-May-2021
  • (2021)A Longitudinal Study of Removed Apps in iOS App StoreProceedings of the Web Conference 202110.1145/3442381.3449990(1435-1446)Online publication date: 19-Apr-2021
  • (2021)Demystifying Illegal Mobile Gambling AppsProceedings of the Web Conference 202110.1145/3442381.3449932(1447-1458)Online publication date: 19-Apr-2021
  • (2021)CHAMPProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00089(933-945)Online publication date: 22-May-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media