survey

Open access

Android Source Code Vulnerability Detection: A Systematic Literature Review

Authors:

Luca PirasAuthors Info & Claims

ACM Computing Surveys, Volume 55, Issue 9

Article No.: 187, Pages 1 - 37

https://doi.org/10.1145/3556974

Published: 16 January 2023 Publication History

All formats PDF

Abstract

The use of mobile devices is rising daily in this technological era. A continuous and increasing number of mobile applications are constantly offered on mobile marketplaces to fulfil the needs of smartphone users. Many Android applications do not address the security aspects appropriately. This is often due to a lack of automated mechanisms to identify, test, and fix source code vulnerabilities at the early stages of design and development. Therefore, the need to fix such issues at the initial stages rather than providing updates and patches to the published applications is widely recognized. Researchers have proposed several methods to improve the security of applications by detecting source code vulnerabilities and malicious codes. This Systematic Literature Review (SLR) focuses on Android application analysis and source code vulnerability detection methods and tools by critically evaluating 118 carefully selected technical studies published between 2016 and 2022. It highlights the advantages, disadvantages, applicability of the proposed techniques, and potential improvements of those studies. Both Machine Learning (ML)-based methods and conventional methods related to vulnerability detection are discussed while focusing more on ML-based methods, since many recent studies conducted experiments with ML. Therefore, this article aims to enable researchers to acquire in-depth knowledge in secure mobile application development while minimizing the vulnerabilities by applying ML methods. Furthermore, researchers can use the discussions and findings of this SLR to identify potential future research and development directions.

1 Introduction

With technological enhancements and an increase in the usage of mobile devices, a growing number of people depend on mobile phones for their daily activities, both for the personal and professional aspects of their lives. Statista forecast that, by 2023, there will be 4.3 billion smartphone users [136]. 71.45% of them use Android as the Operating System (OS) (May 2022) [135]. Because of this worldwide popularity, many application developers are developing various Android mobile applications with rapid development life cycles. However, most of these applications are developed without integrating proper security mechanisms, increasing their vulnerabilities at the development stage. Since Google Play does not extensively validate apps to detect code vulnerabilities when publishing [36], users may experience various issues caused by the lack of security checks [142]. Therefore, it is important to integrate appropriate Android source code vulnerability detection methods and tools when developing the apps.

Studies such as References [20, 53, 59, 133, 134] have proposed several methods to detect vulnerabilities in source code, including some automated mechanisms to support the developers when designing and developing secured applications. An increasing number of these supporting methods, such as References [45, 48, 89, 93], are based on machine learning and deep learning for automatic early detection of security issues and vulnerabilities, which can support the software engineers to improve software security. Studies in References [10, 18, 116, 164] employ alternative techniques, to identify vulnerabilities. A thorough understanding of these existing methods is essential for developing Android applications by applying security best practices. Moreover, further research also can be conducted to enhance those methods. Therefore, researchers and the application development community can use this state-of-the-art SLR to fully understand the strengths and weaknesses of existing source code vulnerability detection methods and thus identify future research directions.

As discussed in Section 2, several limitations, including not covering recent proposals, relatively narrow scopes, and lack of critical appraisals of suggested detection methods, have been identified in these existing literature reviews on Android vulnerability detection and prevention methods. The lack of a thorough analysis of ML or Deep Learning (DL)-based methods when detecting vulnerabilities was also a limitation of existing works. This SLR addresses these limitations by critically evaluating 118 carefully selected technical studies while answering the formulated research questions. Therefore, this work is unique, which addresses the research gap in this area.

1.1 Research Questions

This systematic review aims to answer the following research questions:

RQ1: What are the existing methods for source code and application analysis? Many research studies considered various source code analysis methods, including application reverse-engineering. Moreover, byte-code-based analysers are also used, since Android apps can be easily reverse-engineered to source code. The static analysis techniques were mainly used on a broad scale, while also applying dynamic and hybrid analysis techniques to analyse source code. These methods are discussed in Section 4.

RQ2: What are existing Android source code vulnerability detection methods, and how to use them to prevent vulnerabilities? When detecting Android source code vulnerabilities, ML methods and some conventional methods were applied in various studies. Among them, applying ML methods has been popular in recent years among the research community. Therefore, many studies applied various ML methods. In contrast, few studies applied conventional non-ML-based methods. Detecting vulnerabilities alone is not sufficient when boosting the security of Android source code. Ways to prevent security issues by integrating the detection techniques into software development environments should also be studied. These detection and prevention methods are discussed in Section 5.

RQ3: Which tools and repositories can be used to detect vulnerabilities in Android apps? Exploring tools, repositories, and datasets that can be used to analyse source code and detect vulnerabilities is also essential. Identifying their characteristics and usage is beneficial for conducting new research studies. Therefore, those facts are discussed in Section 6.

1.2 Organization of the Review

The SLR is structured as presented in Figure 1 to answer the formulated research questions in three main sections.

Fig. 1.

The rest of this article is organised as follows: Background and related literature are discussed in Section 2, followed by a detailed description of the review methodology in Section 3. The experimental studies reviewed in this SLR are categorised into three main sections: application analysis, code vulnerability detection, supportive tools and repositories. As the initial step of vulnerability detection, applications should be analysed, and three types of analysis techniques (static analysis, dynamic analysis, and hybrid analysis) are used. Studies related to this application analysis are reviewed in Section 4. Code vulnerability detection-related studies are reviewed in Section 5, which discusses the ML-based methods and conventional methods. It further discusses how the existing prevention techniques can be integrated with the identified detection methods. Section 6 reviews supportive tools and repositories that can be used in application analysis, vulnerability detection, and prevention. Section 7 discusses the threats to the validity of the review. Section 8 concludes the article.

2 Background AND Related Literature Reviews

This section discusses the background related to the security of Android applications and the vulnerabilities associated with them. It provides a high-level overview of the Android layered architecture, its security implications, Android application vulnerabilities, and potential mistakes made by users and developers that can lead to security and privacy issues. It also explains the ML process, since the article focuses more on ML-based vulnerability detection mechanisms. Then, it would be helpful for all the readers to understand the content easily. Furthermore, the existing literature reviews are also discussed in this section.

2.1 Background

2.1.1 Layered Architecture and Security Implication of Android.

Android has a layered architecture that provides a systematic way to communicate with device components, software applications, and its users. Android OS is built on top of the Linux kernel [125]. It provides drivers and mechanisms for networking and manages virtual memory, device power, and security. On top of the Linux kernel layer, hardware abstraction layer, native C/C++ libraries layer, and Android runtime layer, Java Application Programming Interface (API) framework layer are stacked [12].

Each of these layers performs unique tasks while interacting with other layers. The study in Reference [125] provided a layered approach for Android application development, which uses the layered architecture. In this approach, a server interacts with the Hypertext Transfer Protocol (HTTP) layer, while the API layer interacts with the HTTP layer. The API layer interacts with the generic data layer, and that interacts with the platform-dependent data layer. The User Interface (UI) layer interacts with the platform-dependent layer. Many of the source code vulnerabilities can be identified in the top layer, which contains user and system apps, since the regular app developers mainly focus on that layer. However, the understanding of the layered architecture is beneficial to mitigating some of the vulnerabilities.

Android platform security has been defined with several rules in the Android security model [94]. They are multi-party consent, open eco-system access, security and compatibility requirements, factory restores the device to a safe state, and applications’ security principles. The study in Reference [165] identified three main security mechanisms, including (1) process sandbox, which is the sandbox environment of Android; (2) signature mechanism, which can digitally sign applications with the private key before being released; (3) permission mechanism, which defines the ability of an app to access protected APIs and resources. The sandbox environment of Android does not allow the use of one application resource by the other. Sandboxes are developed using Linux, and only they can access the core functionalities of the OS. Monitoring system calls and acknowledging them is a responsibility of the Sandbox [125], and they are used to prevent malicious applications that request access for system functionalities through vulnerable source code.

Mobile devices can easily be lost or stolen, connect with several networks on the go, and contain more privacy-related data, since they are close to the users [131]. Therefore, limiting to traditional security mechanisms will not be sufficient for mobile devices. Malicious actions including gaining physical access to the device, connecting with untrusted networks, installing and running untrusted applications, executing untested code blocks and contents are identified as some of the threats [94] that can occur in an Android mobile device. Thus, security measures should be tightened though mitigating vulnerabilities to safeguard the data in Android devices.

Therefore, implementing proper security measurements is a must to improve the security of Android devices and their applications. Hence, detecting vulnerabilities of Android applications and their source code should be performed when implementing the security best practices.

2.1.2 Vulnerabilities of Android Applications.

Many of the Android applications are freely available to download from app markets. Therefore, the usage of these free applications is very high. Hackers may try to penetrate these applications and grab user data on a massive scale or perform illegal activities if there are no proper security mechanisms in place in the application [130]. Therefore, the app developer should carefully check for proper security measures. Issues in Secure Sockets Layer (SSL), Transport Layer Security (TLS) commands, permissions, web views, key stores, fragments, encryptions, intents, intent filters, and leaks were identified as common causes for vulnerabilities of Android mobile apps [47]. Attacks from the Internet and Wireless Personal Area Networks (WPAN), and malware transmitted through personal computers can exploit these vulnerabilities. The study related to Android vulnerability [165], identified SSL/TLS protocol vulnerabilities, forged signature vulnerabilities, and some of the common vulnerabilities in Input/Output (I/O) operations, intents, permissions, web views.

There are several types of vulnerabilities categorised in CyBOK [101]. They are memory management vulnerabilities, structured output generation vulnerabilities, race condition vulnerabilities, API vulnerabilities, and side-channel vulnerabilities. Safe languages, spatial vulnerabilities, temporal vulnerabilities, code corruption attacks, control-flow hijack attacks, information leak attacks, data-only attacks were identified as memory management vulnerabilities. It has classified Structured Query Language (SQL) injections, command injection vulnerabilities, script injection vulnerabilities, stored injection vulnerabilities, high-order injection vulnerabilities as structured output generation vulnerabilities. It has listed concurrency bugs and time-of-check to time-of-use issues as race condition vulnerabilities. Correct use and implementation were also recognised as cases where API vulnerabilities can evolve while mentioning side-channel vulnerabilities as the other category of vulnerabilities. Under this software-based side channel, covert channels, micro-architectural effects, and fault injection attacks were considered. Some of these vulnerabilities are related to Android, and they can be mapped with Common Vulnerabilities and Exposures (CVE) [33] and Common Weakness Enumeration (CWE) [34], which is possible to use as the starting point to categorise vulnerable source code.

The study in Reference [49] identified 563 Android-related vulnerabilities, including gain privileges and information, memory corruption, Denial of Services (DoS), malicious code execution, overflow, and bypass security measures. Further, it has analysed trends of those vulnerabilities from 2009 to 2019, and the peak period of vulnerabilities was started in 2016. The empirical study conducted in Reference [82] discussed the types of Android-related vulnerabilities, the layers and subsystems of Android that could be affected by vulnerabilities, and the survivability of vulnerabilities. This study consists of 660 vulnerabilities from the CVE Details [33] and the official Android Security Bulletins [11]. It was identified that most of the vulnerabilities could happen from data processing issues, access controls, memory buffers, and improper input validation, primarily due to vulnerable code lines. Most of them can be reduced by following secure coding practices.

2.1.3 User and Developer Mistakes.

Mistakes can happen when using an application by the users or when developing applications by the developers. Sometimes users allow permissions when installing or running an application without knowing the exact need, leading to some vulnerabilities. Hackers and malicious app developers also try to steal data from users by forcing them to allow specific permissions. Authors in Reference [92] identified that most security issues in mobile applications occurred due to the actions of the users. These issues can be minimised if the applications are developed with less vulnerable source code.

Developers also make mistakes by not following an extensive testing and validation process from the initial stage of the app development life cycle. When developing mobile applications, many mobile app developers are still not focusing on writing secure codes even though there are mechanisms in place [100]. Due to these developer mistakes, some of the vulnerabilities have occurred. Instances such as mentioning unwanted permissions in the AndroidManifest.xml file can also lead to arise vulnerabilities of the mobile applications. If these permissions are dangerous level permissions, then users will have to grant those also, which can cause rejections of the app by some users [25]. These vulnerabilities may not be detected when publishing apps to Google Play, since Google Play itself could not thoroughly analyse the code of the mobile applications when publishing it, such as in Apple App-Store [51]. Therefore, proper source code vulnerability detection mechanisms should be integrated with coding.

2.1.4 Machine Learning Processes.

The use of ML methods for vulnerability detection has been increased in recent years [53]. Therefore, comprehending the ML processes is beneficial to understanding ML-based source code vulnerability detection studies.

The ML lifecycle includes data extraction, preprocessing, feature selection, model training, evaluation, and deployment steps [8] and ML consists of supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and deep learning. A labelled dataset is used to train the model in supervised learning to solve classification and regression problems. Algorithms such as Naive Bayes (NB), Logistic Regression (LR), Linear Regression, Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and k-Nearest Neighbors (kNN) can be applied for supervised learning. Unsupervised learning identifies hidden patterns in data using clustering, association and dimensionality reduction. A labelled dataset is not required to train the model. K-means clustering, Principal Component Analysis (PCA), and autoencoders are some methods that can be applied for unsupervised learning. A mix of supervised and unsupervised learning techniques is applied in semi-supervised learning and used in the case of limited labelled data in the used dataset. The model parameters are updated with the feedback from the environment in reinforcement learning where no training data is involved. This ML method proceeds as prediction and evaluation cycles. DL is defined as learning and improving by analysing algorithms independently, which consists of a higher or deeper number of processing layers. Convolutional Neural Network (CNN), Long Short Term Memory Network (LSTM), Recurrent Neural Network (RNN), Generative Adversarial Network (GAN), and Multilayer Perceptron (MLP) are some of the popular DL algorithms [21].

2.2 Related Literature Reviews

Previous reviews [1, 2, 38, 50, 72, 80, 86, 127, 132] discussed various security-related studies, including vulnerability detection methods for Android applications and the ways to prevent them. With the rapid focus on software security, most of these studies and experiments were conducted after 2015.

The security in the Android platform was studied in Reference [1]. It reviewed the studies on several threats to Android, such as information leakage, privilege escalation, repackaging apps, denial of service attacks, and colluding. It has also reviewed Crowdroid [24], Kirin [40], AndroSimilar [42], RiskRanker [55], RiskMon [67], FireDroid [123], Aurasium [161], DroidScope [162], RecDroid [166], and DroidRanger [168] methods. Those studies were reviewed with objectives such as the assessment, the source code analysis method (i.e., static or dynamic analysis), and detection mechanisms. However, there are limitations in this review, such as following an informal review approach and not covering comprehensive details on vulnerability detection and prevention mechanisms.

It is crucial to identify the duplicate codes in several places in a program. When identifying bugs in a program, code repetition places also should be identified as all the places should be revised. The study in Reference [2] reviewed various studies related to code clone detection. This study has systematically reviewed 54 studies under six categories: textual approaches, lexical approaches, tree-based approaches, metric-based approaches, semantic approaches, and hybrid approaches. Twenty-six clone detection tools were also identified in this review, and found that many of the tools and models can be applied in Java/C++ codes. However, since this review contained papers from 2013 to 2018, it is better to review the latest code clone detection methods proposed after 2018.

Authors in Reference [38] analysed 55 studies from 2015 to 2021 related to software vulnerability detection. The selected articles were grouped into several categories across various vulnerability detection evaluation criteria such as neural network, machine learning, static and dynamic analysis, code clone, classification models and frameworks. Based on the analysis, it has been identified that many researchers used machine learning strategies to detect vulnerability in software, since a large volume of data can be analysed easily with machine learning. Though some of the reviewed studies have overlaps with vulnerability detection of source code written in C and Java, a further review should be conducted specifically for Android source code vulnerability detection.

The study in Reference [50] reviewed Android security assessments, including trends and patterns of different analysis approaches, analysis techniques, code representation tools, and applicable frameworks by analysing about 200 studies from 2013 to 2020. It has also focused on privacy leaks, cryptographic issues, app cloning, permission misuse, code verification, malware detection, test case generation, and energy consumption. It discussed sensitivity analysis, data structures, and code representations in the reviewed literature under the static analysis techniques. Kernel level, application level, and emulator level inspections were also considered under taint analysis and anomaly-based approaches in dynamic analysis techniques. The review highlighted that many research studies were conducted related to Android vulnerabilities and leaks. Moreover, this study systematically reviewed several android assessments techniques and identified call graphs, control flow graphs, and inter-procedural control flow graphs as the used data structures. However, the studies related to preventing vulnerabilities were not discussed. Moreover, reviewing studies on non-ML-based methods to detect and prevent vulnerabilities is possible, since this review considered only ML methods.

Related studies on automated testing mechanisms of Android applications were systematically reviewed in Reference [72] from 2010 to 2016. This article discussed three functional testing types: black-box, white-box, and grey-box by analysing Android-related studies, including test-related objectives, targets, levels, and techniques, along with their validation depths. The considered test objectives were bugs, defects, compatibility, energy, performance, security, and concurrency. Under test targets, it considered inter-component communication, inter-application component, graphical user interface, and events. System, integration, and regression were considered for the test levels while testing types, testing environment, and testing methods were listed as used test techniques. Further, executing tests using emulators and real devices was also discussed. It examined testing methods including mutation, concolic, A/B, fuzzing, random, search-based, and model-based. This review considered frequently used essential tools such as AndroidRipper [9], Monkey [14], Silkuli [61], Robotium [119], EMMA [121], and Roboelectric [158]. Though it has reviewed Android app testing comprehensively till 2016, the recent studies are not considered.

The review in Reference [80] analysed 124 research studies from 2011 to 2015 intending to identify static analysis mechanisms for Android applications. It identified that static analysis was used in many research studies related to privacy and security, and taint analysis was the widely applied technique in those studies. According to that, Soot (a framework for analyzing, instrumenting, optimizing, transforming and visualizing Java and Android applications) [76], and Jimple (an intermediate representation that can simplify analysis and transformation of Java bytecode) [151] were the widely used tools and formats, and a few studies considered path-sensitivity. After analysing, this review identified that leaks and vulnerabilities are the primary concerns addressed by the other research studies. Moreover, this review found permission misuse, energy consumption, clone detection, test case generation, code verification, and cryptographic implementation issues. Some of the novel techniques, including ML-related studies, were not reviewed, since this review focused on research from 2011 to 2015.

The work in Reference [86] systematically reviewed DL-based Android malware defence by answering three primary research questions: (1) aspects of Android malware defences applied when using DL, (2) approaches developed for malware defences, and (3) emerging and potential research trends for DL-based Android malware defences. The review considered the technical studies from 2014 to November 2021. The review identified that while many of the reviewed studies mainly consider DL-based on Android malware detection, some defence approaches were based on non-DL-based methods. It has also been identified that static program analysis is widely used to collect features, and semantic features are frequently occurring. Moreover, it concluded that most of the approaches were performed as a supervised classification task. This review identified that many studies were conducted to detect malware, and other types of more detailed analyses on malicious apps are receiving increasing attention. Nevertheless, it was not comprehensively reviewed how the other types, such as malicious code detection and code vulnerability detection, can be performed.

The systematic review conducted in Reference [127] discussed ML and DL-based Android malware detection methods, along with a comparison of methods and their accuracies. This review analysed many studies from 2017 to 2021 and identified that static, dynamic, and hybrid analysis could be used with ML/DL models to detect malware. Furthermore, it identified that static analysis is the widely used technique in the reviewed studies. It has been found out that RF, SVM, NB, kNN, LSTM, and AdaBoost (AB) were the widely used ML/DL models in this context. Further to the malware detection method, this review briefly discussed Android software vulnerability identification. It reviewed the methods and techniques to identify source code vulnerabilities. The critiqued studies have identified that hybrid analysis techniques were widely used to identify Android source code vulnerabilities. The main focus of this review was Android malware detection using ML/DL. Therefore, it is still essential to review code vulnerability detection methods to a great extent.

Studies related to the Android security framework, its security mechanisms assessments, and mitigation strategies were reviewed in Reference [132]. Under the security mechanisms, user interfaces, file access, memory management, type safety, mobile carrier, application permissions, component encapsulation, and application signing were reviewed. It has reviewed the security analysis studies related to Android framework cornerstone layers, application-level permissions, installing applications, mobile web browsers, SQL injections, connectivity and communication, hardware, software updates, malware in Linux environment, and malware related to Java. Under mechanisms, studies related to anti-malware tools, firewalls, intrusion detection and prevention methods, access controls, permission management applications, encryption methods, and spam filters were reviewed. Though this review discussed the studies conducted under Android security by following an informal and non-systematic approach, it does not consider security issues such as API vulnerabilities, concurrency bugs, and the latest OS-related bugs due to the considered period in the review.

Though the existing reviews provide in-depth details of the related studies, reviews such as References [72, 80] did not cover the recent works conducted in this area. Reviews such as References [2, 132] did not thoroughly review the studies on Android-specific vulnerability detection using various experiments performed in source code analysis. Therefore, it is required to conduct a comprehensive review of recent studies related to Android source code vulnerability detection and prevention mechanisms. Table 1 summarises and compares the related reviews with this work’s contribution.

Table 1.

Paper	Focus of the Review	Period	Review Approach	Number of Reviewed Studies	Android Specific	Static Analysis	Dynamic Analysis	Hybrid Analysis	ML-based code vulnerability detection	Non-ML-based code vulnerability detection	Code vulnerability prevention	Supportive tools and repositories
Ahmed et al. [1]	Security in Android platform, associated threats, and malicious application growth	2010–2017	Informal	14	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	x	x	x	x
Ain et al. [2]	Code clone detection for vulnerabilities	2013–2018	Systematic search (Budgen [23] and Kitchenham [71] guidelines)	54	x	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	x	\(\checkmark\)	x	x
Eberendu et al. [38]	Various methods for detecting software vulnerabilities	2015–2021	Systematic search - (PRISMA model [109])	55	x	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	x	x	x
Grag et al. [50]	Android security assessments and application analysis methods	2013–2020	Systematic search (Kitchenham guidelines [71])	200	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	x	x	\(\checkmark\)
Kong et al. [72]	Automated testing mechanisms for Android	2010–2016	Systematic search (Kitchenham guidelines [71])	103	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	x	x	x	x	\(\checkmark\)
Li et al. [80]	Static analysis for Android apps	2011–2015	Systematic search (Kitchenham guidelines [71])	124	\(\checkmark\)	\(\checkmark\)	x	x	x	\(\checkmark\)	x	x
Liu et al. [86]	Malware and malicious code detection with DL	2014–2021	Systematic search (Kitchenham guidelines [71])	132	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	x	x	x
Senanayake et al. [127]	Malware detection and malicious code detection with ML	2017–2021	Systematic search (PRISMA model [109])	106	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	x	x	x
Shabtai et al. [132]	Android security framework and security assessment	2007–2009	Informal	42	\(\checkmark\)	x	x	x	x	\(\checkmark\)	x	x
This work	Android code vulnerability detection and prevention	2016–2022	Systematic search - (PRISMA model) [109])	118	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)

Table 1. Summary of Related Reviews

3 Methodology

The Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) model [109] was used in this review to report and analyse the research studies conducted in this domain. Based on the objective of this study, initially, the search strategy was defined to identify the conducted studies that can be used to answer the research questions. The database usage and the inclusion and exclusion criteria were also defined. The study selection, data extraction and synthesis were conducted as the next stage to identify studies aiming to answer the formulated research questions. Threats to the validity of this SLR, and the mechanism to reduce the bias and other factors that could have influenced the outcomes of this study, were also identified.

3.1 Search Strategy

As the initial step of the review process, existing literature reviews on Android malicious code detection and vulnerability detection were analysed to identify the research gap. Once the research gap was identified, the search string was used to extract and identify the technical studies related to the review’s focus.

The search strategy involves outlining the most relevant bibliographic sources and terms of search. This review used several top research repositories, including ACM Digital Library, IEEEXplore Digital Library, Science Direct, Web of Science, and Springer Link as primary sources to identify studies. The search string to browse through research repositories is (((“vulnerability detection”) OR (“source code vulnerability”) OR (“vulnerable code”) OR (“code vulnerability detection”) OR (“vulnerability analysis”) OR (“static analysis”) OR (“dynamic analysis”) OR (“hybrid analysis”) OR (“vulnerability dataset”)) AND (“android”) AND ((“machine learning”) OR (“deep learning”) OR (“formal methods”) OR (“heuristic methods”))).

A few years after the initial release of Android in 2008, security concerns were discussed with the increasing popularity of Android applications [135]. Methods to detect and prevent vulnerabilities by improving software security using ML and non-ML-based methods were also proposed in several studies. There was a boost in applying various techniques to improve application security in the past five years [85], and vulnerability detection in mobile applications using ML techniques-related trends increased from 2016 [53]. Hence, many researchers are involved in identifying novel ML-based methods to enhance software security. Considering these reasons, technical studies from 2016 to June 2022 were reviewed. The search results distribution across primary sources for each search term is listed in Table 2.

Table 2.

Search Term	ACM	IEEEXplore	Science Direct	Web of Science	Springer	Total Count
vulnerability detection	428	377	264	557	496	2,122
source code vulnerability	10	13	17	11	36	87
code vulnerability detection	9	10	8	12	20	59
vulnerability analysis	381	331	2,456	1,632	740	5,540
vulnerable code	282	99	88	103	201	773
static analysis	5,596	2,210	2,609	3,390	4,729	18,534
dynamic analysis	2,652	1,947	3,048	2,197	2,695	12,539
hybrid analysis	175	88	150	132	257	802
vulnerability dataset	22	10	24	10	41	107
android	12,482	8,942	5,331	9,942	12,618	49,315
machine learning	54,996	88,735	146,255	215,080	85,194	590,260
deep learning	24,316	72,742	66,147	127,016	48,450	338,671
formal methods	3,488	1,335	1,686	3,859	5,844	16,212
heuristic methods	760	1,271	3,261	1,635	3,445	10,372
Complete Search String	688	146	582	255	855	2,526

Table 2. Search Results Distribution across Primary Sources for Each Search Term

Google Scholar was also used as another source to identify research studies published in quality venues, since it can be used to identify research studies not published in primary repositories. “Android source code vulnerability detection” was used as the search query. The range of publication years was set to 2016–2022. Though the search results included about 17,500 records, only the top 200 (sorted by relevance) results for each year were considered, resulting in 1,400 studies.

3.2 Study Selection, Data Extraction, and Synthesis

Initially, through the research database search in the top research repositories, 2,526 research papers and from Google Scholar, 1,400 were identified. 3,112 were excluded from these 3,926 papers because of duplicate entries, and another 127 were excluded because they were not publicly available. 687 studies remained after the initial screening. Research repository search engines often list irrelevant results presence articles [23]. Therefore, the collected list of relevant studies was consolidate by manually going through all the papers and examining the title and abstract to ensure that they align with the focus of this review. After these steps, 119 studies were eligible, but 3 more articles were excluded due to data analysis and experiment issues in the given context, which remains 116 studies. The snowballing process [159] was also performed, considering all the references presented in the retrieved papers and evaluating all the papers referencing the retrieved ones, which resulted in 2 additional relevant papers. The same process was applied as for the retrieved papers. Once all these steps were performed, the remaining 118 articles were reviewed in this study. The results were cross-validated by performing peer-verification process by all the authors. Figure 2 shows a summary of the paper selection method for this systematic review.

Fig. 2.

4 Application Analysis

Applications or source code should be analysed as the first step in detecting vulnerabilities [63]. There are two analysis approaches: analysing the reverse-engineered source code of Android Application Packages (APKs) and analysing the source code simultaneously when writing the code. Most of the research studies have been conducted related to the reverse-engineering approach. However, the second approach has more advantages for the developers to detect code vulnerabilities at the early stages of development life cycle. Reviewing studies on both approaches is vital, since their methodologies have many overlaps.

It is required to extract features as the initial step of analysing the source code of a given application. These features can be extracted through three analysis techniques named as static, dynamic, and hybrid analysis [6, 32, 87]. Reverse-engineered APKs, application’s source code, or byte code can be analysed using static analysis. However, it is impossible to identify all the bugs and failures using static analysis only, since it does not cover the vulnerabilities that can occur during the execution time of the app. Features can be generated by executing applications in runtime when performing the dynamic analysis. The runtime behaviour is monitored while using specific input parameters. However, it is possible to crash the run-time environment due to severe vulnerabilities, and there are possibilities that some vulnerabilities remain undetected [115]. The hybrid analysis technique contains characteristics of both static and dynamic analysis techniques. Therefore, this approach can analyse the source code as well as run-time behaviour of the application [10].

4.1 Static Analysis

Native Android applications can be developed using Java or Kotlin, and Java is the widely used programming language for this purpose. Frameworks such as React Native and Xamarin can also be used to develop Android mobile applications [15]. However, these mobile applications also contain Extensible Markup Language (XML) files such as the Android Manifest, User Interface (UI) layouts, and resources among the application files. Therefore, it is required to identify the issues in both source code files and XML files. Static analysis can analyse both of these files without executing them. There are two static analysis methods: manifest analysis and code analysis. These two methods differ based on feature extraction. Some studies use either manifest analysis or code analysis, while a few use both [73].

The manifest analysis is a widely used static analysis method. It can extract package names, permissions, activities, services, intents, and providers from the AndroidManifest.xml file. The AndroidManifest file contains all the permissions used in a particular application, categorised as dangerous, signature, and normal. Twenty-two permissions have been identified as significant permissions in SigPID in Reference [78], and they were identified by developing a three-level data purring method. Those three levels were support-based permission ranking, permission mining with association rules, and permission ranking with a negative rate. The second method of the static analysis is code analysis, which considers about the source code files. Features such as API calls, information flow, taint tracking, native code, clear-text analysis, and opcodes can be extracted with code analysis. The MaMaDroid [107] method provides an example for API calls analysis. It abstracted apps’ API call executions to create regular classes or packages using static code analysis techniques and then determined the call graph using the Markov chain.

Authors in Reference [50] proposed five aspects for static analysis: analysis techniques, sensitivity analysis, code representation, data structures, and inspection level. The analysis techniques are symbolic execution, taint analysis, program slicing, abstract interpretation, type checking, and code instrumentation. Objects, contexts, fields, paths, and flows are considered for sensitivity analysis. Smali [57], Dex-Assembler [111], Jimple [151], Wala-IR [153], and Java Byte code/Java class are used for code representation while Call Graph, Control Flow Graph, Inter-Procedural Control Flow Graph are data structures. Kernels, applications, and emulators are considered for the inspection levels.

4.2 Dynamic Analysis

The second technique is dynamic analysis, which is used to analyse the application by executing it in a sandbox runtime environment. A completed product (i.e., APK) is required to apply this method. Therefore, it is widely used to detect vulnerabilities and malware in developed applications.

Five feature extraction techniques were identified for the dynamic analysis in Reference [50], and they were network traffic analysis, code instrumentation, system call analysis, system resources analysis, and user interaction analysis. Network-related features, process-related features, usage-related details, and component interactions were extracted using those five methods. Under the network-related features, uniform resource locators, internet protocols, network protocols, network certificates, and network traffic were considered while considering non-encrypted data, Java classes, intents, and system call for process-related features. Usage-related features such as processor, memory, battery, network, and process reports were also considered. Additionally, buttons, icons, actions, and events were considered for user interaction analysis features.

Authors in Reference [52] used dynamic analysis techniques to identify Android vulnerabilities. It contained three modules: network traces collection, network feature extraction, and network feature detection. Network activities of running apps were periodically recorded and monitored in the traces collection module. Feature extraction module extracted network features used in applications such as origin-destination-based features, domain-name system-based features, transmission control protocol-based features, and hypertext transfer protocol-based features and performed the vulnerability detection process.

4.3 Hybrid Analysis

The hybrid analysis uses both static and dynamic features to analyse a given application. The study in Reference [99] used static features, including permissions and intents and dynamic features, including IPs, emails, and URLs, to extract various information related to applications. The APKTool [150] was used to decompile the APK as the initial step. After extracting the data, it used disassembled dex files to create the feature vector for further analysis. The APK files were executed in an emulator to extract the behaviours of the dynamic features.

The model proposed in Reference [143] used a hybrid analysis to identify Android security vulnerabilities. It can analyse metadata and data flow using static analysis and API hooks and executable scripts using dynamic analysis. The static analysis technique of this work was able to identify eight vulnerable categories: unrestricted component, insecure JavaScript in WebView, sensitive data processed in plaintext, privacy leak by log, dynamically loading a file, insecure password, intent exposure, and Structured Query Language (SQL) inject. The dynamic analysis technique was able to identify the unverified inputs vulnerability category. However, it may fail if the app uses specific security measures such as signature verification. Hence, false-positive results can be expected sometimes. Nevertheless, the overall analysis can perform within 93 seconds on average with approximately 95% accuracy. Issues in SSL/TLS are also essential to identify, and they can be analysed using hybrid analysis. The DCDroid framework in Reference [154] used hybrid analysis techniques to identify them, and the study found that 360 out of 2,213 applications contained security issues related to SSL certificates/TLS.

5 Code Vulnerability Detection

Mobile applications can be misused to breach the security mechanisms [53] due to the source code vulnerabilities. However, developing applications with zero defects or vulnerabilities is impossible but can be achieved to a certain extent, and vulnerabilities of the source code must be detected to accomplish that. Several methods including machine learning, deep learning, heuristic-based methods, and formal methods can be applied to detect source code vulnerabilities with the use of static analysis, dynamic analysis, and hybrid analysis.

5.1 Machine Learning Methods

ML and DL methods such as NB, LR, DT, RF, GB, LSTM, RNN, and MLP were applied in the studies related to vulnerability detection. To train such ML or DL methods, features should be identified in the Android application by following a suitable analysis technique: static, dynamic, or hybrid.

5.1.1 Machine Learning with Static Analysis.

ML methods can be applied with static analysis techniques when detecting code vulnerabilities if the source code is formulated into a generalised form. Abstract Syntax Tree (AST) is a popular way of generalising the code [110]. The rate of false alarms on vulnerabilities depends on the accuracy of formulating the AST and its generalisation mechanism, in addition to the quality of features, selected dataset, and trained algorithms. Studies such as Reference [53] proved the possibility of employing ML and DL-based methods on a generalised architecture of source code such as AST to detect Android code vulnerabilities. Therefore, improvements to feature generation methods like AST building are identified as a research gap in this area to employ ML techniques to the problem.

Some studies applied static analysis techniques in various ways with ML methods to detect malicious code and vulnerabilities. The WaffleDetector [138] is a static analysis approach to detect malicious code and vulnerabilities in Android applications by using sensitive permissions, program features, and API calls. Extreme Learning Machine (ELM) was used in this for further analysis. In Reference [45], vulnerability detection and patching framework was proposed named as Vulvet. This framework used static analysis approaches to detect vulnerabilities in Android applications along with a multi-tier multi-pronged analysis technique. Further, this framework proposed an automated patch generation process for vulnerabilities. Augmented control-flow analysis and Android-specific component validation approaches were also proposed to avoid false positives. The Vulvet framework used some features in the Soot framework, such as data-flow analysis, call-graph analysis, intermediate code scanning, taint analysis, parameter analysis, API analysis, and return value analysis. It used vulnerability resolution, control-flow instrumentation, methods/parameters reconstruction, secure method call augmentation, manifest modification, and code elimination. This model can detect vulnerabilities with 95.23% precision and 0.975 F-Measure on 3,700 apps from the benchmark and other Android market places. It has also been identified that 10.46% of evaluated apps were vulnerable to various exploits. Though this is a comprehensive model, there are some limitations, such as not analysing and patching vulnerabilities in native code, not supporting Java reflecting and dynamic code loading, and marking all the files read from external storage as malicious, that need to be overcome with further studies.

Analysing data flow is also important to detect malicious code and applications. The study in Reference [163] proposed a mining method for topic-specific data flow signatures to characterise malicious Android apps. It identified that the topic-specific data flow signatures are much better than the overall data flow signatures to characterise malicious and vulnerable apps. Descriptions and sensitive data flow patterns were obtained from 3,691 benign and 1,612 malicious apps for analysis. Once the features were extracted, a topic model was built using adaptive Latent Dirichlet Allocation (LDA) with Genetic Algorithms (GA). An optimal number of topics was determined with GA. After that, a topic-specific data flow signature was generated by computing the information gain ratio of each piece of data flow information. Then the information gain ratio of the piece of data flow information was generated, and it was used to characterise the apps. Though this study considered a high number of apps, their representative is not considered, which might decrease the process accuracy. That limitation could be overcome by analysing more representative apps and ensuring the sample sizes of each topic.

Source code can be retrieved from the APK file or Portable Executable (PE) file to perform the static analysis. In Reference [70], an automated method to classify malicious codes and secure codes using PE structure. It used static analysis with RF, GB, DT, and CNN models and achieved 98.77% detection accuracy. The model built in Reference [20] was able to predict software code vulnerabilities before the release of the application. The code was represented using an AST to analyse it, and ML models were applied. It used Python, C, and C++ source codes taken from well-known datasets such as NIST [103], SAMATE [104], SATE IV Juliet Test Suites [105], and Draper VDISC [122] for the training purpose. One main drawback of this approach was the inability to locate the specific place of the vulnerable code segment.

The developed mechanism in Reference [31] classifies the C language functions into vulnerable or non-vulnerable using ML methods. The first step was to prepare the AST. After that, data pre-processing, feature extraction, feature selection, and classification tasks were performed by applying ML algorithms. This study used National Vulnerability Dataset (NVD) [103] to collect code blocks written in C language and their known vulnerabilities. Another automated vulnerability detection system was proposed in Reference [122], which uses C and C++ source codes. It used ML with deep feature representation learning and compared the findings with Bag of Words, RF, RNN, and CNN. It used existing datasets and the Drapper dataset [122], compiled using GitHub [124] and Drebin [16] repositories, which contain open source functions and carefully selected labels. Though these studies [20, 31, 70, 122] considered Python C or C++ codes, with methodically proven the possibilities of using these proposed approaches, to detect code vulnerabilities in Android source code written in Java, should be further studied by proving methodically.

5.1.2 Machine Learning with Dynamic Analysis.

Dynamic analysis techniques also can be applied to generated features to train ML models to detect vulnerabilities at the execution time. The study [91] discussed a dynamic analysis approach that used NB, K-Star, RF, DT, and Simple Logistic ML models to detect vulnerabilities and malicious applications. Features were extracted while executing the APKs in an emulator. Simple Logistic performed well with 0.997 precision and 0.996 recall in this model. However, some applications crashed when running in an emulator due to their dynamic behaviour. The used dataset requires fine-tuning to increase accuracy, since some shared permissions exist between malicious and benign applications, which might get incorrectly classified.

A dynamic analysis technique was used in Reference [160], and it discussed a code vulnerability detection mechanism by applying DL. It compared CNN, LSTM, CNN-LSTM and identified that CNN-LSTM has a detection accuracy of 83.6%. It has been identified that Deep Neural Networks (DNN) also can predict vulnerable source code. To classify the vulnerable classes with high precision, recall, and accuracy, the model proposed in Reference [112] can be used. This model was evaluated using Android apps written in Java. N-gram analysis and statistical feature selection to construct feature vector were performed in this model. Another study in Reference [59] discussed an ML-based vulnerability detection rules extraction method with dynamic analysis. The J48 ML algorithm performed with 96% accuracy compared with another 32 supervised ML algorithms considered in this study. A context-aware intrusion detection system was proposed in 6th Sense [134], and it used NB, Markov chain, and Logistic Model Tree (LMT) to detect vulnerabilities. This study observed changes in sensor-related data in the mobile device by integrating dynamic analysis methods. The model still requires some fine-tuning to the followed dynamic analysis approach to widen vulnerability detection:

The dynamic analysis-based method proposed in Reference [93] detect anomalies of system calls with ML by considering type, sequence, and frequency. It can detect Android security vulnerabilities by identifying benign and malicious apps. Further, this work has created system call time-series datasets used in vulnerable and regular applications. The Zygote process of Android, which handles the forking of new processes, was used together with Android Debug Bridge (ADB) to trace every new activity and its processes. Dynamic Register Broadcast Receiver, Electronic Code Book (ECB) block cypher, fragment injection, weak permissions, and privilege escalation were the common vulnerabilities in the selected application dataset. Finally, a consolidated dataset was created after transforming unstructured time-series data. It was used to perform the machine learning operations while computing precision, recall, and F-Score using kNN, LSTM, and the Genetic Algorithm LSTM. All three ML algorithms performed well with over 85% F-Score, while Genetic Algorithm LSTM performed slightly better than the others. Currently, this model can detect only nine vulnerabilities. Therefore, more vulnerabilities should be considered and verified by keeping the same level of accuracy by enhancing the model in subsequent studies.

5.1.3 Machine Learning with Hybrid Analysis.

The use of hybrid analysis with ML methods is also widespread, since the detection approach can be enhanced with both static and dynamic features. The study conducted in Reference [169] proposed an ML-based vulnerability detection mechanism, using hybrid analysis techniques and studied Android Intent mechanisms, along with the composition of Intents. Further, Android-Intent-related security detection was discussed by applying several ML algorithms such as DT, ID3, C4.5, NB, and AB. One hundred fifty applications were collected to test the model with Android Intent mechanism security vulnerabilities and another 150 applications without them and performed the training and testing. The average accuracy was 77% of the proposed model. Limitations such as fewer samples and low performance were identified as points to be further improved.

The study in Reference [48] proposed a parallel-classifier scheme for Android vulnerability detection. This study explained the possibilities of using distinctive parallel classifiers to detect zero-day malware and highly elusive vulnerabilities in Android with an accuracy of 98.27%. It has also identified some issues in static and dynamic analysis approaches such as inefficiency, code obfuscation, and similarity score issue of signature-based detection. This model extracts static features such as permissions, API calls, version, services, used libraries, broadcast receivers while extracting dynamic features such as system calls, network calls of the mobile applications. It proposed the best combination of most efficient ML algorithms, such as SVM, Pruning Rule-based Classification Tree (PART), MLP, and Ripple Down Rule Learner (RIDOR). While employing parallel classifiers, this method also considered upgrading the precision and recall when detecting malware or vulnerabilities. Based on the results of the initial part of the research, it was identified that the MLP performed better than the other classifiers with a 96.11% detection rate. The next part of the study was conducted using a composite model where the results from the initial part are executed in parallel to estimate the efficiency of the cumulative approach. Average probabilities, Product of probabilities, Maximum probabilities, and Majority vote, were considered ensemble techniques. As per the final results, MaxProb was the best parallel classifier. It is better to consider and create more parallel classifiers to increase the model’s accuracy while employing deep learning techniques.

Models such as Reference [66] studied the possibilities of employing ML algorithms with both static analysis and dynamic analysis to analyse the source code in a hybrid manner. Identifying malware and benign applications was the main focus of this by considering their vulnerabilities. After extracting from APK files using Androguard [44] tool, it converted Manifest data to a JSON file in static analysis approach. Then datasets from Kaggle [68] and MalGenome [167] were used to train the ML models such as LR, SVM, and kNN. Finally, another JSON file was prepared and identified the code vulnerabilities. After that, the APKs were dynamically analysed by executing them to find the vulnerabilities.

The model described in Reference [77] used a hybrid analysis mechanism to identify malware and vulnerabilities using ML models. This model had an accuracy of 80% in the static analysis approach and 60% accuracy in the dynamic analysis approach. According to the findings, combining both of these methods will increase detection accuracy. However, it is better to provide a methodical approach to prove the increase in accuracy when using the hybrid analysis. Another model in Reference [139] used a hybrid analysis mechanism to detect vulnerabilities and malware. It has proposed a Tree-Augmented Naive Bayesian Network (TAN)-based mechanism for this using features such as permissions, system and API calls. The output relationships were modelled as a TAN, and it used datasets such as AZ [7], Drebin [16], Android Malware Dataset (AMD) [74], and GitHub [124]. The model performed well with an accuracy of 97%. The main limitation of the study is considering only two features. It could be expanded with more features and train the dataset to get a more reliable outcome. The possibility of combining into one model rather than training separately can also be checked.

Though there are several proposed ML/DL-based methods, many of them lack the code vulnerability detection capability at the app development time. A summary of some useful ML/DL-based models used in Android code vulnerability detection is compared in Table 3. It summarises the methodology, analysis technique, used ML/DL methods or framework, used tools and dataset, and the overall model’s accuracy.

Table 3.

Study	Summary of the Methodology	Analysis Technique	Used ML/DL Methods/ Frameworks	Used Datasets/ Tools/Methods	Accuracy of the Model
Pang, et al., ICDLT Conf., 2017 [112]	Deep neural network algorithm is employed on features based on mining source code, which is generated with N-gram. The algorithm is based on rectified linear units trained with the stochastic gradient descent method and batch normalization.	Static Analysis	DNN	Dataset is generated by downloading APKs from F-Droid [58].	92.87%
Wu, et al., ICCC Conf., 2017 [160]	Function call collection of 9,872 sequences is collected as features to represent the patterns of binary programs during their execution. Then DL models are applied to predict the vulnerabilities of these binary programs based on the collected data.	Dynamic Analysis	CNN, LSTM, and CNN-LSTM	Dataset is generated from 9,872 sequence of function calls. VDiscover tool [56] is also used.	83.6%
Zhuo et al., ICISCE Conf., 2017 [169]	APKs are decompiled, and static analysis is performed on the manifest file to obtain the components/permissions. Then system status is obtained, and fuzzy testing is performed with dynamic analysis. Then ML algorithms are executed to detect intent-based security issues.	Hybrid Analysis	AB and DT	Dataset is generated by downloading 300 APKs from major app stores.	77%
Grag et al., Computers & Electrical Engineering, 2019 [48]	APKs are decompiled, and features are selected for static analysis. Then the APKs are executed in an emulator, and log files are generated from system calls for the dynamic analysis. Then the vector space is generated, and ML algorithms are executed as parallel classifiers.	Hybrid Analysis	MLP, SVM, PART, and RIDOR	Dataset is generated by downloading APKs from Google Play [64], Wandoujia [147], AMD [74], and Androzoo [4].	98.37%
Bulgin et al., IEEE Access, 2020 [20]	ML-based analysis is performed to differentiate vulnerable and non-vulnerable source code by extracting and then converting the AST of a given source code fragment into a numerical array representation while preserving structural and semantic information contained in the source code.	Hybrid Analysis	MLP and a customised model	The public Draper VDISC Dataset [122] is used. Proposed model is compared with code2vec [5] method.	70.1%
Gupta et al., System Assurance Eng. and Manag., 2021 [59]	The most efficient human-readable vulnerability detection rules are generated after selecting the best ML algorithm to detect Lawofdemeter, BeanMemberShouldSerialize, and LocalVariablecouldBeFinal vulnerabilities. The 10-fold cross-validation was performed, and analyzed the result with performance metrics.	Static Analysis	J48 and JRip	Dataset is generated from Android Universal Image Loader project [144] and JHotDraw project [46]. PMD tool is used to analyse the source code.	96%
Kim et al., MDPI Symmetry, 2021 [70]	PE data extraction module and the image generation module are used to generate input data for each module. Then each model individually judges whether it is malicious with ML algorithms by receiving images generated from the image generation module as input data.	Static Analysis	CNN	Dateset is generated from Windows portable program files and using Microsoft Malware Classification Challenge dataset [96].	98.77%

Table 3. ML/DL-based Android Vulnerability Detection Mechanisms

5.2 Conventional Methods

Conventional methods, including heuristic-based methods, formal methods, and other non-ML-based methods, also can be applied to detect code vulnerabilities with analysis techniques. These studies are discussed in this section.

5.2.1 Conventional Methods with Static Analysis.

Several studies have used static analysis with conventional methods to detect code vulnerabilities. A formal model to detect security issues in Android permission protocol in Alloy (a language based on first-order relational logic) was introduced in Reference [18] that automatically analysed and identified potential flows in the protocol with static analysis techniques. It identified three types of vulnerabilities in the protocol: URI permission vulnerability, improper delegation vulnerability, and custom permission vulnerability. This model can also cater to the dynamic permission process and identifies that widely used permission is signature-based. It has been identified that out of four content types in Android, the receiver has the highest frequency. An experimental study was also conducted to confirm the relationship between potential flaws and security vulnerability. It also assessed the scalability of the formal analysis approach. It is possible to apply this model to other mobile operating systems by conducting fewer configurations. By fine-tuning, the model can overcome the limitation of detecting a few vulnerabilities.

Another static analysis approach to identify vulnerabilities in an Android application was proposed as a vulnerability parser model proposed in Reference [28]. This architecture consists of APK decompressor, Manifest.xml parser, vulnerability vector, and DexParser sub components. APK decompression was carried out as the first step using a Python script. The Manifest file was parsed to decompress and decompile the APK file using the Manifest parser. This Manifest parser could parse the Manifest to an understandable format with security aspects. DEX parser is used to parse the decompressed source files. Under the vulnerability vector, file access and exported component vulnerabilities were identified. The detection results were categorised as critical, warning, notice, and advice. It is better if this model can be increased further by considering more vulnerable categories, since, at the moment, it is limited.

Vulnerabilities in third-party libraries also cause problems to the application. Therefore, it is essential to identify them, too. ATVHunter was proposed in Reference [164] for reliable version detection of third-party libraries. This model provided detailed information on vulnerabilities and libraries by pinpointing the vulnerability in library versions and extracting the control flow graphs and opcodes. A dataset contained 189,545 unique third-party libraries with 3,006,676 versions, and it contained 1180 common-vulnerable enumerations. Further, it created additional 224 security bugs to analyse this model. The steps included in the detection process were pre-processing, module decoupling, feature generation, library identification, and vulnerable libraries version identification. The ATVHunter model detects the vulnerabilities with 98.58% precision, 88.79% recall at the library level, 90.55% precision, and 87.16% recall at version level. Focusing only on Java libraries, using only static analysis, detecting only the known vulnerabilities, and using only free apps for the study are identified as limitations of this study that can be further improved.

Android web view objects can also lead to vulnerabilities. A way to detect them by following a static analysis approach named WebVSec framework was proposed in Reference [39]. This study mainly considered four types of vulnerabilities: Interface to Interface vulnerabilities, Interface to WebViewClient vulnerabilities, WebViewClient to WebViewClient vulnerabilities, and Reverse vulnerabilities. This framework was implemented on top of the Androguard tool. AndroZoo was used as the dataset to perform analysis. The WebVSec framework contains five main steps: decompilation, interface and WebWiewClient class identification, methods identification, method abstraction, and path analysis to identify the above four vulnerabilities. The experiments analysed 2,000 Android apps and detected 48 applications with the above four types of vulnerabilities. On average, this can analyse an application in 49 seconds. The framework still requires improvements, such as analysing the WebView vulnerabilities generated through Java codes, since it considers only JavaScript.

The DroidRA model proposed in Reference [79] designed and implemented an approach that aims at boosting existing static analysis for Android by taming reflection in apps. It can resolve the targets of reflective calls through a constraint solving mechanism by instrumenting Android apps to augment reflective calls with their explicit standard Java calls. The analysis was supported by three modules: Jimple pre-processing module, reflection analysis module, and booster module. The model considered a random set of 100 real-world apps that contain reflective calls and at least one sensitive data leak to validate the static analysis results. The main advantage of this model is the possibility of uncovering dangerous code such as sensitive data leaks and sensitive API calls, which is not visible in other static analysis-based analysis mechanisms. However, the single entry-point method may not cover all the reflective calls as identified as a limitation that should be further explored. Applying these boosting mechanisms to other static analysis techniques used in Android vulnerability detection approaches would be interesting for further studies.

5.2.2 Conventional Methods with Dynamic Analysis.

Few studies considered dynamic analysis also with the conventional methods. The study in Reference [116] discussed Android app vulnerability detection, and it was inspired by a case study of web functions’ vulnerabilities. Android app categories including browsers, shopping, and finance were investigated for security by downloading and examining 6,177 apps. It analysed four vulnerabilities: Alibaba Cloud OSS credential disclosure vulnerability, improper certificate validation, Web-View remote code execution vulnerability, Web-View bypass certificate validation vulnerability (from China National Vulnerability Database [27], CVE list [33], and CWE list [34]). A heuristic vulnerability search algorithm was used in a proposed method named VulArcher to verify the accuracy of the analysis. All sensitive APIs and methods that may cause vulnerabilities in the app, a collection of rules for vulnerability fixes, and a set of rules that the vulnerability triggers were the inputs for this algorithm, and it provides detailed code snippets of the vulnerability and the path where vulnerabilities are located as the output. The possibility of detecting vulnerabilities on both packed and unpacked apps was an essential feature of the proposed model, which contains decompilation, packer identification, unpacking (if packed), building taint path, and detection steps. This model can perform with high average accuracy with a detection rate of 91% with high efficiency, low computing cost, and high scalability. Some limitations identified in this were the usage of an old dataset and integrating third-party tools, which can be revised to get higher accuracy when detecting newer vulnerabilities.

Another dynamic analysis-based Android vulnerability detection tool was proposed in Reference [165] named VScanner, which can detect all known system-level vulnerabilities. The framework of this tool was based on a scalable Lua script engine, a lightweight scripting language. Exploiting was used for dynamic detection and feature matching for static detection in the VScanner. It can detect vulnerabilities with a high efficiency and a low false alarm rate (nearly 100% detection accuracy) using 18 implemented plugins. Due to the high scalability of the proposed system, it was easy to add new vulnerability triggers. Once a vulnerability is triggered via an API call, code execution, or database exploit, a feature matching database will be used with scan components (information collection and feedback) in the Lua engine and provide reports and logs. This research has proposed a vulnerability taxonomy by Proof of Concept and Attack Surface (POCAS), since existing taxonomies are still immature, specific to Android. In POCAS, vulnerabilities were divided into native layer vulnerabilities (i.e., memory corruption, permission management, kernel escalation, input validation) and Java layer vulnerabilities (i.e., component exposure, file management, information disclosure, logic error). The model was applied for two case studies, which were FakeSMS and CVE-2014-1484 in National Vulnerability Database [103]. VScanner was tested in 15 Google simulators, 5 Android smartphones, 8 Genymotion emulators, and 7 third-party customised Android systems and provided high accuracy and efficient results. Increasing the number of plugins used for vulnerability detection and optimising the structures to enhance efficiency can improve the quality of the proposed framework.

5.2.3 Conventional Methods with Hybrid Analysis.

Several conventional methods used hybrid analysis techniques to detect vulnerabilities. The empirical study conducted in Reference [133] detected eight common vulnerabilities in Android with hybrid analysis from randomly selected 29 apps in the EATL app store [37] and 6 apps from the Google Play store. Those eight common vulnerabilities were related to storage access, web views, SQLite database encryption, intents, advertisement module, outdated or sensitive APIs, short messages, phone calls, and Android debug mode. The study chose three quality tools: AndroBugs [81], SandDroid [106], and Qark [84] to test and uncover those vulnerabilities. This study further discussed the countermeasures for those vulnerabilities, such as using web views more securely, keeping essential files and backups in the internal storage instead of external storage, and turning the debug mode off when releasing the apps. This study could consider further analysis on apps and more vulnerabilities by increasing the sample size.

The application vulnerability mining method proposed in Reference [29] uses a hybrid approach, first performing static analysis and then following it up with dynamic analysis. This model improved the mining accuracy by using fuzzy dynamic testing technology with static analysis while performing reverse analysis on the application. In the static analysis, APK files were de-compiled to get the source files using Dex2Jar and JD-GUI tools and libraries [111]. Then the feature vector of API functions, privileges, components, and library files was created by the feature extraction process. The scan engine consisted of data flow analysis, regular expressions matching, and file detection using a vulnerability rule base to get the analysis results. Fuzzy testing was used to perform dynamic analysis in a natural machine environment with taint analysis. It was conducted after the static analysis by executing the application with test cases, semi-effective data, execution data, taint tracking, and monitoring the exceptions. This model was able to detect vulnerabilities with an over 95% detection rate that can be optimised further by expanding the number of detectable vulnerabilities by enhancing the analysis techniques.

Another hybrid analysis-based approach was proposed in AndroSheild [10], which focused on building a hybrid analysis approach to detect vulnerabilities in Android applications. This model was evaluated against various applications for various security flaws. It can also detect information leaks, insecure network requests, and commonly detectable flaws that can harm users, such as intent crashes and exported Android components. The proposed model contains three-layer architecture (application, presentation, data) with a methodology of APK reverse-engineering, manifest file decoding, meta-data extracting, static analysis performing, dynamic analysis performing, and report generating. It also can generate a detailed report with the overall application risk level and the identified vulnerabilities in it. Some limitations identified in this publicly available framework are not detecting deprecated and vulnerable libraries, not analysing native libraries, and not applying the model to apps written in other programming languages such as Kotlin.

A summary of studies related to the conventional models used in vulnerability detection methods is compared in Table 4. It summarises the considered vulnerabilities, findings/capabilities, limitations, used datasets, used tools, and used methods of these works.

Table 4.

Study	Considered Vulnerabilities	Findings/Capabilities	Limitations	Used Datasets/ Tools/Methods
DroidRA, ISSTA Conf., 2016 [79]	Vulnerabilities from sensitive data leaks and API calls	A large portion of Android apps rely on reflective calls, and they are usually used with some common patterns.	May not be possible to uncover all reflective calls due to the used single entry-point method.	Google Play [64], AndroZoo [4]
Shezan, et al., NSysS Conf., 2017 [133]	Vulnerabilities in storage access, web views, SQLite database encryption, intents, advertisement module analysis, outdated or sensitive APIs, short messages and phone calls, Android debug mode.	It discusses the countermeasures for the detected vulnerabilities.	Sample size is limited.	AndroBugs [81], SandDroid [106], Qark [84]
VScanner, NSS Conf., 2017 [165]	Vulnerabilities in native layer and Java layer	It can detect all known system-level vulnerabilities. It also proposes a vulnerability taxonomy by Proof of Concept and Attack Surface.	Using a limited number of plugins for vulnerability detection.	Lua Scripts Engine
Bagheri, et al., Formal Aspects of Computing, 2018 [18]	URI permissions, Improper delegations, Custom permissions	It identifies signature-based permission as the widely used permission, and the receiver has the highest frequency out of four content types of Android.	Only a few vulnerabilities can be detected.	Alloy [65]
AndroShield, MDPI Information, 2019 [10]	Information leaks, Insecure network requests, Intent crashes	It generates a detailed report with the overall application risk level and the identified vulnerabilities.	Unable to detect vulnerable libraries, and unable to analyse native libraries.	ApkAnalyzer [13], FlowDroid [17]
Qin, et al., IEEE Access, 2020 [116]	Alibaba Cloud OSS credential disclosure, Improper certificate validation, Web-View remote code execution, Web-View bypass certificate validation	It can detect vulnerabilities in packed and unpacked apps with a low computing cost. The average accuracy, detection rate, efficiency, and scalability are also high.	Using an old APK set and integrating third-party tools.	APK dataset downloaded from Wandoujia [147], Qihoo 360app [126], and Huawei [62] App Stores
DCDroid, Journal of Systems and Software, 2020 [154]	SSL/TLS Certificate vulnerabilities	It identifies the potential security risks of apps in implementing SSL/TLS with static analysis and identifies the vulnerability status of apps to man-in-the-middle attacks and phishing attacks.	Unable to verify the apps with complex method implementations, leading to false-negative results.	APK dataset downloaded from Qihoo 360app [126] app market and Google Play [64]
WebVSec, Computers & Security, 2021 [39]	Interface-Interface, Interface-WebViewClient, WebViewClient-WebViewClient, Reverse vulnerability	It can analyse an application within 49 seconds.	Unable to analyse the WebView vulnerabilities generated other than through JavaScript.	BabelView [120]

Table 4. Conventional Methods of Android Vulnerability Detection

5.3 Prevention Techniques

Preventing code vulnerabilities at the early stages of app development is more advantageous than detecting them once the app has been developed. Therefore, prevention techniques can be integrated as frameworks, tools, and plugins to the development environments as additional support to the app developers with automated vulnerability detection methods. The analysis of experimental results in Reference [141] identified the need for automated code vulnerability detection support when developing secure applications to perform well. Android developers had to play the role of participants, and they had to propose an appropriate fix to given vulnerable code samples, such as SQL injections, encryption issues, and hard-coded credentials. Moreover, the stitch-in-time mechanism proposed in Reference [100] described vulnerability detection methods in Android apps at the development time. Developers can enter source code and continue the development process while the model checks for known security-oriented issues. If there are such issues, then developers are informed accordingly. Therefore, developers get the benefit of developing less-vulnerable source code. However, this method uses only known vulnerabilities. Therefore, the ML/DL-based method could be applied to adapt to the changing nature of source code-related issues. The model could be modified further to learn from user mistakes and bugs.

It is always better to inform the app developer about the severity level of identified vulnerabilities in addition to raising an alert. Android Lint is a valuable method to discover vulnerabilities using static analysis in given Android source code [54]. It can detect 339 security, performance, correctness, usability, internationalisation, and accessibility issues. Android Lint uses either an AST or an Universal AST generated through source code. There are other Linters also available such as Infer, PMD, FindBugs, CheckStyle, Detekt, and Ktlint, as discussed in Reference [60]. The OASSIS study introduced a method to prioritise warnings generated from Android Lint with static analysis. This method used app user reviews and sentiment analysis to identify app issues. Because of the prioritised warnings, developers can take action accordingly to fix the vulnerability issues. A way to indicate this to the developer is using a similar approach like in Reference [156].

The model in Reference [88] proposed a mechanism to integrate static analysis with development environments named as MagpieBridge. Though it is possible to integrate this plugin with code editors such as Eclipse, IntelliJ, PyCharm, Jupyter, and Submile text, integrating with Android studio was not discussed. Meanwhile, the DevKnox plugin [75] for Android Studio can detect and resolve security issues while writing codes to develop Android applications. FixDroid [100] can be used to get security-oriented suggestions and fixes to overcome vulnerabilities when developing Android applications. It also can be integrated with Android studio, and it can be improved further by integrating ML to provide suggestions.

Another framework proposed in Reference [117] guided the app developers to detect, prioritise and mitigate vulnerabilities using secure development guidelines, and named as SOURCERER, and it used static analysis techniques. When the framework is applied, developers can get to know a concise list of vulnerabilities. It has a three-phase process: asset identification, vulnerability to asset mapping, and mitigation. The authors tested this framework with 36 Android financial apps, and there were three developers involved in this experiment. Based on the findings, when using this framework, developers spent an average of 15, 30, and 20 minutes for asset identification, vulnerability detection and prioritisation, and finding mitigations, respectively. This framework did not complicate the security testing process of Android apps. Issues such as the limited number of sample apps, limited developer involvement in the experiments, and developers’ prior knowledge could affect the performance of this framework. Some of these limitations could be overcome by proposing an automated process.

The VuRLE tool [89] can be used to detect and repair vulnerabilities in source code automatically. It assists developers in dealing with various vulnerabilities. Initially, the model was trained and clustered similar edit blocks into groups using a training set of repair examples. Repair templates were generated for each group and used to identify vulnerable groups by applying transformative edits. Traversal of a generated AST was used in this together with 10-fold cross-validation. This model repaired 101 out of 183 detected vulnerabilities from 48 real-world apps (Android, web, word-processing, and multimedia apps) written in Java. However, some of the vulnerabilities were unable to repair due to unsuccessful placeholder resolution, lack of repair examples, and partial repairs. The low repair rate of this, which was 65.69%, can be increased by having a well-trained model with more vulnerable code samples.

5.4 Discussion on Vulnerability Detection Methods

Based on the reviewed studies, it is identified that 51% of studies used static analysis as the application analysis method, and 35% of studies used hybrid analysis. The rest of 14% used the dynamic analysis method. This is illustrated in Figure 3. The increased usage of static analysis may be due to its advantages for code-level analysis approaches, since they focus more on code features. Apart from that, the cost involved in static analysis is lower when compared with the other two methods. Dynamic analysis requires additional resources such as emulators or real devices to run the source code, and it will not be possible to uncover many vulnerabilities as in static analysis. Moreover, the need for APKs or compilable packages could be another reason for having fewer studies conducted for vulnerability detection with dynamic analysis. In terms of percentage, hybrid analysis is in the middle, since it has the characteristics of the other two analysis types.

Fig. 3.

Based on the reviewed studies, applying machine learning-based methods were higher than conventional methods. This is illustrated in Figure 4. Before 2016, conventional methods were popular among the research community compared to ML methods. However, with the boost in ML techniques, researchers tried to apply ML methods to solve problems [53]. Therefore, due to this popularity, the ability to provide high accuracy results, ease of handling complex problems, and scalability are suspected as the reasons for the high usage of these ML/DL methods in studies on Android vulnerabilities detection in the considered period of the review.

Fig. 4.

Many code vulnerability detection studies used the code analysis method as the feature extraction method. The Manifest analysis and the system call analysis methods are the other widely used methods. Figure 5 illustrates those feature extraction methods used in the reviewed studies. It is possible to detect many vulnerabilities by analysing source codes rather than analysing permissions or other features. That may be the reason for the highest usage of code analysis. Using manifest analysis can also identify vulnerabilities to a certain extent, such as the type of permissions used in applications. The vulnerabilities can be detected based on the required permissions of the application, such as the requirement of the dangerous level permissions. That may be the reason for having a somewhat high number of studies conducted. A considerable number of studies use system call analysis, since it is possible to detect vulnerabilities to a certain level by analysing the system calls. Code instrumentation, system resources analysis, and network analysis were used in a limited number of studies, since it is not easy to detect vulnerabilities by analysing them.

Fig. 5.

API calls were the most widely extracted feature for vulnerability analysis and detection in Android source code. Figure 6 illustrates that and the other extracted features in the reviewed studies. Many static and hybrid analysis methods extracted API calls as the feature to perform the analysis. A thorough understanding of vulnerabilities can be received by analysing API calls, which could explain the high usage of this as the extracted feature. Permission is the second prominent feature that was extracted in the reviewed studies. When conducting manifest analysis, permission is the main feature to be extracted. Since the manifest analysis was also widely used, it could be the reason for this high extraction rate. System calls are also a highly extracted feature in those studies, since they can detect many vulnerabilities by analysing them. Native code or opcodes, intents, network traffic, activities, and services were also extracted as features, but their usage is not highly visible.

Fig. 6.

It is identified that very few studies considered prevention mechanisms which are supported by the detection techniques as tools and plugins for Android source code vulnerability mitigation. Many of the studies considered only the detection as illustrated in Figure 7. It is valuable for the Android application developers if proper mechanisms are available that use various advanced techniques to prevent vulnerabilities. Therefore, as a finding of this review, the need to build such a prevention mechanism is recognised.

Fig. 7.

6 Supportive Tools AND Repositories

Application and source code analysis tools and frameworks are beneficial to perform several analysis processes. Once the analysis is completed, vulnerability detection tools can be applied to identify the vulnerable source code. Furthermore, it is essential to identify existing datasets and repositories to build ML models.

6.1 Tools

The surveys and interviews conducted related to intervention for long-term software security in Reference [157] have identified the importance of having an automated code analysis tool to recognise vulnerabilities. Reference [118] considered six main characteristics to compare these security analysis tools. They are (1) tools vs. framework, (2) free vs. commercial, (3) maintained vs. unmaintained, (4) vulnerability detection vs. malicious behaviour detection, (5) static analysis vs. dynamic analysis, and (6) local vs. remote. This study compared 64 solutions and considered supported Android versions, multiple operational modes, supported API levels, only applicable categories of vulnerabilities, the existence of vulnerabilities, and provided inputs as required by the tools. Another study in Reference [118] reviewed 64 tools and empirically evaluated 14 vulnerability detection tools for 42 known vulnerabilities identified in Reference [97] and found out that only 30 vulnerabilities from those 42 can be detected. These 42 known vulnerabilities were categorised into seven, which are, (1) Crypt - 4 vulnerabilities, (2) Inter-Component Communication (ICC) - 16 vulnerabilities, (3) networking - 2 vulnerabilities, (4) permission - 1 vulnerability, (5) storage - 6 vulnerabilities, (6) system - 4 vulnerabilities, and (7) web - 9 vulnerabilities. This study used AndroZoo [4] as the source of real-world Android apps, which contains around 5.8 million APKs.

The empirical analysis conducted in Reference [3] identified the static software metrics’ correlation and the most informative metrics that can be used to find code vulnerability related to Android source codes. The AndRev tool proposed in Reference [113] extracted the permissions using static analysis by reverse-engineering APKs with a batch-scripted tool. Extracted features are stored in a feature vector, and it was analysed to identify patterns of the permission by considering the app category. This tool tried to remove unwanted permissions for the app by reverse-engineering and rebuilding. Security analysis was also performed to identify the vulnerabilities by integrating a tool named Quixxi. According to this study, there are more medium-risk vulnerabilities than low and high-risk vulnerabilities. The tool’s accuracy can be further validated using a large dataset, as this study has used a limited dataset of 50 apps to perform the initial analysis.

Some of the tools and frameworks that can be executed in local machines to analyse applications to analyse source code and to detect vulnerabilities in Android are compared in Table 5. It compares the tool’s capabilities, limitations, analysis method, and usage.

Table 5.

Framework Name/Tool	Year	Capabilities	Limitations	Analysis Technique	Usage
FlowDroid [17]	2014	Ability to statically compute data flows.	Assumes that the entire contents remain tainted, even if an untainted value overwrites the single array element.	Static Analysis	Academic
COVERT [19]	2015	Ability to perform compositional analysis of inter-app vulnerabilities.	Unable to identify native code-related vulnerabilities and Permission leakages.	Static Analysis	Academic
DIALDroid [22]	2017	Ability to identify privilege escalations and inter-app collusion [69].	Unable to resolve reflective calls if their arguments do not contain string constants and may fail to compute some ICC links due to ignoring over-approximated regular expressions.	Static Analysis	Academic
HornDroid [26]	2016	Ability to perform static analysis of information flows and ability to soundly abstract the semantic of Android apps to compose security properties.	Over-approximates the life-cycle of fragments by executing all the fragments along with the containing activity in a flow-insensitive way, which might lead to precision problems in real apps.	Static Analysis	Academic
MalloDroid [41]	2012	Ability to identify broken SSL certification validation using Androguard framework [44].	The analysis might fail if the app is obfuscated and cannot test the entire workflow.	Static Analysis	Academic
JAADAS [43]	2016	Ability to analyse API misuse, inter-procedure style taint flows, local-denial-of-services, and intent crashes.	May crash when analysing obfuscated apps and JSON output file is not visualising the scale of potential problems.	Static Analysis	Industrial
DevKnox [75]	2016	Ability to detect and resolve security issues while writing codes.	Unable to detect novel vulnerabilities and does not support for latest Android environments.	Static Analysis	Industrial
AndroBugs [81]	2015	Ability to find potential Android security vulnerabilities and check the code for security best practices and dangerous shell commands.	Unable to provide a complete and detailed description to help solve any potential security issues.	Static Analysis	Industrial
MARVIN [83]	2015	Ability to assess the maliciousness of previously unknown apps with ML techniques and creates an accurate snapshot of malware behaviour that it can leverage to assess the risk associated with apps.	Unable to intercept apps when downloading from marketplaces; the apps need to be submitted to the MARVIN manually.	Static Analysis	Industrial
QARK [84]	2015	Ability to find security-related vulnerabilities in Android applications either in APKs or source code.	Unable to analyse heavily obfuscated apps and requires high CPU consumption when decompiling.	Static Analysis	Industrial
FixDroid [100]	2017	Ability to provide security-oriented suggestions and fixes to overcome vulnerabilities.	Relying on a relatively small sample set and not focusing on improving data flow analysis except leveraging the existing features of IntelliJ IDEA.	Static Analysis	Academic
MobSF [108]	2015	Ability to perform static analysis, hybrid analysis, penetration testing, and provides a REST API for integration with development environments.	Unable to perform API testing and some issues occur in the emulator when executing apps in hybrid analysis.	Hybrid Analysis	Industrial
APKTool [150]	2010	Ability to decompile the APK with the support of a static analyser [95] and ability to reverse-engineer Android apps by decoding to nearly original form and rebuild the app after performing modifications.	Fails to decompile and analyse heavily obfuscated APKs.	Static Analysis	Industrial
Amandroid [155]	2014	Ability to analyse the inter-component data flow for security vetting.	Unable to detect security issues where exceptions can occur and unable to handle reflections and concurrency.	Static Analysis	Academic

Table 5. Tools and Frameworks for Android Application Analysis and Vulnerability Detection

6.2 Repositories and Datasets

Datasets and Repositories are useful to perform various ML or conventional vulnerability detection methods. There are several datasets, such as Drebin [16], Google Play [64], AndroZoo [4], AppChina [146], Tencent [148], YingYongBao [140], Contagio [149], Genome/MalGenome [167], VirusShare [152], IntelSecurity/MacAfee [137], MassVet [30], AMD [74], APKPure [145], Android Permission Dataset [35], Andrototal [90], Wandoujia [147], Kaggle [68], CICMaldroid [102], AZ [7], and Github [124], that can be used to perform these experiments.

In Reference [97], an open-source repository of benchmarks called as Ghera was introduced. It captured 25 known vulnerabilities in Android apps. Further, this has also presented some common characteristics of vulnerability benchmarks and repositories. The main reason for this research is to find Android-specific vulnerability benchmarks to evaluate available tools that can help the app developers. It identified that there were neither test suites nor benchmarks to evaluate the vulnerability detection techniques reasonably. Many of them used the regular data and apps available in Google Play. During the retrospection stage, 11 characteristics were identified as vulnerability benchmark characteristics. They were (1) tool and technique agnostic, (2) authentic, (3) feature specific, (4) contextual, (5) ready to use, (6) easy to use, (7) version-specific, (8) well documented, (9) contain both vulnerability and a corresponding exploit, (10) open to the community, and (11) comprehensive. This repository contains information on the Android Framework’s inter-component communication, storage, system, and web vulnerabilities. However, in Ghera, vulnerabilities related to networking, sensors in the Android Framework were not covered. Nevertheless, none of them provided a benchmark dataset specifically for Android source code vulnerability detection. Therefore, it is better to cover more areas while expanding the repositories with more real-world apps.

The work in Reference [49] identified the CVE details [33] as one of the sources for data analysis that provides details on vulnerability statistics on products, versions, and vendors. CVE details were prepared using the National Vulnerability Database (NVD) [103]. Mean impact scores of vulnerabilities and the number of instances were considered when assessing vulnerabilities. A dataset to fix open-source software code vulnerabilities by identifying security-related commits in a given source code was introduced in Reference [114]. This dataset was prepared by manually curating. The study in Reference [98] created a repository named AndroVul, which contains Android security vulnerabilities. It includes high-risk shell command vulnerabilities, security code smells, and dangerous permissions. An Android code vulnerability dataset was proposed in Reference [128], named LVDAndro [129], which contains vulnerable and non-vulnerable source code with their CWE [34] details. The proof-of-concept of this work identified the applicability of the dataset to train ML models to detect Android code vulnerabilities.

For the source code analysis purpose, 15 tools and 5 tools that can detect vulnerabilities in Android were identified. The review also identified that 19 datasets and repositories are available for source code analysis and vulnerability detection. However, the majority of these datasets and repositories are not purely based on Android source code vulnerabilities. Therefore, the need to have a proper dataset is one key finding in this SLR. Furthermore, using such datasets, it is also important to consider developing new source code vulnerability detection tools for Android, since very few comprehensive tools are available to detect vulnerabilities.

7 Threats to Validity of the Review

Even though this systematic review was conducted by following a well-established methodology [109], results are not guaranteed to cover all relevant studies due to some limitations during the review process. Thus, this section discusses possible threats to the validity and measures taken to minimise them under construct, internal, external, and conclusion validity.

7.1 Construct Validity

Threats to construct validity can arise because of search term-based queries performed on repositories. There can still be good papers that were not reviewed in this study, since they were not available in the research repositories, including ACM Digital Library, IEEEXplore Digital Library, Science Direct, Web of Science, and Springer Link. Google Scholar was used as another source to capture such missing studies to minimise this. However, some relevant publications may still be missing in the collected studies. The additional aspect of the construct validity is the possibility of having a few errors in the process, filtering out the studies by inclusion or exclusion criteria. The list of publications was analysed by cross-checking the primary studies to avoid these errors.

7.2 Internal Validity

Internal validity is related to data extraction and analysis, primarily related to the soundness of the proposed review process. There was a heavy workload in data extraction and data analysis. Hence, data was also collected by cross-checking and obtained after all the authors agreed on the comparison results. However, it is still possible to make a few mistakes in extracting and analysing data. If possible, original authors could be involved in verifying to reduce mistakes.

7.3 External Validity

External validity is about the summary of the results obtained from the primary studies. The analysis based on this review was performed on the research publications collected from 2016 to June 2022 to cover Android code vulnerability detection methods to date. ML techniques for vulnerability detection have increased significantly during this period due to recent advances in software security and artificial intelligence. The trends may also vary for different time period. Therefore, this work may not capture some comprehensive studies conducted before the period.

7.4 Conclusion Validity

The bias and the other factors affecting the review study were tried to minimise when searching for the papers. Research papers written only in English were considered. Because of this limitation, this work may have overlooked some crucial works written in other languages such as Chinese, German, and Spanish. Moreover, the threat may also occur due to consideration of studies with individual reviewers’ bias. It may lead to flaws and biases in this study. Furthermore, positive results are more likely to be reported than negative results [71]. However, many papers that reported negative effects were also captured in this study, since a peer-verified systematic review process was followed. Cross-checking mechanism was also applied to maintain the focus of the SLR, where a thorough examination was done when reviewing the papers. All the authors were constantly involved in the study selection and reviewing process to ensure this.

8 Conclusion AND Future Works

Mobile app developers continuously develop Android applications to fulfil the need of the rapid demand. When developing these applications, the security concepts also should be adequately addressed. Several vulnerability detection methods could be applied to do that. Based on the available literature, this systematic review of the state-of-the-art Android source code vulnerability detection techniques covered the latest research from 2016 to June 2022. It discussed three steps to increase the security of an Android application by considering analysing, detecting, and preventing vulnerabilities. Applications and code analysis techniques, static analysis, dynamic analysis, and hybrid analysis and the tools used were reviewed in this, along with ML/DL and conventional methods applied to detect vulnerabilities. Possible prevention mechanisms were also discussed. The work identified the potential gaps in previous research and possible future research directions to enhance the security of Android OS.

After conducting the review, static analysis was identified as the widely used technique to detect Android vulnerabilities, and the code analysis technique is more useful when performing it. Another critical finding is that API calls, permissions, and system calls were the widely extracted features in feature extraction. It was also identified ML/DL-based techniques are widely used to detect vulnerabilities. Though a couple of malware-related datasets were identified, a properly labelled dataset on Android source code vulnerability is also required. Once such a dataset is introduced, it can be used for further experiments to train ML models to detect and predict code vulnerabilities with high accuracy. A comprehensive code analysis mechanism that can identify security issues at the development time can be introduced. Furthermore, just having detection mechanisms will not be sufficient for the app development community. Further research should also be conducted to identify the possible ways of integrating the detection methods into Android app development environments as tools or plugins. By utilizing them, developers can validate the security throughout the application without waiting for the complete application. Moreover, the lack of an automated mechanism for identifying the reasons for the vulnerabilities was also identified. Further research can be conducted to integrate the explainable AI techniques with Android source code vulnerability detection mechanisms to overcome this.

Both Android vulnerabilities and their detection techniques are evolving. Therefore, similar future reviews are also necessary to cover the studies on these emerging threats and their detection methods. As per the understanding through the review, novel ML methods, DL methods, and reinforcement learning methods are also emerging to detect and prevent vulnerabilities. Therefore, further reviews also can be carried out.

Acknowledgments

We thank Robert Gordon University - United Kingdom, University of Kelaniya - Sri Lanka, and Accelerating Higher Education Expansion and Development (AHEAD) grant of Sri Lanka for their support.

References

[1]

Omar M. Ahmed and Amira B. Sallow. 2017. Android security: A review. Acad. J. Nawr. Univ. 6, 3 (2017), 135–140. Retrieved from https://journals.nawroz.edu.krd/index.php/ajnu/article/view/99.

Abstract

1 Introduction

1.1 Research Questions

1.2 Organization of the Review

2 Background AND Related Literature Reviews

2.1 Background

2.1.1 Layered Architecture and Security Implication of Android.

2.1.2 Vulnerabilities of Android Applications.

2.1.3 User and Developer Mistakes.

2.1.4 Machine Learning Processes.

2.2 Related Literature Reviews

3 Methodology

3.1 Search Strategy

3.2 Study Selection, Data Extraction, and Synthesis

4 Application Analysis

4.1 Static Analysis

4.2 Dynamic Analysis

4.3 Hybrid Analysis

5 Code Vulnerability Detection

5.1 Machine Learning Methods

5.1.1 Machine Learning with Static Analysis.

5.1.2 Machine Learning with Dynamic Analysis.

5.1.3 Machine Learning with Hybrid Analysis.

5.2 Conventional Methods

5.2.1 Conventional Methods with Static Analysis.

5.2.2 Conventional Methods with Dynamic Analysis.

5.2.3 Conventional Methods with Hybrid Analysis.

5.3 Prevention Techniques

5.4 Discussion on Vulnerability Detection Methods

6 Supportive Tools AND Repositories

6.1 Tools

6.2 Repositories and Datasets

7 Threats to Validity of the Review

7.1 Construct Validity

7.2 Internal Validity

7.3 External Validity

7.4 Conclusion Validity

8 Conclusion AND Future Works

Acknowledgments

References

Cited By

Index Terms

Recommendations

AI-Powered Vulnerability Detection for Secure Source Code Development

Detecting Blind Cross-Site Scripting Attacks Using Machine Learning

AUSERA: Automated Security Vulnerability Detection for Android Apps

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations