Abstract
User reviews obtained from mobile application (app) stores contain technical feedback that can be useful for app developers. Recent research has been focused on mining and categorizing such feedback into actionable software maintenance requests, such as bug reports and functional feature requests. However, little attention has been paid to extracting and synthesizing the Non-Functional Requirements (NFRs) expressed in these reviews. NFRs describe a set of high-level quality constraints that a software system should exhibit (e.g., security, performance, usability, and dependability). Meeting these requirements is a key factor for achieving user satisfaction, and ultimately, surviving in the app market. To bridge this gap, in this paper, we present a two-phase study aimed at mining NFRs from user reviews available on mobile app stores. In the first phase, we conduct a qualitative analysis using a dataset of 6,000 user reviews, sampled from a broad range of iOS app categories. Our results show that 40% of the reviews in our dataset signify at least one type of NFRs. The results also show that users in different app categories tend to raise different types of NFRs. In the second phase, we devise an optimized dictionary-based multi-label classification approach to automatically capture NFRs in user reviews. Evaluating the proposed approach over a dataset of 1,100 reviews, sampled from a set of iOS and Android apps, shows that it achieves an average precision of 70% (range [66% - 80%]) and average recall of 86% (range [69% - 98%]).
Similar content being viewed by others
Notes
The data is available at: http://seel.cse.lsu.edu/data/emse19.zip
Dataset is available at: http://seel.cse.lsu.edu/data/emse19.zip
References
Apté C, Damerau F, Weiss S (1994) Towards language independent automated learning of text categorization models. In: Special interest group on information retrieval, pp 23–30
Bakiu E, Guzman E (2017) Which feature is unusable? Detecting usability and user experience issues from user reviews. In: International requirements engineering conference workshops, pp 182–187
Bano M, Zowghi D, da Rimini F (2017) User satisfaction and system success: An empirical exploration of user involvement in software development. Empir Softw Eng 22(5):2339–2372
Basole R, Karla J (2012) Value transformation in the mobile service ecosystem: A study of app store emergence and growth. Serv Sci 4(1):24–41
Berry D (2017) Evaluation of tools for hairy requirements and software engineering tasks. In: International requirements engineering conference workshops, pp 284–291
Bi W, Kwok J (2014) Multilabel classification with label correlations and missing labels. In: AAAI conference on artificial intelligence, pp 1680–1686
Bird S, Loper E, Klein E (2009) Natural language processing with python. Sentiment Short Strength Detect Informal Text 61(12):2544–2558
Blei D, Ng A, Jordan M (2003) LAtent Dirichlet Allocation. J Mach Learn Res 3:993–1022
Brinker K, Fürnkranz J, Hüllermeier E (2006) A unified model for multilabel classification and ranking. In: European conference on artificial intelligence, pp 489–493
Brusilovsky P, Kobsa A, Nejdl W (2007) The Adaptive Web: Methods and Strategies of Web Personalization. Springer, Berlin, pp 335–336
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167
Carreño L, Winbladh K (2013) Analysis of user comments: An approach for software requirements evolution. In: International conference on software engineering, pp 582–591
Chen N, Lin J, Hoi S, Xiao X, Zhang B (2014) AR-Miner: Mining informative reviews for developers from mobile app marketplace. In: International conference on software engineering, pp 767–778
Cheng W, Hüllermeier E (2009) A simple instance-based approach to multilabel classification using the mallows model. In: International workshop on learning from multi-label data, pp 28–38
Chung L, Cesar J, do Prado Leite S (2009) On non-functional requirements in software engineering. Springer, Berlin, pp 363–379
Ciurumelea A, Schaufelbühl A, Panichella S, Gall H (2017) Analyzing reviews and code of mobile apps for better release planning. In: International conference on software analysis, evolution and reengineering, pp 91–102
Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005) Goal-centric traceability for managing non-functional requirements. In: International conference on software engineering, pp 362–371
Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: Requirements engineering, pp 39–48
Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated classification of non-functional requirements. Requir Eng 12(2):103–120
Coulton P, Bamford W (2011) Experimenting through mobile apps and app stores. Int J Mob Hum Comput Interact 3(4):55–70
Dehlinger J, Dixon J (2011) Mobile application software engineering: Challenges and research directions. In: Workshop on mobile software engineering, pp 29–32
Eisenstein J, OĆonnor B, Smith N, Xing E (2014) Diffusion of lexical change in social media. PLoS ONE 9:1–13
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: International conference on neural information processing systems: natural and synthetic, pp 681–687
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl Discov Data Min 96(34):226–231
Finkelstein A, Harman M, Jia Y, Martin W, Sarro F, Zhang Y (2014) App store analysis: Mining app stores for relationships between customer, business and technical characteristics, University of College London, Tech. Rep. rN/14/10, Tech Rep.
Forman G, Zahorjan J (1994) The challenges of mobile computing. Computer 27(4):38–47
Fu B, Lin J, Li L, Faloutsos C, Hong J, Sadeh N (2013) Why people hate your app: Making sense of user feedback in a mobile app store. In: Knowledge discovery and data mining, pp 1276–1284
Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: International conference on information and knowledge management, pp 195–200
Giardino C, Wang X, Abrahamsson P (2014) Why early-stage software startups fail: A behavioral framework. In: International conference of software business, pp 27–41
Glinz M (2007) On non-functional requirements. In: International requirements engineering conference, pp 21–26
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Advances in knowledge discovery and data mining, pp 22–30
Gokcay D, Gokcay E (1995) Generating titles for paragraphs using statistically extracted keywords and phrases. Syst Man Cybern 4:3174–3179
Gómez M, Adams B, Maalej W, Monperrus M, Rouvoy R (2017) App store 2.0: From crowdsourced information to actionable feedback in mobile ecosystems. IEEE Softw 34(2):81–89
Gotel O, Cleland-Huang J, Hayes J, Zisman A, Egyed A, Grünbacher P, Dekhtyar A, Antoniol G, Maletic J (2012) The grand challenge of traceability (v1. 0). In: Software and systems traceability, pp 343–409
Gralha W, Damian D, Wasserman A, Goulao M, Araújo J (2018) The evolution of requirements practices in software startups. In: International conference on software engineering
Groen E, Kopczynska S, Hauer M, Krafft T, Doerr J (2017) Users - The hidden software product quality experts? Requirements Engineering, pp 80–89
Gross D, Yu E (2001) From non-functional requirements to design through patterns. Requir Eng 6(1):18–36
Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: Requirements engineering, pp 153–162
Harman M., Jia Y., Zhang Y. (2012) App store mining and analysis: MSR for app stores, In: Mining software repositories, pp 108–111
Harrison R, Flood D, Duce D (2013) Usability of mobile applications: Literature review and rationale for a new usability model. J Interact Sci 1(1):1–16
Hattori L, Lanza M (2008) On the nature of commits. In: International conference on automated software engineering, pp 63–71
He W, Tian X, Shen J (2015) Examining security risks of mobile banking applications through blog mining. In: Modern artificial intelligence and cognitive science conference, pp 103–108
Hindle A, Wilson A, Rasmussen K, Barlow J, Charles J, Romansky S (2014) GreenMiner: A hardware based mining software repositories software energy consumption framework. In: Working conference on mining software repositories, pp 21–21
Hutto C, Gilbert, E (2014) VADER: A parsimonious rule-based model for sentiment analysis of social media text. In: International AAAI conference on weblogs and social media
Ihm S, Loh W, Park Y (2013) App analytic: A study on correlation analysis of app ranking data. In: International conference on cloud and green computing, pp 561–563
Javarone M, Armano G (2013) Emergence of acronyms in a community of language users. Eur Phys J B 86(11):474
Jha N, Mahmoud A (2017a) Mining user requirements from application store reviews using frame semantics. In: Requirements engineering: foundation for software quality, pp 273–287
Jha N, Mahmoud A (2017b) MARC: A Mobile application review classifier. In: Requirements engineering: foundation for software quality, workshops, pp 1-15
Jha N, Mahmoud A (2018) Using frame semantics for classifying and summarizing application store reviews. Empir Softw Eng 23(6):3734–3767
Joachims T (1998) Text categorization with Support Vector Machines: Learning with many relevant features, pp 137–142
Johann T, Stanik C, Maalej W et al (2017) Safe: A simple approach for feature extraction from app descriptions and app reviews. In: Requirements engineering, pp 21–30
Jongeling R, Sarkar P, Datta S, Serebrenik A (2017) On negative results when using sentiment analysis tools for software engineering research. Empir Softw Eng 22(5):2543–2584
Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Requirements engineering, pp 61–70
Lee G, Raghu T (2011) Product portfolio and mobile apps success: Evidence from app store market. In: Americas conference information systems, pp 3912–3921
Lewis D (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In: European conference on machine learning, pp 4–15
Li J, Yan H, Liu Z, Chen X, Huang X, Wong D (2017) Location-sharing systems with enhanced privacy in mobile online social networks. IEEE Syst J 11 (2):439–448
Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering: How far can we go? In: International conference on software engineering, pp 94–104
Luaces O, Díez J, Barranquero J, Coz J, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Prog Artif Intell 1(4):303–313
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Requirements engineering, pp 116–125
Maalej W, Kurtanović Z, nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331
Mahatanankoon R, Joseph Wen H, Lim B (2005) Consumer-based m-commerce: Exploring consumer perception of mobile applications. Comput Stand Interfaces 27 (4):347–357
Mahmoud A, Williams G (2016) Detecting, classifying, and tracing non-functional software requirements. Requir Eng 21(3):357–381
Mairiza D, Zowghi D, Nurmuliani N (2010) An investigation into the notion of non-functional requirements. In: Association for computing machinery symposium on applied computing, pp 311–317
Martin W, Harman M, Jia Y, Sarro F, Zhang Y (2015) The app sampling problem for app store mining. In: Working conference on mining software repositories, pp 123–133
Martin W, Sarro F, Harman M (2016a) Causal impact analysis for app releases in google play. In: International symposium on foundations of software engineering, pp 435–446
Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016b) A survey of app store analysis for software engineering. IEEE Transactions on Software Engineering
Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2017) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43(9):817–847
McCallum A, Nigam K et al (1998) A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol 752, pp 41–48
Mcllroy S, Ali N, Khalid H, Hassan A (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Softw Eng 21(3):1067–1106
Nayebi M, Adams B, Ruhe G (2016a) Release practices for mobile apps – what do users and developers think?. In: International conference on software analysis, evolution, and reengineering, pp 552–562
Nguyen Duc A, Abrahamsson P (2016b) Minimum viable product or multiple facet product? The role of mvp in software startups. In: Agile processes in software engineering and extreme programming, pp 118–130
Nayebi M, Farahi H, Ruhe G (2017a) Which version should be released to app store?. In: International symposium on empirical software engineering and measurement, pp 324–333
Nayebi M, Ruhe G (2017b) Optimized functionality for super mobile apps. In: International requirements engineering conference, pp 388–393
Nayebi M, Cho H, Ruhe G (2018) App store mining is not enough for app improvement. Empir Softw Eng 23(5):2764–2794
Nuseibeh B (2001) Weaving together requirements and architectures. Computer 34(3):115–119
Pagano D, Maalej W (2013) User feedback in the appstore: An empirical study. In: Requirements engineering, pp 125–134
Panichella S, Sorbo A, Guzman E, Visaggio C, Canfora G, Gall H (2015) How can I improve my app? Classifying user reviews for software maintenance and evolution. In: International conference on software maintenance and evolution, pp 281–290
Paternoster N, Giardino C, Unterkalmsteiner M, Gorschek T, Abrahamsson P (2014) Software development in Startup companies: A systematic mapping study. Inf Softw Technol 56(10):1200–1218
Pedregosa F, et al. (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Petsas T, Papadogiannakis A, Polychronakis M, Markatos E, Karagiannis T (2013) Rise of the planet of the apps: A systematic study of the mobile app ecosystem. In: Conference on internet measurement, pp 277–290
Quinlan R (1986) Induction of Decision Trees. Mach Learn 1(1):81–106
Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: International conference on data mining, pp 995–1000
Regnell B, Höst M, Berntsson Svensson R (2007) A quality performance model for cost-benefit analysis of non-functional requirements applied to the mobile handset domain. In: Requirements engineering: foundation for software quality, pp 277–291
Ribeiro F, Araújo M, Gonċalves P, Benevenuto F, Gonċalves M (2015) SentiBench-a benchmark comparison of state-of-the-practice sentiment analysis methods, arXiv:http://arXiv.org/abs/1512.01818
Shah F, Sabanin Y, Pfahl D (2016) Feature-based evaluation of competing apps. In: International workshop on app market analytics, pp 15–21
Sorower M (2010) A literature survey on algorithms for multi-label learning, vol 18. Oregon State University, Corvallis
Tsoumakas G, Dimou A, Spyromitros E, Mezaris V, Kompatsiaris I, Vlahavas I (2009) Correlation-based pruning of stacked binary relevance models for multi-label learning. In: International workshop on learning from multi-label data, pp 101–116
Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: International conference on software engineering, pp 14–24
Wasserman A (2010) Software engineering issues for mobile application development. In: The FSE/SDP workshop on future of software engineering research, pp 397–400
Williams G, Mahmoud A (2017a) Analyzing, classifying, and interpreting emotions in software users’ tweets. In: International workshop on emotion awareness in software engineering, pp 2–7
Williams G, Mahmoud A (2017b) Mining Twitter feeds for software user requirements. In: International requirements engineering conference, pp 1–10
Williams G, Mahmoud A (2018) Modeling user concerns in the app store: A case study on the rise and fall of Yik Yak. In: International requirements engineering conference, pp 64–75
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Human language technology and empirical methods in natural language processing, pp 347–354
Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B (2012) A wesslèn Experimentation in Software Engineering. Springer, Berlin
Acknowledgements
We would like to extend our gratitude to Dr. Daniel M. Berry from the University of Waterloo for his contribution to this work. This work was supported in part by the Louisiana Board of Regents Research Competitiveness Subprogram (LA BoR-RCS), contract number: LEQSF(2015-18)-RD-A-07 and by the LSU Economic Development Assistantships (EDA) program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: David Lo, Meiyappan Nagappan, Fabio Palomba, and Sebastiano Panichella
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jha, N., Mahmoud, A. Mining non-functional requirements from App store reviews. Empir Software Eng 24, 3659–3695 (2019). https://doi.org/10.1007/s10664-019-09716-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09716-7