Mining non-functional requirements from App store reviews

Jha, Nishant; Mahmoud, Anas

doi:10.1007/s10664-019-09716-7

Mining non-functional requirements from App store reviews

Published: 07 June 2019

Volume 24, pages 3659–3695, (2019)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

3142 Accesses
67 Citations
1 Altmetric
Explore all metrics

Abstract

User reviews obtained from mobile application (app) stores contain technical feedback that can be useful for app developers. Recent research has been focused on mining and categorizing such feedback into actionable software maintenance requests, such as bug reports and functional feature requests. However, little attention has been paid to extracting and synthesizing the Non-Functional Requirements (NFRs) expressed in these reviews. NFRs describe a set of high-level quality constraints that a software system should exhibit (e.g., security, performance, usability, and dependability). Meeting these requirements is a key factor for achieving user satisfaction, and ultimately, surviving in the app market. To bridge this gap, in this paper, we present a two-phase study aimed at mining NFRs from user reviews available on mobile app stores. In the first phase, we conduct a qualitative analysis using a dataset of 6,000 user reviews, sampled from a broad range of iOS app categories. Our results show that 40% of the reviews in our dataset signify at least one type of NFRs. The results also show that users in different app categories tend to raise different types of NFRs. In the second phase, we devise an optimized dictionary-based multi-label classification approach to automatically capture NFRs in user reviews. Evaluating the proposed approach over a dataset of 1,100 reviews, sampled from a set of iOS and Android apps, shows that it achieves an average precision of 70% (range [66% - 80%]) and average recall of 86% (range [69% - 98%]).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MNoR-BERT: multi-label classification of non-functional requirements using BERT

Article 12 August 2023

Finding and Analyzing App Reviews Related to Specific Features: A Research Preview

Analysing app reviews for software engineering: a systematic literature review

Article Open access 20 January 2022

Notes

https://www.statista.com/topics/1729/app-stores/
https://www.apple.com/itunes/charts
https://rss.itunes.apple.com/us/?urlDesc=
https://github.com/seelprojects/ManualReviewClassifier
The data is available at: http://seel.cse.lsu.edu/data/emse19.zip
https://www.nltk.org/
Dataset is available at: http://seel.cse.lsu.edu/data/emse19.zip
https://github.com/seelprojects/MARC-3.0

References

Apté C, Damerau F, Weiss S (1994) Towards language independent automated learning of text categorization models. In: Special interest group on information retrieval, pp 23–30
Chapter Google Scholar
Bakiu E, Guzman E (2017) Which feature is unusable? Detecting usability and user experience issues from user reviews. In: International requirements engineering conference workshops, pp 182–187
Bano M, Zowghi D, da Rimini F (2017) User satisfaction and system success: An empirical exploration of user involvement in software development. Empir Softw Eng 22(5):2339–2372
Article Google Scholar
Basole R, Karla J (2012) Value transformation in the mobile service ecosystem: A study of app store emergence and growth. Serv Sci 4(1):24–41
Article Google Scholar
Berry D (2017) Evaluation of tools for hairy requirements and software engineering tasks. In: International requirements engineering conference workshops, pp 284–291
Bi W, Kwok J (2014) Multilabel classification with label correlations and missing labels. In: AAAI conference on artificial intelligence, pp 1680–1686
Bird S, Loper E, Klein E (2009) Natural language processing with python. Sentiment Short Strength Detect Informal Text 61(12):2544–2558
MATH Google Scholar
Blei D, Ng A, Jordan M (2003) LAtent Dirichlet Allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Brinker K, Fürnkranz J, Hüllermeier E (2006) A unified model for multilabel classification and ranking. In: European conference on artificial intelligence, pp 489–493
Brusilovsky P, Kobsa A, Nejdl W (2007) The Adaptive Web: Methods and Strategies of Web Personalization. Springer, Berlin, pp 335–336
Book Google Scholar
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167
Article Google Scholar
Carreño L, Winbladh K (2013) Analysis of user comments: An approach for software requirements evolution. In: International conference on software engineering, pp 582–591
Chen N, Lin J, Hoi S, Xiao X, Zhang B (2014) AR-Miner: Mining informative reviews for developers from mobile app marketplace. In: International conference on software engineering, pp 767–778
Cheng W, Hüllermeier E (2009) A simple instance-based approach to multilabel classification using the mallows model. In: International workshop on learning from multi-label data, pp 28–38
Chung L, Cesar J, do Prado Leite S (2009) On non-functional requirements in software engineering. Springer, Berlin, pp 363–379
Google Scholar
Ciurumelea A, Schaufelbühl A, Panichella S, Gall H (2017) Analyzing reviews and code of mobile apps for better release planning. In: International conference on software analysis, evolution and reengineering, pp 91–102
Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005) Goal-centric traceability for managing non-functional requirements. In: International conference on software engineering, pp 362–371
Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: Requirements engineering, pp 39–48
Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated classification of non-functional requirements. Requir Eng 12(2):103–120
Article Google Scholar
Coulton P, Bamford W (2011) Experimenting through mobile apps and app stores. Int J Mob Hum Comput Interact 3(4):55–70
Article Google Scholar
Dehlinger J, Dixon J (2011) Mobile application software engineering: Challenges and research directions. In: Workshop on mobile software engineering, pp 29–32
Eisenstein J, OĆonnor B, Smith N, Xing E (2014) Diffusion of lexical change in social media. PLoS ONE 9:1–13
Article Google Scholar
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: International conference on neural information processing systems: natural and synthetic, pp 681–687
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl Discov Data Min 96(34):226–231
Google Scholar
Finkelstein A, Harman M, Jia Y, Martin W, Sarro F, Zhang Y (2014) App store analysis: Mining app stores for relationships between customer, business and technical characteristics, University of College London, Tech. Rep. rN/14/10, Tech Rep.
Forman G, Zahorjan J (1994) The challenges of mobile computing. Computer 27(4):38–47
Article Google Scholar
Fu B, Lin J, Li L, Faloutsos C, Hong J, Sadeh N (2013) Why people hate your app: Making sense of user feedback in a mobile app store. In: Knowledge discovery and data mining, pp 1276–1284
Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: International conference on information and knowledge management, pp 195–200
Giardino C, Wang X, Abrahamsson P (2014) Why early-stage software startups fail: A behavioral framework. In: International conference of software business, pp 27–41
Chapter Google Scholar
Glinz M (2007) On non-functional requirements. In: International requirements engineering conference, pp 21–26
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Advances in knowledge discovery and data mining, pp 22–30
Chapter Google Scholar
Gokcay D, Gokcay E (1995) Generating titles for paragraphs using statistically extracted keywords and phrases. Syst Man Cybern 4:3174–3179
Google Scholar
Gómez M, Adams B, Maalej W, Monperrus M, Rouvoy R (2017) App store 2.0: From crowdsourced information to actionable feedback in mobile ecosystems. IEEE Softw 34(2):81–89
Article Google Scholar
Gotel O, Cleland-Huang J, Hayes J, Zisman A, Egyed A, Grünbacher P, Dekhtyar A, Antoniol G, Maletic J (2012) The grand challenge of traceability (v1. 0). In: Software and systems traceability, pp 343–409
Google Scholar
Gralha W, Damian D, Wasserman A, Goulao M, Araújo J (2018) The evolution of requirements practices in software startups. In: International conference on software engineering
Groen E, Kopczynska S, Hauer M, Krafft T, Doerr J (2017) Users - The hidden software product quality experts? Requirements Engineering, pp 80–89
Gross D, Yu E (2001) From non-functional requirements to design through patterns. Requir Eng 6(1):18–36
Article MATH Google Scholar
Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: Requirements engineering, pp 153–162
Harman M., Jia Y., Zhang Y. (2012) App store mining and analysis: MSR for app stores, In: Mining software repositories, pp 108–111
Harrison R, Flood D, Duce D (2013) Usability of mobile applications: Literature review and rationale for a new usability model. J Interact Sci 1(1):1–16
Article Google Scholar
Hattori L, Lanza M (2008) On the nature of commits. In: International conference on automated software engineering, pp 63–71
He W, Tian X, Shen J (2015) Examining security risks of mobile banking applications through blog mining. In: Modern artificial intelligence and cognitive science conference, pp 103–108
Hindle A, Wilson A, Rasmussen K, Barlow J, Charles J, Romansky S (2014) GreenMiner: A hardware based mining software repositories software energy consumption framework. In: Working conference on mining software repositories, pp 21–21
Hutto C, Gilbert, E (2014) VADER: A parsimonious rule-based model for sentiment analysis of social media text. In: International AAAI conference on weblogs and social media
Ihm S, Loh W, Park Y (2013) App analytic: A study on correlation analysis of app ranking data. In: International conference on cloud and green computing, pp 561–563
Javarone M, Armano G (2013) Emergence of acronyms in a community of language users. Eur Phys J B 86(11):474
Article Google Scholar
Jha N, Mahmoud A (2017a) Mining user requirements from application store reviews using frame semantics. In: Requirements engineering: foundation for software quality, pp 273–287
Chapter Google Scholar
Jha N, Mahmoud A (2017b) MARC: A Mobile application review classifier. In: Requirements engineering: foundation for software quality, workshops, pp 1-15
Chapter Google Scholar
Jha N, Mahmoud A (2018) Using frame semantics for classifying and summarizing application store reviews. Empir Softw Eng 23(6):3734–3767
Article Google Scholar
Joachims T (1998) Text categorization with Support Vector Machines: Learning with many relevant features, pp 137–142
Johann T, Stanik C, Maalej W et al (2017) Safe: A simple approach for feature extraction from app descriptions and app reviews. In: Requirements engineering, pp 21–30
Jongeling R, Sarkar P, Datta S, Serebrenik A (2017) On negative results when using sentiment analysis tools for software engineering research. Empir Softw Eng 22(5):2543–2584
Article Google Scholar
Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Requirements engineering, pp 61–70
Lee G, Raghu T (2011) Product portfolio and mobile apps success: Evidence from app store market. In: Americas conference information systems, pp 3912–3921
Lewis D (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In: European conference on machine learning, pp 4–15
Chapter Google Scholar
Li J, Yan H, Liu Z, Chen X, Huang X, Wong D (2017) Location-sharing systems with enhanced privacy in mobile online social networks. IEEE Syst J 11 (2):439–448
Article Google Scholar
Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering: How far can we go? In: International conference on software engineering, pp 94–104
Luaces O, Díez J, Barranquero J, Coz J, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Prog Artif Intell 1(4):303–313
Article Google Scholar
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Requirements engineering, pp 116–125
Maalej W, Kurtanović Z, nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331
Article Google Scholar
Mahatanankoon R, Joseph Wen H, Lim B (2005) Consumer-based m-commerce: Exploring consumer perception of mobile applications. Comput Stand Interfaces 27 (4):347–357
Article Google Scholar
Mahmoud A, Williams G (2016) Detecting, classifying, and tracing non-functional software requirements. Requir Eng 21(3):357–381
Article Google Scholar
Mairiza D, Zowghi D, Nurmuliani N (2010) An investigation into the notion of non-functional requirements. In: Association for computing machinery symposium on applied computing, pp 311–317
Martin W, Harman M, Jia Y, Sarro F, Zhang Y (2015) The app sampling problem for app store mining. In: Working conference on mining software repositories, pp 123–133
Martin W, Sarro F, Harman M (2016a) Causal impact analysis for app releases in google play. In: International symposium on foundations of software engineering, pp 435–446
Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016b) A survey of app store analysis for software engineering. IEEE Transactions on Software Engineering
Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2017) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43(9):817–847
Article Google Scholar
McCallum A, Nigam K et al (1998) A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol 752, pp 41–48
Mcllroy S, Ali N, Khalid H, Hassan A (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Softw Eng 21(3):1067–1106
Article Google Scholar
Nayebi M, Adams B, Ruhe G (2016a) Release practices for mobile apps – what do users and developers think?. In: International conference on software analysis, evolution, and reengineering, pp 552–562
Nguyen Duc A, Abrahamsson P (2016b) Minimum viable product or multiple facet product? The role of mvp in software startups. In: Agile processes in software engineering and extreme programming, pp 118–130
Chapter Google Scholar
Nayebi M, Farahi H, Ruhe G (2017a) Which version should be released to app store?. In: International symposium on empirical software engineering and measurement, pp 324–333
Nayebi M, Ruhe G (2017b) Optimized functionality for super mobile apps. In: International requirements engineering conference, pp 388–393
Nayebi M, Cho H, Ruhe G (2018) App store mining is not enough for app improvement. Empir Softw Eng 23(5):2764–2794
Article Google Scholar
Nuseibeh B (2001) Weaving together requirements and architectures. Computer 34(3):115–119
Article Google Scholar
Pagano D, Maalej W (2013) User feedback in the appstore: An empirical study. In: Requirements engineering, pp 125–134
Panichella S, Sorbo A, Guzman E, Visaggio C, Canfora G, Gall H (2015) How can I improve my app? Classifying user reviews for software maintenance and evolution. In: International conference on software maintenance and evolution, pp 281–290
Paternoster N, Giardino C, Unterkalmsteiner M, Gorschek T, Abrahamsson P (2014) Software development in Startup companies: A systematic mapping study. Inf Softw Technol 56(10):1200–1218
Article Google Scholar
Pedregosa F, et al. (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Petsas T, Papadogiannakis A, Polychronakis M, Markatos E, Karagiannis T (2013) Rise of the planet of the apps: A systematic study of the mobile app ecosystem. In: Conference on internet measurement, pp 277–290
Quinlan R (1986) Induction of Decision Trees. Mach Learn 1(1):81–106
Google Scholar
Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: International conference on data mining, pp 995–1000
Regnell B, Höst M, Berntsson Svensson R (2007) A quality performance model for cost-benefit analysis of non-functional requirements applied to the mobile handset domain. In: Requirements engineering: foundation for software quality, pp 277–291
Chapter Google Scholar
Ribeiro F, Araújo M, Gonċalves P, Benevenuto F, Gonċalves M (2015) SentiBench-a benchmark comparison of state-of-the-practice sentiment analysis methods, arXiv:http://arXiv.org/abs/1512.01818
Shah F, Sabanin Y, Pfahl D (2016) Feature-based evaluation of competing apps. In: International workshop on app market analytics, pp 15–21
Sorower M (2010) A literature survey on algorithms for multi-label learning, vol 18. Oregon State University, Corvallis
Google Scholar
Tsoumakas G, Dimou A, Spyromitros E, Mezaris V, Kompatsiaris I, Vlahavas I (2009) Correlation-based pruning of stacked binary relevance models for multi-label learning. In: International workshop on learning from multi-label data, pp 101–116
Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: International conference on software engineering, pp 14–24
Wasserman A (2010) Software engineering issues for mobile application development. In: The FSE/SDP workshop on future of software engineering research, pp 397–400
Williams G, Mahmoud A (2017a) Analyzing, classifying, and interpreting emotions in software users’ tweets. In: International workshop on emotion awareness in software engineering, pp 2–7
Williams G, Mahmoud A (2017b) Mining Twitter feeds for software user requirements. In: International requirements engineering conference, pp 1–10
Williams G, Mahmoud A (2018) Modeling user concerns in the app store: A case study on the rise and fall of Yik Yak. In: International requirements engineering conference, pp 64–75
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Human language technology and empirical methods in natural language processing, pp 347–354
Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B (2012) A wesslèn Experimentation in Software Engineering. Springer, Berlin
Book MATH Google Scholar

Download references

Acknowledgements

We would like to extend our gratitude to Dr. Daniel M. Berry from the University of Waterloo for his contribution to this work. This work was supported in part by the Louisiana Board of Regents Research Competitiveness Subprogram (LA BoR-RCS), contract number: LEQSF(2015-18)-RD-A-07 and by the LSU Economic Development Assistantships (EDA) program.

Author information

Authors and Affiliations

Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA, USA
Nishant Jha & Anas Mahmoud

Authors

Nishant Jha
View author publications
You can also search for this author in PubMed Google Scholar
Anas Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anas Mahmoud.

Additional information

Communicated by: David Lo, Meiyappan Nagappan, Fabio Palomba, and Sebastiano Panichella

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jha, N., Mahmoud, A. Mining non-functional requirements from App store reviews. Empir Software Eng 24, 3659–3695 (2019). https://doi.org/10.1007/s10664-019-09716-7

Download citation

Published: 07 June 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10664-019-09716-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining non-functional requirements from App store reviews

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MNoR-BERT: multi-label classification of non-functional requirements using BERT

Finding and Analyzing App Reviews Related to Specific Features: A Research Preview

Analysing app reviews for software engineering: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Mining non-functional requirements from App store reviews

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MNoR-BERT: multi-label classification of non-functional requirements using BERT

Finding and Analyzing App Reviews Related to Specific Features: A Research Preview

Analysing app reviews for software engineering: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation