Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

MNoR-BERT: multi-label classification of non-functional requirements using BERT

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In the era of Internet access, software is easily available on digital distribution platforms such as app stores. The distribution of software on these platforms makes user feedback more accessible and can be used from requirements engineering to software maintenance context. However, such user reviews might contain technical information about the app that can be valuable for developers and software companies. Due to pervasive use of mobile apps, a large amount of data is created by users on daily basis. Manual identification and classification of such reviews are time-consuming and laborious tasks. Hence, automating this process is essential for assisting developers in managing these reviews efficiently. Prior studies have focused on classification of these reviews into bug reports, user experience, and feature requests. Nevertheless to date, a very few research papers have extracted Non-Functional Requirements (NFRs) present in these reviews. NFRs are considered as the set of quality attributes such as reliability, performance, security and usability of the software. Previous studies have utilized machine learning techniques to classify these reviews into their respective classes. However, it was observed that existing studies treat review classification problems as single-label classification problem, and also underestimate the contextual relationship between the words of review statements. To alleviate this limitation, the proposed research work used a transfer learning model to classify multi-label app reviews into four NFRs: Dependability, Performance, Supportability, and Usability. The proposed approach evaluates the performance of the pre-trained language model for multi-label review classification. In this paper, a set of experiments are conducted to compare the performance of the proposed model against the baseline machine learning with binary relevance and keyword based approach. We evaluated our approach over a dataset of 6000 user reviews of 24 iOS apps. Experimental results show that the proposed model outperforms state-of-the-art baseline techniques with respect to precision, recall, and F1-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The data of this study are public and available on link [http://seel.cse.lsu.edu/data/emse19.zip].

Notes

  1. https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/.

  2. https://github.com/trent-b/iterative-stratification.

References

  1. Achimugu P, Selamat A, Ibrahim R, Mahrin MN (2014) A systematic literature review of software requirements prioritization research. Inform Softw Technol. https://doi.org/10.1016/j.infsof.2014.02.001

    Article  Google Scholar 

  2. Araujo A, Golo M, Viana B, Sanches F, Romero R, Marcacini R (2020) From bag-of-words to pre-trained neural language models: improving automatic classification of app reviews for requirements engineering. Anais do XVII Encontro Nacional de Inteligência Artificial e Computacional. https://doi.org/10.5753/eniac.2020.12144

    Article  Google Scholar 

  3. Aslam N, Ramay WY, Xia K, Sarwar N (2020) Convolutional neural network based classification of app reviews. IEEE Access 8:185619–185628. https://doi.org/10.1109/ACCESS.2020.3029634

    Article  Google Scholar 

  4. C Baker, L Deng, S Chakraborty, J Dehlinger, (2019) Automatic multi-class non-functional software requirements classification using neural networks. In: Proceedings–Int Comput Softw Appl Conference, 2: 610–615, https://doi.org/10.1109/COMPSAC.2019.10275

  5. Benites F, Sapozhnikova E (2016) HARAM: a Hierarchical ARAM neural network for large-scale text classification. In: Proceedings–15th IEEE International Conference on Data Mining Workshop, ICDMW 7: 847–854. https://doi.org/10.1109/ICDMW.2015.14

  6. Binkhonain M, Zhao L (2019) A review of machine learning algorithms for identification and classification of non functional requirements. Expert Syst Appl. https://doi.org/10.1016/j.eswax.2019.100001

    Article  Google Scholar 

  7. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Transact Assoc Comput Linguist 5:135–146. https://doi.org/10.1162/tacl_a_00051

    Article  Google Scholar 

  8. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit. https://doi.org/10.1016/j.patcog.2004.03.009

    Article  Google Scholar 

  9. Boehm B, In H (1996) Identifying quality-requirement conflicts. IEEE Softw 13:25–35

    Article  Google Scholar 

  10. Chen N, Lin J, Hoi SC, Xiao X, Zhang B (2014) AR-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th international conference on software engineering. https://doi.org/10.1145/2568225.2568263

  11. Chollet F (2021) Deep learning with python. Simon Schuster

    Google Scholar 

  12. Chung L, Prado Leite JC (2009) On non-functional requirements in software engineering. Springer, Berlin, Heidelberg, Conceptual modeling Foundations and applications. https://doi.org/10.1007/978-3-642-02463-4_19

    Book  Google Scholar 

  13. Ciurumelea A, Schaufelbühl A, Panichella S, Gall HC (2017) Analyzing reviews and code of mobile apps for better release planning. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). https://doi.org/10.1109/SANER.2017.7884612

  14. Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 42–53

    Chapter  MATH  Google Scholar 

  15. Cleland-Huang J (2007) Quality requirements and their role in successful products. In: 15th IEEE International Requirements Engineering Conference. https://doi.org/10.1109/RE.2007.45

  16. Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-xl: attentive language models beyond a fixed-length context. ACL 2019–57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference.https://doi.org/10.18653/v1/p19-1285

  17. de Araújo AF, Marcacini RM (2021) RE-BERT: automatic extraction of software requirements from app reviews using BERT language model. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing.

  18. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1(Mlm).

  19. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Advances in neural information processing systems

  20. Genc-Nayebi N, Abran A (2017) A systematic literature review: opinion mining studies from mobile app store user reviews. J Syst Softw. https://doi.org/10.1016/j.jss.2016.11.027

    Article  Google Scholar 

  21. Gibaja E, Ventura S (2014) Multi-label learning: a review of the state of the art and ongoing research. Data Mining and Knowledge Discovery, Wiley Interdisciplinary Reviews. https://doi.org/10.1002/widm.1139

    Book  Google Scholar 

  22. Gnanasekaran RK, Chakraborty S, Dehlinger J, Deng L (2021) Using Recurrent Neural Networks for Classification of Natural Language-based Non-functional Requirements. REFSQ Workshops

  23. Groen EC, Kopczyńska S, Hauer MP, Krafft TD, Doerr J (2017) Users—the hidden software product quality experts?: A study on how app users report quality aspects in online reviews. In: 2017 IEEE 25th international requirements engineering conference (RE). IEEE pp 80–89. https://doi.org/10.1109/RE.2017.73

  24. Guzman E, Maalej W (2014) How do users like this feature? a fine grained sentiment analysis of app reviews. In: 2014 IEEE 22nd international requirements engineering conference (RE), https://doi.org/10.1109/RE.2014.6912257

  25. Jha N, Mahmoud A (2019) Mining non-functional requirements from app store reviews. Empir Softw Eng 24(6):3659–3695. https://doi.org/10.1007/s10664-019-09716-7

    Article  Google Scholar 

  26. Kurtanović Z, Maalej W (2017) Automatically classifying functional and non-functional requirements using supervised machine learning. In: 2017 IEEE 25th International Requirements Engineering Conference (RE) IEEE. https://doi.org/10.1109/RE.2017.82

  27. Kurtanović Z, Maalej W (2018) On user rationale in software engineering. Requirements Eng. https://doi.org/10.1007/s00766-018-0293-2

    Article  Google Scholar 

  28. Liu SM, Chen JH (2015) A multi-label classification based approach for sentiment classification. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2014.08.036

    Article  Google Scholar 

  29. Lu M, Liang P (2017) Automatic classification of non-functional requirements from augmented app user reviews. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, https://doi.org/10.1145/3084226.3084241

  30. Maalej W, Kurtanović Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Require Eng. https://doi.org/10.1007/s00766-016-0251-9

    Article  Google Scholar 

  31. Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: IEEE 23rd international requirements engineering conference (RE). https://doi.org/10.1109/RE.2015.7320414

  32. Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. In: Pattern recognition. https://doi.org/10.1016/j.patcog.2012.03.004

  33. McIlroy S, Ali N, Khalid H, E Hassan A (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. In: Empir Softw Eng. https://doi.org/10.1007/s10664-015-9375-7

  34. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings.

  35. Navarro-Almanza R, Juarez-Ramirez R, Licea G (2017) Towards supporting software engineering using deep learning: A case of software requirements classification. In: 2017 5th International Conference in Software Engineering Research and Innovation (CONISOFT) pp. 116–120 .https://doi.org/10.1109/CONISOFT.2017.00021

  36. Pagano D, Maalej W, 2013 User feedback in the appstore: an empirical study. In: 2013 21st IEEE international requirements engineering conference (RE). IEEE

  37. Palomba F, Linares-Vásquez M, Bavota G, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2018) Crowdsourcing user reviews to support the evolution of mobile apps. J Syst Softw. https://doi.org/10.1016/j.jss.2017.11.043

    Article  Google Scholar 

  38. Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC (2015) How can i improve my app? Classifying user reviews for software maintenance and evolution. In: 2015 IEEE international conference on software maintenance and evolution (ICSME) . https://doi.org/10.1109/ICSM.2015.7332474

  39. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  40. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP).

  41. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training.

  42. Stanik C, Haering M, Maalej W (2019) Classifying multilingual user feedback using traditional machine learning and deep learning. In: Proceedings–2019 IEEE 27th International Requirements Engineering Conference Workshops. https://doi.org/10.1109/REW.2019.00046

  43. Sechidis K, Tsoumakas G, Vlahavas I (2011) On the stratification of multi-label data. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, pp 145–158

    Chapter  Google Scholar 

  44. Di Sorbo A, Panichella S, Alexandru CV, Visaggio CA, Canfora G (2017) SURF: summarizer of user reviews feedback. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), IEEE. https://doi.org/10.1109/ICSE-C.2017.5

  45. Tsoumakas G, Katakis I, Vlahavas I (2010) Random k-labelsets for multilabel classification. IEEE Transact Knowl Data Eng. https://doi.org/10.1109/TKDE.2010.164

    Article  Google Scholar 

  46. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:5999–6009

    Google Scholar 

  47. Winkler J, Vogelsang A (2017) Automatic classification of requirements based on Convolutional neural networks. In: 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW) IEEE. https://doi.org/10.1109/REW.2016.16

  48. Wu J, Gao Z, Hu C (2009) An empirical study on several classification algorithms and their improvements. In: International Symposium on Intelligence Computation and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04843-2_30

  49. Yang H, Liang P (2015) Identification and Classification of Requirements from App User Reviews. In: Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE. https://doi.org/10.18293/SEKE2015-063

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamaljit Kaur.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, K., Kaur, P. MNoR-BERT: multi-label classification of non-functional requirements using BERT. Neural Comput & Applic 35, 22487–22509 (2023). https://doi.org/10.1007/s00521-023-08833-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08833-1

Keywords