MNoR-BERT: multi-label classification of non-functional requirements using BERT

Kaur, Kamaljit; Kaur, Parminder

doi:10.1007/s00521-023-08833-1

MNoR-BERT: multi-label classification of non-functional requirements using BERT

Original Article
Published: 12 August 2023

Volume 35, pages 22487–22509, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Kamaljit Kaur¹ &
Parminder Kaur¹

742 Accesses
Explore all metrics

Abstract

In the era of Internet access, software is easily available on digital distribution platforms such as app stores. The distribution of software on these platforms makes user feedback more accessible and can be used from requirements engineering to software maintenance context. However, such user reviews might contain technical information about the app that can be valuable for developers and software companies. Due to pervasive use of mobile apps, a large amount of data is created by users on daily basis. Manual identification and classification of such reviews are time-consuming and laborious tasks. Hence, automating this process is essential for assisting developers in managing these reviews efficiently. Prior studies have focused on classification of these reviews into bug reports, user experience, and feature requests. Nevertheless to date, a very few research papers have extracted Non-Functional Requirements (NFRs) present in these reviews. NFRs are considered as the set of quality attributes such as reliability, performance, security and usability of the software. Previous studies have utilized machine learning techniques to classify these reviews into their respective classes. However, it was observed that existing studies treat review classification problems as single-label classification problem, and also underestimate the contextual relationship between the words of review statements. To alleviate this limitation, the proposed research work used a transfer learning model to classify multi-label app reviews into four NFRs: Dependability, Performance, Supportability, and Usability. The proposed approach evaluates the performance of the pre-trained language model for multi-label review classification. In this paper, a set of experiments are conducted to compare the performance of the proposed model against the baseline machine learning with binary relevance and keyword based approach. We evaluated our approach over a dataset of 6000 user reviews of 24 iOS apps. Experimental results show that the proposed model outperforms state-of-the-art baseline techniques with respect to precision, recall, and F1-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining non-functional requirements from App store reviews

Article 07 June 2019

Study of Various Classifiers for Identification and Classification of Non-functional Requirements

Automatically Classifying Kano Model Factors in App Reviews

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The data of this study are public and available on link [http://seel.cse.lsu.edu/data/emse19.zip].

Notes

References

Achimugu P, Selamat A, Ibrahim R, Mahrin MN (2014) A systematic literature review of software requirements prioritization research. Inform Softw Technol. https://doi.org/10.1016/j.infsof.2014.02.001
Article Google Scholar
Araujo A, Golo M, Viana B, Sanches F, Romero R, Marcacini R (2020) From bag-of-words to pre-trained neural language models: improving automatic classification of app reviews for requirements engineering. Anais do XVII Encontro Nacional de Inteligência Artificial e Computacional. https://doi.org/10.5753/eniac.2020.12144
Article Google Scholar
Aslam N, Ramay WY, Xia K, Sarwar N (2020) Convolutional neural network based classification of app reviews. IEEE Access 8:185619–185628. https://doi.org/10.1109/ACCESS.2020.3029634
Article Google Scholar
C Baker, L Deng, S Chakraborty, J Dehlinger, (2019) Automatic multi-class non-functional software requirements classification using neural networks. In: Proceedings–Int Comput Softw Appl Conference, 2: 610–615, https://doi.org/10.1109/COMPSAC.2019.10275
Benites F, Sapozhnikova E (2016) HARAM: a Hierarchical ARAM neural network for large-scale text classification. In: Proceedings–15th IEEE International Conference on Data Mining Workshop, ICDMW 7: 847–854. https://doi.org/10.1109/ICDMW.2015.14
Binkhonain M, Zhao L (2019) A review of machine learning algorithms for identification and classification of non functional requirements. Expert Syst Appl. https://doi.org/10.1016/j.eswax.2019.100001
Article Google Scholar
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Transact Assoc Comput Linguist 5:135–146. https://doi.org/10.1162/tacl_a_00051
Article Google Scholar
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit. https://doi.org/10.1016/j.patcog.2004.03.009
Article Google Scholar
Boehm B, In H (1996) Identifying quality-requirement conflicts. IEEE Softw 13:25–35
Article Google Scholar
Chen N, Lin J, Hoi SC, Xiao X, Zhang B (2014) AR-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th international conference on software engineering. https://doi.org/10.1145/2568225.2568263
Chollet F (2021) Deep learning with python. Simon Schuster
Google Scholar
Chung L, Prado Leite JC (2009) On non-functional requirements in software engineering. Springer, Berlin, Heidelberg, Conceptual modeling Foundations and applications. https://doi.org/10.1007/978-3-642-02463-4_19
Book Google Scholar
Ciurumelea A, Schaufelbühl A, Panichella S, Gall HC (2017) Analyzing reviews and code of mobile apps for better release planning. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). https://doi.org/10.1109/SANER.2017.7884612
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 42–53
Chapter MATH Google Scholar
Cleland-Huang J (2007) Quality requirements and their role in successful products. In: 15th IEEE International Requirements Engineering Conference. https://doi.org/10.1109/RE.2007.45
Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-xl: attentive language models beyond a fixed-length context. ACL 2019–57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference.https://doi.org/10.18653/v1/p19-1285
de Araújo AF, Marcacini RM (2021) RE-BERT: automatic extraction of software requirements from app reviews using BERT language model. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing.
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1(Mlm).
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Advances in neural information processing systems
Genc-Nayebi N, Abran A (2017) A systematic literature review: opinion mining studies from mobile app store user reviews. J Syst Softw. https://doi.org/10.1016/j.jss.2016.11.027
Article Google Scholar
Gibaja E, Ventura S (2014) Multi-label learning: a review of the state of the art and ongoing research. Data Mining and Knowledge Discovery, Wiley Interdisciplinary Reviews. https://doi.org/10.1002/widm.1139
Book Google Scholar
Gnanasekaran RK, Chakraborty S, Dehlinger J, Deng L (2021) Using Recurrent Neural Networks for Classification of Natural Language-based Non-functional Requirements. REFSQ Workshops
Groen EC, Kopczyńska S, Hauer MP, Krafft TD, Doerr J (2017) Users—the hidden software product quality experts?: A study on how app users report quality aspects in online reviews. In: 2017 IEEE 25th international requirements engineering conference (RE). IEEE pp 80–89. https://doi.org/10.1109/RE.2017.73
Guzman E, Maalej W (2014) How do users like this feature? a fine grained sentiment analysis of app reviews. In: 2014 IEEE 22nd international requirements engineering conference (RE), https://doi.org/10.1109/RE.2014.6912257
Jha N, Mahmoud A (2019) Mining non-functional requirements from app store reviews. Empir Softw Eng 24(6):3659–3695. https://doi.org/10.1007/s10664-019-09716-7
Article Google Scholar
Kurtanović Z, Maalej W (2017) Automatically classifying functional and non-functional requirements using supervised machine learning. In: 2017 IEEE 25th International Requirements Engineering Conference (RE) IEEE. https://doi.org/10.1109/RE.2017.82
Kurtanović Z, Maalej W (2018) On user rationale in software engineering. Requirements Eng. https://doi.org/10.1007/s00766-018-0293-2
Article Google Scholar
Liu SM, Chen JH (2015) A multi-label classification based approach for sentiment classification. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2014.08.036
Article Google Scholar
Lu M, Liang P (2017) Automatic classification of non-functional requirements from augmented app user reviews. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, https://doi.org/10.1145/3084226.3084241
Maalej W, Kurtanović Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Require Eng. https://doi.org/10.1007/s00766-016-0251-9
Article Google Scholar
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: IEEE 23rd international requirements engineering conference (RE). https://doi.org/10.1109/RE.2015.7320414
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. In: Pattern recognition. https://doi.org/10.1016/j.patcog.2012.03.004
McIlroy S, Ali N, Khalid H, E Hassan A (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. In: Empir Softw Eng. https://doi.org/10.1007/s10664-015-9375-7
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings.
Navarro-Almanza R, Juarez-Ramirez R, Licea G (2017) Towards supporting software engineering using deep learning: A case of software requirements classification. In: 2017 5th International Conference in Software Engineering Research and Innovation (CONISOFT) pp. 116–120 .https://doi.org/10.1109/CONISOFT.2017.00021
Pagano D, Maalej W, 2013 User feedback in the appstore: an empirical study. In: 2013 21st IEEE international requirements engineering conference (RE). IEEE
Palomba F, Linares-Vásquez M, Bavota G, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2018) Crowdsourcing user reviews to support the evolution of mobile apps. J Syst Softw. https://doi.org/10.1016/j.jss.2017.11.043
Article Google Scholar
Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC (2015) How can i improve my app? Classifying user reviews for software maintenance and evolution. In: 2015 IEEE international conference on software maintenance and evolution (ICSME) . https://doi.org/10.1109/ICSM.2015.7332474
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP).
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training.
Stanik C, Haering M, Maalej W (2019) Classifying multilingual user feedback using traditional machine learning and deep learning. In: Proceedings–2019 IEEE 27th International Requirements Engineering Conference Workshops. https://doi.org/10.1109/REW.2019.00046
Sechidis K, Tsoumakas G, Vlahavas I (2011) On the stratification of multi-label data. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, pp 145–158
Chapter Google Scholar
Di Sorbo A, Panichella S, Alexandru CV, Visaggio CA, Canfora G (2017) SURF: summarizer of user reviews feedback. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), IEEE. https://doi.org/10.1109/ICSE-C.2017.5
Tsoumakas G, Katakis I, Vlahavas I (2010) Random k-labelsets for multilabel classification. IEEE Transact Knowl Data Eng. https://doi.org/10.1109/TKDE.2010.164
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:5999–6009
Google Scholar
Winkler J, Vogelsang A (2017) Automatic classification of requirements based on Convolutional neural networks. In: 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW) IEEE. https://doi.org/10.1109/REW.2016.16
Wu J, Gao Z, Hu C (2009) An empirical study on several classification algorithms and their improvements. In: International Symposium on Intelligence Computation and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04843-2_30
Yang H, Liang P (2015) Identification and Classification of Requirements from App User Reviews. In: Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE. https://doi.org/10.18293/SEKE2015-063

Download references

Author information

Authors and Affiliations

Department of Computer Science, Guru Nanak Dev University, Amritsar, India
Kamaljit Kaur & Parminder Kaur

Authors

Kamaljit Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Parminder Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamaljit Kaur.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kaur, K., Kaur, P. MNoR-BERT: multi-label classification of non-functional requirements using BERT. Neural Comput & Applic 35, 22487–22509 (2023). https://doi.org/10.1007/s00521-023-08833-1

Download citation

Received: 06 September 2022
Accepted: 28 June 2023
Published: 12 August 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00521-023-08833-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MNoR-BERT: multi-label classification of non-functional requirements using BERT

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining non-functional requirements from App store reviews

Study of Various Classifiers for Identification and Classification of Non-functional Requirements

Automatically Classifying Kano Model Factors in App Reviews

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MNoR-BERT: multi-label classification of non-functional requirements using BERT

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining non-functional requirements from App store reviews

Study of Various Classifiers for Identification and Classification of Non-functional Requirements

Automatically Classifying Kano Model Factors in App Reviews

Explore related subjects

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation