Abstract
Source code is the most updated source among all the available software artifacts. The majority of existing software redocumentation approaches relied on source code to extract the necessary information for program comprehension in order to support software maintenance tasks. However, performing Extract, Transform and Load (ETL) using a parser from the source code becoming a challenging task. The traditional approach is no longer able to handle the ETL efficiently due to the effect of the analysis efficiency, especially for large source code. This paper proposed to use distributed data processing technique to extract legacy source code components to generate detailed designed or technical software documentation at source code level to support program understanding. The objective of this paper is to apply the distributed data processing technique to the parser by using Hadoop Distributed File System and Apache Spark. Legacy java source code used as a case study to apply our proposed approach to extract the source code components and generate the technical software documentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Khadka, R., Batlajery, B.V., Saeidi, A.M., Jansen, S., Hage, J.: How do professionals perceive legacy systems and software modernization? In: Proc. Int. Conf. Softw. Eng., pp. 36–47 (2014). https://doi.org/10.1145/2568225.2568318
Matthiesen, S., Bjørn, P.: Why replacing legacy systems is so hard in global software development: an information infrastructure perspective. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 876–890 (2015)
Crotty, J., Horrocks, I.: Managing legacy system costs: a case study of a meta-assessment model to identify solutions in a large financial services company. Appl. Comput. Inform. 13, 175–183 (2017)
de Souza, S.C.B., Anquetil, N., de Oliveira, K.M.: Which documentation for software maintenance? J. Braz. Comput. Soc. 12(3), 31–44 (2007). https://doi.org/10.1007/BF03194494
Van Geet, J., Ebraert, P., Demeyer, S.: Redocumentation of a legacy banking system: an experience report. In: Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), pp. 33–41 (2010)
Tadonki, C.: Universal Report: a generic reverse engineering tool. In: 12th IEEE International Workshop on Program Comprehension (IWPC 2004), pp. 266–267 (2004)
Nallusamy, S., Ibrahim, S., Mahrin, M.N.: A software redocumentation process using ontology based approach in software maintenance. Int. J. Inf. Electron. Eng. 1, 133 (2011)
Dorninger, B., Moser, M., Pichler, J.: Multi-language re-documentation to support a COBOL to Java migration project. In: SANER 2017 – 24th IEEE Int. Conf. Softw. Anal. Evol. Reengineering, pp. 536–540 (2017). https://doi.org/10.1109/SANER.2017.7884669
Kienle, H.M., Müller, H.A.: Rigi – an environment for software reverse engineering, exploration, visualization, and redocumentation. Sci. Comput. Program. 75, 247–263 (2010). https://doi.org/10.1016/j.scico.2009.10.007
Sabtu, A., et al.: The challenges of Extract, Transform and Loading (ETL) system implementation for near real-time environment. In: Int. Conf. Res. Innov. Inf. Syst. ICRIIS, pp. 3–7 (2017). https://doi.org/10.1109/ICRIIS.2017.8002467
GarcÃa, S., RamÃrez-Gallego, S., Luengo, J., BenÃtez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1, 1–23 (2016). https://doi.org/10.1186/s41044-016-0014-0
Ragab, M., Tommasini, R., Awaysheh, F.M., Ramos, J.C.: An In-depth Investigation of Large-Scale RDF Relational Schema Optimizations Using Spark-SQL (2021)
Christa, S., Madhusudhan, V., Suma, V., Rao, J.J.: Software maintenance: from the perspective of effort and cost requirement. In: Proceedings of the International Conference on Data Engineering and Communication Technology, pp. 759–768. Springer (2017)
Sugumaran, N., Ibrahim, S.: An evaluation on software redocumentation approaches and tools in software maintenance. In: Commun. IBIMA, pp. 1–10 (2011). https://doi.org/10.5171/2011.875759
Kaur, U., Singh, G.: A review on software maintenance issues and how to reduce maintenance efforts. Int. J. Comput. Appl. 118, 6–11 (2015). https://doi.org/10.5120/20707-3021
Kaur, P.: The study of software re-engineering. WWJMRD 4, 381–383 (2018)
Rostkowycz, A.J., Rajlich, V., Marcus, A.: A case study on the long-term effects of software redocumentation. In: IEEE Int. Conf. Softw. Maintenance, ICSM, pp. 92–101 (2004). https://doi.org/10.1109/ICSM.2004.1357794
Nanthaamornphong, A., Leatongkam, A.: Extended ForUML for automatic generation of UML sequence diagrams from object-oriented Fortran. Sci. Program. (2019). https://doi.org/10.1155/2019/2542686
Singh, K.: Transformation of source code into UML diagrams through visualization tool. Int. J. Adv. Sci. Technol. 29(8), 4861–1114 (2020)
Sheer, A., Tahrawi, A., Jeesh, J., Al Ibrahim, Y.: A Framework for software re-documentation by using reverse engineering approach. Int. J. Comput. Appl. 118, 1–21 (2016)
Pathania, Y., Bathla, G.: A review on re-documentation approaches and their comparative study. Int. J. Comput. Sci. Trends Technol. 2, 48–51 (2014)
Geist, V., Moser, M., Pichler, J., Beyer, S., Pinzger, M.: Leveraging machine learning for software redocumentation. In: SANER 2020 – Proc. 2020 IEEE 27th Int. Conf. Softw. Anal. Evol. Reengineering, pp. 622–626 (2020). https://doi.org/10.1109/SANER48275.2020.9054838
Wolfart, D., et al.: Modernizing legacy systems with microservices: a roadmap. In: Evaluation and Assessment in Software Engineering, pp. 149–159. Association for Computing Machinery (2021)
Puri, R., et al.: Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. https://arxiv.org/abs/2105.12655 (2021)
Casado, R., Younas, M.: Emerging trends and technologies in big data processing. Concurr. Comput. 27, 2078–2091 (2015). https://doi.org/10.1002/cpe.3398
Shaikh, F., Pawaskar, D., Siddiqui, A., Khan, U.: YouTube data analysis using MapReduce on Hadoop. In: 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology, RTEICT 2018 – Proceedings, pp. 2037–2041 (2018). https://doi.org/10.1109/RTEICT42901.2018.9012635
Nibareke, T., Laassiri, J.: Using Big Data-machine learning models for diabetes prediction and flight delays analytics. J. Big Data 7(1), 1–18 (2020). https://doi.org/10.1186/s40537-020-00355-0
Jonnalagadda, V.S., Srikanth, P., Thumati, K., Nallamala, S.H., Dist, K.: A review study of apache spark in big data processing. Int. J. Comput. Sci. Trends Technol. 4, 93–98 (2016)
Han, Z., Zhang, Y.: Spark: a big data processing platform based on memory computing. In: Proc. – Int. Symp. Parallel Archit. Algorithms Program, PAAP, pp. 172–176 (2016). https://doi.org/10.1109/PAAP.2015.41
Chikofsky, E.J., Cross, J.H.: Reverse engineering and design recovery: a taxonomy. IEEE Softw. 7, 13–17 (1990)
Müller, H.A., Kienle, H.M.: A Small Primer on Software Reverse Engineering (2009)
Databricks Community Edition. https://community.cloud.databricks.com. Accessed 10 November 2020
Van Deursen, A., Moonen, L.: Documenting software systems using types. Sci. Comput. Program. 60, 205–220 (2006)
Canfora, G., Di Penta, M., Cerulo, L.: Achievements and challenges in software reverse engineering. Commun. ACM 54, 142–151 (2011)
Freeman, R.M., Munro, M.: Redocumentation for the Maintenance of Software. In: Proceedings of the 30th Annual Southeast Regional Conference, pp. 413–416 (1992)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Nallusamy, S., Hao, H.M., Zulkifle, F.A. (2021). Software Redocumentation Using Distributed Data Processing Technique to Support Program Understanding for Legacy System: A Proposed Approach. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2021. Lecture Notes in Computer Science(), vol 13051. Springer, Cham. https://doi.org/10.1007/978-3-030-90235-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-90235-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90234-6
Online ISBN: 978-3-030-90235-3
eBook Packages: Computer ScienceComputer Science (R0)