DPDR: A Novel Machine Learning Method for the Decision Process for Dimensionality Reduction

Dessureault, Jean-Sébastien; Massicotte, Daniel

doi:10.1007/s42979-023-02394-9

DPDR: A Novel Machine Learning Method for the Decision Process for Dimensionality Reduction

Original Research
Published: 21 December 2023

Volume 5, article number 124, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

121 Accesses
1 Citation
Explore all metrics

Abstract

This paper discusses the critical decision process of extracting or selecting the features in a supervised learning context. It is often confusing to find a suitable method to reduce dimensionality. There are pros and cons to deciding between a feature selection and feature extraction according to the data’s nature and the user’s preferences. Indeed, the user may want to emphasize the results toward integrity or interpretability and a specific data resolution. This paper proposes a new method to choose the best dimensionality reduction method in a supervised learning context. It also helps to drop or reconstruct the features until a target resolution is reached. This target resolution can be user defined, or it can be automatically defined by the method. The method applies a regression or a classification, evaluates the results, and gives a diagnosis about the best dimensionality reduction process in this specific supervised learning context. The main algorithms used are the random forest algorithms, the principal component analysis algorithm, and the multilayer perceptron neural network algorithm. Six use cases are presented, and every one is based on some well-known technique to generate synthetic data. This research also discusses each choice that can be made in the process, aiming to clarify the issues about the entire decision process of selecting or extracting the features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?

High-Dimensional Data Classification

A taxonomy of unsupervised feature selection methods including their pros, cons, and challenges

Article 22 July 2024

Availability of Data and Materials

We used only datasets that a publicly available.

Code Availability

The code is not published yet. It can be provided on demand.

References

Bellman R, Bellman RE, Corporation R. Dynamic programming. Rand Corporation research study. Princeton University Press; 1957. https://books.google.ca/books?id=rZW4ugAACAAJ.
Dessureault J-S, Massicotte D. DPDRC, a novel machine learning method about the decision process for dimensionality reduction before clustering. AI. 2022;3(1):1–21. https://doi.org/10.3390/ai3010001.
Article Google Scholar
Yu J, Zhong H, Kim SB. An ensemble feature ranking algorithm for clustering analysis. J Classif. 2020;37(2):462–89. https://doi.org/10.1007/s00357-019-09330-8.
Article MathSciNet Google Scholar
Zebari R, Abdulazeez A, Zeebaree D, Zebari D, Saeed J. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J Appl Sci Technol Trends. 2020;1(2):56–70. https://doi.org/10.38094/jastt1224.
Article Google Scholar
UR A, Paul S. Feature selection and extraction in data mining. In: 2016 Online International Conference on Green Engineering and Technologies (IC-GET), p. 1–3, 2016. https://doi.org/10.1109/GET.2016.7916845.
Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinformatics. 2015;2015:198363. https://doi.org/10.1155/2015/198363.
Article Google Scholar
Konig A. Dimensionality reduction techniques for multivariate data classification, interactive visualization, and analysis-systematic feature selection vs. extraction. In: KES’2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516), 2000, vol. 1, p. 44–551. https://doi.org/10.1109/KES.2000.885757.
Jović A, Brkić K, Bogunović N. A review of feature selection methods with applications. In: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), p. 1200–5, 2015. https://doi.org/10.1109/MIPRO.2015.7160458.
Mohamad MA, Hassan H, Nasien D, Haron H. A review on feature extraction and feature selection for handwritten character recognition. Int J Adv Comput Sci Appl. 2015. https://doi.org/10.14569/IJACSA.2015.060230.
Article Google Scholar
Khalid S, Khalil T, Nasreen S. A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, p. 372–8, 2014. https://doi.org/10.1109/SAI.2014.6918213.
Ghojogh B, Samad MN, Mashhadi SA, Kapoor T, Ali W, Karray F, Crowley M. Feature selection and feature extraction in pattern analysis: a literature review. 2019. arXiv preprint arXiv:1905.02845.
Shah FP, Patel V. A review on feature selection and feature extraction for text classification. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2016, p. 2264–8. https://doi.org/10.1109/WiSPNET.2016.7566545.
Swiniarski RW, Skowron A. Rough set methods in feature selection and recognition. Pattern Recognit Lett. 2003;24(6):833–49. https://doi.org/10.1016/S0167-8655(02)00196-4.
Article Google Scholar
Lu Y, Cohen I, Zhou XS, Tian Q. Feature selection using principal feature analysis. In: Proceedings of the 15th ACM International Conference on Multimedia. MM ’07. Association for Computing Machinery; 2007. p. 301–4. https://doi.org/10.1145/1291233.1291297.
Mollaee M, Moattar MH. A novel feature extraction approach based on ensemble feature selection and modified discriminant independent component analysis for microarray data classification. Biocybern Biomed Eng. 2016;36(3):521–9. https://doi.org/10.1016/j.bbe.2016.05.001.
Article Google Scholar
Wahab NIA, Mohamed A, Hussain A. Feature selection and extraction methods for power systems transient stability assessment employing computational intelligence techniques. Neural Process Lett. 2012;35(1):81–102. https://doi.org/10.1007/s11063-011-9205-x.
Article Google Scholar
He B, Shah S, Maung C, Arnold G, Wan G, Schweitzer H. Heuristic search algorithm for dimensionality reduction optimally combining feature selection and feature extraction. Proc AAAI Conf Artif Intell. 2019;33(1):2280–7. https://doi.org/10.1609/aaai.v33i01.33012280.
Article Google Scholar
Sreevani Murthy CA. Bridging feature selection and extraction: compound feature generation. IEEE Trans Knowl Data Eng. 2017;29(4):757–70. https://doi.org/10.1109/TKDE.2016.2619712.
Article Google Scholar
Pölsterl S, Conjeti S, Navab N, Katouzian A. Survival analysis for high-dimensional, heterogeneous medical data: exploring feature extraction as an alternative to feature selection. Artif Intell Med. 2016;72:1–11. https://doi.org/10.1016/j.artmed.2016.07.004.
Article Google Scholar
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A. Recent advances and emerging challenges of feature selection in the context of big data. Knowl-Based Syst. 2015;86:33–45. https://doi.org/10.1016/j.knosys.2015.05.014.
Article Google Scholar
Manikandan G, Abirami S. A survey on feature selection and extraction techniques for high-dimensional microarray datasets. In: Margret Anouncia S, Wiil UK, editors. Knowledge computing and its applications: knowledge computing in specific domains: Volume II. Berlin: Springer; 2018. p. 311–3. https://doi.org/10.1007/978-981-10-8258-0_14.
Chapter Google Scholar
De Stefano C, Fontanella F, Marrocco C, Scotto di Freca A. A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit Lett. 2014;35:130–41. https://doi.org/10.1016/j.patrec.2013.01.026.
Article Google Scholar
Lin J-Y, Ke H-R, Chien B-C, Yang W-P. Classifier design with feature selection and feature extraction using layered genetic programming. Expert Syst Appl. 2008;34(2):1384–93. https://doi.org/10.1016/j.eswa.2007.01.006.
Article Google Scholar
Guyon I, Gunn S, Nikravesh M, Zadeh LA. Feature extraction: foundations and applications. Berlin: Springer; 2008. (Google-Books-ID: FOTzBwAAQBAJ).
Google Scholar
Liu H, Motoda H. Feature extraction, construction and selection: a data mining perspective. Berlin: Springer; 1998. (Google-Books-ID: zi_0EdWW5fYC).
Book Google Scholar
Masters T. Modern data mining algorithms in C++ and CUDA C: recent developments in feature extraction and selection algorithms for data science. Apress L.P.; 2020. https://search.ebscohost.com/login.aspx?direct=true &scope=site &db=nlebk &db=nlabk &AN=2494148. Accessed 2022-06-28.
Galli S. python feature engineering cookbook: over 70 recipes for creating, engineering, and transforming features to build machine learning models. Packt Publishing; 2020. https://search.ebscohost.com/login.aspx?direct=true &scope=site &db=nlebk &db=nlabk &AN=2358819. Accessed 2022-06-28.
Biau G, Scornet E. A random forest guided tour. TEST. 2016;25(2):197–227.
Article MathSciNet Google Scholar
Gulea T. How not to use random forest. 2019. Available at https://medium.com/turo-engineering/how-not-to-use-random-forest-265a19a68576. Accessed 2021-04-28.
Paul A, Mukherjee DP, Das P, Gangopadhyay A, Chintha AR, Kundu S. Improved random forest for classification. IEEE Trans Image Process. 2018;27(8):4012–24. https://doi.org/10.1109/TIP.2018.2834830.
Article MathSciNet Google Scholar
Ronaghan S. The mathematics of Decision Trees, Random Forest and feature importance in Scikit-learn and Spark. 2019. https://towardsdatascience.com/the-mathematics-of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark-f2861df67e3. Accessed 2021-03-24.
Chang Y, Li W, Yang Z. Network intrusion detection based on Random Forest and support vector machine. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), vol. 1, p. 635–8, 2017. https://doi.org/10.1109/CSE-EUC.2017.118.
Radovic M, Ghalwash M, Filipovic N, Obradovic Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform. 2017;18(1):9. https://doi.org/10.1186/s12859-016-1423-9.
Article Google Scholar
Keshava N, Mustard JF. Spectral unmixing. IEEE J Mag IEEE Xplore. 2021. Available at https://ieeexplore.ieee.org/document/974727.
Chen C-P, Ding Y-J, Liu S-Y. City economical function and industrial development: case study along the railway line in North Xinjiang in China. J Urban Plan Dev. 2008;134(4):153–8. https://doi.org/10.1061/(ASCE)0733-9488(2008)134:4(153).
Article Google Scholar
Ang L-M, Seng KP, Zungeru AM, Ijemaru GK. Big sensor data systems for smart cities. IEEE Internet Things J. 2017. https://doi.org/10.1109/JIOT.2017.2695535.
Article Google Scholar
Marsal-Llacuna M-L, Colomer-Llinàs J, Meléndez-Frigola J. Lessons in urban monitoring taken from sustainable and livable cities to better address the smart cities initiative. Technol Forecast Soc Change. 2015;90:611–22. https://doi.org/10.1016/j.techfore.2014.01.012.
Article Google Scholar
Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65(6):386–408. https://doi.org/10.1037/h0042519.
Article Google Scholar
Taud H, Mas JF. Multilayer perceptron (MLP). In: Camacho Olmedo MT, Paegelow M, Mas J-F, Escobar F, editors. Geomatic approaches for modeling land change scenarios. Lecture notes in geoinformation and cartography. Berlin: Springer; 2018. p. 451–5. https://doi.org/10.1007/978-3-319-60801-3_27.
Chapter Google Scholar
Bounds, Lloyd, Mathew, Waddell. A multilayer perceptron network for the diagnosis of low back pain. In: IEEE 1988 International Conference on Neural Networks, 1988, p. 481–92. https://doi.org/10.1109/ICNN.1988.23963.
Park Y-S, Lek S. Chapter 7—artificial neural networks: multilayer perceptron for ecological modeling. In: Jørgensen SE, editor. Developments in environmental modelling. Ecological model types, vol. 28. Amsterdam: Elsevier; 2016. p. 123–40. https://doi.org/10.1016/B978-0-444-63623-2.00007-4.
Chapter Google Scholar
Kwon K, Kim D, Park H. A parallel MR imaging method using multilayer perceptron. Med Phys. 2017;44(12):6209–24. https://doi.org/10.1002/mp.12600.
Article Google Scholar
Avila J, Hauck T. Scikit-learn cookbook: over 80 recipes for machine learning in python with Scikit-learn. Birmingham: Packt Publishing Ltd; 2017.
Google Scholar
Kramer O. Scikit-learn. In: Kramer O, editor. Machine learning for evolution strategies. Studies in big data. Berlin: Springer; 2016. p. 45–53. https://doi.org/10.1007/978-3-319-33383-0_5.
Chapter Google Scholar
Holt J, Sievert S. Training machine learning models faster with dask. In: SciPy Conferences, 2021.
Lemaıtre G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(17):1–5.
Google Scholar

Download references

Acknowledgements

This work has been supported by the “Cellule d’expertise en robotique et intelligence artificielle” of the Cégep de Trois-Rivières and the Natural Sciences and Engineering Research Council.

Funding

This work has been supported by the Natural Sciences and Engineering Research Council.

Author information

Authors and Affiliations

LSSI ‑ Laboratory of Signal and System Integration, Electrical and Computer Engineering Department, Université du Québec à Trois-Rivières, 3351 Bd des Forges, Trois-Rivières, QC, G8Z 4M3, Canada
Jean-Sébastien Dessureault & Daniel Massicotte

Authors

Jean-Sébastien Dessureault
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Massicotte
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JSD: Conceptualization, Methodology, Software, Writing—Original Draft, Software. DM: Conceptualization, Methodology, Validation, Resources, Writing—Review and Editing, Supervision, Project administration, Funding acquisition.

Corresponding author

Correspondence to Jean-Sébastien Dessureault.

Ethics declarations

Conflict of interest

The authors confirm there are no conflicts of interest.

Ethical approval

The work uses publicly available and non-identifiable information. No ethical approval was needed.

Consent to participate

Not applicable, since no human participant was involved in the evaluation of our study.

Consent for publication

Not applicable, since all datasets used in this study are released by third parties.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dessureault, JS., Massicotte, D. DPDR: A Novel Machine Learning Method for the Decision Process for Dimensionality Reduction. SN COMPUT. SCI. 5, 124 (2024). https://doi.org/10.1007/s42979-023-02394-9

Download citation

Received: 12 July 2022
Accepted: 05 October 2023
Published: 21 December 2023
DOI: https://doi.org/10.1007/s42979-023-02394-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DPDR: A Novel Machine Learning Method for the Decision Process for Dimensionality Reduction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?

High-Dimensional Data Classification

A taxonomy of unsupervised feature selection methods including their pros, cons, and challenges

Availability of Data and Materials

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

DPDR: A Novel Machine Learning Method for the Decision Process for Dimensionality Reduction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?

High-Dimensional Data Classification

A taxonomy of unsupervised feature selection methods including their pros, cons, and challenges

Availability of Data and Materials

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation