Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3627673.3679994acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper
Open access

Scalable Unsupervised Feature Selection with Reconstruction Error Guarantees via QMR Decomposition

Published: 21 October 2024 Publication History

Abstract

Unsupervised feature selection (UFS) methods have garnered significant attention for their capability to eliminate redundant features without relying on class label information. However, their scalability to large datasets remains a challenge, rendering common UFS methods impractical for such applications. To address this issue, we introduce QMR-FS, a greedy forward filtering approach that selects linearly independent features up to a specified relative tolerance, ensuring that any excluded features can be reconstructed from the retained set within this tolerance. This is achieved through the QMR matrix decomposition, which builds upon the well-known QR decomposition. QMR-FS benefits from linear complexity relative to the number of instances and boasts exceptional performance due to its ability to leverage parallelized computation on both CPU and GPU. Despite its greedy nature, QMR-FS achieves comparable classification and clustering accuracies across multiple datasets when compared to other UFS methods, while achieving runtimes approximately 10 times faster than recently proposed scalable UFS methods for datasets ranging from 100 million to 1 billion elements.

References

[1]
Farhad Abedinzadeh Torghabeh, Yeganeh Modaresnia, and Seyyed Abed Hosseini. 2023. Auto-UFSTool: An Automatic Unsupervised Feature Selection Toolbox for MATLAB. Journal of AI and Data Mining, Vol. 11, 4 (2023), 517--524. https://doi.org/10.22044/jadm.2023.12820.2434
[2]
David Arthur and Sergei Vassilvitskii. 2007. K-Means: The Advantages of Careful Seeding. In SODA '07. ACM, 1027--1035. https://dl.acm.org/doi/10.5555/1283383.1283494
[3]
Deng Cai, Xiaofei He, and Jiawei Han. 2011. Speed up kernel discriminant analysis. The VLDB Journal, Vol. 20 (2011), 21--33. https://doi.org/10.1007/s00778-010-0189--3
[4]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, Vol. 20 (1995), 273--297. https://doi.org/10.1007/BF00994018
[5]
Thomas M. Cover and Joy A. Thomas. 2005. Entropy, Relative Entropy, and Mutual Information. John Wiley & Sons, Ltd. 13--55 pages. https://doi.org/10.1002/047174882X.ch2
[6]
S.K. Das. 1971. Feature Selection with a Linear Dependence Measure. IEEE Trans. Comput., Vol. C-20, 9 (1971), 1106--1109. https://doi.org/10.1109/T-C.1971.223412
[7]
M. Dash, H. Liu, and J. Yao. 1997. Dimensionality reduction of unsupervised data. In Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence. IEEE, 532--539. https://doi.org/10.1109/TAI.1997.632300
[8]
Mark Fanty and Ronald Cole. 1990. Spoken Letter Recognition. In Advances in Neural Information Processing Systems, Vol. 3. Morgan-Kaufmann, 220--226. hrefhttps://proceedings.neurips.cc/paper/1990/hash/49182f81e6a13cf5eaa496d51fea6406-Abstract.htmlhttps://proceedings.neurips.cc/paper/1990.
[9]
Gene H Golub and Charles F Van Loan. 2013. Matrix computations fourth ed.). JHU press. https://doi.org/10.56021/9781421407944
[10]
Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research, Vol. 3, Mar (2003), 1157--1182. https://dl.acm.org/doi/10.5555/944919.944968
[11]
Xiaofei He, Deng Cai, and Partha Niyogi. 2005. Laplacian score for feature selection. In NIPS'05. MIT Press, 507--514. https://dl.acm.org/doi/abs/10.5555/2976248.2976312
[12]
Roger A. Horn and Charles R. Johnson. 2012. Matrix Analysis 2 ed.). Cambridge University Press. https://doi.org/10.1017/CBO9781139020411
[13]
Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2010. Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?. In Scientific and Statistical Database Management. Springer Berlin Heidelberg, 482--500. https://doi.org/10.1007/978--3--642--13818--8_34
[14]
Haojie Hu, Rong Wang, Feiping Nie, Xiaojun Yang, and Weizhong Yu. 2018. Fast unsupervised feature selection with anchor graph and $ell_2,1$-norm regularization. Multimedia Tools Appl., Vol. 77, 17 (sep 2018), 22099--22113. https://doi.org/10.1007/s11042-017--5582-0
[15]
Haojie Hu, Rong Wang, Xiaojun Yang, and Feiping Nie. 2019. Scalable and Flexible Unsupervised Feature Selection. Neural Computation, Vol. 31, 3 (2019), 517--537. https://doi.org/10.1162/neco_a_01163
[16]
Markelle Kelly, Rachel Longjohn, and Kolby Nottingham. 2024. The UCI Machine Learning Repository. https://archive.ics.uci.edu.
[17]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. SNAP Patents. Stanford Large Network Dataset Collection. https://snap.stanford.edu/data/cit-Patents.html.
[18]
Zechao Li, Yi Yang, Jing Liu, Xiaofang Zhou, and Hanqing Lu. 2012. Unsupervised feature selection using nonnegative spectral analysis. In AAAI'12. AAAI Press, 1026--1032. https://doi.org/10.1609/aaai.v26i1.8289
[19]
Yanfang Liu, Dongyi Ye, Wenbin Li, Huihui Wang, and Yang Gao. 2020. Robust neighborhood embedding for unsupervised feature selection. Knowledge-Based Systems, Vol. 193 (2020), 105462. https://doi.org/10.1016/j.knosys.2019.105462
[20]
Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge University Press. https://nlp.stanford.edu/IR-book/information-retrieval-book.html
[21]
Chris Meek, Bo Thiesson, and David Heckerman. 2001. US Census Data (1990). UCI Machine Learning Repository. https://doi.org/10.24432/C5VP42.
[22]
Pabitra Mitra, Chivukula Anjaneya Murthy, and Sankar K. Pal. 2002. Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, 3 (2002), 301--312. https://doi.org/10.1109/34.990133
[23]
C. Radhakrishna Rao. 1971. Generalized inverse of matrices and its applications. Wiley-Interscience. This book is not available online. Readers who cannot access a physical copy can also look in this online document for the same statement: hrefhttps://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/0550c89b69c99e97dcbf52074e293308_MIT18_06SCF11_Ses3.8sum.pdftextttMIT course document.
[24]
UCI Machine Learning Repository. 2019. GitHub MUSAE. UCI Machine Learning Repository. https://doi.org/10.24432/C5Z02B.
[25]
Saúl Solorio-Fernández, J Ariel Carrasco-Ochoa, and José Fco Martínez-Trinidad. 2020. A review of unsupervised feature selection methods. Artificial Intelligence Review, Vol. 53, 2 (2020), 907--948. https://doi.org/10.1007/s10462-019-09682-y
[26]
Saúl Solorio-Fernández, José Fco. Martínez-Trinidad, and J. Ariel Carrasco-Ochoa. 2017. A new Unsupervised Spectral Feature Selection Method for mixed data: A filter approach. Pattern Recognition, Vol. 72 (2017), 314--326. https://doi.org/10.1016/j.patcog.2017.07.020
[27]
Salvatore Stolfo, Wei Fan, Wenke Lee, Andreas Prodromidis, and Philip Chan. 1999. KDD Cup 1999 Data. UCI Machine Learning Repository. https://doi.org/10.24432/C51C7N.
[28]
Roy Varshavsky, Assaf Gottlieb, Michal Linial, and David Horn. 2006. Novel Unsupervised Feature Filtering of Biological Data. Bioinformatics, Vol. 22, 14 (07 2006), e507--e513. https://doi.org/10.1093/bioinformatics/btl214
[29]
Shiping Wang, Witold Pedrycz, Qingxin Zhu, and William Zhu. 2015. Subspace learning for unsupervised feature selection via matrix factorization. Pattern Recognition, Vol. 48, 1 (2015), 10--19. https://doi.org/10.1016/j.patcog.2014.08.004
[30]
Shiping Wang, Witold Pedrycz, Qingxin Zhu, and William Zhu. 2015. Unsupervised feature selection via maximum projection and minimum redundancy. Knowledge-Based Systems, Vol. 75 (2015), 19--29. https://doi.org/10.1016/j.knosys.2014.11.008
[31]
Yi Yang, Heng Tao Shen, Zhigang Ma, Zi Huang, and Xiaofang Zhou. 2011. $ell_2,1$-norm regularized discriminative feature selection for unsupervised learning. In IJCAI'11. AAAI Press, 1589--1594. https://dl.acm.org/doi/10.5555/2283516.2283660
[32]
Aihong Yuan, Mengbo You, Dongjian He, and Xuelong Li. 2022. Convex Non-Negative Matrix Factorization With Adaptive Graph for Unsupervised Feature Selection. IEEE Transactions on Cybernetics, Vol. 52, 6 (2022), 5522--5534. https://doi.org/10.1109/TCYB.2020.3034462
[33]
Zhong Yuan, Hongmei Chen, Pengfei Zhang, Jihong Wan, and Tianrui Li. 2022. A Novel Unsupervised Approach to Heterogeneous Feature Selection Based on Fuzzy Mutual Information. IEEE Transactions on Fuzzy Systems, Vol. 30, 9 (2022), 3395--3409. https://doi.org/10.1109/TFUZZ.2021.3114734
[34]
Zheng Zhao and Huan Liu. 2007. Spectral feature selection for supervised and unsupervised learning. In ICML'07. ACM, 1151--1157. https://doi.org/10.1145/1273496.1273641

Index Terms

  1. Scalable Unsupervised Feature Selection with Reconstruction Error Guarantees via QMR Decomposition

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
      October 2024
      5705 pages
      ISBN:9798400704369
      DOI:10.1145/3627673
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 October 2024

      Check for updates

      Author Tags

      1. feature selection
      2. linear independence
      3. scalability
      4. unsupervised learning

      Qualifiers

      • Short-paper

      Funding Sources

      Conference

      CIKM '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 56
        Total Downloads
      • Downloads (Last 12 months)56
      • Downloads (Last 6 weeks)21
      Reflects downloads up to 24 Dec 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media