Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A New Horizo-Vertical Distributed Feature Selection Approach

Published: 01 November 2018 Publication History

Abstract

Feature selection technique has been a very active research topic that addresses the problem of reducing the dimensionality. Whereas, datasets are continuously growing over time both in samples and features number. As a result, handling both irrelevant and redundant features has become a real challenge. In this paper we propose a new straightforward framework which combines the horizontal and vertical distributed feature selection technique, called Horizo-Vertical Distributed Feature Selection approach (HVDFS), aimed at achieving good performances as well as reducing the number of features. The effectiveness of our approach is demonstrated on three well-known datasets compared to the centralized and the previous distributed approach, using four well-known classifiers.

References

[1]
1. Chan, P. K., S. J. Stolfo. Toward Parallel and Distributed Learning by Meta-Learning. – In: Proc. of AAAI Workshop in Knowledge Discovery in Databases, 1993, pp. 227-240.
[2]
2. Ananthanarayana, V. S., D. K. Subramanian, M. N. Murty. Scalable, Distributed and Dynamic Mining of Association Rules. – In: Proc. of International Conference on High-Performance Computing, Springer, Berlin, Heidelberg, 2000, pp. 559-566.
[3]
3. Tsoumakas, G., I. Vlahavas. Distributed Data Mining of Large Classifier Ensembles. – In: Proc. of Companion Volume of the Second Hellenic Conference on Artificial Intelligence, 2002, pp. 249-256.
[4]
4. Das, K., K. Bhaduri, H. Kargupta. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-To-Peer Networks. – Knowledge and Information Systems, Vol. 24, 2010, No 3, pp. 341-367.
[5]
5. Sheela, M. A., K. Vijayalakshmi. Partition Based Perturbation for Privacy Preserving Distributed Data Mining. – Cybernetics and Information Technologies, Vol. 17, 2017, No 2, pp. 44-55.
[6]
6. Skillicorn, D. B., S. M. McConnell. Distributed Prediction from Vertically Partitioned Data. – Journal of Parallel and Distributed Computing, Vol. 68, 2008, No 1, pp. 16-36.
[7]
7. Rokach, L. Taxonomy for Characterizing Ensemble Methods in Classification Tasks: A Review and Annotated Bibliography. – Computational Statistics & Data Analysis, Vol. 53, 2009, No 12, pp. 4046-4072.
[8]
8. Hasnat, A., A. U. Molla. Feature Selection in Cancer Microarray Data Using Multi-Objective Genetic Algorithm Combined with Correlation Coefficient. – In: Proc. of International Conference on Emerging Technological Trends, IEEE, 2016, pp. 1-6.
[9]
9. Saeys, Y., I. Inza, P. Larrañaga. A Review of Feature Selection Techniques in Bioinformatics. – Bioinformatics, Vol. 23, 2007, No 19, pp. 2507-2517.
[10]
10. Ding, C., H. Peng. Minimum Redundancy Feature Selection from Microarray Gene Expression Data. – Journal of Bioinformatics and Computational Biology, Vol. 3, 2005, No 2, pp. 185-205.
[11]
11. Satorra, A., P. M. Bentler. A Scaled Difference Chi-Square Test Statistic for Moment Structure Analysis. – Psychometrika, Vol. 66, 2001, No 4, pp. 507-514.
[12]
12. Kononenko, I. Estimating Attributes: Analysis and Extensions of RELIEF. – In: Proc. of European Conference on Machine Learning, 1994, pp. 171-182.
[13]
13. Dai, J., Q. Xu. Attribute Selection Based on Information Gain Ratio in Fuzzy Rough Set Theory with Application to Tumor Classification. – Applied Soft Computing, Vol. 13, 2013, No 1, pp. 211-221.
[14]
14. Sikonja, M. R., I. Kononenko. An Adaptation of Relief for Attribute Estimation on Regression. Machine Learning. – In: Proc. of 14th International Conference on Machine Learning, Nashville, 1997, pp. 296-304.
[15]
15. Hall, M. A. Correlation-Based Feature Subset Selection for Machine Learning. – In: Thesis Submitted in Partial Fulfilment of the Requirements of the Degree of Doctor of Philosophy at the University of Waikato, 1998.
[16]
16. Dash, M. H. Liu. Consistency-Based Search in Feature Selection. – Artificial Intelligence, Vol. 151, 2003, No 1-2, pp. 155-176.
[17]
17. Peng, H., F. Long, C. Ding. Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. – IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, 2005, No 8, pp. 1226-1238.
[18]
18. Deisy, C., S. Baskar, N. Ramraj, J. S. Koori. P. Jeevanandam. A Novel Information Theoretic-Interact Algorithm (IT-IN) for Feature Selection Using Three Machine Learning Algorithms. – Expert Systems with Applications, Vol. 37, 2010, No 12, pp. 7589-7597.
[19]
19. Quinlan. J. R. C4.5: Programs for Machine Learning. – In: Elsevier, Machine Learning, Morgan Kaufmann Publishers, 2014.
[20]
20. Nielsen, T. D., F. V. Jensen. Bayesian Networks and Decision Graphs. – In: Springer Science and Business Media, 2009.
[21]
21. Cherkassky, V., Y. Ma. Practical Selection of SVM Parameters and Noise Estimation for SVM Regression. – Neural Networks, Vol. 17, 2004, No 1, pp. 113-126.
[22]
22. Guo, G., H. Wang, D. Bell, Y. Bi, K. Greer. KNN Model-Based Approach in Classification. – In: Proc. of OTM Confederated International Conferences on the Move to Meaningful Internet Systems, Springer, Berlin, Heidelberg, 2003, pp. 986-996.
[23]
23. Bolón-Canedo, V., N. Sánchez-Maroño, J. Cerviño-Rabuñal. Scaling up Feature Selection: A Distributed Filter Approach. – In: Proc. of Conference of the Spanish Association for Artificial Intelligence, Springer, Berlin, Heidelberg, 2013, pp. 121-130.
[24]
24. Bolón-Canedo, V., N. Sánchez-Maroño, A. Alonso-Betanzos. A Distributed Feature Selection Approach Based on a Complexity Measure. – In: Proc. of International Work-Conference on Artificial Neural Networks. Springer, Cham, 2015, pp. 15-28.
[25]
25. Das, K., K. Bhaduri, H. Kargupta. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks. – Knowledge and Information Systems, Vol. 24, 2010, No 3, pp. 341-367.
[26]
26. Tsoumakas, G., I. Vlahavas. Distributed Data Mining of Large Classifier Ensembles. – In: Proc. of Companion Volume of the Second Hellenic Conference on Artificial Intelligence. 2002.
[27]
27. Peralta, D., S. Del Río, S. Ramírez-Gallego, I. Triguero, J. M. Benitez, F. Herrera. Evolutionary Feature Selection for Big Data Classification: A Mapreduce Approach. – Mathematical Problems in Engineering, Vol. 2015, 2015.
[28]
28. Cohen, S., L. Rokach, O. Maimon. Decision-Tree Instance-Space Decomposition with Grouped Gain-Ratio. – Information Sciences, Vol. 177, 2007, No 17, pp. 3592-3612.
[29]
29. Skillicorn, D. B., M. M. Sabine. Distributed Prediction from Vertically Partitioned Data. – Journal of Parallel and Distributed Computing, Vol. 68, 2008, No 1, pp. 16-36.
[30]
30. McConnell, S., D. B. Skillicorn. Building Predictors from Vertically Distributed Data. – In: Proc. of the Conference of the Centre for Advanced Studies on Collaborative Research. IBM Press, 2004, pp. 150-162.
[31]
31. Bolón-Canedo, V., N. Sánchez-Marono, J. Cervino-Rabunal. Toward Parallel Feature Selection from Vertically Partitioned Data. – In: Proc. of European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium, 2014, pp. 23-25.
[32]
32. Morán-Fernández, L., V. Bolón-Canedo, A. Alonso-Betanzos. Centralized vs. Distributed Feature Selection Methods Based on Data Complexity Measures. – Knowledge-Based Systems, Vol. 117, 2017, pp. 27-45.
[33]
33. Banerjee, M., S. Chakravarty. Privacy Preserving Feature Selection for Distributed Data Using Virtual Dimension. – In: Proc. of 20th ACM International Conference on Information and Knowledge Management. ACM. 2011.
[34]
34. Bache, K., M. Linchman. UCI Machine Learning Repository. – In: University of California, Irvine, School of Information and Computer Sciences. Online; Accessed January 2016. http://archive.ics.uci.edu/ml/
[35]
35. University, V. Gene Expression Model Selector. Online. Accessed January 2016. http://www.gems-system.org/
[36]
36. Oreski, D., O. Stjepan, K. Bozidar. Effects of Dataset Characteristics on the Performance of Feature Selection Techniques. – Applied Soft Computing, Vol. 52, 2017, pp. 109-119.
[37]
37. Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten. The Weka Data Mining Software: An Update. – ACM SIGKDD Explorations Newsletter, Vol. 11, 2009, No 1, pp. 10-18.
[38]
38. Song, Q., N. Jingjie, W. Guangtao. A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data. – IEEE Transactions on Knowledge and Data Engineering, Vol. 25, 2013, No 1, pp. 1-14.
[39]
39. Dharmaraj, R. P., J. B. Patil. Malicious URLs Detection Using Decision Tree Classifiers and Majority Voting Technique. – Cybernetics and Information Technologies, Vol. 18, 2018, No 1, pp. 11-29.
[40]
40. Xing, E., M. Jordan, R. Karp. Feature Selection for High-Dimensional Genomic Microarray Data. – In: Proc. of Eighteenth International Conference on Machine Learning, 2001, pp. 601-608.

Index Terms

  1. A New Horizo-Vertical Distributed Feature Selection Approach
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Cybernetics and Information Technologies
        Cybernetics and Information Technologies  Volume 18, Issue 4
        Nov 2018
        128 pages
        ISSN:1314-4081
        EISSN:1314-4081
        Issue’s Table of Contents
        This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

        Publisher

        Walter de Gruyter GmbH

        Berlin, Germany

        Publication History

        Published: 01 November 2018

        Author Tags

        1. Feature selection
        2. distributed approach
        3. dimensionality reduction

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 01 Jan 2025

        Other Metrics

        Citations

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media