Embedding Feature Selection for Large-scale Hierarchical Classification

Naik, Azad; Rangwala, Huzefa

Computer Science > Machine Learning

arXiv:1706.01581 (cs)

[Submitted on 6 Jun 2017]

Title:Embedding Feature Selection for Large-scale Hierarchical Classification

Authors:Azad Naik, Huzefa Rangwala

View PDF

Abstract:Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time and minimizes the memory requirements by compressing the total size of learned model weight vectors. Majority of the studies have also shown feature selection to be competent and successful in improving the classification accuracy by removing irrelevant features. In this work, we investigate various filter-based feature selection methods for dimensionality reduction to solve the large-scale HC problem. Our experimental evaluation on text and image datasets with varying distribution of features, classes and instances shows upto 3x order of speed-up on massive datasets and upto 45% less memory requirements for storing the weight vectors of learned model without any significant loss (improvement for some datasets) in the classification accuracy. Source Code: this https URL.

Comments:	IEEE International Conference on Big Data (IEEE BigData 2016)
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1706.01581 [cs.LG]
	(or arXiv:1706.01581v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1706.01581

Submission history

From: Azad Naik [view email]
[v1] Tue, 6 Jun 2017 01:56:51 UTC (1,248 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2017-06

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Azad Naik
Huzefa Rangwala

export BibTeX citation

Computer Science > Machine Learning

Title:Embedding Feature Selection for Large-scale Hierarchical Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Embedding Feature Selection for Large-scale Hierarchical Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators