research-article

Feature hashing for large scale multitask learning

Authors:

Josh AttenbergAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 1113 - 1120

https://doi.org/10.1145/1553374.1553516

Published: 14 June 2009 Publication History

Get Access

Abstract

Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case --- multitask learning with hundreds of thousands of tasks.

References

[1]

Achlioptas, D. (2003). Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66, 671--687.

Digital Library

Google Scholar

[2]

Bennett, J., & Lanning, S. (2007). The Netflix Prize. Proceedings of Conference on Knowledge Discovery and Data Mining Cup and Workshop 2007.

Google Scholar

[3]

Bernstein, S. (1946). The theory of probabilities. Moscow: Gastehizdat Publishing House.

Google Scholar

[4]

Daume, H. (2007). Frustratingly easy domain adaptation. Annual Meeting of the Association for Computational Linguistics (p. 256).

Google Scholar

[5]

Ganchev, K., & Dredze, M. (2008). Small statistical models by random feature mixing. Workshop on Mobile Language Processing, Annual Meeting of the Association for Computational Linguistics.

Google Scholar

[6]

Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. Proceedings of the 25th VLDB Conference (pp. 518--529). Edinburgh, Scotland: Morgan Kaufmann.

Digital Library

Google Scholar

[7]

Langford, J., Li, L., & Strehl, A. (2007). Vowpal wabbit online learning project (Technical Report). http://hunch.net/?p=309.

Google Scholar

[8]

Ledoux, M. (2001). The concentration of measure phenomenon. Providence, RI: AMS.

Google Scholar

[9]

Li, P., Church, K., & Hastie, T. (2007). Conditional random sampling: A sketch-based sampling technique for sparse data. In B. Schöölkopf, J. Platt and T. Hoffman (Eds.), Advances in neural information processing systems 19, 873--880. Cambridge, MA: MIT Press.

Google Scholar

[10]

Rahimi, A., & Recht, B. (2008). Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20. Cambridge, MA: MIT Press.

Google Scholar

[11]

Rahimi, A., & Recht, B. (2009). Randomized kitchen sinks. In L. Bottou, Y. Bengio, D. Schuurmans and D. Koller (Eds.), Advances in neural information processing systems 21. Cambridge, MA: MIT Press.

Google Scholar

[12]

Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels. Proc. Intl. Workshop on Artificial Intelligence and Statistics 12.

Google Scholar

Cited By

View all

Bae H(2024)Evaluation of Malware Classification Models for Heterogeneous DataSensors10.3390/s2401028824:1(288)Online publication date: 3-Jan-2024
https://doi.org/10.3390/s24010288
Yin HWen DLi JWei ZZhang XHuang ZLi F(2024)Optimal Matrix Sketching over Sliding WindowsProceedings of the VLDB Endowment10.14778/3665844.366584717:9(2149-2161)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665847
Tsoukas VGkogkidis ABoumpa EKakarountas A(2024)A Review on the emerging technology of TinyMLACM Computing Surveys10.1145/366182056:10(1-37)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3661820
Show More Cited By

Recommendations

A unified architecture for natural language processing: deep neural networks with multitask learning
ICML '08: Proceedings of the 25th international conference on Machine learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that ...
A Regularization Approach to Learning Task Relationships in Multitask Learning

Multitask learning is a learning paradigm that seeks to improve the generalization performance of a learning task with the help of some other related tasks. In this article, we propose a regularization approach to learning the relationships between ...
ImageNet classification with deep convolutional neural networks

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, ...

Comments

Information & Contributors

Information

Published In

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

453
Total Citations
View Citations
2,648
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)8

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Bae H(2024)Evaluation of Malware Classification Models for Heterogeneous DataSensors10.3390/s2401028824:1(288)Online publication date: 3-Jan-2024
https://doi.org/10.3390/s24010288
Yin HWen DLi JWei ZZhang XHuang ZLi F(2024)Optimal Matrix Sketching over Sliding WindowsProceedings of the VLDB Endowment10.14778/3665844.366584717:9(2149-2161)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665847
Tsoukas VGkogkidis ABoumpa EKakarountas A(2024)A Review on the emerging technology of TinyMLACM Computing Surveys10.1145/366182056:10(1-37)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3661820
Heddes MNunes IGivargis TNicolau A(2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654932
Rendle S(2024)Efficient Optimization of Sparse User Encoder RecommendersACM Transactions on Recommender Systems10.1145/36511702:3(1-31)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3651170
Zhang HLiu ZChen BZhao YZhao TYang TCui B(2024)CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation ModelsProceedings of the ACM on Management of Data10.1145/36393062:1(1-28)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639306
He XYou CQuek T(2024)Exploiting Storage for Computing: Computation Reuse in Collaborative Edge ComputingIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621100(1501-1510)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOM52122.2024.10621100
Mu ZZhuang YTang S(2024)Position-aware compositional embeddings for compressed recommendation systemsNeurocomputing10.1016/j.neucom.2024.127677592(127677)Online publication date: Aug-2024
https://doi.org/10.1016/j.neucom.2024.127677
Verma BDubey PPratap RThakur M(2024)Improving Compressed Matrix Multiplication using Control Variate MethodInformation Processing Letters10.1016/j.ipl.2024.106517(106517)Online publication date: Jun-2024
https://doi.org/10.1016/j.ipl.2024.106517
Verma BPratap RDubey P(2024)Sparsifying Count SketchInformation Processing Letters10.1016/j.ipl.2024.106490186:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.ipl.2024.106490
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

A unified architecture for natural language processing: deep neural networks with multitask learning

A Regularization Approach to Learning Task Relationships in Multitask Learning

ImageNet classification with deep convolutional neural networks

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

A unified architecture for natural language processing: deep neural networks with multitask learning

A Regularization Approach to Learning Task Relationships in Multitask Learning

ImageNet classification with deep convolutional neural networks

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations