Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1553374.1553516acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Feature hashing for large scale multitask learning

Published: 14 June 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case --- multitask learning with hundreds of thousands of tasks.

    References

    [1]
    Achlioptas, D. (2003). Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66, 671--687.
    [2]
    Bennett, J., & Lanning, S. (2007). The Netflix Prize. Proceedings of Conference on Knowledge Discovery and Data Mining Cup and Workshop 2007.
    [3]
    Bernstein, S. (1946). The theory of probabilities. Moscow: Gastehizdat Publishing House.
    [4]
    Daume, H. (2007). Frustratingly easy domain adaptation. Annual Meeting of the Association for Computational Linguistics (p. 256).
    [5]
    Ganchev, K., & Dredze, M. (2008). Small statistical models by random feature mixing. Workshop on Mobile Language Processing, Annual Meeting of the Association for Computational Linguistics.
    [6]
    Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. Proceedings of the 25th VLDB Conference (pp. 518--529). Edinburgh, Scotland: Morgan Kaufmann.
    [7]
    Langford, J., Li, L., & Strehl, A. (2007). Vowpal wabbit online learning project (Technical Report). http://hunch.net/?p=309.
    [8]
    Ledoux, M. (2001). The concentration of measure phenomenon. Providence, RI: AMS.
    [9]
    Li, P., Church, K., & Hastie, T. (2007). Conditional random sampling: A sketch-based sampling technique for sparse data. In B. Schöölkopf, J. Platt and T. Hoffman (Eds.), Advances in neural information processing systems 19, 873--880. Cambridge, MA: MIT Press.
    [10]
    Rahimi, A., & Recht, B. (2008). Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20. Cambridge, MA: MIT Press.
    [11]
    Rahimi, A., & Recht, B. (2009). Randomized kitchen sinks. In L. Bottou, Y. Bengio, D. Schuurmans and D. Koller (Eds.), Advances in neural information processing systems 21. Cambridge, MA: MIT Press.
    [12]
    Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels. Proc. Intl. Workshop on Artificial Intelligence and Statistics 12.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
    June 2009
    1331 pages
    ISBN:9781605585161
    DOI:10.1145/1553374

    Sponsors

    • NSF
    • Microsoft Research: Microsoft Research
    • MITACS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    ICML '09
    Sponsor:
    • Microsoft Research

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)148
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evaluation of Malware Classification Models for Heterogeneous DataSensors10.3390/s2401028824:1(288)Online publication date: 3-Jan-2024
    • (2024)Optimal Matrix Sketching over Sliding WindowsProceedings of the VLDB Endowment10.14778/3665844.366584717:9(2149-2161)Online publication date: 1-May-2024
    • (2024)A Review on the emerging technology of TinyMLACM Computing Surveys10.1145/366182056:10(1-37)Online publication date: 22-Jun-2024
    • (2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
    • (2024)Efficient Optimization of Sparse User Encoder RecommendersACM Transactions on Recommender Systems10.1145/36511702:3(1-31)Online publication date: 6-Mar-2024
    • (2024)CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation ModelsProceedings of the ACM on Management of Data10.1145/36393062:1(1-28)Online publication date: 26-Mar-2024
    • (2024)Exploiting Storage for Computing: Computation Reuse in Collaborative Edge ComputingIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621100(1501-1510)Online publication date: 20-May-2024
    • (2024)Position-aware compositional embeddings for compressed recommendation systemsNeurocomputing10.1016/j.neucom.2024.127677592(127677)Online publication date: Aug-2024
    • (2024)Improving Compressed Matrix Multiplication using Control Variate MethodInformation Processing Letters10.1016/j.ipl.2024.106517(106517)Online publication date: Jun-2024
    • (2024)Sparsifying Count SketchInformation Processing Letters10.1016/j.ipl.2024.106490186:COnline publication date: 1-Aug-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media