research-article

Variance reduction in feature hashing using MLE and control variate method

Authors:

Bhisham Dev Verma,

Rameshwar Pratap,

Manoj ThakurAuthors Info & Claims

Machine Learning, Volume 111, Issue 7

Pages 2631 - 2662

https://doi.org/10.1007/s10994-022-06166-z

Published: 01 July 2022 Publication History

Abstract

The feature hashing algorithm introduced by Weinberger et al. (2009) is a popular dimensionality reduction algorithm that compresses high dimensional data points into low dimensional data points that closely approximate the pairwise inner product. This algorithm has been used in many fundamental machine learning applications such as model compression (Chen et al. 2015), spam classification (Weinberger et al. 2009), compressing text classifiers (Joulin et al. 2016), large scale image classification (Mensink et al. 2012). However, a limitation of this approach is that the variance of its estimator for the inner product tends to be large for small values of the reduced dimensions, making the estimate less reliable. We address this challenge and suggest two simple and practical solutions in this work. Our approach relies on control variate (CV) and maximum likelihood estimator (MLE), which are popular variance reduction techniques used in statistics. We show that these methods lead to significant variance reduction in the inner product similarity estimation. We give theoretical bounds on the same and complement it via extensive experiments on synthetic and real-world datasets. Given the simplicity and effectiveness of our approach, we hope that it can be adapted in practice.

References

[1]

Achlioptas Dimitris Database-friendly random projections: Johnson-lindenstrauss with binary coins The Journal of Computer and System Sciences 2003 66 4 671-687

[2]

Agarwal, A, Chapelle, O., Dudík, M., & Langford, J. (2014). A reliable effective terascale linear learning system.,15, 1111–1133.

[3]

Bellman Richard Dynamic programming Science 1966 153 3731 34-37

[4]

Broder, A.Z., Charikar, M., Frieze, A.M., & Mitzenmacher, M. (1998). Min-wise independent permutations (extended abstract). In Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23-26, 1998, pp. 327–336.

[5]

Cardano G Ars magna or the rules of algebra 1993 Dover Publications

[6]

Charikar, M. (2002). Similarity estimation techniques from rounding algorithms. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pp. 380–388.

[7]

Chen, W., Wilson, J., Tyree, S., Weinberger, K., & Chen, Y. (2015). Compressing neural networks with the hashing trick. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp. 2285–2294, Lille, France, 07–09 Jul 2015. PMLR.

[8]

Dasgupta, A., Kumar, R., & Sarlós, T., (2010). A sparse johnson: Lindenstrauss transform. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pp. 341–350.

[9]

Dasgupta S and Gupta A An elementary proof of a theorem of johnson and lindenstrauss Random Structures and Algorithms 2003 22 1 60-65

[10]

Ertl, O. (2018). Bagminhash-minwise hashing algorithm for weighted sets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1368–1377.

[11]

Feller W An Introduction to Probability Theory and Its Applications 1968 Wiley

[12]

Guyon I, Gunn S, Ben-Hur A, and Dror G Saul LK, Weiss Y, and Bottou L Result analysis of the nips 2003 feature selection challenge Advances in Neural Information Processing Systems 17 2005 MIT Press 545-552

[13]

Indyk P Stable distributions, pseudorandom generators, embeddings, and data stream computation Journal of the ACM (JACM) 2006 53 3 307-323

[14]

Ioffe, S. (2010). Improved consistent sampling, weighted minhash and l1 sketching. In 2010 IEEE International Conference on Data Mining, pp. 246–255. IEEE.

[15]

Ji, J., Li, J., Yan, S., Zhang, B., & Tian, Q. (2012). Super-bit locality-sensitive hashing. In Advances in neural information processing systems, pp. 108–116.

[16]

Johnson, W.B., & Lindenstrauss, J. (1983). Extensions of lipschitz mappings into a hilbert space. Conference in modern analysis and probability (New Haven, Conn., 1982), Amer. Math. Soc., Providence, R.I., pages 189–206.

[17]

Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016). Fasttext.zip: Compressing text classification models. CoRR, arXiv:abs/1612.03651.

[18]

Kang, K. (2017). Using the multivariate normal to improve random projections. In Intelligent Data Engineering and Automated Learning - IDEAL 2017 - 18th International Conference, Guilin, China, October 30 - November 1, 2017, Proceedings, pp. 397–405.

[19]

Kang, K., & Hooker, G. (2017). Random projections with control variates. In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2017, Porto, Portugal, February 24-26, 2017, pp. 138–147.

[20]

Kang, K., & Pin, W.W. (2018). Improving sign random projections with additional information. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 2484–2492.

[21]

Lavenberg S and Welch P A perspective on the use of control variables to increase the efficiency of monte carlo simulations Management Science 1981 27 322-335

[22]

Li, P. (2007). Very sparse stable random projections for dimension reduction in l

α

(

0 < α \leq 2

) norm. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 440–449.

[23]

Li, P. (2008). Estimators and tail bounds for dimension reduction in

l_{α}

(

0 < α \leq 2

) using stable random projections. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 10–19.

[24]

Li, P. (2019). Sign-full random projections. In Proceedings of the AAAI Conference on Artificial Intelligence,33, 4205–4212.

[25]

Li, P., & Hastie, T. (2007). A unified near-optimal estimator for dimension reduction in

l_{α}

(

0 < α \leq 2

) using stable random projections. In NIPS, pp. 905–912. Citeseer.

[26]

Li, P., Hastie, T., & Church, K.W. (2006). Practical Procedures for Dimension Reduction in l1. Citeseer.

[27]

Li, P., Hastie, T., & Church, K.W. (2006). Improving random projections using marginal information. In Learning Theory, 19th Annual Conference on Learning Theory, COLT 2006, Pittsburgh, PA, USA, June 22-25, 2006, Proceedings, pp. 635–649.

[28]

Li, P., Hastie, T., & Church, K.W. (2006). Very sparse random projections. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, pp. 287–296.

[29]

Li P and König AC Theory and applications of b-bit minwise hashing Commun. ACM 2011 54 8 101-109

[30]

Li, P, Owen, A., & Zhang, C.H. (2012). One permutation hashing. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, pp. 3122–3130.

[31]

Li, X, & Li, P. (2019). Random projections with asymmetric quantization. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 10858–10867.

[32]

Li Yulong, Kuang Zhihao, Li Jiang Yan, and Kang Keegan Improving random projections with extra vectors to approximate inner products IEEE Access 2020 8 78590-78607

[33]

Lichman, M. (2013). UCI machine learning repository.

[34]

Mensink, T., Verbeek, J., Perronnin, F., & Csurka, G., (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In Computer Vision - ECCV 2012 - 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part II, pp. 488–501.

[35]

Murphy, K.P. (2013). Machine learning : a probabilistic perspective. MIT Press, Cambridge, Mass. [u.a.].

[36]

Pratap, R., Bera, D., Revanuru, K., (2019). Efficient sketching algorithm for sparse binary data. In Jianyong Wang, Kyuseok Shim, and Xindong Wu, editors, 2019 IEEE International Conference on Data Mining, ICDM 2019, Beijing, China, November 8-11, 2019, pp. 508–517. IEEE.

[37]

Pratap, R., Kulkarni, R., & Sohony, I. (2018). Efficient dimensionality reduction for sparse binary data. In Naoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen K. Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, and Jeffrey S. Saltz, editors, IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13, 2018, pages 152–157. IEEE.

[38]

Provost, S.B., & Mathai, A.M. (1992). Quadratic Forms in Random Variables: Theory and Applications/ A.M. Mathai, Serge B. Provost. Statistics : textbooks and monographs. Marcel Dek ker.

[39]

Shrivastava, A. (2016). Simple and efficient weighted minwise hashing. In Advances in Neural Information Processing Systems, pages 1498–1506.

[40]

Shrivastava, A. (2017). Optimal densification for fast and accurate minwise hashing. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 3154–3163.

[41]

Shrivastava, A., & Li, P. (2014). In defense of minhash over simhash. In Artificial Intelligence and Statistics, pages 886–894. PMLR.

[42]

Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009). Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009, pages 1113–1120.

[43]

Wei Wu, Li Bin, Chen Ling, Zhang Chengqi, and Philip S Yu Improved consistent weighted sampling revisited IEEE Transactions on Knowledge and Data Engineering 2018 31 12 2332-2345

[44]

Wu X, Zhu X, Wu G, and Ding W Data mining with big data IEEE Transactions on Knowledge and Data Engineering 2014 26 1 97-107

[45]

Yu, F., Kumar, S., Gong, Y., & Chang, S.F., (2014). Circulant binary embedding. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages II–946–II–954. JMLR.org.

[46]

Zhai Y, Ong Y, and Tsang IW The emerging “big dimensionality” IEEE Computational Intelligence Magazine 2014 9 3 14-26

Recommendations

Locally Minimizing Embedding and Globally Maximizing Variance: Unsupervised Linear Difference Projection for Dimensionality Reduction

Recently, many dimensionality reduction algorithms, including local methods and global methods, have been presented. The representative local linear methods are locally linear embedding (LLE) and linear preserving projections (LPP), which seek to find ...
Distinguishing variance embedding

Nonlinear dimensionality reduction is a challenging problem encountered in a variety of high dimensional data analysis. Based on the different geometric intuitions of manifolds, maximum variance unfolding (MVU) and Laplacian eigenmaps are designed for ...
Improving on the mle of a bounded location parameter for spherical distributions

For the problem of estimating under squared error loss the location parameter of a p-variate spherically symmetric distribution where the location parameter lies in a ball of radius m, a general sufficient condition for an estimator to dominate the ...

Comments

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 111, Issue 7

Jul 2022

381 pages

ISSN:0885-6125

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2022

Accepted: 07 March 2022

Revision received: 04 March 2022

Received: 24 January 2021

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents