Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3437963.3441770acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems

Published: 08 March 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the exact same training data could produce very different prediction results. People have relied on the state-of-the-art ensemble method to estimate prediction uncertainty. However, ensembles are expensive to train and serve for web-scale traffic systems.
    In this paper, we seek to advance the understanding of prediction variation estimated by the ensemble method. Through empirical experiments on two widely used benchmark datasets Movielens and Criteo in recommender systems, we observe that prediction variations come from various randomness sources, including training data shuffling, and random initialization. When we add more randomness sources to ensemble members, we see higher prediction variations among these ensemble members, and more accurate mean prediction. Moreover, we propose to infer prediction variation from neuron activation strength and demonstrate its strong prediction power. Our approach provides a simple way for prediction variation estimation, and opens up new opportunities for future work in many interesting areas (e.g., model-based reinforcement learning) without relying on serving expensive ensemble models.

    References

    [1]
    Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. 41--48.
    [2]
    James Bennett, Stan Lanning, et al. 2007. The netflix prize. In Proceedings of KDD cup and workshop, Vol. 2007. Citeseer, 35.
    [3]
    David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems. 5049--5059.
    [4]
    Christopher M Bishop. 2006. Pattern recognition and machine learning. Springer.
    [5]
    Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424 (2015).
    [6]
    Leo Breiman. 1996. Bagging predictors. Machine learning, Vol. 24, 2 (1996), 123--140.
    [7]
    Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. 2018. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems. 4754--4765.
    [8]
    Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2015. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015).
    [9]
    Jeffrey De Fauw, Joseph R Ledsam, Bernardino Romera-Paredes, Stanislav Nikolov, Nenad Tomasev, Sam Blackwell, Harry Askham, Xavier Glorot, Brendan O'Donoghue, Daniel Visentin, et al. 2018. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature medicine, Vol. 24, 9 (2018), 1342--1350.
    [10]
    Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural safety, Vol. 31, 2 (2009), 105--112.
    [11]
    Thomas G Dietterich. 2000. Ensemble methods in machine learning. In International workshop on multiple classifier systems. Springer, 1--15.
    [12]
    Michael W Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, and Andrew M Dai. 2020. Analyzing the role of model uncertainty for electronic health records. In Proceedings of the ACM Conference on Health, Inference, and Learning. 204--213.
    [13]
    Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. 2019. Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757 (2019).
    [14]
    Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050--1059.
    [15]
    Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 315--323.
    [16]
    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. arXiv preprint arXiv:1706.04599 (2017).
    [17]
    Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. International World Wide Web Conferences Steering Committee, 173--182.
    [18]
    Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
    [19]
    Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John T Riedl. 2004. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), Vol. 22, 1 (2004), 5--53.
    [20]
    Geoffrey E Hinton and Drew Van Camp. 1993. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory. 5--13.
    [21]
    Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E Hopcroft, and Kilian Q Weinberger. 2017. Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109 (2017).
    [22]
    Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta. 2018. To trust or not to trust a classifier. In Advances in neural information processing systems. 5541--5552.
    [23]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [24]
    Frank Hyneman Knight. 1921. Risk, uncertainty and profit. Vol. 31. Houghton Mifflin.
    [25]
    Yongchan Kwon, Joong-Ho Won, Beom Joon Kim, and Myunghee Cho Paik. 2018. Uncertainty quantification using bayesian neural networks in classification: Application to ischemic stroke lesion segmentation. (2018).
    [26]
    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems. 6402--6413.
    [27]
    Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. 2017. Training confidence-calibrated classifiers for detecting out-of-distribution samples. arXiv preprint arXiv:1711.09325 (2017).
    [28]
    Christian Leibig, Vaneeda Allken, Murat Secc kin Ayhan, Philipp Berens, and Siegfried Wahl. 2017. Leveraging uncertainty information from deep neural networks for disease detection. Scientific reports, Vol. 7, 1 (2017), 1--14.
    [29]
    Jeremiah Liu, John Paisley, Marianthi-Anna Kioumourtzoglou, and Brent Coull. 2019. Accurate uncertainty estimation and decomposition in ensemble learning. In Advances in Neural Information Processing Systems. 8952--8963.
    [30]
    Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, and Balaji Lakshminarayanan. 2020. Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness. arXiv preprint arXiv:2006.10108 (2020).
    [31]
    Christos Louizos and Max Welling. 2017. Multiplicative normalizing flows for variational bayesian neural networks. arXiv preprint arXiv:1703.01961 (2017).
    [32]
    Zhiyun Lu, Eugene Ie, and Fei Sha. 2020. Uncertainty Estimation with Infinitesimal Jackknife, Its Distribution and Mean-Field Approximation. arXiv preprint arXiv:2006.07584 (2020).
    [33]
    David JC MacKay. 1992. A practical Bayesian framework for backpropagation networks. Neural computation, Vol. 4, 3 (1992), 448--472.
    [34]
    Avery McIntosh. 2016. The Jackknife estimation method. arXiv preprint arXiv:1606.00497 (2016).
    [35]
    Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML .
    [36]
    Radford M Neal. 2012. Bayesian learning for neural networks. Vol. 118. Springer Science & Business Media.
    [37]
    Roger A Nicoll. 2017. A brief history of long-term potentiation. Neuron, Vol. 93, 2 (2017), 281--290.
    [38]
    David A Nix and Andreas S Weigend. 1994. Estimating the mean and variance of the target probability distribution. Proceedings of 1994 ieee international conference on neural networks (ICNN'94), Vol. 1 (1994), 55--60.
    [39]
    David A Nix and Andreas S Weigend. 1995. Learning local error bars for nonlinear regression. Advances in neural information processing systems (1995), 489--496.
    [40]
    Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, David Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshminarayanan, and Jasper Snoek. 2019. Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems. 13991--14002.
    [41]
    Prajit Ramachandran, Barret Zoph, and Quoc V. Le. 2017. Searching for Activation Functions. CoRR, Vol. abs/1710.05941 (2017).
    [42]
    Carlos Riquelme, George Tucker, and Jasper Snoek. 2018. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127 (2018).
    [43]
    Mojdeh Saadati, Syed Shihab, and Mohammed Shaiqur Rahman. 2019. Movie Recommender Systems: Implementation and Performance Evaluation. arXiv preprint arXiv:1909.12749 (2019).
    [44]
    Peter Schulam and Suchi Saria. 2019. Can you trust this prediction? Auditing pointwise reliability after learning. arXiv preprint arXiv:1901.00403 (2019).
    [45]
    Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, and Ryan Adams. 2015. Scalable bayesian optimization using deep neural networks. In International conference on machine learning. 2171--2180.
    [46]
    Asa Cooper Stickland and Iain Murray. 2020. Diverse ensembles improve calibration. arXiv preprint arXiv:2007.04206 (2020).
    [47]
    Dongqi Su, Ying Yin Ting, and Jason Ansel. 2018. Tight Prediction Intervals Using Expanded Interval Minimization. arXiv preprint arXiv:1806.11222 (2018).
    [48]
    Natasa Tagasovska and David Lopez-Paz. 2019. Single-model uncertainties for deep learning. In Advances in Neural Information Processing Systems. 6417--6428.
    [49]
    Yeming Wen, Dustin Tran, and Jimmy Ba. 2020. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning. Eighth International Conference on Learning Representations (ICLR 2020) (2020).
    [50]
    Felix A Wichmann and N Jeremy Hill. 2001. The psychometric function: II. Bootstrap-based confidence intervals and sampling. Perception & psychophysics, Vol. 63, 8 (2001), 1314--1329.
    [51]
    Dongruo Zhou, Lihong Li, and Quanquan Gu. 2019. Neural Contextual Bandits with UCB-based Exploration. arxiv: 1911.04462 [cs.LG]

    Cited By

    View all
    • (2023)Pessimistic Decision-Making for Recommender SystemsACM Transactions on Recommender Systems10.1145/35680291:1(1-27)Online publication date: 7-Feb-2023
    • (2022)A Tensor‐based Regression Approach for Human Motion PredictionQuality and Reliability Engineering International10.1002/qre.315339:2(481-499)Online publication date: 21-Jun-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining
    March 2021
    1192 pages
    ISBN:9781450382977
    DOI:10.1145/3437963
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 March 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ensemble
    2. neural networks
    3. neuron activation
    4. prediction uncertainty
    5. recommender systems

    Qualifiers

    • Research-article

    Conference

    WSDM '21

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)123
    • Downloads (Last 6 weeks)6

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Pessimistic Decision-Making for Recommender SystemsACM Transactions on Recommender Systems10.1145/35680291:1(1-27)Online publication date: 7-Feb-2023
    • (2022)A Tensor‐based Regression Approach for Human Motion PredictionQuality and Reliability Engineering International10.1002/qre.315339:2(481-499)Online publication date: 21-Jun-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media