Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3079628.3079698acmconferencesArticle/Chapter ViewAbstractPublication PagesumapConference Proceedingsconference-collections
short-paper

Interactive Prior Elicitation of Feature Similarities for Small Sample Size Prediction

Published: 09 July 2017 Publication History

Abstract

Regression under the "small n$, large p" condition, of small sample size n and large number of features p in the learning data set, is a recurring setting in which learning from data is difficult. With prior knowledge about relationships of the features, p can effectively be reduced, but explicating such prior knowledge is difficult for experts. In this paper we introduce a new method for eliciting expert prior knowledge about the similarity of the roles of features in the prediction task. The key idea is to use an interactive multidimensional-scaling (MDS) type scatterplot display of the features to elicit the similarity relationships, and then use the elicited relationships in the prior distribution of prediction parameters. Specifically, for learning to predict a target variable with Bayesian linear regression, the feature relationships are used to construct a Gaussian prior with a full covariance matrix for the regression coefficients. Evaluation of our method in experiments with simulated and real users on text data confirm that prior elicitation of feature similarities improves prediction accuracy. Furthermore, elicitation with an interactive scatterplot display outperforms straightforward elicitation where the users choose feature pairs from a feature list.

References

[1]
Bishop, C. M. Pattern Recognition and Machine Learning. Springer, 2006.
[2]
Blitzer, J., Dredze, M., and Pereira, F. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, vol. 7 (2007), 440--447.
[3]
Brown, E. T., Liu, J., Brodley, C. E., and Chang, R. Dis-function: Learning distance functions interactively. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST) (2012), 83--92.
[4]
Costello, J. C., Heiser, L. M., Georgii, E., Gönen, M., Menden, M. P., Wang, N. J., Bansal, M., Hintsanen, P., Khan, S. A., Mpindi, J.-P., et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nature Biotechnology 32, 12 (2014), 1202--1212.
[5]
Daee, P., Peltola, T., Soare, M., and Kaski, S. Knowledge elicitation via sequential probabilistic inference for high-dimensional prediction. arXiv preprint arXiv:1612.03328 (2016).
[6]
Endert, A., Han, C., Maiti, D., House, L., and North, C. Observation-level interaction with statistical models for visual analytics. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST) (2011), 121--130.
[7]
Forman, G. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3 (2003), 1289--1305.
[8]
Garthwaite, P. H., Al-Awadhi, S. A., Elfadaly, F. G., and Jenkinson, D. J. Prior distribution elicitation for generalized linear and piecewise-linear models. Journal of Applied Statistics 40, 1 (2013), 59--75.
[9]
Hearst, M. Search user interfaces. Cambridge University Press, 2009.
[10]
Hernández-Lobato, J. M., Hernández-Lobato, D., and Suárez, A. Expectation propagation in linear regression models with spike-and-slab priors. Machine Learning 99, 3 (2015), 437--487.
[11]
Jeong, D. H., Ziemkiewicz, C., Fisher, B., Ribarsky, W., and Chang, R. iPCA: An interactive system for PCA-based visual analytics. In Computer Graphics Forum, vol. 28, Wiley Online Library (2009), 767--774.
[12]
Johnstone, I. M., and Titterington, D. M. Statistical challenges of high-dimensional data. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 367, 1906 (2009), 4237--4253.
[13]
Micallef, L., Sundin, I., Marttinen, P., Ammad-ud din, M., Peltola, T., Soare, M., Jacucci, G., and Kaski, S. Interactive elicitation of knowledge on feature relevance improves predictions in small data sets. In Proceedings of the 22Nd International Conference on Intelligent User Interfaces, IUI '17, ACM (2017), 547--552.
[14]
Peltonen, J., Sandholm, M., and Kaski, S. Information retrieval perspective to interactive data visualization. EuroVis-Short Papers (2013), 49--53.
[15]
Qu, L., Ifrim, G., and Weikum, G. The bag-of-opinions method for review rating prediction from sparse text patterns. In Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics (2010), 913--921.
[16]
Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 (1972), 11--21.
[17]
Tian, L., Alizadeh, A. A., Gentles, A. J., and Tibshirani, R. A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association 109, 508 (2014), 1517--1532.
[18]
Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.
[19]
Venna, J., Peltonen, J., Nybo, K., Aidos, H., and Kaski, S. Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research 11 (2010), 451--490.
[20]
Yang, L., Jin, R., and Sukthankar, R. Bayesian active distance metric learning. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, AUAI Press (2007), 442--449.
[21]
Zou, H., and Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 2 (2005), 301--320.

Cited By

View all
  • (2021)Decision Rule Elicitation for Domain AdaptationProceedings of the 26th International Conference on Intelligent User Interfaces10.1145/3397481.3450682(244-248)Online publication date: 14-Apr-2021
  • (2019)Human-in-the-loop active covariance learning for improving prediction in small data setsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367243.3367311(1959-1966)Online publication date: 10-Aug-2019
  • (2018)User Modelling for Avoiding Overfitting in Interactive Knowledge Elicitation for PredictionProceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3172944.3172989(305-310)Online publication date: 5-Mar-2018
  • Show More Cited By

Index Terms

  1. Interactive Prior Elicitation of Feature Similarities for Small Sample Size Prediction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      UMAP '17: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization
      July 2017
      420 pages
      ISBN:9781450346351
      DOI:10.1145/3079628
      • General Chairs:
      • Maria Bielikova,
      • Eelco Herder,
      • Program Chairs:
      • Federica Cena,
      • Michel Desmarais
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 July 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. interaction
      2. prior elicitation
      3. regression
      4. small n large p
      5. visualization

      Qualifiers

      • Short-paper

      Funding Sources

      • Academy of Finland (Finnish Center of Excellence in Computational Inference Research COIN)

      Conference

      UMAP '17
      Sponsor:

      Acceptance Rates

      UMAP '17 Paper Acceptance Rate 29 of 80 submissions, 36%;
      Overall Acceptance Rate 162 of 633 submissions, 26%

      Upcoming Conference

      UMAP '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Decision Rule Elicitation for Domain AdaptationProceedings of the 26th International Conference on Intelligent User Interfaces10.1145/3397481.3450682(244-248)Online publication date: 14-Apr-2021
      • (2019)Human-in-the-loop active covariance learning for improving prediction in small data setsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367243.3367311(1959-1966)Online publication date: 10-Aug-2019
      • (2018)User Modelling for Avoiding Overfitting in Interactive Knowledge Elicitation for PredictionProceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3172944.3172989(305-310)Online publication date: 5-Mar-2018
      • (2018)A Conceptual Framework for Personalization of Indoor Comfort Parameters Based on Office Workers’ PreferencesProduct Lifecycle Management to Support Industry 4.010.1007/978-3-030-01614-2_4(35-45)Online publication date: 8-Dec-2018
      • (2017)Knowledge elicitation via sequential probabilistic inference for high-dimensional predictionMachine Language10.1007/s10994-017-5651-7106:9-10(1599-1620)Online publication date: 1-Oct-2017

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media