Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2022
Getting better from worse: augmented bagging and a cautionary tale of variable importance
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 224, Pages 10157–10188As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications. Tools like random forests ...
- research-articleJanuary 2022
Scalable and efficient hypothesis testing with random forests
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 170, Pages 7679–7713Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods. While their black-box nature has made their mathematical analysis difficult, recent work has established important ...
- research-articleFebruary 2022
An empirical analysis of LADA diabetes case, control and variable importance
UCC '21: Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing CompanionArticle No.: 34, Pages 1–8https://doi.org/10.1145/3492323.3495632Latent Autoimmune Diabetes in Adults (LADA) is a condition, which is rarely recognised as a complex disease within its own right and remains under researched. Completely over-shadowed by Type 1 and Type 2 diabetes, LADA is the second most prevalent genre ...
- research-articleJune 2021
A New Noisy Random Forest Based Method for Feature Selection
Cybernetics and Information Technologies (CYBAIT), Volume 21, Issue 2Pages 10–28https://doi.org/10.2478/cait-2021-0016AbstractFeature selection is an essential pre-processing step in data mining. It aims at identifying the highly predictive feature subset out of a large set of candidate features. Several approaches for feature selection have been proposed in the ...
- research-articleSeptember 2020
An analytical toast to wine: Using stacked generalization to predict wine preference
Statistical Analysis and Data Mining (STADM), Volume 13, Issue 5Pages 451–464https://doi.org/10.1002/sam.11474AbstractDue to the intricacies surrounding taste profiles, one's view of good wine is subjective. Therefore, it is advantageous to provide a more objective, data‐driven way to assess wine preferences. Motivated by a previous study that modeled wine ...
- research-articleJanuary 2019
An experimental approach of applying boruta and elastic net for variable selection in classifying breast cancer datasets
International Journal of Knowledge Engineering and Data Mining (IJKEDM), Volume 6, Issue 4Pages 356–375https://doi.org/10.1504/ijkedm.2019.105265Feature selection identifies the key aspects involved in predicting the outcome. In this study, we propose boruta and elastic net (Enet) feature selection for classifying breast cancer datasets. A comparative study of boruta, Enet along with genetic ...
- review-articleJune 2018
In defense of Pratt's variable importance axioms: A response to Gromping
In a recent paper Gromping provided a wide‐ranging review of metrics for assessing variable importance in regression analysis. There are, however, several flaws in Gromping's criticism of the well‐known metric attributed to Pratt. Among the metrics she ...
What is the most important predictor? Easy to assess if predictors are unrelated (left). Not so easy to assess if predictors are related (right). image image
- research-articleJuly 2017
Sensitivity-like analysis for feature selection in genetic programming
GECCO '17: Proceedings of the Genetic and Evolutionary Computation ConferencePages 401–408https://doi.org/10.1145/3071178.3071338Feature selection is an important process within machine learning problems. Through pressures imposed on models during evolution, genetic programming performs basic feature selection, and so analysis of the evolved models can provide some insights into ...
- research-articleAugust 2012
Causally motivated attribution for online advertising
ADKDD '12: Proceedings of the Sixth International Workshop on Data Mining for Online Advertising and Internet EconomyArticle No.: 7, Pages 1–9https://doi.org/10.1145/2351356.2351363In many online advertising campaigns, multiple vendors, publishers or search engines (herein called channels) are contracted to serve advertisements to internet users on behalf of a client seeking specific types of conversion. In such campaigns, ...
- ArticleDecember 2011
System for assessing, exploring and monitoring offset print quality
Variations in offset print quality relate to numerous parameters of printing press and paper. To maintain constant quality of products, press operators need to assess, explore and monitor print quality. This paper presents a novel system for assessing ...
- tutorialJuly 2011
Separating the wheat from the chaff: on feature selection and feature importance in regression random forests and symbolic regression
GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computationPages 623–630https://doi.org/10.1145/2001858.2002059Feature selection in high-dimensional data sets is an open problem with no universal satisfactory method available. In this paper we discuss the requirements for such a method with respect to the various aspects of feature importance and explore them ...
- ArticleMay 2010
Spatial variable importance assessment for yield prediction in precision agriculture
IDA'10: Proceedings of the 9th international conference on Advances in Intelligent Data AnalysisPages 184–195https://doi.org/10.1007/978-3-642-13062-5_18Precision Agriculture applies state-of-the-art GPS technology in connection with site-specific, sensor-based crop management. It can also be described as a data-driven approach to agriculture, which is strongly connected with a number of data mining ...