Abstract
Great attention has been paid to data science in recent years. Besides data science experts, plenty of researchers from other domains are conducting data analysis as well because big data is becoming more easily accessible. However, for those non-expert researchers, it can be quite difficult to find suitable models to conduct their analysis tasks because of their lack of expertise and the existence of excessive models. In the meantime, existing model selection approaches rely too much on the content of data sets and take quite long time to make the selection, which makes these approaches inadequate to recommend models to non-experts online. In this paper, we present an efficient approach to conducting automated model selection efficiently based on analysis history and knowledge graph embeddings. Moreover, we introduce exterior features of data sets to enhance our approach as well as address the cold start issue. We conduct several experiments on competition data from Kaggle, a well-known online community of data researchers. Experimental results show that our approach can improve model selection efficiency dramatically and retain high accuracy as well.
Z. Sun and Z. Chen—Joint first authors, who contributed equally to this research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cifuentes, C.G., Sturzel, M., Jurie, F., Brostow, G.J.: Motion models that only work sometimes. In: BMVC, pp. 1–12 (2012)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Auto-sklearn: efficient and robust automated machine learning. In: Automated Machine Learning 2019, pp. 113–134 (2019)
Fusi, N., Sheth, R., Elibol, M.: Probabilistic matrix factorization for automated machine learning. In: NIPS 2018, pp. 3352–3361 (2018)
Guo, G., Wang, C., Ying, X.: Which algorithm performs best: algorithm selection for community detection. In: WWW 2018, pp. 27–28 (2018)
Mac Aodha, O., Brostow, G.J., Pollefeys, M.: Segmenting video into classes of algorithm-suitability. In: CVPR 2010, pp. 1054–1061 (2010)
Matikainen, P., Sukthankar, R., Hebert, M.: Model recommendation for action recognition. In: CVPR 2012, pp. 2256–2263 (2012)
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: ICML 2016, pp. 2071–2080 (2016)
Wang, C., Wang, H., Mu, T., Li, J., Gao, H.: Auto-model: utilizing research papers and HPO techniques to deal with the cash problem. In: ICDE 2020
Wang, Y., Hebert, M.: Model recommendation: generating object detectors from few samples. In: CVPR 2015, pp. 1619–1628 (2015)
Xu, P.: Truncated SVD methods for discrete linear ill-posed problems. Geophys. J. R. Astronom. Soc. 135(2), 505–514 (1998)
Yang, C., Akimoto, Y., Kim, D.W., Udell, M.: OBOE: collaborative filtering for AutoML model selection. In: KDD 2019, pp. 1173–1183 (2019)
Acknowledgement
This work was supported by the National Key R&D Program of China (NO. 2018YFB 1004404 and 2018YFB1402600) and the Shanghai Sailing Program (NO. 18YF1401300).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, Z., Chen, Z., He, Z., Jing, Y., Sean Wang, X. (2020). A Fast Automated Model Selection Approach Based on Collaborative Knowledge. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-59410-7_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)