Abstract
For an application problem, there may be multiple databases, and each database may not contain complete variables or attributes, that is, some variables are observed but some others are missing. Further, data of a database may be collected conditionally on some designed variables. In this paper, we discuss problems related to data mining from such multiple databases. We propose an approach for detecting identifiability of a joint distribution from multiple databases. For an identifiable joint distribution, we further present the expectation-maximization (EM) algorithm for calculating the maximum likelihood estimates (MLEs) of the joint distribution.
Chapter PDF
Similar content being viewed by others
References
Beeri, C., Fagin, R., Maier, D., Yannakakis, M.: On the desirability of acyclic database schemes. J. Association for Computing Machinery 30, 479–513 (1983)
Bickel, P.J., Doksum, K.A.: Mathemetical Statistics. Holden-Day, Oakland (1977)
Dempster, A.P., Larid, N.M., Rubin, D.B.: Maximum likelihood estimation from incomplete data via the EM algorithm (with disscussion). J. R. Stat. Soc. Ser. B. 39, 1–38 (1977)
Geng, Z., Wan, K., Tao, F.: Mixed graphical models with missing data and the partial imputation EM algorithm. Scan. J. of Stat. 27, 433–444 (2000)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)
Rassler, S.: Statistical Matching. Lecture Notes in Statistics, vol. 168. Springer, New York (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jia, J., Geng, Z., Wang, M. (2006). Identifiability and Estimation of Probabilities from Multiple Databases with Incomplete Data and Sampling Selection. In: Yeung, DY., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2006. Lecture Notes in Computer Science, vol 4109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11815921_87
Download citation
DOI: https://doi.org/10.1007/11815921_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37236-3
Online ISBN: 978-3-540-37241-7
eBook Packages: Computer ScienceComputer Science (R0)