Abstract
We present a method for applying machine learning algorithms to the automatic classification of astronomy star surveys using time series of star brightness. Currently such classification requires a large amount of domain expert time. We show that a combination of phase invariant similarity and explicit features extracted from the time series provide domain expert level classification. To facilitate this application, we investigate the cross-correlation as a general phase invariant similarity function for time series. We establish several theoretical properties of cross-correlation showing that it is intuitively appealing and algorithmically tractable, but not positive semidefinite, and therefore not generally applicable with kernel methods. As a solution we introduce a positive semidefinite similarity function with the same intuitive appeal as cross-correlation. An experimental evaluation in the astronomy domain as well as several other data sets demonstrates the performance of the kernel and related similarity functions.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alcock, C., et al.: The MACHO Project - a Search for the Dark Matter in the Milky-Way. In: Soifer, B.T. (ed.) Sky Surveys. Protostars to Protogalaxies. Astronomical Society of the Pacific Conference Series, vol. 43, p. 291 (1993)
Udalski, A., Szymanski, M., Kubiak, M., Pietrzynski, G., Wozniak, P., Zebrun, Z.: Optical gravitational lensing experiment. Photometry of the macho-smc-1 microlensing candidate. Acta Astronomica 47(431) (1997)
Hodapp, K.W., et al.: Design of the Pan-STARRS telescopes. Astronomische Nachrichten 325, 636–642 (2004)
Starr, B.M., et al.: LSST Instrument Concept. In: Tyson, J.A., Wolff, S. (eds.) Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 4836, pp. 228–239 (2002)
Soszynski, I., Udalski, A., Szymanski, M., Kubiak, M., Pietrzynski, G., Wozniak, P., Zebrun, K., Szewczyk, O., Wyrzykowski, L.: The Optical Gravitational Lensing Experiment. Catalog of RR Lyr Stars in the Large Magellanic Cloud. Acta Astronomica 53, 93–116 (2003)
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Fawcett, T., Mishra, N. (eds.) International Conference on Machine Learning, pp. 147–153 (2003)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Computational Learning Theory, pp. 144–152 (1992)
Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Boutilier, C., Goldszmidt, M. (eds.) Uncertainty in Artificial Intelligence, pp. 397–405. Morgan Kaufmann, San Francisco (2000)
Schölkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge (2002)
Cuturi, M., Vert, J.P., Birkenes, O., Matsui, T.: A kernel for time series based on global alignments. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) 2007, vol. 2, pp. 413–416 (2007)
Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: International Conference on Machine Learning, pp. 179–186 (2002)
Haussler, D.: Convolution kernels for discrete structures. Technical Report UCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz (1999)
Protopapas, P., Giammarco, J.M., Faccioli, L., Struble, M.F., Dave, R., Alcock, C.: Finding outlier light curves in catalogues of periodic variable stars. Monthly Notices of the Royal Astronomical Society 369, 677–696 (2006)
Faccioli, L., Alcock, C., Cook, K., Prochter, G.E., Protopapas, P., Syphers, D.: Eclipsing Binary Stars in the Large and Small Magellanic Clouds from the MACHO Project: The Sample. Astronomy Journal 134, 1963–1993 (2007)
Alcock, C., et al.: The MACHO project LMC variable star inventory. 1: Beat Cepheids-conclusive evidence for the excitation of the second overtone in classical Cepheids. Astronomy Journal 109, 1653 (1995)
Geha, M., et al.: Variability-selected Quasars in MACHO Project Magellanic Cloud Fields. Astronomy Journal 125, 1–12 (2003)
Howell, D.A., et al.: Gemini Spectroscopy of Supernovae from the Supernova Legacy Survey: Improving High-Redshift Supernova Selection and Classification. Astrophysical Journal 634, 1190–1201 (2005)
Debosscher, J., Sarro, L.M., Aerts, C., Cuypers, J., Vandenbussche, B., Garrido, R., Solano, E.: Automated supervised classification of variable stars. i. methodology. Astronomy and Astrophysics 475, 1159–1183 (2007)
Vlachos, M., Vagena, Z., Yu, P.S., Athitsos, V.: Rotation invariant indexing of shapes and line drawings. In: Herzog, O., Schek, H.J., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) Conference on Information and Knowledge Management, pp. 131–138. ACM, New York (2005)
Ge, X., Smyth, P.: Deformable markov model templates for time-series pattern matching. In: Knowledge Discovery and Data Mining, pp. 81–90 (2000)
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Knowledge Discovery and Data Mining, pp. 359–370 (1994)
Lu, Z., Leen, T.K., Huang, Y., Erdogmus, D.: A reproducing kernel hilbert space framework for pairwise time series distances. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) International Conference on Machine Learning, pp. 624–631 (2008)
Keogh, E.J., Wei, L., Xi, X., Lee, S.H., Vlachos, M.: Lb keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Dayal, U., et al. (eds.) International Conference on Very Large Databases, pp. 882–893. ACM, New York (2006)
Adamek, T., O’Connor, N.E.: A multiscale representation method for nonrigid shapes with a single closed contour. IEEE Trans. Circuits Syst. Video Techn. 14(5), 742–753 (2004)
Luss, R., d’Aspremont, A.: Support vector machine classification with indefinite kernels. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Neural Information Processing Systems. MIT Press, Cambridge (2007)
Ong, C.S., Mary, X., Canu, S., Smola, A.J.: Learning with non-positive kernels. In: Brodley, C.E. (ed.) International Conference on Machine Learning (2004)
Balcan, M.F., Blum, A., Srebro, N.: A theory of learning with similarity functions. Machine Learning 72(1-2), 89–112 (2008)
Shin, K., Kuboyama, T.: A generalization of Haussler’s convolution kernel: mapping kernel. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) International Conference on Machine Learning, pp. 944–951 (2008)
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)
Wachman, G., Khardon, R.: Learning from interpretations: a rooted kernel for ordered hypergraphs. In: Ghahramani, Z. (ed.) International Conference on Machine Learning, pp. 943–950 (2007)
Fröhlich, H., Wegner, J., Sieker, F., Zell, A.: Optimal assignment kernels for attributed molecular graphs. In: International Conference on Machine Learning, pp. 225–232 (2005)
Valiant, L.G.: The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201 (1979)
Papadimitriou, C.H.: Computational Complexity. Addison Wesley, Reading (1993)
Gorry, P.A.: General least-squares smoothing and differentiation by the convolution (savitzky-golay) method. Analytical Chemistry 62(6), 570–573 (1990)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. Journal of Machine Learning Research 7, 1531–1565 (2006)
Huang, T.K., Weng, R.C., Lin, C.J.: Generalized bradley-terry models and multi-class probability estimates. Journal of Machine Learning Research 7, 85–115 (2006)
Söderkvist, O.J.O.: Computer vision classification of leaves from Swedish trees. Master’s thesis, Linköping University, SE-581 83 Linköping, Sweden (September 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wachman, G., Khardon, R., Protopapas, P., Alcock, C.R. (2009). Kernels for Periodic Time Series Arising in Astronomy. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-04174-7_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04173-0
Online ISBN: 978-3-642-04174-7
eBook Packages: Computer ScienceComputer Science (R0)