Abstract
Data mining tasks results are usually improved by reducing the dimensionality of data. This improvement however is achieved harder in the case that data lay on a non linear manifold and are distributed across network nodes. Although numerous algorithms for distributed dimensionality reduction have been proposed, all assume that data reside in a linear space. In order to address the non-linear case, we introduce D-Isomap, a novel distributed non linear dimensionality reduction algorithm, particularly applicable in large scale, structured peer-to-peer networks. Apart from unfolding a non linear manifold, our algorithm is capable of approximate reconstruction of the global dataset at peer level a very attractive feature for distributed data mining problems. We extensively evaluate its performance through experiments on both artificial and real world datasets. The obtained results show the suitability and viability of our approach for knowledge discovery in distributed environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abu-Khzam, F.N., Samatova, N.F., Ostrouchov, G., Langston, M.A., Geist, A.: Distributed dimension reduction algorithms for widely dispersed data. In: IASTED PDCS, pp. 167–174 (2002)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACMÂ 51(1) (2008)
Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE TKDE 17(12), 1624–1637 (2005)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)
Gu, Q., Zhou, J.: Local relevance weighted maximum margin criterion for text classification. In: SIAM SDM, pp. 1135–1146 (2009)
Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. In: ACM EDBT, pp. 744–755 (2009)
Henry, G., Geijn, R.: Parallelizing the qr algorithm for the unsymmetric algebraic eigenvalue problem. In: SIAM JSC, pp. 870–883 (1994)
Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: VLDB, pp. 506–515 (2000)
Kargupta, H., Huang, W., Sivakumar, K., Park, B.H., Wang, S.: Collective pca from distributed heterogeneous data. In: PKDD (2000)
Kurose, J.F., Ross, K.W.: Computer Networking: A Top-Down Approach Featuring the Internet. Addison-Wesley, Reading (2000)
Magdalinos, P., Doulkeridis, C., Vazirgiannis, M.: K-landmarks: Distributed dimensionality reduction for clustering quality maintenance. In: PKDD, pp. 322–334 (2006)
Magdalinos, P., Doulkeridis, C., Vazirgiannis, M.: Fedra: A fast and efficient dimensionality reduction algorithm. In: SIAM SDM, pp. 509–520 (2009)
Qi, H., Wang, T., Birdwell, D.: Global pca for dimensionality reduction in distributed data mining. In: SDMKD, ch. 19, pp. 327–342. CRC, Boca Raton (2004)
Qu, Y., Ostrouchov, G., Samatova, N., Geist, A.: Pca for dimensionality reduction in massive distributed data sets. In: 5th International Workshop on High Performance Data Mining (2002)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: ACM SIGCOMM, pp. 161–172 (2001)
de Silva, V., Tenenbaum, J.B.: Global versus local methods in nonlinear dimensionality reduction. In: NIPS, pp. 705–712 (2002)
Stoica, I., Morris, R., Karger, D., Kaashoek, F.M., Hari: Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM (2001)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Togerson, W.S.: Theory and methods of scaling. Wiley, Chichester (1958)
Zhao, D., Yang, L.: Incremental isometric embedding of high-dimensional data using connected neighborhood graphs. IEEE TPAMI 31(1), 86–98 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Magdalinos, P., Vazirgiannis, M., Valsamou, D. (2010). Distributed Knowledge Discovery with Non Linear Dimensionality Reduction. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science(), vol 6119. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13672-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-13672-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13671-9
Online ISBN: 978-3-642-13672-6
eBook Packages: Computer ScienceComputer Science (R0)