Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Distributed Knowledge Discovery with Non Linear Dimensionality Reduction

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6119))

Included in the following conference series:

  • 2190 Accesses

Abstract

Data mining tasks results are usually improved by reducing the dimensionality of data. This improvement however is achieved harder in the case that data lay on a non linear manifold and are distributed across network nodes. Although numerous algorithms for distributed dimensionality reduction have been proposed, all assume that data reside in a linear space. In order to address the non-linear case, we introduce D-Isomap, a novel distributed non linear dimensionality reduction algorithm, particularly applicable in large scale, structured peer-to-peer networks. Apart from unfolding a non linear manifold, our algorithm is capable of approximate reconstruction of the global dataset at peer level a very attractive feature for distributed data mining problems. We extensively evaluate its performance through experiments on both artificial and real world datasets. The obtained results show the suitability and viability of our approach for knowledge discovery in distributed environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abu-Khzam, F.N., Samatova, N.F., Ostrouchov, G., Langston, M.A., Geist, A.: Distributed dimension reduction algorithms for widely dispersed data. In: IASTED PDCS, pp. 167–174 (2002)

    Google Scholar 

  2. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1) (2008)

    Google Scholar 

  3. Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE TKDE 17(12), 1624–1637 (2005)

    Google Scholar 

  4. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  5. Gu, Q., Zhou, J.: Local relevance weighted maximum margin criterion for text classification. In: SIAM SDM, pp. 1135–1146 (2009)

    Google Scholar 

  6. Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. In: ACM EDBT, pp. 744–755 (2009)

    Google Scholar 

  7. Henry, G., Geijn, R.: Parallelizing the qr algorithm for the unsymmetric algebraic eigenvalue problem. In: SIAM JSC, pp. 870–883 (1994)

    Google Scholar 

  8. Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: VLDB, pp. 506–515 (2000)

    Google Scholar 

  9. Kargupta, H., Huang, W., Sivakumar, K., Park, B.H., Wang, S.: Collective pca from distributed heterogeneous data. In: PKDD (2000)

    Google Scholar 

  10. Kurose, J.F., Ross, K.W.: Computer Networking: A Top-Down Approach Featuring the Internet. Addison-Wesley, Reading (2000)

    Google Scholar 

  11. Magdalinos, P., Doulkeridis, C., Vazirgiannis, M.: K-landmarks: Distributed dimensionality reduction for clustering quality maintenance. In: PKDD, pp. 322–334 (2006)

    Google Scholar 

  12. Magdalinos, P., Doulkeridis, C., Vazirgiannis, M.: Fedra: A fast and efficient dimensionality reduction algorithm. In: SIAM SDM, pp. 509–520 (2009)

    Google Scholar 

  13. Qi, H., Wang, T., Birdwell, D.: Global pca for dimensionality reduction in distributed data mining. In: SDMKD, ch. 19, pp. 327–342. CRC, Boca Raton (2004)

    Google Scholar 

  14. Qu, Y., Ostrouchov, G., Samatova, N., Geist, A.: Pca for dimensionality reduction in massive distributed data sets. In: 5th International Workshop on High Performance Data Mining (2002)

    Google Scholar 

  15. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: ACM SIGCOMM, pp. 161–172 (2001)

    Google Scholar 

  16. de Silva, V., Tenenbaum, J.B.: Global versus local methods in nonlinear dimensionality reduction. In: NIPS, pp. 705–712 (2002)

    Google Scholar 

  17. Stoica, I., Morris, R., Karger, D., Kaashoek, F.M., Hari: Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM (2001)

    Google Scholar 

  18. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  19. Togerson, W.S.: Theory and methods of scaling. Wiley, Chichester (1958)

    Google Scholar 

  20. Zhao, D., Yang, L.: Incremental isometric embedding of high-dimensional data using connected neighborhood graphs. IEEE TPAMI 31(1), 86–98 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Magdalinos, P., Vazirgiannis, M., Valsamou, D. (2010). Distributed Knowledge Discovery with Non Linear Dimensionality Reduction. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science(), vol 6119. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13672-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13672-6_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13671-9

  • Online ISBN: 978-3-642-13672-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics