Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Thread Mapping and Parallel Optimization for MIC Heterogeneous Parallel Systems

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8631))

Abstract

There is no dedicated thread mapping method for Many Integrated Core (MIC) heterogeneous system in the traditional multithread programming model. The unreasonable thread mapping will lead the promising computing power of MIC coprocessor not to be fully exploited. In order to fully exploit the computing potential of MIC coprocessor, this paper discussed effective multi threads mapping strategies through comparing the computing performance and analyzing the performance differences between various mapping methods. Meanwhile, for the further exploiting the high computing power of MIC heterogeneous system, the specific program porting and performance optimization strategies were explored by using the k-means application program. Experimental results show that the proposed mapping and parallel optimization strategies are effective, which can be guide the programmer to port and optimize applications effectively to MIC heterogeneous parallel system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Top 500 supercomputer sites (June 2013), http://www.top500.org/

  2. Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art in heterogeneous computing. Scientific Programming 18(1), 1–33 (2010)

    Google Scholar 

  3. Gelado, I., Stone, J.E., Cabezas, J., et al.: An asymmetric distributed shared memory model for heterogeneous parallel systems. In: Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 347–358 (March 2010)

    Google Scholar 

  4. Han, T.D., Abdelrahman: hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems 22(1), 78–90 (2011)

    Article  Google Scholar 

  5. Brodtkorb, A.R., Hagen, T.R., et al.: Graphics processing unit (GPU) programming strategies and trends in GPU computing. Journal of Parallel and Distributed Computing 73(1), 4–13 (2013)

    Article  Google Scholar 

  6. Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: ADAPT: A framework for coscheduling multithreaded programs. ACM Transactions on Architecture and Code Optimization 9(4), Article 45 (2013)

    Google Scholar 

  7. Jablin, T.B., Prabhu, P., Jablin, J.A., Johnson, N.P., Beard, S.R., August, D.I.: Automatic CPU-GPU communication management and optimization. In: Proc. ACM Programming Language Design and Implementation (PLDI), pp. 142–151 (June 2011)

    Google Scholar 

  8. Jeffers, J., Reinders, J.: Intel’s Xeon Phi Coprocessor High-Performance Programming. Elsevier Inc., USA (2013)

    Google Scholar 

  9. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaer, J.W., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of IISWC, pp. 44–54 (2009)

    Google Scholar 

  10. Stratton, C., Rodrigues, I., et al.: Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. IMPACT Technical Report, University of Illinois at Urbana-Champaign Center for Reliable and High-Performance Computing (March 2, 2012)

    Google Scholar 

  11. Yang, Y., Xiang, P., Mantor, M., Zhou, H.: CPU-Assisted GPGPU on Fused CPU-GPU Architectures. In: 18th International Symposium on High Performance Computer Architecture, pp. 1–12 (2012)

    Google Scholar 

  12. Lee, J., Lakshminarayana, N.B., Kim, H., et al.: Many-thread aware prefetching mechanisms for gpgpu applications. In: Proceeding of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 213–224 (2010)

    Google Scholar 

  13. Liu, W., Lewis, B., Zhou, X., et al.: A balanced programming model for emerging heteroge-neous multicore systems. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (2010)

    Google Scholar 

  14. Liu, X., Smelyanskiy, M., Chow, E., et al.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, pp. 273–282 (2013)

    Google Scholar 

  15. Potluri, S., Venkatesh, A., Bureddy, D., et al.: Efficient Intra-node Communication on In-tel-MIC Clusters. In: Proceeding of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 128–135 (2013)

    Google Scholar 

  16. Si, M., Ishikawa, Y., Tatagi, M.: Direct MPI Library for Intel Xeon Phi Co-Processors. In: Proceeding of the 27th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 816–824 (2013)

    Google Scholar 

  17. Saini, S., Jin, J., Jespersen, D., et al.: An early performance evaluation of many integrated core architecture based SGI rackable computing system. In: Proceedings of the ACM International Conference for High Performance Computing, Networking, Storage and Analysis (2013)

    Google Scholar 

  18. Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.S.: Assessing the performance of OpenMP programs on the intel xeon phi. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 547–558. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ju, T., Zhu, Z., Wang, Y., Li, L., Dong, X. (2014). Thread Mapping and Parallel Optimization for MIC Heterogeneous Parallel Systems. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8631. Springer, Cham. https://doi.org/10.1007/978-3-319-11194-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11194-0_23

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11193-3

  • Online ISBN: 978-3-319-11194-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics