Abstract
There is no dedicated thread mapping method for Many Integrated Core (MIC) heterogeneous system in the traditional multithread programming model. The unreasonable thread mapping will lead the promising computing power of MIC coprocessor not to be fully exploited. In order to fully exploit the computing potential of MIC coprocessor, this paper discussed effective multi threads mapping strategies through comparing the computing performance and analyzing the performance differences between various mapping methods. Meanwhile, for the further exploiting the high computing power of MIC heterogeneous system, the specific program porting and performance optimization strategies were explored by using the k-means application program. Experimental results show that the proposed mapping and parallel optimization strategies are effective, which can be guide the programmer to port and optimize applications effectively to MIC heterogeneous parallel system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Top 500 supercomputer sites (June 2013), http://www.top500.org/
Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art in heterogeneous computing. Scientific Programming 18(1), 1–33 (2010)
Gelado, I., Stone, J.E., Cabezas, J., et al.: An asymmetric distributed shared memory model for heterogeneous parallel systems. In: Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 347–358 (March 2010)
Han, T.D., Abdelrahman: hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems 22(1), 78–90 (2011)
Brodtkorb, A.R., Hagen, T.R., et al.: Graphics processing unit (GPU) programming strategies and trends in GPU computing. Journal of Parallel and Distributed Computing 73(1), 4–13 (2013)
Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: ADAPT: A framework for coscheduling multithreaded programs. ACM Transactions on Architecture and Code Optimization 9(4), Article 45 (2013)
Jablin, T.B., Prabhu, P., Jablin, J.A., Johnson, N.P., Beard, S.R., August, D.I.: Automatic CPU-GPU communication management and optimization. In: Proc. ACM Programming Language Design and Implementation (PLDI), pp. 142–151 (June 2011)
Jeffers, J., Reinders, J.: Intel’s Xeon Phi Coprocessor High-Performance Programming. Elsevier Inc., USA (2013)
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaer, J.W., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of IISWC, pp. 44–54 (2009)
Stratton, C., Rodrigues, I., et al.: Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. IMPACT Technical Report, University of Illinois at Urbana-Champaign Center for Reliable and High-Performance Computing (March 2, 2012)
Yang, Y., Xiang, P., Mantor, M., Zhou, H.: CPU-Assisted GPGPU on Fused CPU-GPU Architectures. In: 18th International Symposium on High Performance Computer Architecture, pp. 1–12 (2012)
Lee, J., Lakshminarayana, N.B., Kim, H., et al.: Many-thread aware prefetching mechanisms for gpgpu applications. In: Proceeding of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 213–224 (2010)
Liu, W., Lewis, B., Zhou, X., et al.: A balanced programming model for emerging heteroge-neous multicore systems. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (2010)
Liu, X., Smelyanskiy, M., Chow, E., et al.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, pp. 273–282 (2013)
Potluri, S., Venkatesh, A., Bureddy, D., et al.: Efficient Intra-node Communication on In-tel-MIC Clusters. In: Proceeding of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 128–135 (2013)
Si, M., Ishikawa, Y., Tatagi, M.: Direct MPI Library for Intel Xeon Phi Co-Processors. In: Proceeding of the 27th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 816–824 (2013)
Saini, S., Jin, J., Jespersen, D., et al.: An early performance evaluation of many integrated core architecture based SGI rackable computing system. In: Proceedings of the ACM International Conference for High Performance Computing, Networking, Storage and Analysis (2013)
Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.S.: Assessing the performance of OpenMP programs on the intel xeon phi. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 547–558. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ju, T., Zhu, Z., Wang, Y., Li, L., Dong, X. (2014). Thread Mapping and Parallel Optimization for MIC Heterogeneous Parallel Systems. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8631. Springer, Cham. https://doi.org/10.1007/978-3-319-11194-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-11194-0_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11193-3
Online ISBN: 978-3-319-11194-0
eBook Packages: Computer ScienceComputer Science (R0)