research-article

Early experiences with the intel many integrated cores accelerated computing technology

Authors:

D. StanzioneAuthors Info & Claims

TG '11: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery

Article No.: 21, Pages 1 - 8

https://doi.org/10.1145/2016741.2016764

Published: 18 July 2011 Publication History

Get Access

Abstract

We report on early programming experiences with the Intel® Many Integrated Core (Intel® MIC) Co-processor. This new and x86 based technology is Intel's answer to GPU-based accelerators by NVIDIA, AMD and others. Accelerators have generally sparked interest in the HPC community because they have the potential to significantly increase the compute power of the next generation of supercomputers. The merits of accelerators for general HPC purposes are still very much under debate. Undoubtedly accelerators add more complexity to an already very complex cluster, and the programmability of accelerators will be the key to enticing the diverse HPC user community to this new technology, even if the performance promise may be large.

The study presented here is part of a much broader activity at the Texas Advanced Computing Center (TACC) that focuses on a wide range of accelerators (GPUs, FPGAs, Intel MIC coprocessor, etc.). The Intel MIC architecture is x86 based and supports languages and parallel programming paradigms commonly found on x86 CPUs, including OpenMP which has been widely accepted in the HPC community for thread-parallel programming. The scope of this initial study is limited to the investigation of the Intel MIC programming environment and particularly to the offload-OpenMP model.

Our initial experience with the Intel MIC platform has been very positive. The required code modifications to handle the data transfer and the offloading of parallel sections onto the Intel MIC co-processor are small and conveniently implemented as directives/pragmas to OpenMP constructs. (We use "accelerators" as a generic reference to Intel MIC Co-processors, GPUs, FPGAs, etc.)

References

[1]

Ren, Suda, "Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA", Int'l Conference on Computational Science and Engineering, Vancouver, August, 2009.

Digital Library

Google Scholar

[2]

John A. Turner, "ORNL Center for Accelerated Application Readiness (CAAR): Preparing today's applications for tomorrrow's machines", 1^st Hybrid Multicore Consortium Workshop, San Francisco, CA, January 2010.

Google Scholar

[3]

http://www.intel.com/pressroom/archive/releases/2010/20100531comp.htm.

Google Scholar

[4]

http://www.lanl.gov/roadrunner/

Google Scholar

[5]

http://mdgrape.gsc.riken.jp/

Google Scholar

[6]

http://www.riken.jp/engn/index.html

Google Scholar

[7]

http://www.altera.com/ http://www.xilinx.com/

Google Scholar

[8]

http://www.epcc.ed.ac.uk/facilities/maxwell/

Google Scholar

[9]

http://www.epcc.ed.ac.uk/projects/research/fhpca

Google Scholar

[10]

http://nscc-tj.gov.cn/en/show.asp?id=191

Google Scholar

Cited By

View all

Li MHamidouche KLu XLin JPanda D(2015)High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi ClustersEuro-Par 2015: Parallel Processing10.1007/978-3-662-48096-0_48(625-637)Online publication date: 25-Jul-2015
https://doi.org/10.1007/978-3-662-48096-0_48
Zhang CLiu LLi RYang G(2015)Performance Characterization and Optimization for Intel Xeon Phi CoprocessorAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-27119-4_2(16-33)Online publication date: 16-Dec-2015
https://doi.org/10.1007/978-3-319-27119-4_2
Sainz FMateo SBeltran VBosque JMartorell XAyguadé E(2014)Leveraging OmpSs to Exploit Hardware AcceleratorsProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.26(112-119)Online publication date: 22-Oct-2014
https://dl.acm.org/doi/10.1109/SBAC-PAD.2014.26
Show More Cited By

Index Terms

Early experiences with the intel many integrated cores accelerated computing technology
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

Optimizing the PCIT algorithm on stampede's Xeon and Xeon Phi processors for faster discovery of biological networks
XSEDE '13: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery

The PCIT method is an important technique for detecting interactions between networks. The PCIT algorithm has been used in the biological context to infer complex regulatory mechanisms and interactions in genetic networks, in genome wide association ...
Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators
CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster Computing

As Many-Task Computing (MTC) is becoming common-place on clusters, grids, and supercomputers, research that aims to take advantage of the new advances in hardware for MTC workloads is becoming more relevant. A good example is the design of frameworks ...
Intel® many integrated core (MIC) architecture: portability and performance efficiency study of radio astronomy algorithms
Astro-HPC '12: Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date

Radio Astronomy demands for HPC power have been rising and are expected to reach exaflop scale by 2020. To address such huge demands for compute, users are testing newer CPU architectures and accelerator architectures such as GPUs and FPGAs. Intel, with ...

Comments

Information & Contributors

Information

Published In

TG '11: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery

July 2011

256 pages

ISBN:9781450308885

DOI:10.1145/2016741

General Chair:
John Towns
NCSA-Illinois
,
Program Chairs:
Shawn Brown
PSC
,
Daniel S. Katz
UC/ANL

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

TG'11

Sponsor:

University of Illinois

TG'11: TeraGrid 2011

July 18 - 21, 2011

Utah, Salt Lake City

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
452
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Li MHamidouche KLu XLin JPanda D(2015)High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi ClustersEuro-Par 2015: Parallel Processing10.1007/978-3-662-48096-0_48(625-637)Online publication date: 25-Jul-2015
https://doi.org/10.1007/978-3-662-48096-0_48
Zhang CLiu LLi RYang G(2015)Performance Characterization and Optimization for Intel Xeon Phi CoprocessorAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-27119-4_2(16-33)Online publication date: 16-Dec-2015
https://doi.org/10.1007/978-3-319-27119-4_2
Sainz FMateo SBeltran VBosque JMartorell XAyguadé E(2014)Leveraging OmpSs to Exploit Hardware AcceleratorsProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.26(112-119)Online publication date: 22-Oct-2014
https://dl.acm.org/doi/10.1109/SBAC-PAD.2014.26
Coviello GCadambi SChakradhar S(2014)A Coprocessor Sharing-Aware Scheduler for Xeon Phi-Based Compute ClustersProceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium10.1109/IPDPS.2014.44(337-346)Online publication date: 19-May-2014
https://dl.acm.org/doi/10.1109/IPDPS.2014.44
Dong XChai JYang JWen MWu NCai XZhang CChen Z(2014)Utilizing Multiple Xeon Phi Coprocessors on One Compute NodeAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-11194-0_6(68-81)Online publication date: 2014
https://doi.org/10.1007/978-3-319-11194-0_6
Baker MPophale SVasnier JJin HHernandez O(2014)Hybrid Programming Using OpenSHMEM and OpenACCProceedings of the First Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools - Volume 835610.1007/978-3-319-05215-1_6(74-89)Online publication date: 4-Mar-2014
https://dl.acm.org/doi/10.1007/978-3-319-05215-1_6
Qi JYang CChen CWu QTang T(2013)Accelerating IDCT Algorithm on Xeon Phi CoprocessorAdvanced Materials Research10.4028/www.scientific.net/AMR.756-759.3114756-759(3114-3120)Online publication date: Sep-2013
https://doi.org/10.4028/www.scientific.net/AMR.756-759.3114
Potluri SBureddy DHamidouche KVenkatesh AKandalla KSubramoni HPanda DGropp WMatsuoka S(2013)MVAPICH-PRISMProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503288(1-11)Online publication date: 17-Nov-2013
https://dl.acm.org/doi/10.1145/2503210.2503288
Cadambi SCoviello GLi CPhull RRao KSankaradass MChakradhar SParashar MWeissman JEpema DFigueiredo R(2013)COSMICProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2493123.2462921(215-226)Online publication date: 17-Jun-2013
https://dl.acm.org/doi/10.1145/2493123.2462921
Cadambi SCoviello GLi CPhull RRao KSankaradass MChakradhar SParashar MWeissman JEpema DFigueiredo R(2013)COSMICProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2462902.2462921(215-226)Online publication date: 17-Jun-2013
https://dl.acm.org/doi/10.1145/2462902.2462921
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Optimizing the PCIT algorithm on stampede's Xeon and Xeon Phi processors for faster discovery of biological networks

Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators

Intel® many integrated core (MIC) architecture: portability and performance efficiency study of radio astronomy algorithms