Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Vc: A C++ library for explicit vectorization

Published: 01 November 2012 Publication History
  • Get Citation Alerts
  • Abstract

    It is an established trend that CPU development takes advantage of Moore's Law to improve in parallelism much more than in scalar execution speed. This results in higher hardware thread counts (MIMD) and improved vector units (SIMD), of which the MIMD developments have received the focus of library research and development in recent years. To make use of the latest hardware improvements, SIMD must receive a stronger focus of API research and development because the computational power can no longer be neglected and often auto-vectorizing compilers cannot generate the necessary SIMD code, as will be shown in this paper. Nowadays, the SIMD capabilities are sufficiently significant to warrant vectorization of algorithms requiring more conditional execution than was originally expected for Streaming SIMD Extension to handle. The Vc library (http://compeng.uni-frankfurt.de/?vc) was designed to support developers in the creation of portable vectorized code. Its capabilities and performance have been thoroughly tested. Vc provides portability of the source code, allowing full utilization of the hardware's SIMD capabilities, without introducing any overhead. Copyright © 2011 John Wiley & Sons, Ltd.

    References

    [1]
    ALICE Collaboration. Technical proposal for a large ion collider experiment at the CERN LHC. Technical Report, CERN, Dec 1995.
    [2]
    Gorbunov S, Kretz M, Rohr D. Fast cellular automaton tracker for the ALICE high level trigger, 2009. URL http://www.gsi.de/informationen/wti/library/scientificreport2009/.
    [3]
    Kretz M. Efficient use of multi-and many-core systems with vectorization and multithreading. Diplomarbeit, University of Heidelberg, 2009. URL http://compeng.uni-frankfurt.de/index.php?id=vc.
    [4]
    Intel Corporation. C++ Larrabee Prototype Library, 2009. URL http://software.intel.com/en-us/articles/prototype-primitives-guide/.
    [5]
    OpenMP Architecture Review Board. OpenMP Application Program Interface, Jul 2011. URL http://www.openmp.org/mp-documents/OpenMP3.1.pdf.
    [6]
    Intel Corporation. Intel®;Threading Building Blocks, Aug 2011. URL http://threadingbuildingblocks.org/uploads/81/91/LatestOpenSourceDocumentation/Reference.pdf.
    [7]
    Guennebaud G, Jacob B et al. Eigen v3. http://eigen.tuxfamily.org, 2010. URL http://eigen.tuxfamily.org/.
    [8]
    Falcou J. NT2: the numerical template toolbox, 2007. URL http://nt2.sourceforge.net/.
    [9]
    Intel Corporation. Integrated performance primitives from Intel, 2011. URL http://software.intel.com/en-us/articles/intel-ipp/.
    [10]
    Advanced Micro Devices. AMD Core Math Library (ACML), 2011. URL http://developer.amd.com/libraries/acml/downloads/assets/acml.pdf.
    [11]
    ‘SSEPlus’: SSEPlus Project Documentation, 2008. URL http://sseplus.sourceforge.net/.
    [12]
    IBM. Power ISA™Version 2.03, Sep 2006. URL http://www.power.org/resources/downloads/PowerISA_203_Final_Public.pdf.
    [13]
    ARM. Introducing NEON™. Development Article ARM, 2009. URL http://infocenter.arm.com/help/topic/com.arm.doc.dht0002a/DHT0002A_introducing_neon.pdf.
    [14]
    Intel Corporation. Intel®;Advanced Vector Extensions Programming Reference, Jun 2011. URL http://software.intel.com/file/36945.
    [15]
    Falcou J, Serot J. E.V.E., An object oriented SIMD library. Scalable Computing: Practice and Experience 2005; 6(4): 31–41.
    [16]
    Pixelglow Software. macstl, Sep 2005. URL http://www.pixelglow.com/macstl/.
    [17]
    Gorbunov S, Kebschull U, Kisel I, Lindenstruth V, Müller WFJ. Fast SIMDized Kalman filter based track fit. Computer Physics Communications 2008; 178: 374–383.
    [18]
    Kisel I, Kretz M, Kulakov I. Scalability of the SIMD Kalman filter track fit based on the vector classes, 2009. URL http://www.gsi.de/informationen/wti/library/scientificreport2009/.
    [19]
    Intel Corporation. C++ Classes and SIMD Operations. URL http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/cref_cls/common/cppref_class_cpp_simd.htm.
    [20]
    Kunzman DM, Kalé LV. Towards a framework for abstracting accelerators in parallel applications: experience with cell. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09. ACM: New York, NY, USA, 2009; 54:1–54:12. URL http://doi.acm.org/10.1145/1654059.1654114.
    [21]
    Kunzman DM, Kalé LV. Programming heterogeneous clusters with accelerators using object-based programming. Scientific Programming 2011; 19: 47–62.
    [22]
    Khronos OpenCL Working Group. The OpenCL specification, Jan 2011. URL http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf.
    [23]
    NVIDIA. NVIDIA CUDA™, Jun 2011. URL http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf.
    [24]
    Microsoft®;Corporation. Compute shader overview (Windows). URL http://msdn.microsoft.com/en-us/library/ff476331\%28v=VS.85\%29.aspx.
    [25]
    Ateji. Java parallel programming, portfolio optimization software & Java threads made simple –– Ateji PX, 2011. URL http://www.ateji.com/.
    [26]
    Viry P. Ateji PX for Java parallel programming made simple, Whitepaper, Ateji, 2010. URL http://www.ateji.com/px/whitepapers/Ateji\%20PX\%20for\%20Java\%20v1.0.pdf.
    [27]
    Mono Project. Mono.Simd Namespace. URL http://api.xamarin.com/monodoc.ashx?link=N\%3aMono.Simd.
    [28]
    Franchetti F, Kral S, Lorenz J, Ueberhuber CW. Efficient utilization of SIMD extensions. IEEE Proceedings Special Issue on Program Generation Optimization and Platform Adaptation 2005; 93: 409–425. URL http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:efficient+utilization+of+simd+extensions#0.
    [29]
    Intel Corporation. Intel®;64 and IA-32 architectures optimization reference manual, 2011. URL http://www.intel.com/products/processor/manuals/.
    [30]
    Kernighan BW, Ritchie DM. The C Programming Language, 2nd ed., Vol. 37. Prentice Hall Inc., A Simon & Schuster Company: Englewood Cliffs, New Jersey 07632, 1988.
    [31]
    Moshier SLB. Methods and Programs for Mathematical Functions. Ellis Horwood. Halsted Press: Chichester, England, New York, 1989. URL http://www.netlib.org/cephes.
    [32]
    Veldhuizen T. Expression templates. C++ Report 1995; 7(5): 26–31.
    [33]
    Advanced Micro Devices. Software optimization guide for AMD family 10h processors, May 2009. URL http://developer.amd.com/documentation/guides/Pages/default.aspx.

    Cited By

    View all
    • (2024)Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer FugakuThe Journal of Supercomputing10.1007/s11227-024-06113-w80:12(16947-16978)Online publication date: 1-Aug-2024
    • (2023)Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCLProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585354(1-12)Online publication date: 18-Apr-2023
    • (2022)Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with SacadoACM Transactions on Mathematical Software10.1145/356026248:4(1-29)Online publication date: 19-Dec-2022
    • Show More Cited By

    Index Terms

    1. Vc: A C++ library for explicit vectorization
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Software—Practice & Experience
      Software—Practice & Experience  Volume 42, Issue 11
      November 2012
      115 pages

      Publisher

      John Wiley & Sons, Inc.

      United States

      Publication History

      Published: 01 November 2012

      Author Tags

      1. AVX
      2. C++
      3. LRBni
      4. SIMD
      5. SSE
      6. Vc
      7. data-parallel
      8. optimization
      9. vectorization

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer FugakuThe Journal of Supercomputing10.1007/s11227-024-06113-w80:12(16947-16978)Online publication date: 1-Aug-2024
      • (2023)Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCLProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585354(1-12)Online publication date: 18-Apr-2023
      • (2022)Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with SacadoACM Transactions on Mathematical Software10.1145/356026248:4(1-29)Online publication date: 19-Dec-2022
      • (2021)Distributed Sparse Block Grids on GPUsHigh Performance Computing10.1007/978-3-030-78713-4_15(272-290)Online publication date: 24-Jun-2021
      • (2020)Automatic Code Generation for High-performance Discontinuous Galerkin Methods on Modern ArchitecturesACM Transactions on Mathematical Software10.1145/342414447:1(1-31)Online publication date: 8-Dec-2020
      • (2020)SIMD programming using Intel vector extensionsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.09.012135:C(83-100)Online publication date: 1-Jan-2020
      • (2019)Revec: program rejuvenation through revectorizationProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307357(29-41)Online publication date: 16-Feb-2019
      • (2019)An HPC perspective on generative programmingProceedings of the 14th International Workshop on Software Engineering for Science10.1109/SE4Science.2019.00008(9-16)Online publication date: 28-May-2019
      • (2019)Communication-free massively distributed graph generationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.03.011131:C(200-217)Online publication date: 1-Sep-2019
      • (2019)A parallel space–time boundary element method for the heat equationComputers & Mathematics with Applications10.1016/j.camwa.2018.12.03178:9(2852-2866)Online publication date: 1-Nov-2019
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media