Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/IPDPSW.2013.207guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms

Published: 20 May 2013 Publication History

Abstract

Augmenting a processor with special hardware that is able to apply a Single Instruction to Multiple Data(SIMD) at the same time is a cost effective way of improving processor performance. It also offers a means of improving the ratio of processor performance to power usage due to reduced and more effective data movement and intrinsically lower instruction counts. This paper considers and compares the NEON SIMD instruction set used on the ARM Cortex-A series of RISC processors with the SSE2 SIMD instruction set found on Intel platforms within the context of the Open Computer Vision (OpenCV) library. The performance obtained using compiler auto-vectorization is compared with that achieved using hand-tuning across a range of five different benchmarks and ten different hardware platforms. On the ARM platforms the hand-tuned NEON benchmarks were between 1.05x and13.88x faster than the auto-vectorized code, while for the Intel platforms the hand-tuned SSE benchmarks were between1.34x and 5.54x faster.

Cited By

View all
  • (2024)Tandem Processor: Grappling with Emerging Operators in Neural NetworksProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640365(1165-1182)Online publication date: 27-Apr-2024
  • (2022)Performance optimization of the MGB hydrological model for multi-core and GPU architecturesEnvironmental Modelling & Software10.1016/j.envsoft.2021.105271148:COnline publication date: 1-Feb-2022
  • (2021)Performance analysis and optimization of decision tree classifiers on embedded devicesProceedings of the 2021 International Conference on Embedded Software10.1145/3477244.3477618(37-38)Online publication date: 30-Sep-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
May 2013
2304 pages
ISBN:9780769549798

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 May 2013

Author Tags

  1. ARM
  2. AVX
  3. Low-Power
  4. NEON
  5. SIMD
  6. SSE
  7. Vectorization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Tandem Processor: Grappling with Emerging Operators in Neural NetworksProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640365(1165-1182)Online publication date: 27-Apr-2024
  • (2022)Performance optimization of the MGB hydrological model for multi-core and GPU architecturesEnvironmental Modelling & Software10.1016/j.envsoft.2021.105271148:COnline publication date: 1-Feb-2022
  • (2021)Performance analysis and optimization of decision tree classifiers on embedded devicesProceedings of the 2021 International Conference on Embedded Software10.1145/3477244.3477618(37-38)Online publication date: 30-Sep-2021
  • (2020)Leveraging SIMD parallelism for accelerating network applicationsProceedings of the 4th Asia-Pacific Workshop on Networking10.1145/3411029.3411033(23-29)Online publication date: 3-Aug-2020
  • (2020)Low-level Optimizations for Faster Mobile Deep Learning Inference FrameworksProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416516(4738-4742)Online publication date: 12-Oct-2020
  • (2019)CappuccinoIEEE Embedded Systems Letters10.1109/LES.2018.281595411:1(9-12)Online publication date: 1-Mar-2019
  • (2018)Control Flow Vectorization for ARM NEONProceedings of the 21st International Workshop on Software and Compilers for Embedded Systems10.1145/3207719.3207721(66-75)Online publication date: 28-May-2018
  • (2018)Optimizing image spatial filtering on single CPU coreMultimedia Tools and Applications10.1007/s11042-016-4266-577:1(251-281)Online publication date: 1-Jan-2018
  • (2017)Programming the Adapteva Epiphany 64-core network-on-chip coprocessorInternational Journal of High Performance Computing Applications10.1177/109434201559923831:4(285-302)Online publication date: 1-Jul-2017
  • (2017)Data Flow Algorithms for Processors with Vector ExtensionsJournal of Signal Processing Systems10.1007/s11265-015-1045-x87:1(21-31)Online publication date: 1-Apr-2017
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media