Article

Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms

Authors:

Jun ZhouAuthors Info & Claims

IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Pages 1107 - 1116

https://doi.org/10.1109/IPDPSW.2013.207

Published: 20 May 2013 Publication History

Abstract

Augmenting a processor with special hardware that is able to apply a Single Instruction to Multiple Data(SIMD) at the same time is a cost effective way of improving processor performance. It also offers a means of improving the ratio of processor performance to power usage due to reduced and more effective data movement and intrinsically lower instruction counts. This paper considers and compares the NEON SIMD instruction set used on the ARM Cortex-A series of RISC processors with the SSE2 SIMD instruction set found on Intel platforms within the context of the Open Computer Vision (OpenCV) library. The performance obtained using compiler auto-vectorization is compared with that achieved using hand-tuning across a range of five different benchmarks and ten different hardware platforms. On the ARM platforms the hand-tuned NEON benchmarks were between 1.05x and13.88x faster than the auto-vectorized code, while for the Intel platforms the hand-tuned SSE benchmarks were between1.34x and 5.54x faster.

Cited By

View all

Ghodrati SKinzer SXu HMahapatra RKim YAhn BWang DKarthikeyan LYazdanbakhsh APark JKim NEsmaeilzadeh HTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Tandem Processor: Grappling with Emerging Operators in Neural NetworksProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640365(1165-1182)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640365
Freitas HMendes CIlic A(2022)Performance optimization of the MGB hydrological model for multi-core and GPU architecturesEnvironmental Modelling & Software10.1016/j.envsoft.2021.105271148:COnline publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.envsoft.2021.105271
Krishnakumar AOgras UPhan LBroman D(2021)Performance analysis and optimization of decision tree classifiers on embedded devicesProceedings of the 2021 International Conference on Embedded Software10.1145/3477244.3477618(37-38)Online publication date: 30-Sep-2021
https://dl.acm.org/doi/10.1145/3477244.3477618
Show More Cited By

Recommendations

SIMD programming using Intel vector extensions
Abstract
Single instruction multiple data (SIMD) extensions are one of the most significant capabilities of recent General Purpose Processors (GPPs) which improves the performance of applications with less hardware modification. Each GPP vendor ...
Highlights
- We provide a review of SIMD technologies in general and Intel’s SIMD in particular.
Simple, portable and fast SIMD intrinsic programming: generic simd library
WPMVP '14: Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing

Using SIMD (Single Instruction Multiple Data) is a cost-effective way to explore data parallelism on modern processors. Most processor vendors today provide SIMD engines, such as Altivec/VSX for POWER, SSE/AVX for Intel processors, and NEON for ARM. ...
Building Retargetable and Efficient Compilers for Multimedia Instruction Sets
PACT '11: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques

Multimedia Instruction Sets have been introduced more than 20 years ago to speedup multimedia processing on General Purpose Processors. However, to take advantage of these instructions, developers have to cope with the low-level assembly or the ...

Comments

Information & Contributors

Information

Published In

IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

May 2013

2304 pages

ISBN:9780769549798

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 May 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ghodrati SKinzer SXu HMahapatra RKim YAhn BWang DKarthikeyan LYazdanbakhsh APark JKim NEsmaeilzadeh HTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Tandem Processor: Grappling with Emerging Operators in Neural NetworksProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640365(1165-1182)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640365
Freitas HMendes CIlic A(2022)Performance optimization of the MGB hydrological model for multi-core and GPU architecturesEnvironmental Modelling & Software10.1016/j.envsoft.2021.105271148:COnline publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.envsoft.2021.105271
Krishnakumar AOgras UPhan LBroman D(2021)Performance analysis and optimization of decision tree classifiers on embedded devicesProceedings of the 2021 International Conference on Embedded Software10.1145/3477244.3477618(37-38)Online publication date: 30-Sep-2021
https://dl.acm.org/doi/10.1145/3477244.3477618
Li HHan JHan D(2020)Leveraging SIMD parallelism for accelerating network applicationsProceedings of the 4th Asia-Pacific Workshop on Networking10.1145/3411029.3411033(23-29)Online publication date: 3-Aug-2020
https://dl.acm.org/doi/10.1145/3411029.3411033
Febvay MWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Low-level Optimizations for Faster Mobile Deep Learning Inference FrameworksProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416516(4738-4742)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416516
Motamedi MFong DGhiasi S(2019)CappuccinoIEEE Embedded Systems Letters10.1109/LES.2018.281595411:1(9-12)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/LES.2018.2815954
Pohl ACosenza BJuurlink B(2018)Control Flow Vectorization for ARM NEONProceedings of the 21st International Workshop on Software and Compilers for Embedded Systems10.1145/3207719.3207721(66-75)Online publication date: 28-May-2018
https://dl.acm.org/doi/10.1145/3207719.3207721
Zekri A(2018)Optimizing image spatial filtering on single CPU coreMultimedia Tools and Applications10.1007/s11042-016-4266-577:1(251-281)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s11042-016-4266-5
Varghese AEdwards BMitra GRendell A(2017)Programming the Adapteva Epiphany 64-core network-on-chip coprocessorInternational Journal of High Performance Computing Applications10.1177/109434201559923831:4(285-302)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1177/1094342015599238
Barford LBhattacharyya SLiu Y(2017)Data Flow Algorithms for Processors with Vector ExtensionsJournal of Signal Processing Systems10.1007/s11265-015-1045-x87:1(21-31)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s11265-015-1045-x
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Recommendations

SIMD programming using Intel vector extensions

Simple, portable and fast SIMD intrinsic programming: generic simd library

Building Retargetable and Efficient Compilers for Multimedia Instruction Sets

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations