research-article

On the exploitation of loop-level parallelism in embedded applications

Authors:

Arun Kejariwal,

Alexander V. Veidenbaum,

Alexandru Nicolau,

Milind Girkar,

Xinmin Tian,

Hideki SaitoAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 8, Issue 2

Article No.: 10, Pages 1 - 34

https://doi.org/10.1145/1457255.1457257

Published: 09 February 2009 Publication History

Get Access

Abstract

Advances in the silicon technology have enabled increasing support for hardware parallelism in embedded processors. Vector units, multiple processors/cores, multithreading, special-purpose accelerators such as DSPs or cryptographic engines, or a combination of the above have appeared in a number of processors. They serve to address the increasing performance requirements of modern embedded applications. To what extent the available hardware parallelism can be exploited is directly dependent on the amount of parallelism inherent in the given application and the congruence between the granularity of hardware and application parallelism. This paper discusses how loop-level parallelism in embedded applications can be exploited in hardware and software. Specifically, it evaluates the efficacy of automatic loop parallelization and the performance potential of different types of parallelism, viz., true thread-level parallelism (TLP), speculative thread-level parallelism and vector parallelism, when executing loops. Additionally, it discusses the interaction between parallelization and vectorization. Applications from both the industry-standard EEMBC®,¹ 1.1, EEMBC 2.0 and the academic MiBench embedded benchmark suites are analyzed using the Intel®² C compiler. The results show the performance that can be achieved today on real hardware and using a production compiler, provide upper bounds on the performance potential of the different types of thread-level parallelism, and point out a number of issues that need to be addressed to improve performance. The latter include parallelization of libraries such as libc and design of parallel algorithms to allow maximal exploitation of parallelism. The results also point to the need for developing new benchmark suites more suitable to parallel compilation and execution.

¹ Other names and brands may be claimed as the property of others.

² Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.

References

[1]

Adve, S. V. and Gharachorloo, K. 1996. Shared memory consistency models: A tutorial. IEEE Comput. 29, 12, 66--76.

Abstract

References

Cited By

Index Terms

Recommendations

Challenges in exploitation of loop parallelism in embedded applications

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

Combining thread level speculation helper threads and runahead execution

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations