research-article

vtkSMP: task-based parallel operators for vtk filters

Authors:

B. RaffinAuthors Info & Claims

EGPGV '13: Proceedings of the 13th Eurographics Symposium on Parallel Graphics and Visualization

Pages 41 - 48

Published: 04 May 2013 Publication History

Abstract

NUMA nodes are potentially powerful but taking benefit of their capabilities is challenging due to their architecture (multiple computing cores, advanced memory hierarchy). They are nonetheless one of the key components to enable processing the ever growing amount of data produced by scientific simulations.

In this paper we study the parallelization of patterns commonly used in vtk algorithms and propose a new multithreaded plugin for vtk that eases the development of parallel multi-core vtk filters. We specifically focus on task-based approaches and show that with a limited code refactoring effort we can take advantage of NUMA node capabilities. We experiment our patterns on a transform filter, base isosurface extraction filter and a min/max tree accelerated isosurface extraction. We support 3 programming environments, OpenMP, Intel TBB and X-Kaapi, and propose different algorithmic refinements according to the capabilities of the target environment. Results show that we can speed execution up to 30 times on a 48-core machine.

References

[1]

{ABM* 01} Ahrens J., Brislawn K., Martin K., Geveci B., Law C. C., Papka M.: Large-scale data visualization using parallel data streaming. IEEE Computer Graphics and Applications (2001), 34--41. 2

Digital Library

[2]

{ALS* 00} Ahrens J., Law C., Schroeder W., Martin K., Inc. K., Papka M.: A Parallel Approach for Efficiently Visualizing Extremely Large, Time-Varying Datasets. Tech. rep., 2000. 2

[3]

{BGD12} Broquedis F., Gautier T., Danjean V.: Libkomp, an efficient openmp runtime system for both fork-join and data flow paradigms. In Proceedings of the 8th international conference on OpenMP in a Heterogeneous World (Berlin, Heidelberg, 2012), IWOMP'12, Springer-Verlag, pp. 102--115. 2

Digital Library

[4]

{BJK* 96} Blumofe R., Joerg C., Kuszmaul B., Leiserson C., Randall K., Zhou Y.: Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37, 1 (1996), 55--69. 3

Digital Library

[5]

{DFA* 09} Duran A., Ferrer R., Ayguadé E., Badia R. M., Labarta J.: A proposal to extend the openmp tasking model with dependent tasks. Int. J. Parallel Program. 37 (June 2009), 292--305. 2

Digital Library

[6]

{FLR98} Frigo M., Leiserson C. E., Randall K. H.: The implementation of the cilk-5 multithreaded language. SIGPLAN Not. 33 (1998), 212--223. 2, 3

Digital Library

[7]

{GBP07} Gautier T., Besseron X., Pigeon L.: KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In Proceedings of PASCO'07 (New York, NY, USA, 2007), ACM. 2, 3

Digital Library

[8]

{HIST10} Hendler D., Incze I., Shavit N., Tzafrir M.: Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures (New York, NY, USA, 2010), SPAA '10, ACM, pp. 355--364. 5

Digital Library

[9]

{Lee06} Lee E. A.: The problem with threads. Computer 39 (2006), 33--42. 2

Digital Library

[10]

{LMDG11} Le Mentec F., Danjean V., Gautier T.: X-Kaapi C programming interface. Tech. Rep. RT-0417, INRIA, 2011. 2, 3

[11]

{MAGM11} Moreland K., Ayachit U., Geveci B., Ma K.-L.: Dax toolkit: A proposed framework for data analysis and visualization at extreme scale. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on (2011), pp. 97--104. 2

[12]

{PP95} Pan V. Y., Preparata F. P.: Work-preserving speed-up of parallel matrix computations. SIAM J. Comput (1995). 4

Digital Library

[13]

{Rei07} Reinders J.: Intel threading building blocks, first ed. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2007. 2, 3

Digital Library

[14]

{RVK08} Robison A., Voss M., Kukanov A.: Optimization via reflection on work stealing in TBB. In IPDPS (2008). 3

[15]

{Ska10} Skaugen K.: Petascale to Exascale: Extending Intel's HPC commitment. Tech. rep., ISC keynote, 2010. 8

[16]

{StLA12} Sewell C., ta Lo L., Ahrens J.: Piston: A portable cross-platform framework for data-parallel visualization operators. In Eurographics Symposium on Parallel Graphics ans Visualization (2012). 2

[17]

{TDR10} Tchiboukdjian M., Danjean V., Raffin B.: Cache-efficient parallel isosurface extraction for shared cache multicores. In Eurographics Symposium on Parallel Graphics ans Visualization (2010). 2

Digital Library

[18]

{thr12} Thrust library. http://code.google.com/p/thrust/, 2012. 2

[19]

{VOC* 12} Vo H., Osmari D., Comba J., Lindstrom P., Silva C.: Hyperflow: A heterogeneous dataflow architecture. In Eurographics Symposium on Parallel Graphics ans Visualization (2012). 2, 8

[20]

{VOS* 10} Vo H., Osmari D., Summa B., Comba J., Pascucci V., Silva C.: Streaming-enabled parallel dataflow architecture for multicore systems. In Eurographics/IEEE-VGTC Symposium on Visualization (june 2010), Ltd. B. P., (Ed.), pp. 1073--1082. 2

Digital Library

Index Terms

vtkSMP: task-based parallel operators for vtk filters

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EGPGV '13: Proceedings of the 13th Eurographics Symposium on Parallel Graphics and Visualization

May 2013

81 pages

ISBN:9783905674453

Conference Chair:
Margarita Amor López
Universidade da Coruña, Spain
,
Program Chairs:
Fabio Marton
CRS4, Italy
,
Kenneth Moreland
Sandia National Laboratories

Sponsors

EUROGRAPHICS: The European Association for Computer Graphics

In-Cooperation

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Eurographics Association

Goslar, Germany

Publication History

Published: 04 May 2013

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
6
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

Media

Figures

Other

Tables

View Table of Contents