Abstract
In the current study, the problems of elastic and elastoplastic torsion were formulated by finite element method. The finite element code was parallelized on both shared and distributed memory architectures. An assembling method with high parallelism ability and consuming minimum memory was proposed to obtain compressed global stiffness matrix directly. Parallel programming principles were expressed in two shared memory and distributed memory approaches; moreover, parallel well-known mathematical libraries were briefly expressed. In this paper, the main focus was on a lucid explanation of parallelization mechanisms in detail on two memory architectures such as some settings of Linux operating system for large-scale problems. To verify the ability of the proposed method and its parallel performance, several benchmark examples were represented with different mesh sizes and were compared with their respective analytical solutions. Considering the obtained results, the proposed sparse assembling algorithm decreased required memory significantly (about 103.5 to 105.5 times) and the obtained speedup was about 3.4 for the elastoplastic torsion problem in a simple multicore computer.
Similar content being viewed by others
Notes
Bi-conjugate gradient stabilized.
Conjugate gradient.
Generalized minimal residual method.
In computational authorities in-core technique means that store data on memory of computer instead of hard disk.
This subroutine is f_dcreate_matrix_dist; for more information, visit SuperLU user guide.
References
Parhami B (2002) Introduction to parallel processing: algorithms and architectures. Kluwer Academic
Kosec G et al (2014) Super linear speedup in a local parallel meshless solution of thermo-fluid problems. Comput Struct 133:30–38
Kalro V et al (1997) Parallel finite element simulation of large ram-air parachutes. Int J Numer Methods Fluids 24(12):1353–1369
Majumder S, Rixner S (2004) Comparing Ethernet and Myrinet for MPI communication. In: Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems (LCR 2004), pp 83–89, October 2004
Turner EL, Hu H (2001) A parallel CFD rotor code using OpenMP. Adv Eng Softw 32(8):665–671
MPI Forum (1994) MPI: a message-passing interface standard. Int J Supercomput Appl High Perform Comput 8:159–416
Nakajima K (2005) Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator. Parallel Comput 31(10–12):1048–1065
Bauza CG, et al. (2009) Parallel implementation of a fem code by using MPI/PETSC and OpenMP hybrid programming techniques
Vargas-Félix M, Botello-Rionda S (2012) Solution of finite element problems using hybrid parallelization with MPI and OpenMP. Acta Univ 22(7):14–24
Grimes R, Lucas R, Wagenbreth G (2011) Progress on GPU implementation for LS-DYNA implicit mechanics. In: 8th European LS-DYNA users conference, Strasbourg
Garrison LH et al (2018) The abacus cosmos: a suite of cosmological N-body simulations. Astrophy J Suppl Ser 236(2):43
Cheng J, Grossman M, McKercher T (2014) Professional Cuda C programming. Wiley
Cercos-Pita JL (2015) AQUAgpusph, a new free 3D SPH solver accelerated with OpenCL. Comput Phys Commun 192:295–312
Domínguez JM et al (2013) New multi-GPU implementation for smoothed particle hydrodynamics on heterogeneous clusters. Comput Phys Commun 184(8):1848–1860
Timoshenko S, Goodier J (1951) Theory of elasticity. McGraw-Hill Book Company Inc, New York
Smith JO, Sidebottom OM (1965) Inelastic behavior of load-carrying members. Wiley, New York, NY
Hodge PG, Herakovich CT, Stout RB (1968) On numerical comparisons in elastic-plastic torsion. J Appl Mech 35(3):454–459
Yamada Y, Nakagiri S, Takatsuka K (1972) Elastic-plastic analysis of Saint-Venant torsion problem by a hybrid stress model. Int J Numer Methods Eng 5(2):193–207
May I, Al-Shaarbaf I (1989) Elasto-plastic analysis of torsion using a three-dimensional finite element model. Comput Struct 33(3):667–678
Baniassadi M et al (2010) A novel semi-inverse solution method for elastoplastic torsion of heat treated rods. Meccanica 45(3):375–392
Liu C-S (2007) A meshless regularized integral equation method (MRIEM) for Laplace equation in arbitrary interior or exterior plane domains. Proc ICCES 7:69–80
Krupka J, Šimecek I (2010) Parallel solvers of Poisson’s equation. Department of Computer Systems, Faculty of Information Technology, Czech Technical University, Prague, MEMICS 2010
Koric S, Lu Q, Guleryuz E (2014) Evaluation of massively parallel linear sparse solvers on unstructured finite element meshes. Comput Struct 141:19–25
Woźniak M et al (2015) Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines. Comput Methods Appl Mech Eng 284:971–987
Koric S, Gupta A (2016) Sparse matrix factorization in the implicit finite element method on petascale architecture. Comput Methods Appl Mech Eng 302:281–292
Naumov M (2011) Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS. Tech. rep., Technical Report and White Paper
Li A, Mazhar H, Serban R, Negrut D (2015) Comparison of SPMV performance on matrices with different matrix format using CUSP, cuSPARSE and ViennaCL, Technical Report TR-2015-02, SBEL, University of Wisconsin-Madison, Tech. Rep
Trost N, Jiménez J, Lukarski D, Sanchez V (2015) Accelerating COBAYA3 on Multi-Core CPU and GPU Systems using PARALUTION. Ann Nuclear Energy 82:252–259
Sadd MH (2009) Elasticity: theory, applications, and numerics, 2nd edn. Boston Elsevier/AP, Amsterdam
Bland J (1993) Implementation of an algorithm for elastoplastic torsion. Adv Eng Softw 17(1):61–68
KoŁodziej JA, Gorzelańczyk P (2012) Application of method of fundamental solutions for elasto-plastic torsion of prismatic rods. Eng Anal Bound Elements 36(2):81–86
Mukhtar FM, Al-Gahtani HJ (2016) Application of radial basis functions to the problem of elasto-plastic torsion of prismatic bars. Appl Math Model 40(1):436–450
Koric S, Hibbeler LC, Thomas BG (2009) Explicit coupled thermo-mechanical finite element model of steel solidification. Int J Numer Methods Eng 78(1):1–31
Li J et al (2012) Elastic–plastic transition in three-dimensional random materials: massively parallel simulations, fractal morphogenesis and scaling functions. Philos Mag 92(22):2733–2758
Samii A, Michoski C, Dawson C (2016) A parallel and adaptive hybridized discontinuous Galerkin method for anisotropic nonhomogeneous diffusion. Comput Methods Appl Mech Eng 304:118–139
Blackford LS, Choi J, Cleary A, D’Azevedo E, Demmel J, Dhillon I, Dongarra J, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley R (1997) ScaLAPACK users’ guide, Society for Industrial and Applied Mathematics
Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra J, Croz JD, Greenbaum A, Hammarling S, McKenney A et al (1999) LAPACK users' guide, SIAM, Philadelphia
Cebrián JM et al (2017) Code modernization strategies to 3-D Stencil-based applications on Intel Xeon Phi: KNC and KNL. Comput Math Appl 74(10):2557–2571
Akhter S, Roberts J (2006) Multi-core programming: increasing performance through software multi-threading. Books by engineers for engineers. Intel Press
Silberschatz A, Galvin PB, Gagne G (2014) Operating system concepts essentials. Wiley
Duff IS, Grimes RG, Lewis JG (1992) Users’ guide for the Harwell-Boeing sparse matrix collection (release 1). Technical Report RAL-92-086, Rutherford Appleton Laboratory
Demmel JW, Gilbert J, Li XS (1997) SuperLU users’ guide. Computer Science Division, University of California, Berkeley, Tech. Rep. CSD-97-944
Li XS, Demmel JW (1999) SuperLU DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans Math Softw 29:110–140. https://doi.org/10.1145/779359.779361
Snir M, Otto S, Huss-Lederman S, Walker D, Dongarra J (1998) MPI: the complete reference, vol 1. The MIT Press. ISBN 0262692155
Hermanns M (2002) Parallel programming in Fortran 95 using OpenMP, vol 75. Universidad Politecnica de Madrid, Madrid
Geuzaine C, Remacle JF (2009) Gmsh: A 3-D finite element mesh generator with built-in pre-and post-processing facilities. Int J Numer Methods Eng 79(11):1309–1331
Wagner W, Gruttmann F (2001) Finite element analysis of Saint-Venant torsion problem with exact integration of the elastic–plastic constitutive equations. Comput Methods Appl Mech Eng 190(29–30):3831–3848
Manchanda N, Anand K (2010) Non-uniform memory access (numa). New York University 4
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sefidgar, S.M.H., Firoozjaee, A.R. & Dehestani, M. Parallelization of torsion finite element code using compressed stiffness matrix algorithm. Engineering with Computers 37, 2439–2455 (2021). https://doi.org/10.1007/s00366-020-00952-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00366-020-00952-w