Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Parallelization of torsion finite element code using compressed stiffness matrix algorithm

  • Original Article
  • Published:
Engineering with Computers Aims and scope Submit manuscript

Abstract

In the current study, the problems of elastic and elastoplastic torsion were formulated by finite element method. The finite element code was parallelized on both shared and distributed memory architectures. An assembling method with high parallelism ability and consuming minimum memory was proposed to obtain compressed global stiffness matrix directly. Parallel programming principles were expressed in two shared memory and distributed memory approaches; moreover, parallel well-known mathematical libraries were briefly expressed. In this paper, the main focus was on a lucid explanation of parallelization mechanisms in detail on two memory architectures such as some settings of Linux operating system for large-scale problems. To verify the ability of the proposed method and its parallel performance, several benchmark examples were represented with different mesh sizes and were compared with their respective analytical solutions. Considering the obtained results, the proposed sparse assembling algorithm decreased required memory significantly (about 103.5 to 105.5 times) and the obtained speedup was about 3.4 for the elastoplastic torsion problem in a simple multicore computer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. Bi-conjugate gradient stabilized.

  2. Conjugate gradient.

  3. Generalized minimal residual method.

  4. In computational authorities in-core technique means that store data on memory of computer instead of hard disk.

  5. This subroutine is f_dcreate_matrix_dist; for more information, visit SuperLU user guide.

References

  1. Parhami B (2002) Introduction to parallel processing: algorithms and architectures. Kluwer Academic

  2. Kosec G et al (2014) Super linear speedup in a local parallel meshless solution of thermo-fluid problems. Comput Struct 133:30–38

    Article  Google Scholar 

  3. Kalro V et al (1997) Parallel finite element simulation of large ram-air parachutes. Int J Numer Methods Fluids 24(12):1353–1369

    Article  Google Scholar 

  4. Majumder S, Rixner S (2004) Comparing Ethernet and Myrinet for MPI communication. In: Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems (LCR 2004), pp 83–89, October 2004

  5. Turner EL, Hu H (2001) A parallel CFD rotor code using OpenMP. Adv Eng Softw 32(8):665–671

    Article  Google Scholar 

  6. MPI Forum (1994) MPI: a message-passing interface standard. Int J Supercomput Appl High Perform Comput 8:159–416

    Google Scholar 

  7. Nakajima K (2005) Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator. Parallel Comput 31(10–12):1048–1065

    Article  Google Scholar 

  8. Bauza CG, et al. (2009) Parallel implementation of a fem code by using MPI/PETSC and OpenMP hybrid programming techniques

  9. Vargas-Félix M, Botello-Rionda S (2012) Solution of finite element problems using hybrid parallelization with MPI and OpenMP. Acta Univ 22(7):14–24

    Google Scholar 

  10. Grimes R, Lucas R, Wagenbreth G (2011) Progress on GPU implementation for LS-DYNA implicit mechanics. In: 8th European LS-DYNA users conference, Strasbourg

  11. Garrison LH et al (2018) The abacus cosmos: a suite of cosmological N-body simulations. Astrophy J Suppl Ser 236(2):43

    Article  Google Scholar 

  12. Cheng J, Grossman M, McKercher T (2014) Professional Cuda C programming. Wiley

  13. Cercos-Pita JL (2015) AQUAgpusph, a new free 3D SPH solver accelerated with OpenCL. Comput Phys Commun 192:295–312

    Article  MathSciNet  Google Scholar 

  14. Domínguez JM et al (2013) New multi-GPU implementation for smoothed particle hydrodynamics on heterogeneous clusters. Comput Phys Commun 184(8):1848–1860

    Article  Google Scholar 

  15. Timoshenko S, Goodier J (1951) Theory of elasticity. McGraw-Hill Book Company Inc, New York

    MATH  Google Scholar 

  16. Smith JO, Sidebottom OM (1965) Inelastic behavior of load-carrying members. Wiley, New York, NY

    Google Scholar 

  17. Hodge PG, Herakovich CT, Stout RB (1968) On numerical comparisons in elastic-plastic torsion. J Appl Mech 35(3):454–459

    Article  Google Scholar 

  18. Yamada Y, Nakagiri S, Takatsuka K (1972) Elastic-plastic analysis of Saint-Venant torsion problem by a hybrid stress model. Int J Numer Methods Eng 5(2):193–207

    Article  Google Scholar 

  19. May I, Al-Shaarbaf I (1989) Elasto-plastic analysis of torsion using a three-dimensional finite element model. Comput Struct 33(3):667–678

    Article  Google Scholar 

  20. Baniassadi M et al (2010) A novel semi-inverse solution method for elastoplastic torsion of heat treated rods. Meccanica 45(3):375–392

    Article  Google Scholar 

  21. Liu C-S (2007) A meshless regularized integral equation method (MRIEM) for Laplace equation in arbitrary interior or exterior plane domains. Proc ICCES 7:69–80

    Google Scholar 

  22. Krupka J, Šimecek I (2010) Parallel solvers of Poisson’s equation. Department of Computer Systems, Faculty of Information Technology, Czech Technical University, Prague, MEMICS 2010

  23. Koric S, Lu Q, Guleryuz E (2014) Evaluation of massively parallel linear sparse solvers on unstructured finite element meshes. Comput Struct 141:19–25

    Article  Google Scholar 

  24. Woźniak M et al (2015) Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines. Comput Methods Appl Mech Eng 284:971–987

    Article  MathSciNet  Google Scholar 

  25. Koric S, Gupta A (2016) Sparse matrix factorization in the implicit finite element method on petascale architecture. Comput Methods Appl Mech Eng 302:281–292

    Article  MathSciNet  Google Scholar 

  26. Naumov M (2011) Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS. Tech. rep., Technical Report and White Paper

  27. Li A, Mazhar H, Serban R, Negrut D (2015) Comparison of SPMV performance on matrices with different matrix format using CUSP, cuSPARSE and ViennaCL, Technical Report TR-2015-02, SBEL, University of Wisconsin-Madison, Tech. Rep

  28. Trost N, Jiménez J, Lukarski D, Sanchez V (2015) Accelerating COBAYA3 on Multi-Core CPU and GPU Systems using PARALUTION. Ann Nuclear Energy 82:252–259

    Article  Google Scholar 

  29. Sadd MH (2009) Elasticity: theory, applications, and numerics, 2nd edn. Boston Elsevier/AP, Amsterdam

    Google Scholar 

  30. Bland J (1993) Implementation of an algorithm for elastoplastic torsion. Adv Eng Softw 17(1):61–68

    Article  Google Scholar 

  31. KoŁodziej JA, Gorzelańczyk P (2012) Application of method of fundamental solutions for elasto-plastic torsion of prismatic rods. Eng Anal Bound Elements 36(2):81–86

    Article  MathSciNet  Google Scholar 

  32. Mukhtar FM, Al-Gahtani HJ (2016) Application of radial basis functions to the problem of elasto-plastic torsion of prismatic bars. Appl Math Model 40(1):436–450

    Article  MathSciNet  Google Scholar 

  33. Koric S, Hibbeler LC, Thomas BG (2009) Explicit coupled thermo-mechanical finite element model of steel solidification. Int J Numer Methods Eng 78(1):1–31

    Article  Google Scholar 

  34. Li J et al (2012) Elastic–plastic transition in three-dimensional random materials: massively parallel simulations, fractal morphogenesis and scaling functions. Philos Mag 92(22):2733–2758

    Article  Google Scholar 

  35. Samii A, Michoski C, Dawson C (2016) A parallel and adaptive hybridized discontinuous Galerkin method for anisotropic nonhomogeneous diffusion. Comput Methods Appl Mech Eng 304:118–139

    Article  MathSciNet  Google Scholar 

  36. Blackford LS, Choi J, Cleary A, D’Azevedo E, Demmel J, Dhillon I, Dongarra J, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley R (1997) ScaLAPACK users’ guide, Society for Industrial and Applied Mathematics

  37. Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra J, Croz JD, Greenbaum A, Hammarling S, McKenney A et al (1999) LAPACK users' guide, SIAM, Philadelphia

  38. Cebrián JM et al (2017) Code modernization strategies to 3-D Stencil-based applications on Intel Xeon Phi: KNC and KNL. Comput Math Appl 74(10):2557–2571

    Article  MathSciNet  Google Scholar 

  39. Akhter S, Roberts J (2006) Multi-core programming: increasing performance through software multi-threading. Books by engineers for engineers. Intel Press

  40. Silberschatz A, Galvin PB, Gagne G (2014) Operating system concepts essentials. Wiley

  41. Duff IS, Grimes RG, Lewis JG (1992) Users’ guide for the Harwell-Boeing sparse matrix collection (release 1). Technical Report RAL-92-086, Rutherford Appleton Laboratory

  42. Demmel JW, Gilbert J, Li XS (1997) SuperLU users’ guide. Computer Science Division, University of California, Berkeley, Tech. Rep. CSD-97-944

  43. Li XS, Demmel JW (1999) SuperLU DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans Math Softw 29:110–140. https://doi.org/10.1145/779359.779361

    Article  MathSciNet  Google Scholar 

  44. Snir M, Otto S, Huss-Lederman S, Walker D, Dongarra J (1998) MPI: the complete reference, vol 1. The MIT Press. ISBN 0262692155

  45. Hermanns M (2002) Parallel programming in Fortran 95 using OpenMP, vol 75. Universidad Politecnica de Madrid, Madrid

    Google Scholar 

  46. Geuzaine C, Remacle JF (2009) Gmsh: A 3-D finite element mesh generator with built-in pre-and post-processing facilities. Int J Numer Methods Eng 79(11):1309–1331

    Article  MathSciNet  Google Scholar 

  47. Wagner W, Gruttmann F (2001) Finite element analysis of Saint-Venant torsion problem with exact integration of the elastic–plastic constitutive equations. Comput Methods Appl Mech Eng 190(29–30):3831–3848

    Article  Google Scholar 

  48. Manchanda N, Anand K (2010) Non-uniform memory access (numa). New York University 4

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Rahmani Firoozjaee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sefidgar, S.M.H., Firoozjaee, A.R. & Dehestani, M. Parallelization of torsion finite element code using compressed stiffness matrix algorithm. Engineering with Computers 37, 2439–2455 (2021). https://doi.org/10.1007/s00366-020-00952-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00366-020-00952-w

Keywords