Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU

Wei, Kai-Cheng; Sun, Xue; Chu, Hsun; Wu, Chao-Chin

doi:10.1007/s11227-017-2041-7

Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU

Published: 03 May 2017

Volume 73, pages 4711–4738, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Kai-Cheng Wei¹,
Xue Sun^2,3,
Hsun Chu¹ &
…
Chao-Chin Wu¹

241 Accesses
11 Citations
Explore all metrics

Abstract

General-purpose computing on graphics processing unit (GPGPU) has been adopted to accelerate the running of applications which require long execution time in various problem domains. Tabu Search belonging to meta-heuristics optimization has been used to find a suboptimal solution for NP-hard problems within a more reasonable time interval. In this paper, we have investigated in how to improve the performance of Tabu Search algorithm on GPGPU and took the permutation flow shop scheduling problem (PFSP) as the example for our study. In previous approach proposed recently for solving PFSP by Tabu Search on GPU, all the job permutations are stored in global memory to successfully eliminate the occurrences of branch divergence. Nevertheless, the previous algorithm requires a large amount of global memory space, because of a lot of global memory access resulting in system performance degradation. We propose a new approach to address the problem. The main contribution of this paper is an efficient multiple-loop struct to generate most part of the permutation on the fly, which can decrease the size of permutation table and significantly reduce the amount of global memory access. Computational experiments on problems according with benchmark suite for PFSP reveal that the best performance improvement of our approach is about 100%, comparing with the previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A GPU-Based Backtracking Algorithm for Permutation Combinatorial Problems

Solving the Permutation Problem Efficiently for Tabu Search on CUDA GPUs

Multiple k −opt evaluation multiple k −opt moves with GPU high performance local search to large-scale traveling salesman problems

Article 16 April 2020

References

Fung J, Tang F, Mann S (2002) Mediated reality using computer graphics hardware for computer vision. In: Proceedings of the International Symposium on Wearable Computing 2002, 83–89
Fung J, Mann S (2004) Computer vision signal processing on graphics processing units. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp V-93–V-96
Abi-Chahla F (2015) Nvidia’s CUDA: The End of the CPU?. Tom’s Hardware
Zouaneb I, Belarbi M, Chouarfia A (2016) Multi approach for real-time systems specification: case study of GPU parallel systems. Int J Big Data Intell 3(2):122–141
Article Google Scholar
Playne DP, Hawick KA (2015) Benchmarking multi-GPU communication using the shallow water equations. Int J Big Data Intell 2(3):157–167
Article Google Scholar
Wu CC, Ke JY, Lin H, Jhan SS (2014) Adjusting thread parallelism dynamically to accelerate dynamic programming with irregular workload distribution on GPGPUs. Int J Grid High Perform Comput (IJGHPC) 6(1):1–20
Article Google Scholar
Novoa C, Qasem A, Chaparala A (2015) A SIMD tabu search implementation for solving the quadratic assignment problem with GPU acceleration. In: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, pp 13
Czapiński M, Barnes S (2011) Tabu search with two approaches to parallel flowshop evaluation on CUDA platform. J Parallel Distrib Comput 71:802–811
Article Google Scholar
Johnson SM (1954) Optimal two- and three-stage production schedules with setup times included. Naval Res Logist Q 1(1):61–68
Article MATH Google Scholar
Garey MR, Johnson D, Sethi R (1976) The complexity of flowshop and jobshop scheduling. Math Oper Res 1(2):117–129
Article MATH MathSciNet Google Scholar
Chung C-S, Flynn J, Kirca O (2002) A branch and bound algorithm to minimize the total flow time for m-machine permutation flowshop problems. Int J Prod Econ 79(3):185–196
Article MATH Google Scholar
Bautista J, Canoa A, Companys R, Ribasb I (2012) Solving the Fm$\mid $block$\mid $C$_{max}$ problem using bounded dynamic programming. Eng Appl Artif Intell 25(6):1235–1245
Article Google Scholar
Ren T, Zhao P, Zhang D, Liu B, Yuan H, Bai D (2016) Permutation flow-shop scheduling problem to optimize a quadratic objective function. Eng Optim. doi:10.1080/0305215X.2016.1261127
Gangadharan R, Rajendran C (1993) Heuristic algorithms for scheduling in the no-wait flowshop. Int J Prod Econ 32(3):285–290
Article Google Scholar
Santos N, Rebelo R, Pedroso J (2014) A tabu search for the permutation flow shop problem with sequence dependent setup times. Int J Data Anal Tech Strateg 6(3):275–285
Article Google Scholar
Gao J, Chen R, Dong W (2013) An efficient tabu search algorithm for the distributed permutation flowshop scheduling problem. Int J Prod Res 51(3):641–651
Article Google Scholar
Rajkumar R, Shahabudeen P (2009) An improved genetic algorithm for the flowshop scheduling problem. Int J Prod Res 47(1):233–249
Article MATH Google Scholar
Jarosław P, Czesław S, Dominik Ż (2013) Optimizing bicriteria flow shop scheduling problem by simulated annealing algorithm. Proc Comput Sci 18:936–945
Article Google Scholar
Xu X, Xu Z, Gu X (2011) An asynchronous genetic local search algorithm for the permutation flowshop scheduling problem with total flowtime minimization. Expert Syst Appl 38(7):7970–7979
Article Google Scholar
Banka M, Ghomia SMTF, Jolai F, Behnamian J (2012) Application of particle swarm optimization and simulated annealing algorithms in flow shop scheduling problem under linear deterioration. Adv Eng Softw 47(1):1–6
Article Google Scholar
Ahmadiza F (2012) A new ant colony algorithm for makespan minimization in permutation flow shops. Comput Ind Eng 63(2):355–361
Article Google Scholar
Bożejko W, Uchroński M, Wodeck M (2016) Parallel metaheuristics for the cyclic flow shop scheduling problem. Comput Ind Eng 95:156–163
Article Google Scholar
Czapiński M (2010) Parallel simulated annealing with genetic enhancement for flowshop problem with C$_{sum}$. Comput Ind Eng 59(4):778–785
Article Google Scholar
Bożejko W (2009) Solving the flow shop problem by parallel programming. J Parallel Distrib Comput 69(5):470–481
Article Google Scholar
Nowicki E, Smutnicki C (1998) The flow shop with parallel machines: a tabu search approach. Eur J Oper Res 106(2–3):226–253
Article MATH Google Scholar
Janiak A, Janiak WA, Lichtenstein M (2008) Tabu Search on GPU. J UCS 14(14):2416–2426
Google Scholar
Kaviani M, Abbasi M, Rahpeyma B, Yusefi M (2014) A hybrid tabu search-simulated annealing method to solve quadratic assignment problem. Decis Sci Lett 3(3):391–396
Article Google Scholar
Pattnaik A, Tang X, Jog A, Kayiran O, Mishra AK, Kandemir MT, Mutlu O, Das CR (2016) Scheduling techniques for GPU architectures with processing-in-memory capabilities. In: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, pp 31–44
Han TD, Abdelrahman TS (2011) Reducing branch divergence in GPU programs. In: Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, pp 1–8
Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55
Article Google Scholar
Lu F, Song J, Cao X, Zhu X (2012) CPU/GPU computing for long-wave radiation physics on large GPU clusters. Comput Geosci 41:47–55
Article Google Scholar
Nvidia CUDA (2017) CUDA C Programming Guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
Nvidia CUDA (2017) CUDA C BEST PRACTICES GUIDE. http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf
Liu Y-F, Liu S-Y (2011) A hybrid discrete artificial bee colony algorithm for permutation flowshop scheduling problem. Appl Soft Comput 13(3):1459–1463
Article Google Scholar
Lin Q, Gao L, Li X, Zhang C (2015) A hybrid backtracking search algorithm for permutation flow-shop. Comput Ind Eng 85:437–446
Article Google Scholar
Glover F (1989) Tabu search—part I. ORSA J Comput 1(3):190–206
Article MATH Google Scholar
Glover F (1990) Tabu search—part II. ORSA J Comput 2(1):4–32
Article MATH Google Scholar
Huang L-T, Jhan S-S, Li Y-J, Wu C.C (2014) Solving the permutation problem efficiently for tabu search on CUDA GPUs. In: Proceedings of 6th International Conference on Computational Collective Intelligence Technologies and Applications, pp 342–352
Wu C-C, Wei K-C, Lai W-S, Li Y-J (2016) Avoiding duplicated computation to improve the performance of PFSP on CUDA GPUs. Comput Sci Inform Technol 6:13–23
Google Scholar
Fung WWL, Sham I, Yuan G, Aamodt TM (2007) Dynamic warp formation and scheduling for efficient GPU control flow. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp 407–420
Taillard E (1993) Benchmarks for basic scheduling problems. Eur J Oper Res 64(2):278–285
Article MATH Google Scholar

Download references

Acknowledgements

We would like to express our gratitude for reviewers’ valuable comments and thank the National Science Council, Taiwan, for financially supporting this research under Contract No. MOST104-2221-E-018-007.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Changhua University of Education, Changhua, 500, Taiwan
Kai-Cheng Wei, Hsun Chu & Chao-Chin Wu
Department of Electrical Engineering, National Changhua University of Education, Changhua, 500, Taiwan
Xue Sun
College of Automation, Beijing Union University, Beijing, 100101, China
Xue Sun

Authors

Kai-Cheng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Xue Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hsun Chu
View author publications
You can also search for this author in PubMed Google Scholar
Chao-Chin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao-Chin Wu.

Appendix

This appendix argue that the maximum number of block areas is five for each permutation segment table.

The permutation table is constructed as follows. Each permutation is generated by swapping two positions on the parent permutation, resulting in $C_2^n $ child permutations totally. These child permutations are placed into the permutation table by ordering defined in Table 4 .

Table 4 The indices of “From” and “To” for each ordered child permutation

Full size table

The two indices, “From” and “To” indicate that two positions on the parent permutation are swapped for one child permutation, where $1\le From\le n-1$ and $2\le From\le n$. Note that “From” is smaller than “To” for any one of the child permutations.

The permutation table can be divided into ($n-1)$ groups from left to right, where each group has the same “From” value. For instance, in the 7$^{\mathrm{th}}$ group, the “From” index of each child permutation is 7. We illustrate the conceptual overview of the s$^{\mathrm{th}}$ group in Fig. 19. There are exactly two shaded cells in each column, representing the two swapped positions.

The permutation table will be divided into segment tables, from the left to the right columns. Each permutation segment table consists of 32 consecutive columns because the size of one warp is 32.

First, assume that one permutation segment table falls in only one group of child permutations. There are three cases as shown in Fig. 20. Case 1 is derived when the permutation segment table (PST) begins from the first column of the group, where there are three block areas (BAs). Case 2 is obtained when the PST ends at the final column of the group, where there are five BAs. Case 3 is the remaining cases and there are four BAs. In general, there are five BAs in cases 2. However, if the PST in case 2 is equivalent to the s$^{\mathrm{th}}$ group, there are only two BAs because BA3 and BA5 will not exist and BA4 will be merged into BA2.

Next, let us look at the cases when one PST includes multiple groups of child permutations, as shown in Fig. 21. Case 5 contains the last columns in the sth group and the first column in the (s+1)th group, where several rows, between Row sand Rown, have the same values in their own row, respectively. There are four BAs in cases 5. Case 4 demonstrates a general case when one PST is comprised of multiple groups. If there are more than two groups to form a PST, the rows with distinct data will be merged into BA2, resulting in two BAs totally.

According to the above analysis, the maximum number of BAs is five.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, KC., Sun, X., Chu, H. et al. Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU. J Supercomput 73, 4711–4738 (2017). https://doi.org/10.1007/s11227-017-2041-7

Download citation

Published: 03 May 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11227-017-2041-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A GPU-Based Backtracking Algorithm for Permutation Combinatorial Problems

Solving the Permutation Problem Efficiently for Tabu Search on CUDA GPUs

Multiple k −opt evaluation multiple k −opt moves with GPU high performance local search to large-scale traveling salesman problems

References

Acknowledgements