Abstract
In this paper, we introduce several new schemes for calculation of discrete wavelet transforms of images. These schemes reduce the number of steps and, as a consequence, allow to reduce the number of synchronizations on parallel architectures. As an additional useful property, the proposed schemes can reduce also the number of arithmetic operations. The schemes are primarily demonstrated on CDF 5/3 and CDF 9/7 wavelets employed in JPEG 2000 image compression standard. However, the presented method is general, and it can be applied on any wavelet transform. As a result, our scheme requires only two memory barriers for 2-D CDF 5/3 transform compared to four barriers in the original separable form or three barriers in the non-separable scheme recently published. Our reasoning is supported by exhaustive experiments on high-end graphics cards.



















Similar content being viewed by others
References
Cohen, A., Daubechies, I., Feauveau, J.C.: Biorthogonal bases of compactly supported wavelets. Commun. Pure Appl. Math. 45(5), 485–560 (1992). doi:10.1002/cpa.3160450502
Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 4(3), 247–269 (1998). doi:10.1007/BF02476026
Mallat, S.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989). doi:10.1109/34.192463
Matela, J.: GPU-based DWT acceleration for JPEG2000. In: Annual Doctoral Workshop on Mathematical and Engineering Methods in Computer Science, pp. 136–143 (2009)
Arguello, F., Heras, D.B., Boo, M., Lamas-Rodriguez, J.: The split-and-merge method in general purpose computation on GPUs. Parallel Comput. 38(6–7), 277–288 (2012). doi:10.1016/j.parco.2012.03.003
Mallat, S.: A Wavelet Tour of Signal Processing: The Sparse Way. With contributions from Gabriel Peyre, 3rd edn. Academic Press, London (2009)
Strang, G., Nguyen, T.: Wavelets and Filter Banks. Wellesley-Cambridge Press, Cambridge (1997)
Sweldens, W.: The lifting scheme: a custom-design construction of biorthogonal wavelets. Appl. Comput. Harmonic Anal. 3(2), 186–200 (1996)
Rauber, T., Runger, G.: Parallel Programming: For Multicore and Cluster Systems. Springer, Berlin (2013). doi:10.1007/978-3-642-37801-0
Franco, J., Bernabe, G., Fernandez, J., Ujaldon, M.: The 2D wavelet transform on emerging architectures: GPUs and multicores. J. Real-Time Image Process. 7(3), 145–152 (2011). doi:10.1007/s11554-011-0224-7
Tenllado, C., Lario, R., Prieto, M., Tirado, F.: The 2D discrete wavelet transform on programmable graphics hardware. Vis. Imaging Image Process. Conf. 2004, 808–813 (2004)
Tenllado, C., Setoain, J., Prieto, M., Pinuel, L., Tirado, F.: Parallel implementation of the 2D discrete wavelet transform on graphics processing units: filter bank versus lifting. IEEE Trans. Parallel Distrib. Syst. 19(3), 299–310 (2008). doi:10.1109/TPDS.2007.70716
Franco, J., Bernabe, G., Fernandez, J., Acacio, M.: A parallel implementation of the 2D wavelet transform using CUDA. In: 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 111–118 (2009). doi:10.1109/PDP.2009.40
Blazewicz, M., Ciznicki, M., Kopta, P., Kurowski, K., Lichocki, P.: Two-dimensional discrete wavelet transform on large images for hybrid computing architectures: GPU and CELL. In: Euro-Par 2011: Parallel Processing Workshops, LNCS, vol. 7155, pp. 481–490. Springer, Berlin. doi:10.1007/978-3-642-29737-3_53 (2012)
Galiano, V., Lopez, O., Malumbres, M., Migallon, H.: Improving the discrete wavelet transform computation from multicore to GPU-based algorithms. In: Proceedings of the 11th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE), pp. 544–555 (2011)
Galiano, V., Lopez, O., Malumbres, M., Migallon, H.: Parallel strategies for 2D discrete wavelet transform in shared memory systems and GPUs. J. Supercomput. 64(1), 4–16 (2013). doi:10.1007/s11227-012-0750-5
van der Laan, W., Roerdink, J.B.T.M., Jalba, A.: Accelerating wavelet-based video coding on graphics hardware using CUDA. In: Proceedings of 6th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 608–613 (2009). doi:10.1109/ISPA.2009.5297658
van der Laan, W.J., Jalba, A.C., Roerdink, J.B.T.M.: Accelerating wavelet lifting on graphics hardware using CUDA. IEEE Trans. Parallel Distrib. Syst. 22(1), 132–146 (2011). doi:10.1109/TPDS.2010.143
Song, C., Li, Y., Guo, J., Lei, J.: Block-based two-dimensional wavelet transform running on graphics processing unit. IET Comput. Digit. Tech. 8(5), 229–236 (2014). doi:10.1049/iet-cdt.2013.0141
Iwahashi, M.: Four-band decomposition module with minimum rounding operations. Electron. Lett. 43(6), 27–28 (2007). doi:10.1049/el:20073479
Iwahashi, M., Kiya, H.: A new lifting structure of non separable 2D DWT with compatibility to JPEG 2000. In: Acoustics Speech and Signal Processing (ICASSP), pp. 1306–1309 (2010). doi:10.1109/ICASSP.2010.5495427
Iwahashi, M., Kiya, H.: Non separable two dimensional discrete wavelet transform for image signals. In: Discrete Wavelet Transforms—A Compendium of New Approaches and Recent Applications. InTech (2013). doi:10.5772/51199
Kula, M., Barina, D., Zemcik, P.: Block-based approach to 2-D wavelet transform on GPUs. In: International Conference on Information Technology—New Generations (ITNG), pp. 643–653. Springer, Berlin (2016). doi:10.1007/978-3-319-32467-8_56
Acknowledgments
This work has been supported by the Ministry of Education, Youth and Sports of the Czech Republic from the National Programme of Sustainability (NPU II) project IT4Innovations excellence in science—LQ1602.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barina, D., Kula, M. & Zemcik, P. Parallel wavelet schemes for images. J Real-Time Image Proc 16, 1365–1381 (2019). https://doi.org/10.1007/s11554-016-0646-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-016-0646-3