High Parameter Frequency Resolution Encoding Scheme for Spatial Audio Objects Using Stacked Sparse Autoencoder

Wu, Yulin; Hu, Ruimin; Wang, Xiaochen; Hu, Chenhao; Ke, Shanfa

doi:10.1007/s11063-021-10659-8

High Parameter Frequency Resolution Encoding Scheme for Spatial Audio Objects Using Stacked Sparse Autoencoder

Published: 18 October 2021

Volume 54, pages 817–833, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Yulin Wu^1,2,
Ruimin Hu ORCID: orcid.org/0000-0002-5872-3872^1,2,
Xiaochen Wang^1,3,
Chenhao Hu¹ &
…
Shanfa Ke¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Object-based audio systems have become common in recent years as they provide the flexibility for many auditory scenarios, such as virtual reality games, interactive theater, and spatial audio communication. For saving bitrates, multiple audio objects are compressed into a mono downmix signal and side information parameters. However, side information parameter frequency resolution is too low to cause aliasing distortion. To overcome this issue, a new encoding scheme based on high parameter frequency resolution (224 sub-bands in a frame) is proposed in this paper. The side information parameters with high frequency resolution are compressed and reconstructed via SSAE (stacked sparse autoencoder) neural network and further used for recovering the audio objects. The performance of the proposed method is compared against existing SAOC (spatial audio object coding) methods at the same overall bitrate, judged by both objective and subjective results. The evaluation shows that our approach can facilitate the high quality of spatial audio objects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Stacked Sparse Autoencoder for Audio Object Coding

Multi-step Coding Structure of Spatial Audio Object Coding

Audio object coding based on optimal parameter frequency resolution

Article 05 March 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Ando A (2011) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Transactions Audio Speech Lang Process 19(6):1467–1475
Article Google Scholar
Antoine L, Fabian-Robert S, Zafar R, Daichi K, Bertrand R, Nobutaka I, Nobutaka O, Julie F (2017) The 2016 signal separation evaluation campaign. In: Latent Variable Analysis and Signal Separation - 12th International Conference, Springer International Publishing, pp 323–332
Arteaga D, Pons J (2021) Multichannel-based learning for audio object extraction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 206–210
Bosi M, Goldberg RE (2012) Introduction to digital audio coding and standards, vol 721. Springer, New York
Google Scholar
Bosi M, Brandenburg K, Quackenbush S, Fielder L, Akagiri K, Fuchs H, Dietz M, Herre J, Davidson G, Oikawa Y (1997) ISO/IEC MPEG-2 advanced audio coding. Audio Eng Soc (AES) 45(10):789–814
Google Scholar
Dolby Laboratories (2015) Dolby Atmos for the Home Theater. [Available]: http://www.dolby.com/us/en/technologies/dolby-atmos/dolby-atmos-for-the-home-theater.pdf
Dolby Laboratories (2016) Dolby Atmos. [Available]: http://www.dolby.com/us/en/brands/dolby-atmos.html
Elfitri I, Muharam M, Shobirin M (2014) Distortion analysis of hierarchical mixing technique on MPEG surround standard. In: International Conference on Advanced Computer Science and Information System, pp 396–400
Faller C, Baumgarte F (2003) Binaural cue coding-part II: schemes and applications. IEEE Transactions Speech Audio Process 11(6):520–531
Article Google Scholar
Févotte C, Gribonval R, Vincent E (2005) BSS\_EVAL toolbox user guide–Revision 2.0
Gnouma M, Ladjailia A, Ejbali R, Zaied M (2019) Stacked sparse autoencoder and history of binary motion image for human activity recognition. Multimedia Tools Appl 78(2):2157–2179
Article Google Scholar
Herre J, Disch S (2007) New concepts in parametric coding of spatial audio: from SAC to SAOC. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1894–1897
Herre J, Purnhagen H, Koppens J, Hellmuth O, Engdegard J, Hilpert J, Villemoes L, Terentiv L, Falch C, Holzer A, Valero ML, Resch B, Mundt H, Oh HO (2012) MPEG spatial audio object coding-The ISO/MPEG standard for efficient coding of interactive audio scenes. Audio Eng Soc (AES) 60(9):655–673
Google Scholar
Herre J, Hilpert J, Kuntz A, Plogsties J (2015a) MPEG-H 3D audio-the new standard for coding of immersive spatial audio. IEEE J Sel Topics Signal Process 9(5):770–779
Article Google Scholar
Herre J, Hilpert J, Kuntz A, Plogsties J (2015b) MPEG-H audio-the new standard for universal spatial/3D audio coding. Audio Eng Soc (AES) 62(12):821–830
Article Google Scholar
Hu C, Hu R, Wang X, Wu T, Li D (2020) Multi-step coding structure of spatial audio object coding. In: International Conference on Multimedia Modeling, pp 666–678
Hu C, Hu R, Wang X, Wu Y (2021a) Spatial audio object coding based on time-frequency shifting and scheduling. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6
Hu C, Hu R, Wang X, Wu Y, Liu W (2021b) Efficient multi-step audio object coding with limited residual information. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6
Hu C, Wang X, Hu R, Wu Y (2021) Audio object coding based on n-step residual compensating. Multimedia Tools Appl 80(12):18717–18733
Article Google Scholar
ISO/IEC 23003-2 (2018) Information technology —- MPEG audio technologies —- Part 2: Spatial Audio Object Coding (SAOC)
ISO/IEC 23008-3 (2019) Information technology —- High efficiency coding and media delivery in heterogeneous environments —- Part 3: 3D audio
Jia M, Yang Z, Bao C, Zheng X, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Transactions Audio Speech Lang Process 23(6):1082–1095
Article Google Scholar
Jia M, Zhang J, Bao C, Zheng X (2017) A psychoacoustic-based multiple audio object coding approach via intra-object sparsity. Appl Sci 7(12):1301–1312
Article Google Scholar
Kadam VJ, Jadhav SM, Kurdukar AA, Shirsath MR (2020) Arrhythmia classification using feature ensemble learning based on stacked sparse autoencoders with GA-SVM guided features. In: International Conference on Industry 4.0 Technology (I4Tech), pp 94–99
Kim K, Seo J, Beack S, Kang K, Hahn M (2011) Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Transactions Multimedia 13(6):1208–1216
Article Google Scholar
Li Y, Lei Y, Wang P, Jiang M, Liu Y (2021) Embedded stacked group sparse autoencoder ensemble with L1 regularization and manifold reduction. Appl Soft Comput 101:107003
Article Google Scholar
Murtaza A, Herre J, Paulus J, Terentiv L, Fuchs H, Disch S (2015) ISO/MPEG-H 3D audio: SAOC 3D decoding and rendering. In: Audio Engineering Society (AES) Convention 139
Purnhagen H, Hirvonen T, Villemoes L, Samuelsson J, Klejsa J (2016) Immersive audio delivery using joint object coding. In: Audio Engineering Society (AES) Convention 140
Recommendation ITU-R BS1534-3 (2015) Method for the subjective assessment of intermediate quality level of audio systems. International Telecommunication Union Radiocommunication Assembly
Rohlfing C, ECohen J, Liutkus A (2017) Very low bitrate spatial audio coding with dimensionality reduction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 741–745
Shi C, Luo B, He S, Li K, Liu H, Li B (2020) Tool wear prediction via multidimensional stacked sparse autoencoders with feature fusion. IEEE Transactions Ind Inform 16(8):5150–5159
Article Google Scholar
Villemoes L, Hirvonen T, Purnhagen H (2017) Decorrelation for audio object coding. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 706–710
Vincent E, Gribonval R, Févotte C (2006) Performance measurement in blind audio source separation. IEEE Transactions Audio Speech Lang Process 14(4):1462–1469
Article Google Scholar
Wang Y, Yao H, Zhao S (2016) Auto-encoder based dimensionality reduction. Neurocomputing 184:232–242
Article Google Scholar
Wu T, Hu R, Wang X, Ke S, Wang J (2017) High quality audio object coding framework based on non-negative matrix factorization. China Commun 14(9):32–41
Article Google Scholar
Wu T, Hu R, Wang X, Ke S (2019) Audio object coding based on optimal parameter frequency resolution. Multimedia Tools Appl 78(15):20723–20738
Article Google Scholar
Wu Y, Hu R, Hu C, Ke S, Li G, Wang X (2021a) Low bitrates audio object coding using convolutional auto-encoder and densenet mixture model. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6
Wu Y, Hu R, Wang X, Hu C, Li G (2021b) Stacked sparse autoencoder for audio object coding. In: International Conference on Multimedia Modeling (MMM), pp 50–61
Yang F, Herranz L, Cheng Y, Mozerov MG (2021) Slimmable compressive autoencoders for practical neural image compression. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4998–5007
Yang Z, Jia M, Bao C, Wang W (2015a) An analysis-by-synthesis encoding approach for multiple audio objects. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp 59–62
Yang Z, Jia M, Wang W, Zhang J (2015b) Multi-stage encoding scheme for multiple audio objects using compressed sensing. Cybern Information Technol 15(6):135–146
Article MathSciNet Google Scholar
Yu M, Quan T, Peng Q, Yu X, Liu L (2021) A model-based collaborate filtering algorithm based on stacked AutoEncoder. Neural Comput Appl. https://doi.org/10.1007/s00521-021-05933-8
Zhang Q, Zhou J, Zhang B (2020) A noninvasive method to detect diabetes mellitus and lung cancer using the stacked sparse autoencoder. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1409–1413
Zhang S, Wu X, Qu T (2019) Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society (AES) Convention 146
Zheng X, Ritz C, Xi J (2013) Encoding navigable speech sources: a psychoacoustic-based analysis-by-synthesis approach. IEEE Transactions Audio Speech Lang Process 21(1):29–38
Article Google Scholar
Zheng X, Ritz C, Xi J (2013b) A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 281–285

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (No. 2017YFB1002803), the National Nature Science Foundation of China (No. 61801334, No. U1803262), and Basic Research Project of Science and Technology Plan of Shenzhen (JCYJ20170818143246278).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China
Yulin Wu, Ruimin Hu, Xiaochen Wang, Chenhao Hu & Shanfa Ke
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China
Yulin Wu & Ruimin Hu
Research Institute of Wuhan University in Shenzhen, Shenzhen, China
Xiaochen Wang

Authors

Yulin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chenhao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shanfa Ke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruimin Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Hu, R., Wang, X. et al. High Parameter Frequency Resolution Encoding Scheme for Spatial Audio Objects Using Stacked Sparse Autoencoder. Neural Process Lett 54, 817–833 (2022). https://doi.org/10.1007/s11063-021-10659-8

Download citation

Accepted: 06 October 2021
Published: 18 October 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11063-021-10659-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High Parameter Frequency Resolution Encoding Scheme for Spatial Audio Objects Using Stacked Sparse Autoencoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stacked Sparse Autoencoder for Audio Object Coding

Multi-step Coding Structure of Spatial Audio Object Coding

Audio object coding based on optimal parameter frequency resolution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

High Parameter Frequency Resolution Encoding Scheme for Spatial Audio Objects Using Stacked Sparse Autoencoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stacked Sparse Autoencoder for Audio Object Coding

Multi-step Coding Structure of Spatial Audio Object Coding

Audio object coding based on optimal parameter frequency resolution

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation