Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3548084acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Structure-Enhanced Pop Music Generation via Harmony-Aware Learning

Published: 10 October 2022 Publication History

Abstract

Pop music generation has always been an attractive topic for both musicians and scientists for a long time. However, automatically composing pop music with a satisfactory structure is still a challenging issue. In this paper, we propose to leverage harmony-aware learning for structure-enhanced pop music generation. On the one hand, one of the participants of harmony, chord, represents the harmonic set of multiple notes, which is integrated closely with the spatial structure of music, the texture. On the other hand, the other participant of harmony, chord progression, usually accompanies the development of the music, which promotes the temporal structure of music, the form. Moreover, when chords evolve into chord progression, the texture and form can be bridged by the harmony naturally, which contributes to the joint learning of the two structures. Furthermore, we propose the Harmony-Aware Hierarchical Music Transformer (HAT), which can exploit the structure adaptively from the music, and make the musical tokens interact hierarchically to enhance the structure in multi-level musical elements. Experimental results reveal that compared to the existing methods, HAT owns a much better understanding of the structure and it can also improve the quality of generated music, especially in the form and texture.

Supplementary Material

M4V File (MM22-fp1383.m4v)
Pop music generation has always been an attractive topic for both musicians and scientists for a long time. However, automatically composing pop music with a satisfactory structure is still a challenging issue. In this paper, we propose to leverage harmony-aware learning for structure-enhanced pop music generation. On the one hand, one of the participants of harmony, chord, represents the harmonic set of multiple notes, which is integrated closely with the spatial structure of music, the texture. On the other hand, the other participant of harmony, chord progression, usually accompanies the development of the music, which promotes the temporal structure of music, the form. Moreover, when chords evolve into chord progression, the texture and form can be bridged by harmony naturally, which contributes to the joint learning of the two structures.

References

[1]
Jean-Pierre Briot, Gaëtan Hadjeres, and Francois-David Pachet. 2020. Deep Learning Techniques for Music Generation. Springer.
[2]
Michael J. Bruderer, Martin F. McKinney, and Armin Kohlrausch. 2006. Structural boundary perception in popular music. In ISMIR 2006, 7th International Conference on Music Information Retrieval. 198--201.
[3]
Filippo Carnovalini and Antonio Rodà. 2020. Computational Creativity and Music Generation Systems: An Introduction to the State of the Art. Frontiers Artif. Intell., Vol. 3 (2020), 14.
[4]
Ke Chen, Weilin Zhang, Shlomo Dubnov, Gus Xia, and Wei Li. 2019. The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation. In 2019 International Workshop on Multilayer Music Representation and Processing (MMRP). IEEE Computer Society, 77--84.
[5]
David Cope. 1987. An Expert System for Computer-Assisted Composition. Computer Music Journal, Vol. 11, 4 (1987), 30--46.
[6]
Shuqi Dai, Zeyu Jin, Celso Gomes, and Roger B. Dannenberg. 2021. Controllable deep melody generation via hierarchical music structure representation. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021. 143--150.
[7]
Shuqi Dai, Huan Zhang, and Roger B Dannenberg. 2020. Automatic analysis and influence of hierarchical structure on melody, rhythm and harmony in popular music. In Proc. of the 2020 Joint Conference on AI Music Creativity, CSMC-MuMe 2020.
[8]
George Grove. 1883. A Dictionary of Music and Musicians. Vol. 3. Macmillan.
[9]
Dorien Herremans, Ching-Hua Chuan, and Elaine Chew. 2017. A Functional Taxonomy of Music Generation Systems. ACM Comput. Surv., Vol. 50, 5 (2017), 69:1--69:30.
[10]
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curious Case of Neural Text Degeneration. In 8th International Conference on Learning Representations, ICLR 2020.
[11]
Dominik Hörnel and Wolfram Menzel. 1998. Learning Musical Structure and Style with Neural Networks. Computer Music Journal, Vol. 22, 4 (1998), 44--62.
[12]
Wen-Yi Hsiao, Jen-Yu Liu, Yin-Cheng Yeh, and Yi-Hsuan Yang. 2021. Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021. 178--186.
[13]
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. 2019. Music Transformer: Generating Music with Long-Term Structure. In 7th International Conference on Learning Representations, ICLR 2019.
[14]
Yu-Siang Huang and Yi-Hsuan Yang. 2020. Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions. In MM '20: The 28th ACM International Conference on Multimedia. 1180--1188.
[15]
Junyan Jiang, Gus Xia, Dave B. Carlton, Chris N. Anderson, and Ryan H. Miyakawa. 2020. Transformer VAE: A Hierarchical Model for Structure-Aware and Interpretable Music Representation Learning. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020. 516--520.
[16]
A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret. 2020. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. In Proceedings of the International Conference on Machine Learning (ICML).
[17]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015.
[18]
Gabriele Medeot, Srikanth Cherla, Katerina Kosta, Matt McVicar, Samer Abdallah, Marco Selvi, Ed Newton-Rex, and Kevin Webster. 2018. StructureNet: Inducing Structure in Generated Melodies. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2019. 725--731.
[19]
Meinard Müller. 2015. Fundamentals of Music Processing - Audio, Analysis, Algorithms, Applications. Springer.
[20]
Meinard Müller and Nanzhu Jiang. 2012. A Scape Plot Representation for Visualizing Repetitive Structures of Music Recordings. In Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012. 97--102.
[21]
Tim O'Brien. 2016. Musical Structure Segmentation with Convolutional Neural Networks. In Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016.
[22]
Sageev Oore, Ian Simon, Sander Dieleman, Douglas Eck, and Karen Simonyan. 2020. This time with feeling: learning expressive musical performance. Neural Comput. Appl., Vol. 32, 4 (2020), 955--967.
[23]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019. 8024--8035.
[24]
Don Michael Randel. 1999. The Harvard Concise Dictionary of Music and Musicians. Harvard University Press.
[25]
Adam Roberts, Jesse H. Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. 2018. A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music. In Proceedings of the 35th International Conference on Machine Learning, ICML 2019. 4361--4370.
[26]
Peter M Todd. 1989. A Connectionist Approach to Algorithmic Composition. Computer Music Journal, Vol. 13, 4 (1989), 27--43.
[27]
Takao Umemoto. 1990. The Psychological Structure of Music. Music Perception, Vol. 8, 2 (1990), 115--127.
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, NIPS 2017. 5998--6008.
[29]
Ziyu Wang, Ke Chen, Junyan Jiang, Yiyi Zhang, Maoran Xu, Shuqi Dai, and Gus Xia. 2020a. POP909: A Pop-Song Dataset for Music Arrangement Generation. In Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020. 38--45.
[30]
Ziyu Wang, Dingsu Wang, Yixiao Zhang, and Gus Xia. 2020b. Learning Interpretable Representation for Controllable Polyphonic Music Generation. In Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020. 662--669.
[31]
Ziyu Wang and Gus Xia. 2018. A Framework for Automated Pop-song Melody Generation with Piano Accompaniment Arrangement. arXiv preprint arXiv:1812.10906 (2018).
[32]
Ziyu Wang, Yiyi Zhang, Yixiao Zhang, Junyan Jiang, Ruihan Yang, Gus Xia, and Junbo Zhao. 2020c. PIANOTREE VAE: Structured Representation Learning for Polyphonic Music. In Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020. 368--375.
[33]
Wikipedia contributors. 2022a. Homophony -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Homophony. [Online].
[34]
Wikipedia contributors. 2022b. Musical form -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Musical_form. [Online].
[35]
Wikipedia contributors. 2022c. Post-chorus -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Post-chorus. [Online].
[36]
Wikipedia contributors. 2022d. Texture (music) -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Texture_(music). [Online].
[37]
Jian Wu, Xiaoguang Liu, Xiaolin Hu, and Jun Zhu. 2020. PopMNet: Generating structured pop music melodies using neural networks. Artif. Intell., Vol. 286 (2020), 103303.
[38]
Shih-Lun Wu and Yi-Hsuan Yang. 2020. The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-composed Music through Quantitative Measures. In Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020. 142--149.
[39]
Shih-Lun Wu and Yi-Hsuan Yang. 2021. MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just One Transformer VAE. arXiv preprint arXiv:2105.04090 (2021).
[40]
Yichao Zhou, Wei Chu, Sam Young, and Xin Chen. 2019. BandNet: A Neural Network-based, Multi-Instrument Beatles-Style MIDI Music Composition Machine. In Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019. 655--662.
[41]
Hongyuan Zhu, Qi Liu, Nicholas Jing Yuan, Chuan Qin, Jiawei Li, Kun Zhang, Guang Zhou, Furu Wei, Yuanchun Xu, and Enhong Chen. 2018. XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018. 2837--2846.

Cited By

View all
  • (2025)Small Tunes Transformer: Exploring Macro and Micro-level Hierarchies for Skeleton-Conditioned Melody GenerationMultiMedia Modeling10.1007/978-981-96-2071-5_3(30-43)Online publication date: 2-Jan-2025
  • (2024)Style-conditioned music generation with Transformer-GANs基于Transformer-GANs生成有风格调节的音乐Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230035925:1(106-120)Online publication date: 8-Feb-2024
  • (2024)TAS: Personalized Text-guided Audio SpatializationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681626(9029-9037)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. Structure-Enhanced Pop Music Generation via Harmony-Aware Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithmic composition
    2. hierarchy
    3. music generation
    4. structure
    5. transformer

    Qualifiers

    • Research-article

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)122
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Small Tunes Transformer: Exploring Macro and Micro-level Hierarchies for Skeleton-Conditioned Melody GenerationMultiMedia Modeling10.1007/978-981-96-2071-5_3(30-43)Online publication date: 2-Jan-2025
    • (2024)Style-conditioned music generation with Transformer-GANs基于Transformer-GANs生成有风格调节的音乐Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230035925:1(106-120)Online publication date: 8-Feb-2024
    • (2024)TAS: Personalized Text-guided Audio SpatializationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681626(9029-9037)Online publication date: 28-Oct-2024
    • (2024)Continuous Emotion-Based Image-to-Music GenerationIEEE Transactions on Multimedia10.1109/TMM.2023.333808926(5670-5679)Online publication date: 2024
    • (2024)HRPE: Hierarchical Relative Positional Encoding for Transformer-Based Structured Symbolic Music GenerationMusic Intelligence10.1007/978-981-97-0576-4_9(122-134)Online publication date: 4-Feb-2024
    • (2023)A Survey on Deep Learning for Symbolic Music Generation: Representations, Algorithms, Evaluations, and ChallengesACM Computing Surveys10.1145/359749356:1(1-39)Online publication date: 25-Aug-2023
    • (2023)An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music RecommendationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612140(6938-6947)Online publication date: 26-Oct-2023
    • (2023)The Beauty of Repetition: An Algorithmic Composition Model With Motif-Level Repetition Generator and Outline-to-Music Generator in Symbolic Music GenerationIEEE Transactions on Multimedia10.1109/TMM.2023.332149526(4320-4333)Online publication date: 2-Oct-2023
    • (2023)Video Background Music Generation: Dataset, Method and Evaluation2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01433(15591-15601)Online publication date: 1-Oct-2023
    • (2023)Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage ApproachICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095098(1-5)Online publication date: 4-Jun-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media