extended-abstract

Free access

LyricLure: Mining Catchy Hooks in Song Lyrics to Enhance Music Discovery and Recommendation

Authors:

Siddharth Sharma,

Ajinkya Walimbe,

Joaquin DelgadoAuthors Info & Claims

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

Pages 800 - 802

https://doi.org/10.1145/3640457.3688049

Published: 08 October 2024 Publication History

All formats PDF

Abstract

Music Search encounters a significant challenge as users increasingly rely on catchy lines from lyrics to search for both new releases and other popular songs. Integrating lyrics into existing lexical search index or using lyrics vector index pose difficulties due to lyrics text length. While lexical scoring mechanisms like BM25 are inadequate and necessitates complex query planning and index schema for long text, text embedding similarity based techniques often retrieve noisy near-similar meaning lyrics, resulting in low precision. This paper introduces a proactive approach to extract catchy phrases from song lyrics, overcoming the limitations of conventional graph-based phrase extractors and deep learning models, which are primarily designed for extractive summarization or task-specific key phrase extraction from domain-specific corpora. Additionally, we employ a multi-step mechanism to mine search query logs for potential unresolved user queries containing catchy phrases from lyrics. This involves creation of word and character k-gram index for lyric chunks, careful query and lyrics domain-centric normalization (and expansion) and a re-ranking layer incorporating lexical and well as semantic similarity. Together these strategies helped us create a high retrieval source specifically for serving lyrics intent queries with high recall.

References

[1]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly Media, Inc.https://www.nltk.org/book/ch07.html

Digital Library

[2]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747

[3]

Qingxiu Dong, Jingfeng Zhu, Zhenhui Zhang, Ying Xia, Ting Xiao, and Chengqing Zong. 2023. A Survey on In-context Learning. arXiv preprint arXiv:2301.00234 (2023).

[4]

Jaliya Ekanayake, Huizhi Liang, Bing Lin, Anand Sivasubramaniam, Gautam Das, and Divyakant Agrawal. 2008. Indexing Heterogeneous XML with Prufer Sequences. In 2008 IEEE 24th International Conference on Data Engineering. IEEE, 126–135.

[5]

Jaliya Ekanayake, Huizhi Liang, Anand Sivasubramaniam, Gautam Das, and Divyakant Agrawal. 2010. Efficient Indexing of Heterogeneous XML Data. In Proceedings of the VLDB Endowment, Vol. 3. VLDB Endowment, 1072–1083.

[6]

Jaliya Ekanayake, Huizhi Liang, Anand Sivasubramaniam, Gautam Das, and Divyakant Agrawal. 2010. Indexing Heterogeneous XML Data with Relational Databases. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). IEEE, 1–12.

[7]

M. Fell and C. Sporleder. 2024. Song lyrics have become simpler and more repetitive over the last five decades. Scientific Reports 14 (2024), 12345. https://doi.org/10.1038/s41598-024-55742-x

[8]

C. Florescu and C. Caragea. 2017. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1105–1115. https://doi.org/10.18653/v1/P17-1102

[9]

Sujatha Gupta, Amir Pouran Ben Veyseh, Franck Dernoncourt, Thuy Nguyen, Antonis Anastasopoulos, and Jing Gao. 2021. Learning Rich Representation of Keyphrases from Text. arXiv preprint arXiv:2112.08547 (2021).

[10]

Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics 8 (2020), 64–77. https://doi.org/10.1162/tacl_a_00300

[11]

Greg Kamradt. 2023. Semantic Chunking for LLM Applications. arXiv preprint arXiv:2303.17770 (2023).

[12]

Vladimir Karpukhin, Sewon Min, Lyle Wu, Oyvind Tafjord, Karen Hambardzumyan, Joe West, Jason Yosinski, and Marc’Aurelio Ranzato. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2643–2653.

[13]

Omar Khattab and Matei Zaharia. 2020. Colibri: Lightweight automated machine learning for embedding-based retrieval. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1289–1304.

[14]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).

[15]

Apache Lucene. 2008. Lucene’s Practical Scoring Function. https://lucene.apache.org/core/8_7_0/scoring.html. In Apache Lucene - Scoring.

[16]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. https://nlp.stanford.edu/IR-book/html/htmledition/k-gram-indexes-for-wildcard-queries-1.html

[17]

R. Mihalcea and P. Tarau. 2004. TextRank: Bringing Order into Texts. In Proceedings of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing (Barcelonaand Spain).

[18]

Luan Moura, Emanuel Fontelles, Vinicius Sampaio, and Mardônio França. 2020. Music Dataset: Lyrics and Metadata from 1950 to 2019. https://doi.org/10.17632/3t9vbwxgr5.2

[19]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982–3992. https://doi.org/10.18653/v1/D19-1410

[20]

S. Rose, D. Engel, N. Cramer, and W. Cowley. 2010. Automatic Keyword Extraction from Individual Documents. John Wiley and Sons, Ltd, 1–20. https://doi.org/10.1002/9780470689646.ch1

[21]

Progress Software. 2019. Troubleshooting Lucene Search Issues. https://www.progress.com/blogs/troubleshooting-lucene-search-issues.

[22]

Andrew Trotman, Antti Puurula, and Elaine Ruckhaus. 2020. Which BM25 do you mean? A large-scale reproducibility study of scoring variants. European Conference on Information Retrieval (2020), 595–611.

[23]

Xingxing Zhang, Furu Ren, Yue Luan, Guorui Zhu, Hao Zhou, Xiaoqiang Zeng, Pengtao Xie, and Maosong Sun. 2020. Revisiting embedding based text retrieval with hierarchical segmentation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6214–6224.

Index Terms

LyricLure: Mining Catchy Hooks in Song Lyrics to Enhance Music Discovery and Recommendation
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Mining Opinion Polarity from Multilingual Song Lyrics
WLSI 2015: Revised Selected Papers of the Second International Workshop on Worldwide Language Service Infrastructure - Volume 9442

Song opinion is an important criterion when people organize and access songs. The ever growing amount of song data in the Web, which includes multilingual songs, calls for the development of automatic tools in classifying songs by opinion polarity. Sony ...
Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning
Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes in music. Lyrics and chords are generally essential information in music, i.e. unaccompanied singing vocals mixed with instrumental music, representing important ...
Music/lyrics composition system considering user's image and music genre
SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

This paper proposes a music/lyrics composition system consisting of two sections, a lyric composing section and a music composing section, which considers user's image of a song and music genre. First of all, a user has an image of music/lyrics to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

October 2024

1438 pages

ISBN:9798400705052

DOI:10.1145/3640457

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Check for updates

Author Tags

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

RecSys '24

Sponsor:

RecSys '24: 18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
21
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)21

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents