2024
pdf
bib
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
John Pavlopoulos
|
Thea Sommerschield
|
Yannis Assael
|
Shai Gordin
|
Kyunghyun Cho
|
Marco Passarotti
|
Rachele Sprugnoli
|
Yudong Liu
|
Bin Li
|
Adam Anderson
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
pdf
bib
abs
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
Zihao Deng
|
Yinghao Ma
|
Yudong Liu
|
Rongchen Guo
|
Ge Zhang
|
Wenhu Chen
|
Wenhao Huang
|
Emmanouil Benetos
Findings of the Association for Computational Linguistics: NAACL 2024
Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT (CITATION) with a frozen LLM, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from captions in the MusicCaps datasets, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones.
2023
pdf
bib
Proceedings of the Ancient Language Processing Workshop
Adam Anderson
|
Shai Gordin
|
Bin Li
|
Yudong Liu
|
Marco C. Passarotti
Proceedings of the Ancient Language Processing Workshop
pdf
bib
abs
Introducing an Open Source Library for Sumerian Text Analysis
Hansel Guzman-Soto
|
Yudong Liu
Proceedings of the Ancient Language Processing Workshop
The study of Sumerian texts often requires domain experts to examine a vast number of tables. However, the absence of user-friendly tools for this process poses challenges and consumes significant time. In addressing this issue, we introduce an open-source library that empowers domain experts with minimal technical expertise to automate manual and repetitive tasks using a no-code dashboard. Our library includes an information extraction module that enables the automatic extraction of names and relations based on the user-defined lists of name tags and relation types. By utilizing the tool to facilitate the creation of knowledge graphs which is a data representation method offering insights into the relationships among entities in the data, we demonstrate its practical application in the analysis of Sumerian texts.
2022
pdf
bib
abs
Few-shot Learning for Sumerian Named Entity Recognition
Guanghai Wang
|
Yudong Liu
|
James Hearne
Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing
This paper presents our study in exploring the task of named entity recognition (NER) in a low resource setting, focusing on few-shot learning on the Sumerian NER task. The Sumerian language is deemed as an extremely low-resource language due to that (1) it is a long dead language, (2) highly skilled language experts are extremely scarce. NER on Sumerian text is important in that it helps identify the actors and entities active in a given period of time from the collections of tens of thousands of texts in building socio-economic networks of the archives of interest. As a text classification task, NER tends to become challenging when the amount of annotated data is limited or the model is required to handle new classes. The Sumerian NER is no exception. In this work, we propose to use two few-shot learning systems, ProtoBERT and NNShot, to the Sumerian NER task. Our experiments show that the ProtoBERT NER generally outperforms both the NNShot NER and the fully supervised BERT NER in low resource settings on the predictions of rare classes. In particular, F1-score of ProtoBERT on unseen entity types on our test set has achieved 89.6% that is significantly better than the F1-score of 84.3% of the BERT NER.
pdf
bib
abs
Generating Descriptive and Rules-Adhering Spells for Dungeons & Dragons Fifth Edition
Pax Newman
|
Yudong Liu
Proceedings of the 9th Workshop on Games and Natural Language Processing within the 13th Language Resources and Evaluation Conference
We examine the task of generating unique content for the spell system of the tabletop roleplaying game Dungeons and Dragons Fifth Edition using several generative language models. Due to the descriptive nature of the game Dungeons and Dragons Fifth Edition, it presents a number of interesting avenues for generation and analysis of text. In particular, the “spell” system of the game has interesting and unique characteristics as it is primarily made up of high level and descriptive text but has many of the game’s main rules embedded with that text. Thus, we examine the capabilities of several models on the task of generating new content for this game, evaluating the performance through the use of both score-based methods and a survey on the best performing model to determine how the generated content conforms to the rules of the game and how well they might be used in the game.
2015
pdf
bib
Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition
Yudong Liu
|
Clinton Burkhart
|
James Hearne
|
Liang Luo
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2009
pdf
bib
Exploration of the LTAG-Spinal Formalism and Treebank for Semantic Role Labeling
Yudong Liu
|
Anoop Sarkar
Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks (GEAF 2009)
2007
pdf
bib
Exploiting Rich Syntactic Information for Relationship Extraction from Biomedical Articles
Yudong Liu
|
Zhongmin Shi
|
Anoop Sarkar
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
pdf
bib
Experimental Evaluation of LTAG-Based Features for Semantic Role Labeling
Yudong Liu
|
Anoop Sarkar
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
2006
pdf
bib
Using LTAG-Based Features for Semantic Role Labeling
Yudong Liu
|
Anoop Sarkar
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms