2024
pdf
bib
abs
Team QUST at SemEval-2024 Task 8: A Comprehensive Study of Monolingual and Multilingual Approaches for Detecting AI-generated Text
Xiaoman Xu
|
Xiangrun Li
|
Taihang Wang
|
Jianxiang Tian
|
Ye Jiang
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper presents the participation of team QUST in Task 8 SemEval 2024. we first performed data augmentation and cleaning on the dataset to enhance model training efficiency and accuracy. In the monolingual task, we evaluated traditional deep-learning methods, multiscale positive-unlabeled framework (MPU), fine-tuning, adapters and ensemble methods. Then, we selected the top-performing models based on their accuracy from the monolingual models and evaluated them in subtasks A and B. The final model construction employed a stacking ensemble that combined fine-tuning with MPU. Our system achieved 6th (scored 6th in terms of accuracy, officially ranked 13th in order) place in the official test set in multilingual settings of subtask A. We release our system code at:https://github.com/warmth27/SemEval2024_QUST
2023
pdf
bib
abs
Team QUST at SemEval-2023 Task 3: A Comprehensive Study of Monolingual and Multilingual Approaches for Detecting Online News Genre, Framing and Persuasion Techniques
Ye Jiang
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes the participation of team QUST in the SemEval2023 task3. The monolingual models are first evaluated with the under-sampling of the majority classes in the early stage of the task. Then, the pre-trained multilingual model is fine-tuned with a combination of the class weights and the sample weights. Two different fine-tuning strategies, the task-agnostic and the task-dependent, are further investigated. All experiments are conducted under the 10-fold cross-validation, the multilingual approaches are superior to the monolingual ones. The submitted system achieves the second best in Italian and Spanish (zero-shot) in subtask-1.
pdf
bib
abs
Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic
Ye Jiang
|
Xingyi Song
|
Carolina Scarton
|
Iknoor Singh
|
Ahmet Aker
|
Kalina Bontcheva
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
The spread of COVID-19 misinformation on social media became a major challenge for citizens, with negative real-life consequences. Prior research focused on detection and/or analysis of COVID-19 misinformation. However, fine-grained classification of misinformation claims has been largely overlooked. The novel contribution of this paper is in introducing a new dataset which makes fine-grained distinctions between statements that assert, comment or question on false COVID-19 claims. This new dataset not only enables social behaviour analysis but also enables us to address both evidence-based and non-evidence-based misinformation classification tasks. Lastly, through leave claim out cross-validation, we demonstrate that classifier performance on unseen COVID-19 misinformation claims is significantly different, as compared to performance on topics present in the training data.
2019
pdf
bib
abs
Team Bertha von Suttner at SemEval-2019 Task 4: Hyperpartisan News Detection using ELMo Sentence Representation Convolutional Network
Ye Jiang
|
Johann Petrak
|
Xingyi Song
|
Kalina Bontcheva
|
Diana Maynard
Proceedings of the 13th International Workshop on Semantic Evaluation
This paper describes the participation of team “bertha-von-suttner” in the SemEval2019 task 4 Hyperpartisan News Detection task. Our system uses sentence representations from averaged word embeddings generated from the pre-trained ELMo model with Convolutional Neural Networks and Batch Normalization for predicting hyperpartisan news. The final predictions were generated from the averaged predictions of an ensemble of models. With this architecture, our system ranked in first place, based on accuracy, the official scoring metric.
2017
pdf
bib
abs
Comparing Attitudes to Climate Change in the Media using sentiment analysis based on Latent Dirichlet Allocation
Ye Jiang
|
Xingyi Song
|
Jackie Harrison
|
Shaun Quegan
|
Diana Maynard
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
News media typically present biased accounts of news stories, and different publications present different angles on the same event. In this research, we investigate how different publications differ in their approach to stories about climate change, by examining the sentiment and topics presented. To understand these attitudes, we find sentiment targets by combining Latent Dirichlet Allocation (LDA) with SentiWordNet, a general sentiment lexicon. Using LDA, we generate topics containing keywords which represent the sentiment targets, and then annotate the data using SentiWordNet before regrouping the articles based on topic similarity. Preliminary analysis identifies clearly different attitudes on the same issue presented in different news sources. Ongoing work is investigating how systematic these attitudes are between different publications, and how these may change over time.