Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 61 results for author: Lioma, C

.
  1. arXiv:2407.17023  [pdf, other

    cs.CL cs.AI

    From Internal Conflict to Contextual Adaptation of Language Models

    Authors: Sara Vera Marjanović, Haeun Yu, Pepa Atanasova, Maria Maistro, Christina Lioma, Isabelle Augenstein

    Abstract: Knowledge-intensive language understanding tasks require Language Models (LMs) to integrate relevant context, mitigating their inherent weaknesses, such as incomplete or outdated knowledge. Nevertheless, studies indicate that LMs often ignore the provided context as it can conflict with the pre-existing LM's memory learned during pre-training. Moreover, conflicting knowledge can already be present… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 22 pages, 15 figures

    MSC Class: 68T50 ACM Class: I.2.7

  2. Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and Relevance

    Authors: Theresia Veronika Rampisela, Tuukka Ruotsalo, Maria Maistro, Christina Lioma

    Abstract: Relevance and fairness are two major objectives of recommender systems (RSs). Recent work proposes measures of RS fairness that are either independent from relevance (fairness-only) or conditioned on relevance (joint measures). While fairness-only measures have been studied extensively, we look into whether joint measures can be trusted. We collect all joint evaluation measures of RS relevance and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGIR 2024 as full paper

  3. Recommending Target Actions Outside Sessions in the Data-poor Insurance Domain

    Authors: Simone Borg Bruun, Christina Lioma, Maria Maistro

    Abstract: Providing personalized recommendations for insurance products is particularly challenging due to the intrinsic and distinctive features of the insurance domain. First, unlike more traditional domains like retail, movie etc., a large amount of user feedback is not available and the item catalog is smaller. Second, due to the higher complexity of products, the majority of users still prefer to compl… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.15360

    Journal ref: ACM Transactions on Recommender Systems 2023

  4. arXiv:2402.15708  [pdf, other

    cs.CL cs.AI cs.IR

    Query Augmentation by Decoding Semantics from Brain Signals

    Authors: Ziyi Ye, Jingtao Zhan, Qingyao Ai, Yiqun Liu, Maarten de Rijke, Christina Lioma, Tuukka Ruotsalo

    Abstract: Query augmentation is a crucial technique for refining semantically imprecise queries. Traditionally, query augmentation relies on extracting information from initially retrieved, potentially relevant documents. If the quality of the initially retrieved documents is low, then the effectiveness of query augmentation would be limited as well. We propose Brain-Aug, which enhances a query by incorpora… ▽ More

    Submitted 3 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  5. arXiv:2402.13006  [pdf, other

    cs.LG cs.CL

    Investigating the Impact of Model Instability on Explanations and Uncertainty

    Authors: Sara Vera Marjanović, Isabelle Augenstein, Christina Lioma

    Abstract: Explainable AI methods facilitate the understanding of model behaviour, yet, small, imperceptible perturbations to inputs can vastly distort explanations. As these explanations are typically evaluated holistically, before model deployment, it is difficult to assess when a particular explanation is trustworthy. Some studies have tried to create confidence estimators for explanations, but none have… ▽ More

    Submitted 4 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  6. arXiv:2401.15061  [pdf, other

    cs.NE cs.ET physics.optics

    Digital-analog hybrid matrix multiplication processor for optical neural networks

    Authors: Xiansong Meng, Deming Kong, Kwangwoong Kim, Qiuchi Li, Po Dong, Ingemar J. Cox, Christina Lioma, Hao Hu

    Abstract: The computational demands of modern AI have spurred interest in optical neural networks (ONNs) which offer the potential benefits of increased speed and lower power consumption. However, current ONNs face various challenges,most significantly a limited calculation precision (typically around 4 bits) and the requirement for high-resolution signal format converters (digital-to-analogue conversions (… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  7. arXiv:2311.09889  [pdf, other

    cs.CL

    Language Generation from Brain Recordings

    Authors: Ziyi Ye, Qingyao Ai, Yiqun Liu, Maarten de Rijke, Min Zhang, Christina Lioma, Tuukka Ruotsalo

    Abstract: Generating human language through non-invasive brain-computer interfaces (BCIs) has the potential to unlock many applications, such as serving disabled patients and improving communication. Currently, however, generating language via BCIs has been previously successful only within a classification setup for selecting pre-generated sentence continuation candidates with the most likely cortical sema… ▽ More

    Submitted 11 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Preprint. Under Submission

  8. Evaluation Measures of Individual Item Fairness for Recommender Systems: A Critical Study

    Authors: Theresia Veronika Rampisela, Maria Maistro, Tuukka Ruotsalo, Christina Lioma

    Abstract: Fairness is an emerging and challenging topic in recommender systems. In recent years, various ways of evaluating and therefore improving fairness have emerged. In this study, we examine existing evaluation measures of fairness in recommender systems. Specifically, we focus solely on exposure-based fairness measures of individual items that aim to quantify the disparity in how individual items are… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to ACM Transactions on Recommender Systems (TORS)

  9. arXiv:2305.18029  [pdf, other

    cs.CL cs.AI

    Faithfulness Tests for Natural Language Explanations

    Authors: Pepa Atanasova, Oana-Maria Camburu, Christina Lioma, Thomas Lukasiewicz, Jakob Grue Simonsen, Isabelle Augenstein

    Abstract: Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural… ▽ More

    Submitted 30 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Short paper, ACL 2023

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)

  10. arXiv:2302.13812  [pdf, other

    quant-ph cs.CL

    Adapting Pre-trained Language Models for Quantum Natural Language Processing

    Authors: Qiuchi Li, Benyou Wang, Yudong Zhu, Christina Lioma, Qun Liu

    Abstract: The emerging classical-quantum transfer learning paradigm has brought a decent performance to quantum computational models in many tasks, such as computer vision, by enabling a combination of quantum models and classical pre-trained neural networks. However, using quantum computing with pre-trained models has yet to be explored in natural language processing (NLP). Due to the high linearity constr… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  11. Graph-based Recommendation for Sparse and Heterogeneous User Interactions

    Authors: Simone Borg Bruun, Kacper Kenji Lesniak, Mirko Biasini, Vittorio Carmignani, Panagiotis Filianos, Christina Lioma, Maria Maistro

    Abstract: Recommender system research has oftentimes focused on approaches that operate on large-scale datasets containing millions of user interactions. However, many small businesses struggle to apply state-of-the-art models due to their very limited availability of data. We propose a graph-based recommender model which utilizes heterogeneous interactions between users and content of different types and i… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  12. arXiv:2212.02885  [pdf, other

    cs.CL

    Template-based Recruitment Email Generation For Job Recommendation

    Authors: Qiuchi Li, Christina Lioma

    Abstract: Text generation has long been a popular research topic in NLP. However, the task of generating recruitment emails from recruiters to candidates in the job recommendation scenario has received little attention by the research community. This work aims at defining the topic of automatic email generation for job recommendation, identifying the challenges, and providing a baseline template-based solut… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted by GEM2022 workshop

  13. Principled Multi-Aspect Evaluation Measures of Rankings

    Authors: Maria Maistro, Lucas Chaves Lima, Jakob Grue Simonsen, Christina Lioma

    Abstract: Information Retrieval evaluation has traditionally focused on defining principled ways of assessing the relevance of a ranked list of documents with respect to a query. Several methods extend this type of evaluation beyond relevance, making it possible to evaluate different aspects of a document ranking (e.g., relevance, usefulness, or credibility) using a single measure (multi-aspect evaluation).… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  14. Learning Recommendations from User Actions in the Item-poor Insurance Domain

    Authors: Simone Borg Bruun, Maria Maistro, Christina Lioma

    Abstract: While personalised recommendations are successful in domains like retail, where large volumes of user feedback on items are available, the generation of automatic recommendations in data-sparse domains, like insurance purchasing, is an open problem. The insurance domain is notoriously data-sparse because the number of products is typically low (compared to retail) and they are usually purchased to… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  15. arXiv:2204.02007  [pdf, other

    cs.CL cs.LG

    Fact Checking with Insufficient Evidence

    Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

    Abstract: Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it wit… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 14 pages

    MSC Class: cs.CL

  16. arXiv:2109.03756  [pdf, other

    cs.LG

    Diagnostics-Guided Explanation Generation

    Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

    Abstract: Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    ACM Class: I.2.7

  17. Unsupervised Multi-Index Semantic Hashing

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster al… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Proceedings of the 2021 World Wide Web Conference, published under Creative Commons CC-BY 4.0 License

  18. Projected Hamming Dissimilarity for Bit-Level Importance Coding in Collaborative Filtering

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Christina Lioma

    Abstract: When reasoning about tasks that involve large amounts of data, a common approach is to represent data items as objects in the Hamming space where operations can be done efficiently and effectively. Object similarity can then be computed by learning binary representations (hash codes) of the objects and computing their Hamming distance. While this is highly efficient, each bit dimension is equally… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Proceedings of the 2021 World Wide Web Conference, published under Creative Commons CC-BY 4.0 License

  19. arXiv:2103.10572  [pdf, other

    cs.MM

    Quantum-inspired Multimodal Fusion for Video Sentiment Analysis

    Authors: Qiuchi Li, Dimitris Gkoumas, Christina Lioma, Massimo Melucci

    Abstract: We tackle the crucial challenge of fusing different modalities of features for multimodal sentiment analysis. Mainly based on neural networks, existing approaches largely model multimodal interactions in an implicit and hard-to-understand manner. We address this limitation with inspirations from quantum theory, which contains principled methods for modeling complicated interactions and correlation… ▽ More

    Submitted 22 March, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: Post-print accepted by Information Fusion

  20. arXiv:2012.12366  [pdf, other

    cs.CL

    Multi-Head Self-Attention with Role-Guided Masks

    Authors: Dongsheng Wang, Casper Hansen, Lucas Chaves Lima, Christian Hansen, Maria Maistro, Jakob Grue Simonsen, Christina Lioma

    Abstract: The state of the art in learning meaningful semantic representations of words is the Transformer model and its attention mechanisms. Simply put, the attention mechanisms learn to attend to specific parts of the input dispensing recurrence and convolutions. While some of the learned attention heads have been found to play linguistically interpretable roles, they can be redundant or prone to errors.… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: Accepted at ECIR@2021

  21. arXiv:2011.12684  [pdf, other

    cs.IR cs.LG

    Denmark's Participation in the Search Engine TREC COVID-19 Challenge: Lessons Learned about Searching for Precise Biomedical Scientific Information on COVID-19

    Authors: Lucas Chaves Lima, Casper Hansen, Christian Hansen, Dongsheng Wang, Maria Maistro, Birger Larsen, Jakob Grue Simonsen, Christina Lioma

    Abstract: This report describes the participation of two Danish universities, University of Copenhagen and Aalborg University, in the international search engine competition on COVID-19 (the 2020 TREC-COVID Challenge) organised by the U.S. National Institute of Standards and Technology (NIST) and its Text Retrieval Conference (TREC) division. The aim of the competition was to find the best search engine str… ▽ More

    Submitted 26 November, 2020; v1 submitted 25 November, 2020; originally announced November 2020.

  22. arXiv:2009.13295  [pdf, other

    cs.CL cs.LG

    A Diagnostic Study of Explainability Techniques for Text Classification

    Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

    Abstract: Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

    MSC Class: cs.CL; cs.AI ACM Class: I.2.7

  23. Unsupervised Semantic Hashing with Pairwise Reconstruction

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by t… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: Accepted at SIGIR'20

  24. Factuality Checking in News Headlines with Eye Tracking

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Birger Larsen, Stephen Alstrup, Christina Lioma

    Abstract: We study whether it is possible to infer if a news headline is true or false using only the movement of the human eyes when reading news headlines. Our study with 55 participants who are eye-tracked when reading 108 news headlines (72 true, 36 false) shows that false headlines receive statistically significantly less visual attention than true headlines. We further build an ensemble learner that p… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: Accepted to SIGIR 2020

  25. Content-aware Neural Hashing for Cold-start Recommendation

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Content-aware recommendation approaches are essential for providing meaningful recommendations for \textit{new} (i.e., \textit{cold-start}) items in a recommender system. We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF), which generates binary hash codes for users and items, such that the highly efficient Hamming distance can be used for estimating user… ▽ More

    Submitted 31 May, 2020; originally announced June 2020.

    Comments: Accepted to SIGIR 2020

  26. arXiv:2004.05773  [pdf, other

    cs.CL cs.AI cs.LG

    Generating Fact Checking Explanations

    Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

    Abstract: Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims. A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process -- generating justifications for verdicts on claims.… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: In Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics (ACL 2020)

  27. arXiv:1912.12333  [pdf, other

    cs.CL cs.IR

    Encoding word order in complex embeddings

    Authors: Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen

    Abstract: Sequential word order is important when processing text. Currently, neural networks (NNs) address this by modeling word position using position embeddings. The problem is that position embeddings capture the position of individual words, but not the ordered relationship (e.g., adjacency or precedence) between individual word positions. We present a novel and principled solution for modeling both t… ▽ More

    Submitted 28 June, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

    Comments: 15 pages, 3 figures, ICLR 2020 spotlight paper. A typo on Ablation Table was revised thanks to Jingquan Zeng from SCUT

  28. arXiv:1909.06856  [pdf, other

    cs.CY

    Modelling End-of-Session Actions in Educational Systems

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Christina Lioma

    Abstract: In this paper we consider the problem of modelling when students end their session in an online mathematics educational system. Being able to model this accurately will help us optimize the way content is presented and consumed. This is done by modelling the probability of an action being the last in a session, which we denote as the End-of-Session probability. We use log data from a system where… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: In proceedings of EDM 2019

  29. arXiv:1909.03242  [pdf, other

    cs.CL cs.IR cs.LG stat.ML

    MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

    Authors: Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, Jakob Grue Simonsen

    Abstract: We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists. We present an in-depth analysis of the dataset, highlighting characteristics and challenges. Furthe… ▽ More

    Submitted 21 October, 2019; v1 submitted 7 September, 2019; originally announced September 2019.

    Comments: Proceedings of EMNLP 2019, to appear

  30. arXiv:1906.00674  [pdf, other

    cs.IR cs.CL

    Contextually Propagated Term Weights for Document Representation

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a target word, redistributes part of that word's weight (that has been computed with word embeddings) across words occurring in similar contexts as the target wo… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: SIGIR 2019

  31. arXiv:1906.00671  [pdf, other

    cs.IR cs.CL cs.LG

    Unsupervised Neural Generative Semantic Hashing

    Authors: Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma

    Abstract: Fast similarity search is a key component in large-scale information retrieval, where semantic hashing has become a popular strategy for representing documents as binary hash codes. Recent advances in this area have been obtained through neural network based models: generative models trained by learning to reconstruct the original documents. We present a novel unsupervised generative semantic hash… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: SIGIR 2019

  32. arXiv:1904.00761  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Speed Reading with Structural-Jump-LSTM

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Recurrent neural networks (RNNs) can model natural language by sequentially 'reading' input tokens and outputting a distributed representation of each token. Due to the sequential nature of RNNs, inference time is linearly dependent on the input length, and all inputs are read regardless of their importance. Efforts to speed up this inference, known as 'neural speed reading', either ignore or skim… ▽ More

    Submitted 2 April, 2019; v1 submitted 20 March, 2019; originally announced April 2019.

    Comments: 10 pages

    Journal ref: 7th International Conference on Learning Representations (ICLR) 2019

  33. arXiv:1903.08408  [pdf, other

    cs.IR cs.LG

    Modelling Sequential Music Track Skips using a Multi-RNN Approach

    Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Modelling sequential music skips provides streaming companies the ability to better understand the needs of the user base, resulting in a better user experience by reducing the need to manually skip certain music tracks. This paper describes the solution of the University of Copenhagen DIKU-IR team in the 'Spotify Sequential Skip Prediction Challenge', where the task was to predict the skip behavi… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: 4 pages

    Journal ref: 12th ACM International Conference on Web Search and Data Mining (WSDM) 2019, WSDM Cup

  34. arXiv:1903.08404  [pdf, other

    cs.IR cs.CL cs.LG

    Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

    Abstract: Automatic fact-checking systems detect misinformation, such as fake news, by (i) selecting check-worthy sentences for fact-checking, (ii) gathering related information to the sentences, and (iii) inferring the factuality of the sentences. Most prior research on (i) uses hand-crafted features to select check-worthy sentences, and does not explicitly account for the recent finding that the top weigh… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: 6 pages

    Journal ref: In Companion Proceedings of the 2019 World Wide Web Conference

  35. arXiv:1903.08389  [pdf, other

    cs.CL cs.AI

    Contextual Compositionality Detection with External Knowledge Bases andWord Embeddings

    Authors: Dongsheng Wang, Quichi Li, Lucas Chaves Lima, Jakob grue Simonsen, Christina Lioma

    Abstract: When the meaning of a phrase cannot be inferred from the individual meanings of its words (e.g., hot dog), that phrase is said to be non-compositional. Automatic compositionality detection in multi-word phrases is critical in any application of semantic processing, such as search engines; failing to detect non-compositional phrases can hurt system effectiveness notably. Existing research treats ph… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: WWW '19 Companion, May 13-17, 2019, San Francisco, CA, USA

  36. Predicting antimicrobial drug consumption using web search data

    Authors: Niels Dalum Hansen, Kåre Mølbak, Ingemar Cox, Christina Lioma

    Abstract: Consumption of antimicrobial drugs, such as antibiotics, is linked with antimicrobial resistance. Surveillance of antimicrobial drug consumption is therefore an important element in dealing with antimicrobial resistance. Many countries lack sufficient surveillance systems. Usage of web mined data therefore has the potential to improve current surveillance methods. To this end, we study how well an… ▽ More

    Submitted 9 March, 2018; originally announced March 2018.

  37. arXiv:1802.06833  [pdf, other

    cs.IR

    Seasonal Web Search Query Selection for Influenza-Like Illness (ILI) Estimation

    Authors: Niels Dalum Hansen, Kåre Mølbak, Ingemar J. Cox, Christina Lioma

    Abstract: Influenza-like illness (ILI) estimation from web search data is an important web analytics task. The basic idea is to use the frequencies of queries in web search logs that are correlated with past ILI activity as features when estimating current ILI activity. It has been noted that since influenza is seasonal, this approach can lead to spurious correlations with features/queries that also exhibit… ▽ More

    Submitted 19 February, 2018; originally announced February 2018.

  38. arXiv:1802.02603  [pdf

    cs.IR

    To Phrase or Not to Phrase - Impact of User versus System Term Dependence Upon Retrieval

    Authors: Christina Lioma, Birger Larsen, Peter Ingwersen

    Abstract: When submitting queries to information retrieval (IR) systems, users often have the option of specifying which, if any, of the query terms are heavily dependent on each other and should be treated as a fixed phrase, for instance by placing them between quotes. In addition to such cases where users specify term dependence, automatic ways also exist for IR systems to detect dependent terms in querie… ▽ More

    Submitted 5 March, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

  39. arXiv:1709.03742  [pdf, other

    cs.IR cs.CL

    Dependencies: Formalising Semantic Catenae for Information Retrieval

    Authors: Christina Lioma

    Abstract: Building machines that can understand text like humans is an AI-complete problem. A great deal of research has already gone into this, with astounding results, allowing everyday people to discuss with their telephones, or have their reading materials analysed and classified by computers. A prerequisite for processing text semantics, common to the above examples, is having some computational repres… ▽ More

    Submitted 12 September, 2017; originally announced September 2017.

    Comments: This document is a doktordisputats - a dissertation within the Danish academic system required to obtain the degree of \textit{Doctor Scientiarum}, in form and function equivalent to the French and German Habilitation and the Higher Doctorate of the Commonwealth

  40. arXiv:1708.07157  [pdf, ps, other

    cs.IR

    Evaluation Measures for Relevance and Credibility in Ranked Lists

    Authors: Christina Lioma, Jakob Grue Simonsen, Birger Larsen

    Abstract: Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure c… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

  41. arXiv:1708.06403  [pdf, other

    cs.CY

    Smart City Analytics: Ensemble-Learned Prediction of Citizen Home Care

    Authors: Casper Hansen, Christian Hansen, Stephen Alstrup, Christina Lioma

    Abstract: We present an ensemble learning method that predicts large increases in the hours of home care received by citizens. The method is supervised, and uses different ensembles of either linear (logistic regression) or non-linear (random forests) classifiers. Experiments with data available from 2013 to 2017 for every citizen in Copenhagen receiving home care (27,775 citizens) show that prediction can… ▽ More

    Submitted 21 August, 2017; originally announced August 2017.

  42. arXiv:1708.04164  [pdf, other

    cs.CY cs.HC

    Sequence Modelling For Analysing Student Interaction with Educational Systems

    Authors: Christian Hansen, Casper Hansen, Niklas Hjuler, Stephen Alstrup, Christina Lioma

    Abstract: The analysis of log data generated by online educational systems is an important task for improving the systems, and furthering our knowledge of how students learn. This paper uses previously unseen log data from Edulab, the largest provider of digital learning for mathematics in Denmark, to analyse the sessions of its users, where 1.08 million student sessions are extracted from a subset of their… ▽ More

    Submitted 14 August, 2017; originally announced August 2017.

    Comments: The 10th International Conference on Educational Data Mining 2017

  43. arXiv:1704.01851  [pdf, ps, other

    cs.IR

    Fixed versus Dynamic Co-Occurrence Windows in TextRank Term Weights for Information Retrieval

    Authors: Wei Lu, Qikai Cheng, Christina Lioma

    Abstract: TextRank is a variant of PageRank typically used in graphs that represent documents, and where vertices denote terms and edges denote relations between terms. Quite often the relation between terms is simple term co-occurrence within a fixed window of k terms. The output of TextRank when applied iteratively is a score for each vertex, i.e. a term weight, that can be used for information retrieval… ▽ More

    Submitted 6 April, 2017; originally announced April 2017.

  44. arXiv:1704.01845  [pdf, ps, other

    cs.IR

    Report on TBAS 2012: Workshop on Task-Based and Aggregated Search

    Authors: Birger Larsen, Christina Lioma, Arjen de Vries

    Abstract: The ECIR half-day workshop on Task-Based and Aggregated Search (TBAS) was held in Barcelona, Spain on 1 April 2012. The program included a keynote talk by Professor Jarvelin, six full paper presentations, two poster presentations, and an interactive discussion among the approximately 25 participants. This report overviews the aims and contents of the workshop and outlines the major outcomes.

    Submitted 6 April, 2017; originally announced April 2017.

  45. arXiv:1704.01617  [pdf, ps, other

    cs.IR

    Part of Speech Based Term Weighting for Information Retrieval

    Authors: Christina Lioma, Roi Blanco

    Abstract: Automatic language processing tools typically assign to terms so-called weights corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  46. arXiv:1704.01610  [pdf, ps, other

    cs.IR

    A Subjective Logic Formalisation of the Principle of Polyrepresentation for Information Needs

    Authors: Christina Lioma, Birger Larsen, Hinrich Schütze, Peter Ingwersen

    Abstract: Interactive Information Retrieval refers to the branch of Information Retrieval that considers the retrieval process with respect to a wide range of contexts, which may affect the user's information seeking experience. The identification and representation of such contexts has been the object of the principle of Polyrepresentation, a theoretical framework for reasoning about different representati… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  47. arXiv:1704.01603  [pdf, ps, other

    cs.IR

    Preliminary Experiments using Subjective Logic for the Polyrepresentation of Information Needs

    Authors: Christina Lioma, Birger Larsen, Peter Ingwersen

    Abstract: According to the principle of polyrepresentation, retrieval accuracy may improve through the combination of multiple and diverse information object representations about e.g. the context of the user, the information sought, or the retrieval system. Recently, the principle of polyrepresentation was mathematically expressed using subjective logic, where the potential suitability of each representati… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  48. arXiv:1704.01599  [pdf, ps, other

    cs.IR cs.CL

    Rhetorical relations for information retrieval

    Authors: Christina Lioma, Birger Larsen, Wei Lu

    Abstract: Typically, every part in most coherent text has some plausible reason for its presence, some function that it performs to the overall semantics of the text. Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts of a text are linked to each other. Knowledge about this socalled discourse structure has been applied successfully to several natural language processing tasks. T… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  49. arXiv:1703.03640  [pdf, ps, other

    cs.CL

    A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

    Authors: Christina Lioma, Niels Dalum Hansen

    Abstract: Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is meaning-preserving, compositionality can be approximated as the semantic similarity between a phrase and a version of that phrase where words have been replaced by their s… ▽ More

    Submitted 10 March, 2017; originally announced March 2017.

  50. arXiv:1702.07326  [pdf, other

    cs.IR q-bio.QM stat.AP

    Time-Series Adaptive Estimation of Vaccination Uptake Using Web Search Queries

    Authors: Niels Dalum Hansen, Kåre Mølbak, Ingemar J. Cox, Christina Lioma

    Abstract: Estimating vaccination uptake is an integral part of ensuring public health. It was recently shown that vaccination uptake can be estimated automatically from web data, instead of slowly collected clinical records or population surveys. All prior work in this area assumes that features of vaccination uptake collected from the web are temporally regular. We present the first ever method to remove t… ▽ More

    Submitted 23 February, 2017; originally announced February 2017.