Document structure-driven investigative information retrieval
Data-driven investigations are increasingly dealing with non-moderated, non-standard and even manipulated information Whether the field in question is journalism, law enforcement, or insurance fraud it is becoming more and more difficult for ...
Highlights
- Investigative information retrieval is a sub-task of exploratory search with more emphasis on transparency and reasoning. It can be used to complement black-box methods with regards to verification and transparency.
- Relevance ...
An empirical evaluation of unsupervised event log abstraction techniques in process mining
These days, businesses keep track of more and more data in their information systems. Moreover, this data becomes more fine-grained than ever, tracking clicks and mutations in databases at the lowest level possible. Faced with such data, process ...
Worker similarity-based noise correction for crowdsourcing
Crowdsourcing offers a cost-effective way to obtain multiple noisy labels for each instance by employing multiple crowd workers. Then label integration is used to infer its integrated label. Despite the effectiveness of label integration ...
A novel self-supervised graph model based on counterfactual learning for diversified recommendation
- A new method is proposed for diversified recommendation.
- The method considers the influence of imbalanced data distribution on diversity.
- An enhanced negative sampling strategy is designed.
- Self-supervised auxiliary task is ...
Consumers’ needs present a trend of diversification, which causes the emergence of diversified recommendation systems. However, existing diversified recommendation research mostly focuses on objective function construction rather than on the root ...
Validation set sampling strategies for predictive process monitoring
Previous studies investigating the efficacy of long short-term memory (LSTM) recurrent neural networks in predictive process monitoring and their ability to capture the underlying process structure have raised concerns about their limited ability ...
Attention-based multi attribute matrix factorization for enhanced recommendation performance
In E-commerce platforms, auxiliary information containing several attributes (e.g., price, quality, and brand) can improve recommendation performance. However, previous studies used a simple combined embedding approach that did not consider the ...
Highlights
- Various types of auxiliary information are utilized for user-item attentive interaction learning.
- Self-attention mechanism is used to consider the importance of attributes in auxiliary information.
- Experimental results show that ...
Heterogeneous graph neural networks for fraud detection and explanation in supply chain finance
It is a critical mission for financial service providers to discover fraudulent borrowers in a supply chain. The borrowers’ transactions in an ongoing business are inspected to support the providers’ decision on whether to lend the money. ...
Highlights
- Financial fraud is identified based on multiple views in supply chain finance.
- A multitask learning framework with heterogeneous GNN is proposed to identify frauds.
- Comprehensive explanations are provided on multiple heterogeneous ...
An efficient visual exploration approach of geospatial vector big data on the web map
The visual exploration of geospatial vector data has become an increasingly important part of the management and analysis of geospatial vector big data (GVBD). With the rapid growth of data scale, it is difficult to realize efficient visual ...
Highlights
- A Pixel-Quad-R-tree structure is designed to support the efficient visual exploration of geospatial vector big data (GVBD), the index can adapt to the computational requirements in the subsequent tile drawing process and provide efficient ...
Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers
With the advent of the modern pre-trained Transformers, the text preprocessing has started to be neglected and not specifically addressed in recent NLP literature. However, both from a linguistic and from a computer science point of view, we ...
Highlights
- The text preprocessing techniques available in the literature are discussed.
- The impact of the three most common techniques on SOTA models is evaluated.
- Text preprocessing can significantly affect the performance of Transformers.
LSPC: Exploring contrastive clustering based on local semantic information and prototype
Recently years, several prominent contrastive learning algorithms, a kind of self-supervised learning methods, have been extensively studied that can efficiently extract useful feature representations from input images by means of data ...
Foundations and practice of binary process discovery
Most contemporary process discovery methods take as inputs only positive examples of process executions, and so they are one-class classification algorithms. However, we have found negative examples to also be available in industry, hence we ...
Highlights
- We formalise process discovery as a binary classification problem.
- We automatically verified this formalisation using the Isabelle proof assistant.
- We propose the notation-agnostic binary Rejection Miner discovery algorithm.
- We ...
HubHSP graph: Capturing local geometrical and statistical data properties via spanning graphs
The computation of a continuous generative model to describe a finite sample of an infinite metric space can prove challenging and lead to erroneous hypothesis, particularly in high-dimensional spaces. In this paper, we follow a different route ...
On tuning parameters guiding similarity computations in a data deduplication pipeline for customers records
Data stored in information systems are often erroneous. Duplicate data are one of the typical error type. To discover and handle duplicates, the so-called deduplication methods are applied. They are complex and time costly algorithms. In data ...
Explaining cube measures through Intentional Analytics
The Intentional Analytics Model (IAM) has been devised to couple OLAP and analytics by (i) letting users express their analysis intentions on multidimensional data cubes and (ii) returning enhanced cubes, i.e., multidimensional data annotated ...
A screenshot-based task mining framework for disclosing the drivers behind variable human actions
Robotic Process Automation (RPA) enables subject matter experts to use the graphical user interface as a means to automate and integrate systems. This is a fast method to automate repetitive, mundane tasks. To avoid constructing a software robot ...
Highlights
- A task mining framework that enhances UI Logs with screenshot-derived features.
- Employs decision trees to extract decision drivers from UI.
- Key UI elements detected by the decision tree are highlighted as visual feedback.
- ...