A Distributed Solution for Efficient <italic>K</italic> Shortest Paths Computation Over Dynamic Road Networks
The problem of identifying the <italic>k</italic>-shortest paths (KSPs for short) in a dynamic road network is essential to many location-based services. Road networks are dynamic in the sense that the weights of the edges in the corresponding graph ...
A Generic Schema Evolution Approach for NoSQL and Relational Databases
In the same way as with relational systems, schema evolution is a crucial aspect of NoSQL systems. But providing approaches and tools to support NoSQL schema evolution is more challenging than for relational databases. Not only are most NoSQL systems ...
A Neural Database for Answering Aggregate Queries on Incomplete Relational Data
Real-world datasets are often incomplete due to data collection cost, privacy considerations or as a side effect of data integration/preparation. We focus on answering aggregate queries on such datasets, where data incompleteness causes the answers to be ...
A Suite of Efficient Randomized Algorithms for Streaming Record Linkage
Organizations leverage massive volumes of information and new types of data to generate unprecedented insights and improve their outcomes. Correctly identifying duplicate records that represent the same entity, such as user, customer, patient and so on, a ...
A Survey on Generative Diffusion Models
Deep generative models have unlocked another profound realm of human creativity. By capturing and generalizing patterns within data, we have entered the epoch of all-encompassing Artificial Intelligence for General Creativity (AIGC). Notably, diffusion ...
A Unified and Scalable Algorithm Framework of User-Defined Temporal <inline-formula><tex-math notation="LaTeX">$(k,\mathcal {X})$</tex-math><alternatives><mml:math><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="script">X</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhong-ieq1-3349310.gif"/></alternatives></inline-formula>-Core Query
Querying cohesive subgraphs on temporal graphs (e.g., social network, finance network, etc.) with various conditions has attracted intensive research interests recently. In this paper, we study a novel Temporal <inline-formula><tex-math notation="LaTeX">$(...
An Effective Optimization Method for Fuzzy <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="liang-ieq1-3329821.gif"/></alternatives></inline-formula>-Means With Entropy Regularization
Fuzzy <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="liang-ieq2-3329821.gif"/></alternatives></inline-formula>-Means with Entropy Regularization method (ERFKM) is ...
An Index for Set Intersection With Post-Filtering
This paper studies how to design an index structure on a collection of sets <inline-formula><tex-math notation="LaTeX">$S_{1}, S_{2},{\ldots }, S_{n}$</tex-math><alternatives><mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn>1</mml:mn></mml:msub><...
AStore: Uniformed Adaptive Learned Index and Cache for RDMA-Enabled Key-Value Store
Distributed key-value storage and computation are essential components of cloud services. As the demand for high-performance systems has increased significantly, a new architecture has been motivated to separate computing and storage nodes and connect ...
Bayes-Enhanced Multi-View Attention Networks for Robust POI Recommendation
POI recommendation can facilitate various Location-Based Social Network services. Existing methods generally assume the available POI check-ins are the ground-truth depiction of user behaviors. However, in real scenarios, check-in data can be rather ...
Co-Engaged Location Group Search in Location-Based Social Networks
Searching for well-connected user communities in a Location-based Social Network (LBSN) has been extensively investigated. However, very few studies focus on finding a group of locations in an LBSN which are significantly engaged with socially cohesive ...
Comfort-Aware Lane Change Planning With Exit Strategy for Autonomous Vehicle
Automation in road vehicles is an emerging technology that has developed rapidly over the last decade. There have been many inter-disciplinary challenges posed on existing transportation infrastructure by autonomous vehicles. In this paper, we conduct an ...
Denoising Item Graph With Disentangled Learning for Recommendation
Recent years have witnessed the growth of Graph-based Collaborative Filtering (GCF) for high-performance recommendations, but the widely adopted user-item bipartite graphs are subject to deeper layers’ over-smoothing effect and sparse user-item ...
DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection
Depression is one of the most common mental illnesses, but few of the currently proposed in-depth models based on social media data take into account both temporal and spatial information in the data for the detection of depression. In this paper, we ...
Dynamic Graph Embedding via Meta-Learning
Graphs in real-world applications usually evolve constantly presenting dynamic behaviors such as social networks and transportation networks. Hence, dynamic graph embedding has gained much attention recently. In dynamic graphs, both the topology and node ...
Dynamic Quantification With Constrained Error Under Unknown General Dataset Shift
Quantification research has sought to accurately estimate class distributions under dataset shift. While existing methods perform well under assumed conditions of shift, it is not always clear whether such assumptions will hold in a given application. ...
Efficient Algorithms for Group Hitting Probability Queries on Large Graphs
Given a source node <inline-formula><tex-math notation="LaTeX">$s$</tex-math><alternatives><mml:math><mml:mi>s</mml:mi></mml:math><inline-graphic xlink:href="guo-ieq1-3349164.gif"/></alternatives></inline-formula> and a target node <inline-formula><tex-...
Efficient Skyline Frequent-Utility Itemset Mining Algorithm on Massive Data
Frequent itemset mining (FIM) and high-utility itemset mining (HUIM) are two important branches of itemset mining which is a key technology of knowledge discovery in many applications. Nowadays, there have been extensive algorithms on FIM and HUIM, but ...
Enhancing Drug Recommendations Via Heterogeneous Graph Representation Learning in EHR Networks
Electronic health records (EHRs) contain vast medical information like diagnosis, medication, and procedures, enabling personalized drug recommendations and treatment adjustments. However, current drug recommendation methods only model patients’ ...
Explainable Recommender With Geometric Information Bottleneck
Explainable recommender systems can explain their recommendation decisions, enhancing user trust in the systems. Most explainable recommender systems either rely on human-annotated rationales to train models for explanation generation or leverage the ...
Focused Contrastive Loss for Classification With Pre-Trained Language Models
- Jiayuan He,
- Yuan Li,
- Zenan Zhai,
- Biaoyan Fang,
- Camilo Thorne,
- Christian Druckenbrodt,
- Saber Akhondi,
- Karin Verspoor
Contrastive learning, which learns data representations by contrasting similar and dissimilar instances, has achieved great success in various domains including natural language processing (NLP). Recently, it has been demonstrated that incorporating class ...
Fraud's Bargain Attack: Generating Adversarial Text Samples via Word Manipulation Process
Recent research has revealed that natural language processing (NLP) models are vulnerable to adversarial examples. However, the current techniques for generating such examples rely on deterministic heuristic rules, which fail to produce optimal ...
Geometric-Contextual Mutual Infomax Path Aggregation for Relation Reasoning on Knowledge Graph
Relation reasoning in <bold>K</bold>nowledge <bold>G</bold>raph <bold>C</bold>ompletion (KGC) aims at predicting missing relations between entities. Recently, effective KGC methods have usually focused on exploring the path pattern between entities, such ...
Give us the Facts: Enhancing Large Language Models With Knowledge Graphs for Fact-Aware Language Modeling
Recently, ChatGPT, a representative large language model (LLM), has gained considerable attention. Due to their powerful emergent abilities, recent LLMs are considered as a possible alternative to structured knowledge bases like knowledge graphs (KGs). ...
Half-Xor: A Fully-Dynamic Sketch for Estimating the Number of Distinct Values in Big Tables
Calculating the number of distinct values (i.e., NDV) in a column of a big table is costly yet fundamental to a variety of database applications such as data compression and profiling. To reduce the high time and space cost, a number of sketch methods (...
Heterogeneous Graph Condensation
Graph neural networks greatly facilitate data processing in homogeneous and heterogeneous graphs. However, training GNNs on large-scale graphs poses a significant challenge to computing resources. It is especially prominent on heterogeneous graphs, which ...
Hierarchical Context Representation and Self-Adaptive Thresholding for Multivariate Anomaly Detection
Anomaly detection in multivariate time series is a critical research area, but it is also a challenging one due to its occurrence in various real-world scenarios, such as structural health monitoring and risk management. Traditional approaches for anomaly ...
Hybrid Regret Minimization: A Submodular Approach
Regret minimization queries are important methods to extract representative tuples from databases. They have been extensively investigated in the last decade due to wide applications in multi-criteria decision making. For a given database <inline-formula><...
Label-Free Multivariate Time Series Anomaly Detection
Anomaly detection in multivariate time series has been widely studied in one-class classification (OCC) setting. The training samples in this setting are assumed to be normal. In more practical situations, it is difficult to guarantee that all samples are ...
Lauca: A Workload Duplicator for Benchmarking Transactional Database Performance
Generating synthetic workloads is essential and critical to the performance evaluation of database systems. When benchmarking database performance for a specific application, the similarity between synthetic workloads and real application workloads ...