Author: Papapetrou, Odysseas : Search

research-article

OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates

Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 3Pages 319–331https://doi.org/10.14778/3632093.3632098

A key need in different disciplines is to perform analytics over fast-paced data streams, similar in nature to the traditional OLAP analytics in relational databases - i.e., with filters and aggregates. Storing unbounded streams, however, is not a ...

research-article

Efficient detection of multivariate correlations with different correlation measures

The VLDB Journal — The International Journal on Very Large Data Bases (VLDB), Volume 33, Issue 2Pages 481–505https://doi.org/10.1007/s00778-023-00815-y

Abstract

Correlation analysis is an invaluable tool in many domains, for better understanding the data and extracting salient insights. Most works to date focus on detecting high pairwise correlations. A generalization of this problem with known ...

research-article

Open Access

Adaptive Distributed Streaming Similarity Joins

DEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-based SystemsPages 25–36https://doi.org/10.1145/3583678.3596891

How can we perform similarity joins of multi-dimensional streams in a distributed fashion, achieving low latency? Can we adaptively repartition those streams in order to retain high performance under concept drifts? Current approaches to similarity ...

research-article

TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching

Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 4Pages 790–802https://doi.org/10.14778/3574245.3574263

Set similarity join is an important problem with many applications in data discovery, cleaning and integration. To increase robustness, fuzzy set similarity join calculates the similarity of two sets based on maximum weighted bipartite matching instead ...

research-article

Multivariate correlations discovery in static and streaming data

Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 6Pages 1266–1278https://doi.org/10.14778/3514061.3514072

Correlation analysis is an invaluable tool in many domains, for better understanding data and extracting salient insights. Most works to date focus on detecting high pairwise correlations. A generalization of this problem with known applications but no ...

research-article

Scalable temporal clique enumeration

SSTD '19: Proceedings of the 16th International Symposium on Spatial and Temporal DatabasesPages 120–129https://doi.org/10.1145/3340964.3340987

We study the problem of enumeration of all k-sized subsets of temporal events that mutually overlap at some point in a query time window. This problem arises in many application domains, e.g., in social networks, life sciences, smart cities, ...

article

Monitoring distributed fragmented skylines

Distributed and Parallel Databases (DAPD), Volume 36, Issue 4Pages 675–715https://doi.org/10.1007/s10619-018-7223-7

Distributed skyline computation is important for a wide range of domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal ...

research-article

Public Access

Practical Private Range Search in Depth

ACM Transactions on Database Systems (TODS), Volume 43, Issue 1Article No.: 2, Pages 1–52https://doi.org/10.1145/3167971

We consider a data owner that outsources its dataset to an untrusted server. The owner wishes to enable the server to answer range queries on a single attribute, without compromising the privacy of the data and the queries. There are several schemes on “...

research-article

Practical Private Range Search Revisited

SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 185–198https://doi.org/10.1145/2882903.2882911

We consider a data owner that outsources its dataset to an untrusted server. The owner wishes to enable the server to answer range queries on a single attribute, without compromising the privacy of the data and the queries. There are several schemes on "...

article

Free

Sketching distributed sliding-window data streams

The VLDB Journal — The International Journal on Very Large Data Bases (VLDB), Volume 24, Issue 3Pages 345–368https://doi.org/10.1007/s00778-015-0380-7

While traditional data management systems focus on evaluating single, ad hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...

column

Data management research at the technical university of crete

ACM SIGMOD Record (SIGMOD), Volume 42, Issue 4Pages 61–66https://doi.org/10.1145/2590989.2590999

research-article

Decentralized Probabilistic Text Clustering

IEEE Transactions on Knowledge and Data Engineering (IEEECS_TKDE), Volume 24, Issue 10Pages 1848–1861https://doi.org/10.1109/TKDE.2011.120

Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly distributed environments, such as peer-...

research-article

Sketch-based querying of distributed sliding-window data streams

Proceedings of the VLDB Endowment (PVLDB), Volume 5, Issue 10Pages 992–1003https://doi.org/10.14778/2336664.2336672

While traditional data-management systems focus on evaluating single, ad-hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...

Article

XStreamCluster: an efficient algorithm for streaming XML data clustering

DASFAA'11: Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part IPages 496–510

XML clustering finds many applications, ranging from storage to query processing. However, existing clustering algorithms focus on static XML collections, whereas modern information systems frequently deal with streaming XML data that needs to be ...

poster

Collaborative classification over P2P networks

WWW '11: Proceedings of the 20th international conference companion on World wide webPages 97–98https://doi.org/10.1145/1963192.1963242

We propose a novel collaborative approach for distributed document classification, combining the knowledge of multiple users for improved organization of data such as individual document repositories or emails. The approach builds on top of a P2P ...

research-article

Efficient discovery of frequent subgraph patterns in uncertain graph databases

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database TechnologyPages 355–366https://doi.org/10.1145/1951365.1951408

Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent ...

article

Cardinality estimation and dynamic length adaptation for Bloom filters

Distributed and Parallel Databases (DAPD), Volume 28, Issue 2-3Pages 119–156https://doi.org/10.1007/s10619-010-7067-2

Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features ...

article

PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing

Computer Networks: The International Journal of Computer and Telecommunications Networking (CNTW), Volume 54, Issue 12Pages 2019–2040https://doi.org/10.1016/j.comnet.2010.03.025

Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term ...

Article

Efficient term cloud generation for streaming web content

ICWE'10: Proceedings of the 10th international conference on Web engineeringPages 385–399

Large amounts of information are posted daily on the Web, such as articles published online by traditional news agencies or blog posts referring to and commenting on various events. Although the users sometimes rely on a small set of trusted sources ...

Article

Efficient semantic-aware detection of near duplicate resources

ESWC'10: Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part IIPages 136–150https://doi.org/10.1007/978-3-642-13489-0_10

Efficiently detecting near duplicate resources is an important task when integrating information from various sources and applications. Once detected, near duplicate resources can be grouped together, merged, or removed, in order to avoid repetition and ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences