Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 3Pages 319–331https://doi.org/10.14778/3632093.3632098A key need in different disciplines is to perform analytics over fast-paced data streams, similar in nature to the traditional OLAP analytics in relational databases - i.e., with filters and aggregates. Storing unbounded streams, however, is not a ...
- research-articleOctober 2023
Efficient detection of multivariate correlations with different correlation measures
The VLDB Journal — The International Journal on Very Large Data Bases (VLDB), Volume 33, Issue 2Pages 481–505https://doi.org/10.1007/s00778-023-00815-yAbstractCorrelation analysis is an invaluable tool in many domains, for better understanding the data and extracting salient insights. Most works to date focus on detecting high pairwise correlations. A generalization of this problem with known ...
Adaptive Distributed Streaming Similarity Joins
- George Siachamis,
- Kyriakos Psarakis,
- Marios Fragkoulis,
- Odysseas Papapetrou,
- Arie van Deursen,
- Asterios Katsifodimos
DEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-based SystemsPages 25–36https://doi.org/10.1145/3583678.3596891How can we perform similarity joins of multi-dimensional streams in a distributed fashion, achieving low latency? Can we adaptively repartition those streams in order to retain high performance under concept drifts? Current approaches to similarity ...
TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 4Pages 790–802https://doi.org/10.14778/3574245.3574263Set similarity join is an important problem with many applications in data discovery, cleaning and integration. To increase robustness, fuzzy set similarity join calculates the similarity of two sets based on maximum weighted bipartite matching instead ...
Multivariate correlations discovery in static and streaming data
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 6Pages 1266–1278https://doi.org/10.14778/3514061.3514072Correlation analysis is an invaluable tool in many domains, for better understanding data and extracting salient insights. Most works to date focus on detecting high pairwise correlations. A generalization of this problem with known applications but no ...
-
- research-articleAugust 2019
Scalable temporal clique enumeration
SSTD '19: Proceedings of the 16th International Symposium on Spatial and Temporal DatabasesPages 120–129https://doi.org/10.1145/3340964.3340987We study the problem of enumeration of all k-sized subsets of temporal events that mutually overlap at some point in a query time window. This problem arises in many application domains, e.g., in social networks, life sciences, smart cities, ...
- articleDecember 2018
Monitoring distributed fragmented skylines
Distributed and Parallel Databases (DAPD), Volume 36, Issue 4Pages 675–715https://doi.org/10.1007/s10619-018-7223-7Distributed skyline computation is important for a wide range of domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal ...
- research-articleMarch 2018
Practical Private Range Search in Depth
- Ioannis Demertzis,
- Stavros Papadopoulos,
- Odysseas Papapetrou,
- Antonios Deligiannakis,
- Minos Garofalakis,
- Charalampos Papamanthou
ACM Transactions on Database Systems (TODS), Volume 43, Issue 1Article No.: 2, Pages 1–52https://doi.org/10.1145/3167971We consider a data owner that outsources its dataset to an untrusted server. The owner wishes to enable the server to answer range queries on a single attribute, without compromising the privacy of the data and the queries. There are several schemes on “...
- research-articleJune 2016
Practical Private Range Search Revisited
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 185–198https://doi.org/10.1145/2882903.2882911We consider a data owner that outsources its dataset to an untrusted server. The owner wishes to enable the server to answer range queries on a single attribute, without compromising the privacy of the data and the queries. There are several schemes on "...
- articleJune 2015
Sketching distributed sliding-window data streams
The VLDB Journal — The International Journal on Very Large Data Bases (VLDB), Volume 24, Issue 3Pages 345–368https://doi.org/10.1007/s00778-015-0380-7While traditional data management systems focus on evaluating single, ad hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...
- columnFebruary 2014
- research-articleOctober 2012
Decentralized Probabilistic Text Clustering
IEEE Transactions on Knowledge and Data Engineering (IEEECS_TKDE), Volume 24, Issue 10Pages 1848–1861https://doi.org/10.1109/TKDE.2011.120Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly distributed environments, such as peer-...
- research-articleJune 2012
Sketch-based querying of distributed sliding-window data streams
Proceedings of the VLDB Endowment (PVLDB), Volume 5, Issue 10Pages 992–1003https://doi.org/10.14778/2336664.2336672While traditional data-management systems focus on evaluating single, ad-hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...
- ArticleApril 2011
XStreamCluster: an efficient algorithm for streaming XML data clustering
XML clustering finds many applications, ranging from storage to query processing. However, existing clustering algorithms focus on static XML collections, whereas modern information systems frequently deal with streaming XML data that needs to be ...
- posterMarch 2011
Collaborative classification over P2P networks
WWW '11: Proceedings of the 20th international conference companion on World wide webPages 97–98https://doi.org/10.1145/1963192.1963242We propose a novel collaborative approach for distributed document classification, combining the knowledge of multiple users for improved organization of data such as individual document repositories or emails. The approach builds on top of a P2P ...
- research-articleMarch 2011
Efficient discovery of frequent subgraph patterns in uncertain graph databases
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database TechnologyPages 355–366https://doi.org/10.1145/1951365.1951408Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent ...
- articleDecember 2010
Cardinality estimation and dynamic length adaptation for Bloom filters
Distributed and Parallel Databases (DAPD), Volume 28, Issue 2-3Pages 119–156https://doi.org/10.1007/s10619-010-7067-2Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features ...
- articleAugust 2010
PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing
Computer Networks: The International Journal of Computer and Telecommunications Networking (CNTW), Volume 54, Issue 12Pages 2019–2040https://doi.org/10.1016/j.comnet.2010.03.025Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term ...
- ArticleJuly 2010
Efficient term cloud generation for streaming web content
Large amounts of information are posted daily on the Web, such as articles published online by traditional news agencies or blog posts referring to and commenting on various events. Although the users sometimes rely on a small set of trusted sources ...
- ArticleMay 2010
Efficient semantic-aware detection of near duplicate resources
ESWC'10: Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part IIPages 136–150https://doi.org/10.1007/978-3-642-13489-0_10Efficiently detecting near duplicate resources is an important task when integrating information from various sources and applications. Once detected, near duplicate resources can be grouped together, merged, or removed, in order to avoid repetition and ...