Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–39 of 39 results for author: Palpanas, T

Searching in archive cs. Search in all archives.
.
  1. DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search

    Authors: Jiuqi Wei, Botao Peng, Xiaodong Lee, Themis Palpanas

    Abstract: Locality-sensitive hashing (LSH) is a well-known solution for approximate nearest neighbor (ANN) search in high-dimensional spaces due to its robust theoretical guarantee on query accuracy. Traditional LSH-based methods mainly focus on improving the efficiency and accuracy of the query phase by designing different query strategies, but pay little attention to improving the efficiency of the indexi… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Journal ref: PVLDB, 17(9): 2241 - 2254, 2024

  2. arXiv:2406.10327  [pdf, other

    stat.ML cs.LG

    Analysing Multi-Task Regression via Random Matrix Theory with Application to Time Series Forecasting

    Authors: Romain Ilbert, Malik Tiomoko, Cosme Louart, Ambroise Odonnat, Vasilii Feofanov, Themis Palpanas, Ievgen Redko

    Abstract: In this paper, we introduce a novel theoretical framework for multi-task regression, applying random matrix theory to provide precise performance estimations, under high-dimensional, non-Gaussian data distributions. We formulate a multi-task optimization problem as a regularization technique to enable single-task models to leverage multi-task learning information. We derive a closed-form solution… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2402.10198  [pdf, other

    cs.LG stat.ML

    SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

    Authors: Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen Redko

    Abstract: Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite the… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted as an Oral at ICML 2024, Vienna. The first two authors contributed equally

  4. ADF & TransApp: A Transformer-Based Framework for Appliance Detection Using Smart Meter Consumption Series

    Authors: Adrien Petralia, Philippe Charpentier, Themis Palpanas

    Abstract: Over the past decade, millions of smart meters have been installed by electricity suppliers worldwide, allowing them to collect a large amount of electricity consumption data, albeit sampled at a low frequency (one point every 30min). One of the important challenges these suppliers face is how to utilize these data to detect the presence/absence of different appliances in the customers' households… ▽ More

    Submitted 17 December, 2023; originally announced January 2024.

    Comments: 10 pages, 7 figures. This paper appeared in VLDB 2024

    Journal ref: Proceedings of the VLDB Endowment, Volume 17, Issue 3, Pages 553-562, 2023

  5. arXiv:2311.09790  [pdf, other

    cs.LG cs.AI cs.CR

    Breaking Boundaries: Balancing Performance and Robustness in Deep Wireless Traffic Forecasting

    Authors: Romain Ilbert, Thai V. Hoang, Zonghua Zhang, Themis Palpanas

    Abstract: Balancing the trade-off between accuracy and robustness is a long-standing challenge in time series forecasting. While most of existing robust algorithms have achieved certain suboptimal performance on clean data, sustaining the same performance level in the presence of data perturbations remains extremely hard. In this paper, we study a wide array of perturbation scenarios and propose novel defen… ▽ More

    Submitted 28 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted for presentation at the ARTMAN workshop, part of the ACM Conference on Computer and Communications Security (CCS), 2023

    MSC Class: 68T05; 62M10; 68T01 ACM Class: I.2.6; I.2.4; K.6.5

    Journal ref: Proceedings of the 2023 Workshop on Recent Advances in Resilient and Trustworthy ML Systems in Autonomous Networks; pp.17-28

  6. arXiv:2310.11602  [pdf, other

    cs.DB cs.DC

    FreSh: A Lock-Free Data Series Index

    Authors: Panagiota Fatourou, Eleftherios Kosmas, Themis Palpanas, George Paterakis

    Abstract: We present FreSh, a lock-free data series index that exhibits good performance (while being robust). FreSh is based on Refresh, which is a generic approach we have developed for supporting lock-freedom in an efficient way on top of any localityaware data series index. We believe Refresh is of independent interest and can be used to get well-performed lock-free versions of other locality-aware bloc… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 12 pages, 18 figures, Conference: Symposium on Reliable Distributed Systems (SRDS 2023)

    ACM Class: E.1; F.2

  7. arXiv:2307.05800  [pdf, other

    eess.IV cs.CV

    A Hierarchical Transformer Encoder to Improve Entire Neoplasm Segmentation on Whole Slide Image of Hepatocellular Carcinoma

    Authors: Zhuxian Guo, Qitong Wang, Henning Müller, Themis Palpanas, Nicolas Loménie, Camille Kurtz

    Abstract: In digital histopathology, entire neoplasm segmentation on Whole Slide Image (WSI) of Hepatocellular Carcinoma (HCC) plays an important role, especially as a preprocessing filter to automatically exclude healthy tissue, in histological molecular correlations mining and other downstream histopathological tasks. The segmentation task remains challenging due to HCC's inherent high-heterogeneity and t… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  8. arXiv:2307.01231  [pdf, other

    cs.DB cs.AI cs.LG

    A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms

    Authors: George Papadakis, Nishadi Kirielle, Peter Christen, Themis Palpanas

    Abstract: Entity resolution (ER) is the process of identifying records that refer to the same entities within one or across multiple databases. Numerous techniques have been developed to tackle ER challenges over the years, with recent emphasis placed on machine and deep learning methods for the matching phase. However, the quality of the benchmark datasets typically used in the experimental evaluations of… ▽ More

    Submitted 12 November, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

  9. arXiv:2306.12144  [pdf, other

    cs.CR

    PrivSketch: A Private Sketch-based Frequency Estimation Protocol for Data Streams

    Authors: Ying Li, Xiaodong Lee, Botao Peng, Themis Palpanas, Jingan Xue

    Abstract: Local differential privacy (LDP) has recently become a popular privacy-preserving data collection technique protecting users' privacy. The main problem of data stream collection under LDP is the poor utility due to multi-item collection from a very large domain. This paper proposes PrivSketch, a high-utility frequency estimation protocol taking advantage of sketches, suitable for private data stre… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  10. Appliance Detection Using Very Low-Frequency Smart Meter Time Series

    Authors: Adrien Petralia, Philippe Charpentier, Paul Boniol, Themis Palpanas

    Abstract: In recent years, smart meters have been widely adopted by electricity suppliers to improve the management of the smart grid system. These meters usually collect energy consumption data at a very low frequency (every 30min), enabling utilities to bill customers more accurately. To provide more personalized recommendations, the next step is to detect the appliances owned by customers, which is a cha… ▽ More

    Submitted 21 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 11 pages, 7 figures. This paper appeared in ACM e-Energy 2023

  11. Dumpy: A Compact and Adaptive Index for Large Data Series Collections

    Authors: Zeyu Wang, Qitong Wang, Peng Wang, Themis Palpanas, Wei Wang

    Abstract: Data series indexes are necessary for managing and analyzing the increasing amounts of data series collections that are nowadays available. These indexes support both exact and approximate similarity search, with approximate search providing high-quality results within milliseconds, which makes it very attractive for certain modern applications. Reducing the pre-processing (i.e., index building) t… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Journal ref: Proc. ACM Manag. Data 1, 1, Article 111 (May 2023), 27 pages

  12. arXiv:2301.11049  [pdf, other

    cs.DC cs.DB

    Odyssey: A Journey in the Land of Distributed Data Series Similarity Search

    Authors: Manos Chatzakis, Panagiota Fatourou, Eleftherios Kosmas, Themis Palpanas, Botao Peng

    Abstract: This paper presents Odyssey, a novel distributed data-series processing framework that efficiently addresses the critical challenges of exhibiting good speedup and ensuring high scalability in data series processing by taking advantage of the full computational capacity of modern clusters comprised of multi-core servers. Odyssey addresses a number of challenges in designing efficient and highly sc… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: PVLDB 2023

  13. arXiv:2212.13310  [pdf, other

    cs.DB

    ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees

    Authors: Karima Echihabi, Theophanis Tsandilas, Anna Gogolou, Anastasia Bezerianos, Themis Palpanas

    Abstract: Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

    Journal ref: The VLDB Journal, 1-27 (2022)

  14. Hercules Against Data Series Similarity Search

    Authors: Karima Echihabi, Panagiota Fatourou, Kostas Zoumpatianos, Themis Palpanas, Houda Benbrahim

    Abstract: We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perfor… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

    Journal ref: Proc. VLDB Endow. 15(10): 2005-2018 (2022)

  15. Series2Graph: Graph-based Subsequence Anomaly Detection for Time Series

    Authors: Paul Boniol, Themis Palpanas

    Abstract: Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches proposed so far in the literature have severe limitations: they either require prior domain knowledge used to design the anomaly discovery algorithms, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Journal ref: Proceedings of the VLDB Endowment, Volume 13, Issue 12, Pages 1821-1834, 2020

  16. dCAM: Dimension-wise Class Activation Map for Explaining Multivariate Data Series Classification

    Authors: Paul Boniol, Mohammed Meftah, Emmanuel Remy, Themis Palpanas

    Abstract: Data series classification is an important and challenging problem in data science. Explaining the classification decisions by finding the discriminant parts of the input that led the algorithm to some decisions is a real need in many applications. Convolutional neural networks perform well for the data series classification task; though, the explanations provided by this type of algorithm are poo… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Journal ref: Proceedings of the 2022 International Conference on Management of Data (SIGMOD '22), June 12--17, 2022, Philadelphia, PA, USA

  17. arXiv:2204.08801  [pdf, other

    cs.DB

    Generalized Supervised Meta-blocking (technical report)

    Authors: Luca Gagliardelli, George Papadakis, Giovanni Simonini, Sonia Bergamaschi, Themis Palpanas

    Abstract: Entity Resolution constitutes a core data integration task that relies on Blocking in order to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall, requires no domain knowledge and applies to data of any structuredness and schema heterogeneity. This comes at the cost of many irrelevant candidate pairs (i.e., comparisons), which can be significantly reduced throug… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

  18. arXiv:2110.07519  [pdf, other

    cs.DB

    Fast Data Series Indexing for In-Memory Data

    Authors: Botao Peng, Panagiota Fatourou, Themis Palpanas

    Abstract: Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.00786

  19. arXiv:2104.09509  [pdf, other

    cs.DB

    Local Similarity Search on Geolocated Time Series Using Hybrid Indexing

    Authors: Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos

    Abstract: Geolocated time series, i.e., time series associated with certain locations, abound in many modern applications. In this paper, we consider hybrid queries for retrieving geolocated time series based on filters that combine spatial distance and time series similarity. For the latter, unlike existing work, we allow filtering based on local similarity, which is computed based on subsequences rather t… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    MSC Class: 68P05 ACM Class: E.1

  20. arXiv:2104.09417  [pdf, other

    cs.DS

    Local Pair and Bundle Discovery over Co-Evolving Time Series

    Authors: Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos

    Abstract: Time series exploration and mining has many applications across several industrial and scientific domains. In this paper, we consider the problem of detecting locally similar pairs and groups, called bundles, over co-evolving time series. These are pairs or groups of subsequences whose values do not differ by more than ε for at least delta consecutive timestamps, thus indicating common local patte… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: 16 pages, 16 figures

    MSC Class: 68P05 ACM Class: E.1

  21. arXiv:2104.06874  [pdf

    cs.DS

    Twin Subsequence Search in Time Series

    Authors: Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos

    Abstract: We address the problem of subsequence search in time series using Chebyshev distance, to which we refer as twin subsequence search. We first show how existing time series indices can be extended to perform twin subsequence search. Then, we introduce TS-Index, a novel index tailored to this problem. Our experimental evaluation compares these approaches against real time series datasets, and demonst… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: 6 pages, 8 figures

    MSC Class: 68P05 ACM Class: E.1

  22. arXiv:2009.10373  [pdf, other

    cs.DB

    Scalable Data Series Subsequence Matching with ULISSE

    Authors: Michele Linardi, Themis Palpanas

    Abstract: Data series similarity search is an important operation and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first dat… ▽ More

    Submitted 22 September, 2020; originally announced September 2020.

  23. arXiv:2009.00786  [pdf, other

    cs.DB

    MESSI: In-Memory Data Series Indexing

    Authors: Botao Peng, Panagiota Fatourou, Themis Palpanas

    Abstract: Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

  24. arXiv:2009.00166  [pdf, other

    cs.DB

    ParIS+: Data Series Indexing on Multi-Core Architectures

    Authors: Botao Peng, Panagiota Fatourou, Themis Palpanas

    Abstract: Data series similarity search is a core operation for several data series analysis applications across many different domains. Nevertheless, even state-of-the-art techniques cannot provide the time performance required for large data series collections. We propose ParIS and ParIS+, the first disk-based data series indices carefully designed to inherently take advantage of multi-core architectures,… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

  25. arXiv:2008.13447  [pdf, other

    cs.DB

    Matrix Profile Goes MAD: Variable-Length Motif And Discord Discovery in Data Series

    Authors: Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh

    Abstract: In the last fifteen years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif and discord discovery tools still require the user to provide the relative length. Yet, in several cases, the… ▽ More

    Submitted 31 August, 2020; originally announced August 2020.

  26. arXiv:2008.13432  [pdf, other

    cs.DB

    VALMOD: A Suite for Easy and Exact Detection of Variable Length Motifs in Data Series

    Authors: Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh

    Abstract: Data series motif discovery represents one of the most useful primitives for data series mining, with applications to many domains, such as robotics, entomology, seismology, medicine, and climatology, and others. The state-of-the-art motif discovery tools still require the user to provide the motif length. Yet, in several cases, the choice of motif length is critical for their detection. Unfortuna… ▽ More

    Submitted 31 August, 2020; originally announced August 2020.

  27. arXiv:2008.08919  [pdf, other

    cs.AI cs.LG cs.LO

    SentiQ: A Probabilistic Logic Approach to Enhance Sentiment Analysis Tool Quality

    Authors: Wissam Maamar Kouadri, Salima Benbernou, Mourad Ouziri, Themis Palpanas, Iheb Ben Amor

    Abstract: The opinion expressed in various Web sites and social-media is an essential contributor to the decision making process of several organizations. Existing sentiment analysis tools aim to extract the polarity (i.e., positive, negative, neutral) from these opinionated contents. Despite the advance of the research in the field, sentiment analysis tools give \textit{inconsistent} polarities, which is h… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: In Proceedings of the 9th KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM 20). San Diego, CA, USA, 8 pages

  28. arXiv:2006.13713  [pdf, other

    cs.DB

    Coconut: a scalable bottom-up approach for building data series indexes

    Authors: Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas

    Abstract: Many modern applications produce massive amounts of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2006.11474

  29. arXiv:2006.13079  [pdf, other

    cs.DB

    Coconut Palm: Static and Streaming Data Series Exploration Now in your Palm

    Authors: Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas

    Abstract: Many modern applications produce massive streams of data series and maintain them in indexes to be able to explore them through nearest neighbor search. Existing data series indexes, however, are expensive to operate as they issue many random I/Os to storage. To address this problem, we recently proposed Coconut, a new infrastructure that organizes data series based on a new sortable format. In th… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  30. arXiv:2006.11474  [pdf, other

    cs.DB

    Coconut: sortable summarizations for scalable indexes over static and streaming data series

    Authors: Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas

    Abstract: Many modern applications produce massive streams of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing… ▽ More

    Submitted 16 April, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

  31. arXiv:2006.11459  [pdf, other

    cs.DB

    Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search

    Authors: Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, Houda Benbrahim

    Abstract: Data series are a special type of multidimensional data present in numerous domains, where similarity search is a key operation that has been extensively studied in the data series literature. In parallel, the multidimensional community has studied approximate similarity search techniques. We propose a taxonomy of similarity search techniques that reconciles the terminology used in these two domai… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  32. arXiv:2006.11454  [pdf, other

    cs.DB

    The Lernaean Hydra of Data Series Similarity Search: An Experimental Evaluation of the State of the Art

    Authors: Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, Houda Benbrahim

    Abstract: Increasingly large data series collections are becoming commonplace across many different domains and applications. A key operation in the analysis of data series collections is similarity search, which has attracted lots of attention and effort over the past two decades. Even though several relevant approaches have been proposed in the literature, none of the existing studies provides a detailed… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  33. arXiv:1905.06397  [pdf, other

    cs.DB

    End-to-End Entity Resolution for Big Data: A Survey

    Authors: Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, Kostas Stefanidis

    Abstract: One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-… ▽ More

    Submitted 19 August, 2020; v1 submitted 15 May, 2019; originally announced May 2019.

  34. Schema-agnostic Progressive Entity Resolution (extended version)

    Authors: Giovanni Simonini, George Papadakis, Themis Palpanas, Sonia Bergamaschi

    Abstract: Entity Resolution (ER) is the task of finding entity profiles that correspond to the same real-world entity. Progressive ER aims to efficiently resolve large datasets when limited time and/or computational resources are available. In practice, its goal is to provide the best possible partial solution by approximating the optimal comparison order of the entity profiles. So far, Progressive ER has o… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Journal ref: IEEE Trans. Knowl. Data Eng. 31(6): 1208-1221 (2019)

  35. arXiv:1905.06167  [pdf, other

    cs.DB

    A Survey of Blocking and Filtering Techniques for Entity Resolution

    Authors: George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas

    Abstract: Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid techniques, facilitating their understanding and use. We also provided an in-dept coverage of each category, further classifying the corresponding works into novel sub-categories. Lately, the efficiency techniques have r… ▽ More

    Submitted 21 August, 2020; v1 submitted 15 May, 2019; originally announced May 2019.

  36. arXiv:1812.08032  [pdf, other

    cs.HC cs.DB cs.LG

    Progressive Data Science: Potential and Challenges

    Authors: Cagatay Turkay, Nicola Pezzotti, Carsten Binnig, Hendrik Strobelt, Barbara Hammer, Daniel A. Keim, Jean-Daniel Fekete, Themis Palpanas, Yunhai Wang, Florin Rusu

    Abstract: Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped-up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner,… ▽ More

    Submitted 12 September, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

    ACM Class: H.5.2; H.3.m; I.2.m; I.3.m

  37. arXiv:1609.03095  [pdf, other

    cs.DB

    Efficient Error-tolerant Search on Knowledge Graphs

    Authors: Zhaoyang Shao, Davood Rafiei, Themis Palpanas

    Abstract: Edge-labeled graphs are widely used to describe relationships between entities in a database. Given a query subgraph that represents an example of what the user is searching for, we study the problem of efficiently searching for similar subgraphs in a large data graph, where the similarity is defined in terms of the well-known graph edit distance. We call these queries "error-tolerant exemplar que… ▽ More

    Submitted 11 May, 2020; v1 submitted 10 September, 2016; originally announced September 2016.

  38. arXiv:1405.5829  [pdf, other

    cs.DB cs.LG

    Node Classification in Uncertain Graphs

    Authors: Michele Dallachiesa, Charu Aggarwal, Themis Palpanas

    Abstract: In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the classification process. If the information about link reliability is not used explicitly, the classificati… ▽ More

    Submitted 22 May, 2014; originally announced May 2014.

  39. arXiv:1208.1931  [pdf, other

    cs.DB

    Uncertain Time-Series Similarity: Return to the Basics

    Authors: Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, Themis Palpanas

    Abstract: In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and engineering facilities to ensure efficiency, product quality and safety, hydrologic and geologic observing systems, pollution management, and others. Due to… ▽ More

    Submitted 9 August, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1662-1673 (2012)