Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–46 of 46 results for author: Bifet, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16187  [pdf, other

    cs.LG cs.AI

    Real-Time Energy Pricing in New Zealand: An Evolving Stream Analysis

    Authors: Yibin Sun, Heitor Murilo Gomes, Bernhard Pfahringer, Albert Bifet

    Abstract: This paper introduces a group of novel datasets representing real-time time-series and streaming data of energy prices in New Zealand, sourced from the Electricity Market Information (EMI) website maintained by the New Zealand government. The datasets are intended to address the scarcity of proper datasets for streaming regression learning tasks. We conduct extensive analyses and experiments on th… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 12 Pages, 8 figures, short version accepted by PRICAI

  2. A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

    Authors: Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet

    Abstract: The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify wh… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  3. arXiv:2406.02175  [pdf, other

    cs.LG

    Branches: A Fast Dynamic Programming and Branch & Bound Algorithm for Optimal Decision Trees

    Authors: Ayman Chaouki, Jesse Read, Albert Bifet

    Abstract: Decision Tree Learning is a fundamental problem for Interpretable Machine Learning, yet it poses a formidable optimization challenge. Despite numerous efforts dating back to the early 1990's, practical algorithms have only recently emerged, primarily leveraging Dynamic Programming (DP) and Branch & Bound (B&B) techniques. These breakthroughs led to the development of two distinct approaches. Algor… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: This preprint is currently under review

  4. arXiv:2405.17222  [pdf, other

    cs.LG

    A Retrospective of the Tutorial on Opportunities and Challenges of Online Deep Learning

    Authors: Cedric Kulbach, Lucas Cazzonelli, Hoang-Anh Ngo, Minh-Huong Le-Nguyen, Albert Bifet

    Abstract: Machine learning algorithms have become indispensable in today's world. They support and accelerate the way we make decisions based on the data at hand. This acceleration means that data structures that were valid at one moment could no longer be valid in the future. With these changing data structures, it is necessary to adapt machine learning (ML) systems incrementally to the new data. This is d… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted for publication on ECML-PKDD 2023 joint Post-Workshop Proceeding

  5. arXiv:2404.06403  [pdf, other

    cs.LG

    Online Learning of Decision Trees with Thompson Sampling

    Authors: Ayman Chaouki, Jesse Read, Albert Bifet

    Abstract: Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly researched, mostly in the batch setting with a fixed labelled dataset, leading to popular algorithms such as C4.5, ID3 and CART. Unfortunately, these methods are of heuristic nature, they rely on greedy splits offering no guarantees of global optimality and often leading to unnecessarily co… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: To be published in the Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain. PMLR: Volume 238

  6. arXiv:2401.08348  [pdf, other

    cs.LG

    Estimating Model Performance Under Covariate Shift Without Labels

    Authors: Jakub Białek, Wojtek Kuberski, Nikolaos Perrakis, Albert Bifet

    Abstract: Machine learning models often experience performance degradation post-deployment due to shifts in data distribution. It is challenging to assess model's performance accurately when labels are missing or delayed. Existing proxy methods, such as drift detection, fail to measure the effects of these shifts adequately. To address this, we introduce a new method, Probabilistic Adaptive Performance Esti… ▽ More

    Submitted 28 May, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 9 content pages, 3 figures

    MSC Class: 62G05

  7. Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning

    Authors: Anton Lee, Yaqian Zhang, Heitor Murilo Gomes, Albert Bifet, Bernhard Pfahringer

    Abstract: Continual learning aims to create artificial neural networks capable of accumulating knowledge and skills through incremental training on a sequence of tasks. The main challenge of continual learning is catastrophic interference, wherein new knowledge overrides or interferes with past knowledge, leading to forgetting. An associated issue is the problem of learning "cross-task knowledge," where mod… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of the 32nd ACM international conference on information and knowledge management, CIKM 2023, birmingham, united kingdom, october 21-25, 2023

  8. arXiv:2305.11311  [pdf, ps, other

    cs.LG cs.AI

    BELLA: Black box model Explanations by Local Linear Approximations

    Authors: Nedeljko Radulovic, Albert Bifet, Fabian Suchanek

    Abstract: In recent years, understanding the decision-making process of black-box models has become not only a legal requirement but also an additional way to assess their performance. However, the state of the art post-hoc interpretation approaches rely on synthetic data generation. This introduces uncertainty and can hurt the reliability of the interpretations. Furthermore, they tend to produce explanatio… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 21 pages,3 figures, submitted to Journal of Artificial Intelligence

  9. arXiv:2302.08017  [pdf, other

    cs.LG cs.AI

    Preventing Discriminatory Decision-making in Evolving Data Streams

    Authors: Zichong Wang, Nripsuta Saxena, Tongjia Yu, Sneha Karki, Tyler Zetty, Israat Haque, Shan Zhou, Dukka Kc, Ian Stockwell, Albert Bifet, Wenbin Zhang

    Abstract: Bias in machine learning has rightly received significant attention over the last decade. However, most fair machine learning (fair-ML) work to address bias in decision-making systems has focused solely on the offline setting. Despite the wide prevalence of online systems in the real world, work on identifying and correcting bias in the online setting is severely lacking. The unique challenges of… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  10. arXiv:2209.13917  [pdf, other

    cs.LG cs.AI

    A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal

    Authors: Yaqian Zhang, Bernhard Pfahringer, Eibe Frank, Albert Bifet, Nick Jin Sean Lim, Yunzhe Jia

    Abstract: Online continual learning (OCL) aims to train neural networks incrementally from a non-stationary data stream with a single pass through data. Rehearsal-based methods attempt to approximate the observed input distributions over time with a small memory and revisit them later to avoid forgetting. Despite its strong empirical performance, rehearsal methods still suffer from a poor approximation of t… ▽ More

    Submitted 13 November, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022

  11. arXiv:2209.08192  [pdf, other

    cs.LG

    Linear TreeShap

    Authors: Peng Yu, Chao Xu, Albert Bifet, Jesse Read

    Abstract: Decision trees are well-known due to their ease of interpretability. To improve accuracy, we need to grow deep trees or ensembles of trees. These are hard to interpret, offsetting their original benefits. Shapley values have recently become a popular way to explain the predictions of tree-based machine learning models. It provides a linear weighting to features independent of the tree structure. T… ▽ More

    Submitted 25 January, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

    Comments: An efficient algorithm to compute Shapley value on decision trees

  12. arXiv:2205.03184  [pdf, other

    cs.LG cs.AR

    Green Accelerated Hoeffding Tree

    Authors: Eva Garcia-Martin, Albert Bifet, Niklas Lavesson, Rikard König, Henrik Linusson

    Abstract: State-of-the-art machine learning solutions mainly focus on creating highly accurate models without constraints on hardware resources. Stream mining algorithms are designed to run on resource-constrained devices, thus a focus on low power and energy and memory-efficient is essential. The Hoeffding tree algorithm is able to create energy-efficient models, but at the cost of less accurate trees in c… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: Presented as a poster in the TinyML 2021 Research Symposium

  13. Open challenges for Machine Learning based Early Decision-Making research

    Authors: Alexis Bondu, Youssef Achenchabe, Albert Bifet, Fabrice Clérot, Antoine Cornuéjols, Joao Gama, Georges Hébrail, Vincent Lemaire, Pierre-François Marteau

    Abstract: More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularly studied in the field of Early Time Series Classi… ▽ More

    Submitted 20 May, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

  14. arXiv:2201.11650  [pdf, other

    cs.DB cs.AI

    Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrences

    Authors: Thomas Guyet, Wenbin Zhang, Albert Bifet

    Abstract: The need to analyze information from streams arises in a variety of applications. One of its fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the presence of the pattern in transactions but pay no attention to the series of itemsets and their multiple occurrences. The pattern over a window of itemsets stream and their m… ▽ More

    Submitted 9 April, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  15. arXiv:2201.06205  [pdf, other

    cs.LG

    Balancing Performance and Energy Consumption of Bagging Ensembles for the Classification of Data Streams in Edge Computing

    Authors: Guilherme Cassales, Heitor Gomes, Albert Bifet, Bernhard Pfahringer, Hermes Senger

    Abstract: In recent years, the Edge Computing (EC) paradigm has emerged as an enabling factor for developing technologies like the Internet of Things (IoT) and 5G networks, bridging the gap between Cloud Computing services and end-users, supporting low latency, mobility, and location awareness to delay-sensitive applications. Most solutions in EC employ machine learning (ML) methods to perform data classifi… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

    Comments: 18 pages. arXiv admin note: text overlap with arXiv:2112.09834

  16. arXiv:2201.05156   

    cs.IR cs.AI cs.LG

    Proceedings of the 4th Workshop on Online Recommender Systems and User Modeling -- ORSUM 2021

    Authors: João Vinagre, Alípio Mário Jorge, Marie Al-Ghossein, Albert Bifet

    Abstract: Modern online services continuously generate data at very fast rates. This continuous flow of data encompasses content - e.g., posts, news, products, comments -, but also user feedback - e.g., ratings, views, reads, clicks -, together with context data - user device, spatial or temporal data, user task or activity, weather. This can be overwhelming for systems and algorithms designed to train in b… ▽ More

    Submitted 17 January, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

  17. Improving the performance of bagging ensembles for data streams through mini-batching

    Authors: Guilherme Cassales, Heitor Gomes, Albert Bifet, Bernhard Pfahringer, Hermes Senger

    Abstract: Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data mining, stream processing algorithms have additional requirements regarding computational resources and adaptability to data evolution. They must process instances in… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

    Journal ref: Information Sciences, Volume 580, 2021, Pages 260-282

  18. arXiv:2108.11923  [pdf, other

    cs.LG cs.AI

    Sketches for Time-Dependent Machine Learning

    Authors: Jesus Antonanzas, Marta Arias, Albert Bifet

    Abstract: Time series data can be subject to changes in the underlying process that generates them and, because of these changes, models built on old samples can become obsolete or perform poorly. In this work, we present a way to incorporate information about the current data distribution and its evolution across time into machine learning algorithms. Our solution is based on efficiently maintaining statis… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

  19. arXiv:2108.07403  [pdf, other

    cs.LG cs.AI

    FARF: A Fair and Adaptive Random Forests Classifier

    Authors: Wenbin Zhang, Albert Bifet, Xiangliang Zhang, Jeremy C. Weiss, Wolfgang Nejdl

    Abstract: As Artificial Intelligence (AI) is used in more applications, the need to consider and mitigate biases from the learned models has followed. Most works in developing fair learning algorithms focus on the offline setting. However, in many real-world applications data comes in an online fashion and needs to be processed on the fly. Moreover, in practical application, there is a trade-off between acc… ▽ More

    Submitted 21 August, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

  20. arXiv:2106.09170  [pdf, other

    cs.LG

    A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

    Authors: Heitor Murilo Gomes, Maciej Grzenda, Rodrigo Mello, Jesse Read, Minh Huong Le Nguyen, Albert Bifet

    Abstract: Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  21. arXiv:2104.01830  [pdf, other

    stat.ML cs.LG

    Model Compression for Dynamic Forecast Combination

    Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares, Albert Bifet

    Abstract: The predictive advantage of combining several different predictive models is widely accepted. Particularly in time series forecasting problems, this combination is often dynamic to cope with potential non-stationary sources of variation present in the data. Despite their superior predictive performance, ensemble methods entail two main limitations: high computational costs and lack of transparency… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

  22. arXiv:2103.09883  [pdf, other

    cs.LG

    A Survey on Spatio-temporal Data Analytics Systems

    Authors: Md Mahbub Alam, Luis Torgo, Albert Bifet

    Abstract: Due to the surge of spatio-temporal data volume, the popularity of location-based services and applications, and the importance of extracted knowledge from spatio-temporal data to solve a wide range of real-world problems, a plethora of research and development work has been done in the area of spatial and spatio-temporal data analytics in the past decade. The main goal of existing works was to de… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

  23. arXiv:2103.00903  [pdf, other

    cs.LG stat.ML

    STUDD: A Student-Teacher Method for Unsupervised Concept Drift Detection

    Authors: Vitor Cerqueira, Heitor Murilo Gomes, Albert Bifet, Luis Torgo

    Abstract: Concept drift detection is a crucial task in data stream evolving environments. Most of state of the art approaches designed to tackle this problem monitor the loss of predictive models. However, this approach falls short in many real-world scenarios, where the true labels are not readily available to compute the loss. In this context, there is increasing attention to approaches that perform conce… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: 23 pages, single column

  24. arXiv:2012.04740  [pdf, ps, other

    cs.LG cs.AI cs.MS

    River: machine learning for streaming data in Python

    Authors: Jacob Montiel, Max Halford, Saulo Martiello Mastelini, Geoffrey Bolmier, Raphael Sourty, Robin Vaysse, Adil Zouitine, Heitor Murilo Gomes, Jesse Read, Talel Abdessalem, Albert Bifet

    Abstract: River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning problems. It is the result from the merger of the two most popular packages for stream learning in Python: Creme and scikit-multiflow. River introduces a revamped a… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

    Comments: Submitted to JMLR MLOSS

    MSC Class: 68-04 ACM Class: I.2; I.2.5

  25. arXiv:2010.16045  [pdf, other

    cs.CR cs.LG

    Machine Learning (In) Security: A Stream of Problems

    Authors: Fabrício Ceschin, Marcus Botacin, Albert Bifet, Bernhard Pfahringer, Luiz S. Oliveira, Heitor Murilo Gomes, André Grégio

    Abstract: Machine Learning (ML) has been widely applied to cybersecurity and is considered state-of-the-art for solving many of the open issues in that field. However, it is very difficult to evaluate how good the produced solutions are, since the challenges faced in security may not appear in other areas. One of these challenges is the concept drift, which increases the existing arms race between attackers… ▽ More

    Submitted 4 September, 2023; v1 submitted 29 October, 2020; originally announced October 2020.

    Journal ref: Digital Threats 2023

  26. arXiv:2010.10935  [pdf, other

    cs.LG stat.ML

    An Eager Splitting Strategy for Online Decision Trees

    Authors: Chaitanya Manapragada, Heitor M Gomes, Mahsa Salehi, Albert Bifet, Geoffrey I Webb

    Abstract: Decision tree ensembles are widely used in practice. In this work, we study in ensemble settings the effectiveness of replacing the split strategy for the state-of-the-art online tree learner, Hoeffding Tree, with a rigorous but more eager splitting strategy that we had previously published as Hoeffding AnyTime Tree. Hoeffding AnyTime Tree (HATT), uses the Hoeffding Test to determine whether the c… ▽ More

    Submitted 31 July, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:2010.08199

  27. arXiv:2010.08199  [pdf, other

    cs.LG cs.AI

    Emergent and Unspecified Behaviors in Streaming Decision Trees

    Authors: Chaitanya Manapragada, Geoffrey I Webb, Mahsa Salehi, Albert Bifet

    Abstract: Hoeffding trees are the state-of-the-art methods in decision tree learning for evolving data streams. These very fast decision trees are used in many real applications where data is created in real-time due to their efficiency. In this work, we extricate explanations for why these streaming decision tree algorithms for stationary and nonstationary streams (HoeffdingTree and HoeffdingAdaptiveTree)… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  28. arXiv:2009.09677  [pdf, other

    cs.LG stat.ML

    CURIE: A Cellular Automaton for Concept Drift Detection

    Authors: Jesus L. Lobo, Javier Del Ser, Eneko Osaba, Albert Bifet, Francisco Herrera

    Abstract: Data stream mining extracts information from large quantities of data flowing fast and continuously (data streams). They are usually affected by changes in the data distribution, giving rise to a phenomenon referred to as concept drift. Thus, learning models must detect and adapt to such changes, so as to exhibit a good predictive performance after a drift has occurred. In this regard, the develop… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  29. arXiv:2007.01260  [pdf, other

    cs.DC

    S2CE: A Hybrid Cloud and Edge Orchestrator for Mining Exascale Distributed Streams

    Authors: Nicolas Kourtellis, Herodotos Herodotou, Maciej Grzenda, Piotr Wawrzyniak, Albert Bifet

    Abstract: The explosive increase in volume, velocity, variety, and veracity of data generated by distributed and heterogeneous nodes such as IoT and other devices, continuously challenge the state of art in big data processing platforms and mining techniques. Consequently, it reveals an urgent need to address the ever-growing gap between this expected exascale data generation and the extraction of insights… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

    Comments: 11 pages, 4 figures, 2 tables

    ACM Class: H.2.4

  30. arXiv:2005.07353  [pdf, ps, other

    cs.LG stat.ML

    Adaptive XGBoost for Evolving Data Streams

    Authors: Jacob Montiel, Rory Mitchell, Eibe Frank, Bernhard Pfahringer, Talel Abdessalem, Albert Bifet

    Abstract: Boosting is an ensemble method that combines base models in a sequential manner to achieve high predictive accuracy. A popular learning algorithm based on this ensemble method is eXtreme Gradient Boosting (XGB). We present an adaptation of XGB for classification of evolving data streams. In this setting, new data arrives over time and the relationship between the class and the features may change… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

    Comments: To be published in Proceedings of the International Joint Conference on Neural Networks (IJCNN) 2020, 8 pages

  31. arXiv:1911.07361  [pdf, other

    cs.LG stat.ML

    Rebalancing Learning on Evolving Data Streams

    Authors: Alessio Bernardo, Emanuele Della Valle, Albert Bifet

    Abstract: Nowadays, every device connected to the Internet generates an ever-growing stream of data (formally, unbounded). Machine Learning on unbounded data streams is a grand challenge due to its resource constraints. In fact, standard machine learning techniques are not able to deal with data whose statistics is subject to gradual or sudden changes without any warning. Massive Online Analysis (MOA) is th… ▽ More

    Submitted 17 November, 2019; originally announced November 2019.

  32. arXiv:1908.08019  [pdf, other

    cs.NE cs.AI cs.LG

    Spiking Neural Networks and Online Learning: An Overview and Perspectives

    Authors: Jesus L. Lobo, Javier Del Ser, Albert Bifet, Nikola Kasabov

    Abstract: Applications that generate huge amounts of data in the form of fast streams are becoming increasingly prevalent, being therefore necessary to learn in an online manner. These conditions usually impose memory and processing time restrictions, and they often turn into evolving environments where a change may affect the input data distribution. Such a change causes that predictive models trained over… ▽ More

    Submitted 23 July, 2019; originally announced August 2019.

  33. arXiv:1908.08018  [pdf, other

    cs.NE cs.AI cs.LG

    Exploiting a Stimuli Encoding Scheme of Spiking Neural Networks for Stream Learning

    Authors: Jesus L. Lobo, Izaskun Oregi, Albert Bifet, Javier Del Ser

    Abstract: Stream data processing has gained progressive momentum with the arriving of new stream applications and big data scenarios. One of the most promising techniques in stream learning is the Spiking Neural Network, and some of them use an interesting population encoding scheme to transform the incoming stimuli into spikes. This study sheds lights on the key issue of this encoding scheme, the Gaussian… ▽ More

    Submitted 23 July, 2019; originally announced August 2019.

  34. arXiv:1905.08848  [pdf, other

    cs.LG stat.ML

    Recurring Concept Meta-learning for Evolving Data Streams

    Authors: Robert Anderson, Yun Sing Koh, Gillian Dobbie, Albert Bifet

    Abstract: When concept drift is detected during classification in a data stream, a common remedy is to retrain a framework's classifier. However, this loses useful information if the classifier has learnt the current concept well, and this concept will recur again in the future. Some frameworks retain and reuse classifiers, but it can be time-consuming to select an appropriate classifier to reuse. These fra… ▽ More

    Submitted 21 May, 2019; originally announced May 2019.

  35. arXiv:1905.05881  [pdf, other

    cs.LG stat.ML

    Resource-aware Elastic Swap Random Forest for Evolving Data Streams

    Authors: Diego Marrón, Eduard Ayguadé, José Ramon Herrero, Albert Bifet

    Abstract: Continual learning based on data stream mining deals with ubiquitous sources of Big Data arriving at high-velocity and in real-time. Adaptive Random Forest ({\em ARF}) is a popular ensemble method used for continual learning due to its simplicity in combining adaptive leveraging bagging with fast random Hoeffding trees. While the default ARF size provides competitive accuracy, it is usually over-p… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

  36. arXiv:1810.10094  [pdf, ps, other

    cs.DS

    Novel Adaptive Algorithms for Estimating Betweenness, Coverage and k-path Centralities

    Authors: Mostafa Haghir Chehreghani, Albert Bifet, Talel Abdessalem

    Abstract: An important index widely used to analyze social and information networks is betweenness centrality. In this paper, first given a directed network $G$ and a vertex $r\in V(G)$, we present a novel adaptive algorithm for estimating betweenness score of $r$. Our algorithm first computes two subsets of the vertex set of $G$, called $\mathcal{RF}(r)$ and $\mathcal{RT}(r)$, that define the sample spaces… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  37. Scikit-Multiflow: A Multi-output Streaming Framework

    Authors: Jacob Montiel, Jesse Read, Albert Bifet, Talel Abdessalem

    Abstract: Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: 5 pages, Open Source Software

    Journal ref: Journal of Machine Learning Research, 2019, vol. 1, p. 2915-2914

  38. arXiv:1805.11477  [pdf, other

    cs.DC

    Large-Scale Learning from Data Streams with Apache SAMOA

    Authors: Nicolas Kourtellis, Gianmarco De Francisci Morales, Albert Bifet

    Abstract: Apache SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage, and analyze, due to the time and memory complexity. Apache SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine le… ▽ More

    Submitted 26 May, 2018; originally announced May 2018.

    Comments: 31 pages, 7 Tables, 16 Figures, 26 References. arXiv admin note: substantial text overlap with arXiv:1607.08325

  39. Bitcoin Volatility Forecasting with a Glimpse into Buy and Sell Orders

    Authors: Tian Guo, Albert Bifet, Nino Antulov-Fantulin

    Abstract: In this paper, we study the ability to make the short-term prediction of the exchange price fluctuations towards the United States dollar for the Bitcoin market. We use the data of realized volatility collected from one of the largest Bitcoin digital trading offices in 2016 and 2017 as well as order information. Experiments are performed to evaluate a variety of statistical and machine learning ap… ▽ More

    Submitted 6 February, 2019; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: Full version of the paper published at IEEE International Conference on Data Mining (ICDM), 2018

    Journal ref: 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 2018: 989-994

  40. Exact and Approximate Algorithms for Computing Betweenness Centrality in Directed Graphs

    Authors: Mostafa Haghir Chehreghani, Albert Bifet, Talel Abdessalem

    Abstract: Graphs (networks) are an important tool to model data in different domains. Real-world graphs are usually directed, where the edges have a direction and they are not symmetric. Betweenness centrality is an important index widely used to analyze networks. In this paper, first given a directed network $G$ and a vertex $r \in V(G)$, we propose an exact algorithm to compute betweenness score of $r$. O… ▽ More

    Submitted 28 October, 2021; v1 submitted 28 August, 2017; originally announced August 2017.

    Journal ref: Fundamenta Informaticae, Volume 182, Issue 3 (November 18, 2021) fi:8451

  41. arXiv:1704.07351  [pdf, ps, other

    cs.DS

    Metropolis-Hastings Algorithms for Estimating Betweenness Centrality in Large Networks

    Authors: Mostafa Haghir Chehreghani, Talel Abdessalem, and Albert Bifet

    Abstract: Betweenness centrality is an important index widely used in different domains such as social networks, traffic networks and the world wide web. However, even for mid-size networks that have only a few hundreds thousands vertices, it is computationally expensive to compute exact betweenness scores. Therefore in recent years, several approximate algorithms have been developed. In this paper, first g… ▽ More

    Submitted 3 May, 2017; v1 submitted 24 April, 2017; originally announced April 2017.

    Comments: 14 pages

  42. arXiv:1703.06227  [pdf, ps, other

    cs.DS cs.SI

    Discriminative Distance-Based Network Indices with Application to Link Prediction

    Authors: Mostafa Haghir Chehreghani, Albert Bifet, Talel Abdessalem

    Abstract: In large networks, using the length of shortest paths as the distance measure has shortcomings. A well-studied shortcoming is that extending it to disconnected graphs and directed graphs is controversial. The second shortcoming is that a huge number of vertices may have exactly the same score. The third shortcoming is that in many applications, the distance between two vertices not only depends on… ▽ More

    Submitted 31 March, 2018; v1 submitted 17 March, 2017; originally announced March 2017.

  43. arXiv:1607.08325  [pdf, other

    cs.DC cs.AI cs.DB

    VHT: Vertical Hoeffding Tree

    Authors: Nicolas Kourtellis, Gianmarco De Francisci Morales, Albert Bifet, Arinto Murdopo

    Abstract: IoT Big Data requires new machine learning methods able to scale to large size of data arriving at high speed. Decision trees are popular machine learning models since they are very effective, yet easy to interpret and visualize. In the literature, we can find distributed algorithms for learning decision trees, and also streaming algorithms, but not algorithms that combine both features. In this p… ▽ More

    Submitted 28 July, 2016; originally announced July 2016.

  44. arXiv:1511.00971  [pdf, other

    cs.LG cs.NE

    Data Stream Classification using Random Feature Functions and Novel Method Combinations

    Authors: Diego Marrón, Jesse Read, Albert Bifet, Nacho Navarro

    Abstract: Big Data streams are being generated in a faster, bigger, and more commonplace. In this scenario, Hoeffding Trees are an established method for classification. Several extensions exist, including high-performing ensemble setups such as online and leveraging bagging. Also, $k$-nearest neighbors is a popular choice, with most extensions dealing with the inherent performance limitations over a potent… ▽ More

    Submitted 3 November, 2015; originally announced November 2015.

    Comments: 20 pages, journal

  45. arXiv:1504.06366  [pdf, other

    cs.AI cs.LG

    Use of Ensembles of Fourier Spectra in Capturing Recurrent Concepts in Data Streams

    Authors: Sripirakas Sakthithasan, Russel Pears, Albert Bifet, Bernhard Pfahringer

    Abstract: In this research, we apply ensembles of Fourier encoded spectra to capture and mine recurring concepts in a data stream environment. Previous research showed that compact versions of Decision Trees can be obtained by applying the Discrete Fourier Transform to accurately capture recurrent concepts in a data stream. However, in highly volatile environments where new concepts emerge often, the approa… ▽ More

    Submitted 23 April, 2015; originally announced April 2015.

    Comments: This paper has been accepted for IJCNN 2015 conference, Ireland

  46. arXiv:1405.0546  [pdf, other

    cs.AI cs.CL cs.IR

    Kaggle LSHTC4 Winning Solution

    Authors: Antti Puurula, Jesse Read, Albert Bifet

    Abstract: Our winning submission to the 2014 Kaggle competition for Large Scale Hierarchical Text Classification (LSHTC) consists mostly of an ensemble of sparse generative models extending Multinomial Naive Bayes. The base-classifiers consist of hierarchically smoothed models combining document, label, and hierarchy level Multinomials, with feature pre-processing using variants of TF-IDF and BM25. Addition… ▽ More

    Submitted 9 May, 2014; v1 submitted 2 May, 2014; originally announced May 2014.

    Comments: Kaggle LSHTC winning solution description