This document summarizes different methods for time series analysis and prediction in the deep learning era. It discusses classical autoregressive and Bayesian models, general machine learning approaches, and various deep learning techniques including DeepAR, Deep Ensembles, Deep State Space models, and combinations of deep neural networks with Gaussian processes. The document compares the pros and cons of each approach in terms of scalability, ability to share information across time series, handling cold starts with limited data, estimating predictive uncertainty, and dealing with unevenly spaced time series data.
Explainable AI (XAI) is becoming Must-Have NFR for most AI enabled product or solution deployments. Keen to know viewpoints and collaboration opportunities.
The document discusses steps for identifying and building ARIMA models for time series data. It describes ARIMA models as consisting of three components - identification, estimation, and diagnostic checking. For identification, it explains how to determine the p, d, and q values by examining the autocorrelation and partial autocorrelation functions of stationary differenced time series data. It then discusses using the method of moments to estimate ARIMA model parameters by equating sample statistics to population parameters.
Time Series Forecasting Project Presentation.Anupama Kate
Hello Folks, Anupama here, Presenting on behalf of my team for our internship project - Forecasting Gold Prices. for that, we use python and machine learning algorithms and models.
with Exploratory data analysis, modelling, model building, model evaluation, deployment, and publishing applications.
#machinelearning #datascience #forecasting #predection #timeseries #python #project
This document discusses forecasting gasoline prices in the United States using an ARIMA model. It provides background on gasoline, including its consumption and retail prices. The objective is to understand price volatility due to supply and demand constraints. Data on US gasoline prices from 1993-2014 is obtained from the EIA. After checking for stationarity and transforming the data, an ARIMA(1,1,3) model is identified as best. This model reveals gasoline prices are significantly related to past prices and unobserved factors. The validated model is used to forecast future gasoline prices.
Time series forecasting involves analyzing sequential data measured over time. A time series can be univariate (containing a single variable) or multivariate (containing multiple variables). It can also be continuous or discrete. Key components of time series include trends, cyclical variations, seasonal variations, and irregular variations. Time series analysis involves fitting a model to the data. Stationarity, where the statistical properties do not depend on time, is required for forecasting. Common forecasting models include ARMA, ARIMA, and SARIMA stochastic models as well as artificial neural networks and support vector machines. Each approach has strengths for modeling nonlinear relationships and generalizing to make predictions.
Automated machine learning (AutoML) systems can find the optimal machine learning algorithm and hyperparameters for a given dataset without human intervention. AutoML addresses the skills gap in data science by allowing data scientists to build more models in less time. On average, tuning hyperparameters results in a 5-10% improvement in accuracy over default parameters. However, the best parameters vary across problems. AutoML tools like Auto-sklearn use techniques like Bayesian optimization and meta-learning to efficiently search the hyperparameter space. Auto-sklearn has won several AutoML challenges due to its ability to effectively optimize over 100 hyperparameters.
Scipy 2011 Time Series Analysis in PythonWes McKinney
1) The document discusses statsmodels, a Python library for statistical modeling that implements standard statistical models. It includes tools for linear regression, descriptive statistics, statistical tests, time series analysis, and more.
2) The talk provides an overview of using statsmodels for time series analysis, including descriptive statistics, autoregressive moving average (ARMA) models, vector autoregression (VAR) models, and filtering tools.
3) The discussion highlights the development of statsmodels and the need for integrated statistical data structures and user interfaces to make Python more competitive with R for data analysis and statistics.
Time Series Classification with Deep Learning | Marco Del PraData Science Milan
Today there are a lot of data that are stored in the form of time series, and with the actual large diffusion of real-time applications many areas are strongly increasing their interest in applications based on this kind of data, like for example finance, advertising, marketing, health care, automated disease detection, biometrics, retail, and identification of anomalies of any kind. It is therefore very interesting to understand the role and potential of machine learning in this sector.
Many methods can be used for the classification of the time series, but all of them, apart from deep learning, require some kind of feature engineering as a separate stage before the classification is performed, and this can imply the loss of some important information and the increase of the development and test time. On the contrary, deep learning models such as recurrent and convolutional neural networks already incorporate this kind of feature engineering internally, optimizing it and eliminating the need to do it manually. Therefore they are able to extract information from the time series in a faster, more direct, and more complete way.
Bio:
Marco Del Pra
I am 41 years old, I was born in Venice, I have 2 master's degrees (Computer Science and Mathematics). I have been working for about 10 years in Artificial Intelligence, first as Data Scientist, then as Team Leader and finally as Head of Data. Among others, I worked for Microsoft, for the European Commission (JRC of Ispra) and for Cuebiq. I am currently working as a freelancer and I am creating with 2 other cofounders an innovative AI startup. I have 2 important publications in applied mathematics.
Topics: recurrent and convolutional neural networks, deep learning, time-series.
Usage of AI and machine learning models is likely to become more commonplace as larger swaths of the economy embrace automation and data-driven decision-making. While these predictive systems can be quite accurate, they have been treated as inscrutable black boxes in the past, that produce only numeric predictions with no accompanying explanations. Unfortunately, recent studies and recent events have drawn attention to mathematical and sociological flaws in prominent weak AI and ML systems, but practitioners usually don’t have the right tools to pry open machine learning black-boxes and debug them.
This presentation introduces several new approaches to that increase transparency, accountability, and trustworthiness in machine learning models. If you are a data scientist or analyst and you want to explain a machine learning model to your customers or managers (or if you have concerns about documentation, validation, or regulatory requirements), then this presentation is for you!
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Simplilearn
This Time Series Analysis (Part-2) in R presentation will help you understand what is ARIMA model, what is correlation & auto-correlation and you will alose see a use case implementation in which we forecast sales of air-tickets using ARIMA and at the end, we will also how to validate a model using Ljung-Box text. A time series is a sequence of data being recorded at specific time intervals. The past values are analyzed to forecast a future which is time-dependent. Compared to other forecast algorithms, with time series we deal with a single variable which is dependent on time. So, lets deep dive into this presentation and understand what is time series and how to implement time series using R.
Below topics are explained in this " Time Series in R presentation " -
1. Introduction to ARIMA model
2. Auto-correlation & partial auto-correlation
3. Use case - Forecast the sales of air-tickets using ARIMA
4. Model validating using Ljung-Box test
Become an expert in data analytics using the R programming language in this data science certification training course. You’ll master data exploration, data visualization, predictive analytics and descriptive analytics techniques with the R language. With this data science course, you’ll get hands-on practice on R CloudLab by implementing various real-life, industry-based projects in the domains of healthcare, retail, insurance, finance, airlines, music industry, and unemployment.
Why learn Data Science with R?
1. This course forms an ideal package for aspiring data analysts aspiring to build a successful career in analytics/data science. By the end of this training, participants will acquire a 360-degree overview of business analytics and R by mastering concepts like data exploration, data visualization, predictive analytics, etc
2. According to marketsandmarkets.com, the advanced analytics market will be worth $29.53 Billion by 2019
3. Wired.com points to a report by Glassdoor that the average salary of a data scientist is $118,709
4. Randstad reports that pay hikes in the analytics industry are 50% higher than IT
The Data Science with R is recommended for:
1. IT professionals looking for a career switch into data science and analytics
2. Software developers looking for a career switch into data science and analytics
3. Professionals working in data and business analytics
4. Graduates looking to build a career in analytics and data science
5. Anyone with a genuine interest in the data science field
6. Experienced professionals who would like to harness data science in their fields
Learn more at: https://www.simplilearn.com/
Predictive Analytics - Big Data & Artificial IntelligenceManish Jain
Quick overview of the latest in big data and artificial intelligence. A lot of buzzwords being thrown around, hopefully this presentation will demystify many of the terms.
This document discusses anomaly detection techniques for intrusion detection systems. It begins by defining anomalies and explaining the principles of anomaly detection models. It then describes some key challenges in anomaly detection and different types of outputs it can provide. The document proceeds to classify anomaly detection techniques into statistical, machine learning and data mining based methods. As examples, it examines several case studies of early statistical anomaly detection systems like Haystack and IDES.
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
This document provides an overview of key mathematical concepts relevant to machine learning, including linear algebra (vectors, matrices, tensors), linear models and hyperplanes, dot and outer products, probability and statistics (distributions, samples vs populations), and resampling methods. It also discusses solving systems of linear equations and the statistical analysis of training data distributions.
The document discusses analyzing multivariate time series of five energy futures (crude oil, ethanol, gasoline, heating oil, natural gas) using vector autoregressive (VAR) and vector error correction (VEC) models. It finds the futures are cointegrated using Johansen and Engle-Granger tests, indicating they share a common stochastic trend. A VAR(1) model is estimated and found stable. The VEC model captures the error correction behavior as futures return to their long-run equilibrium. Forecasts are generated and limitations of the Engle-Granger approach discussed.
Introduction to Recurrent Neural NetworkKnoldus Inc.
The document provides an introduction to recurrent neural networks (RNNs). It discusses how RNNs differ from feedforward neural networks in that they have internal memory and can use their output from the previous time step as input. This allows RNNs to process sequential data like time series. The document outlines some common RNN types and explains the vanishing gradient problem that can occur in RNNs due to multiplication of small gradient values over many time steps. It discusses solutions to this problem like LSTMs and techniques like weight initialization and gradient clipping.
ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models were based on a description of trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
The explosion of sensors in all types of devices from “smart” consumer wearables and appliances to complex machines on manufacturing floors has given rise to a requirement to quickly analyze vast quantities of sensor metrics to provide meaningful insights. From exploratory to predictive analytics, analyzing time-series data is essential to address inefficiencies, identify risks and improve operations.
In this presentation, we will see how you can conduct exploratory analytics of time-series data rapidly to gain insights into the performance of the machines being monitored. We will talk about how to look at data from multiple metrics together in a holistic way to hone in on anomalies and identify potential problems. Finally, we will cover algorithms and techniques to predict future trends for time-series metrics. Along the way, we will discuss useful tools and technologies to perform time-series data analysis in minutes.
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
Data Streams are sequential set of data records. When data appears at highest speed and constantly, so predicting
the class accordingly to the time is very essential. Currently Ensemble modeling techniques are growing
speedily in Classification of Data Stream. Ensemble learning will be accepted since its benefit to manage huge
amount of data stream, means it will manage the data in a large size and also it will be able to manage concept
drifting. Prior learning, mostly focused on accuracy of ensemble model, prediction efficiency has not considered
much since existing ensemble model predicts in linear time, which is enough for small applications and
accessible models workings on integrating some of the classifier. Although real time application has huge
amount of data stream so we required base classifier to recognize dissimilar model and make a high grade
ensemble model. To fix these challenges we developed Ensemble tree which is height balanced tree indexing
structure of base classifier for quick prediction on data streams by ensemble modeling techniques. Ensemble
Tree manages ensembles as geodatabases and it utilizes R tree similar to structure to achieve sub linear time
complexity
Time Series Analysis… using an Event Streaming Platformconfluent
Time Series Analysis… using an Event Streaming Platform, Mirko Kämpf, Solutions Architect, Confluent
Meetup Link: https://www.meetup.com/Apache-Kafka-Germany-Munich/events/272827528/
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank
Watch the recording of the speech done at Spark Summit Brussles 2016 here:
https://www.youtube.com/watch?v=wyfTjd9z1sY
Data Science with SparkML on DataBricks is a perfect platform for application of Ensemble Learning on massive a scale. This presentation describes Prediction-as-a-Service platform which can predict trends on 1 billion observed prices daily. In order to train ensemble model on a multivariate time series in thousands/millions dimensional space, one has to fragment the whole space into subspaces which exhibit a significant similarity. In order to achieve this, the vastly sparse space has to undergo dimensionality reduction into a parameters space which then is used to cluster the observations. The data in the resulting clusters is modeled in parallel using machine learning tools capable of coefficient estimation at the massive scale (SparkML and Scikit Learn). The estimated model coefficients are stored in a database to be used when executing predictions on demand via a web service. This approach enables training models fast enough to complete the task within a couple of hours, allowing daily or even real time updates of the coefficients. The above machine learning framework is used to predict the airfares used as support tool for the airline Revenue Management systems.
Automated machine learning lectures given at the Advanced Course on Data Science & Machine Learning. AutoML, hyperparameter optimization, Bayesian optimization, Neural Architecture Search, Meta-learning, MAML
Time Series Anomaly Detection with .net and AzureMarco Parenzan
If you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
Marius Eriksen discusses Reflow, a new cloud-native workflow framework for bioinformatics. Reflow programs workflows directly using a functional programming language for simplicity and composability. It leverages lazy evaluation and caching to efficiently parallelize and distribute work across private clusters. Reflow aims to untie the hands of implementors compared to traditional workflow systems through its unified approach to programming, execution, and infrastructure.
The document describes developing a model to predict house prices using deep learning techniques. It proposes using a dataset with house features without labels and applying regression algorithms like K-nearest neighbors, support vector machine, and artificial neural networks. The models are trained and tested on split data, with the artificial neural network achieving the lowest mean absolute percentage error of 18.3%, indicating it is the most accurate model for predicting house prices based on the data.
ML on Big Data: Real-Time Analysis on Time SeriesSigmoid
This document discusses building a machine learning model for real-time time series analysis on big data. It describes using Spark and Kafka to ingest streaming sensor data and train a model to identify patterns and predict failures. The training phase identifies concepts in historical data to build a knowledge base. In real-time, incoming data is processed in microbatches to identify patterns and sequences matching the concepts, triggering alerts. Challenges addressed include handling large volumes of small files and sharing data between batches for signals spanning multiple batches.
The document is a report on using artificial neural networks (ANNs) to predict stock market returns. It discusses how ANNs have been applied to problems like stock exchange index prediction. It also discusses support vector machines (SVMs), a supervised learning method that can perform linear and non-linear classification. SVMs have been used for stock market prediction by analyzing training data to build a model that assigns categories or predicts values for new data points. The report includes code screenshots showing the import of libraries for SVM regression and plotting the predicted versus actual prices.
This tutorial provides an overview of recent advances in deep generative models. It will cover three types of generative models: Markov models, latent variable models, and implicit models. The tutorial aims to give attendees a full understanding of the latest developments in generative modeling and how these models can be applied to high-dimensional data. Several challenges and open questions in the field will also be discussed. The tutorial is intended for the 2017 conference of the International Society for Bayesian Analysis.
Josh Patterson, Principal at Patterson Consulting: Introduction to Parallel Iterative Machine Learning Algorithms on Hadoop’s NextGeneration YARN Framework
The anatomy of a neural network consists of layers, input data and targets, a loss function, and an optimizer. Layers are the building blocks and include dense, RNN, CNN, and more. Keras is a user-friendly deep learning framework that allows easy construction of neural networks by stacking layers. It supports TensorFlow as a backend and offers pre-trained models, GPU acceleration, and integration with data libraries. To set up a deep learning workstation, software like TensorFlow, Keras, and CUDA must be installed along with a GPU. The hypothesis space refers to all possible models considered by an algorithm. Loss functions measure prediction error while optimizers adjust parameters to minimize loss and improve accuracy. Common examples are described.
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
FireWorks is a workflow management system that allows researchers to define and execute complex computational materials science workflows on local or remote computing resources in an automated manner. It provides features such as error detection and recovery, job scheduling, provenance tracking, and remote file access. The atomate library builds on FireWorks to provide a high-level interface for common materials simulation procedures like structure optimization, band structure calculation, and property prediction using popular codes like VASP. Together, these tools aim to make high-throughput computational materials discovery and design more accessible to researchers.
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...Editor IJCATR
Time series forecasting is important because it can often provide the foundation for decision making in a large variety of fields. A tree-ensemble method, referred to as time series forest (TSF), is proposed for time series classification. The approach is based on the concept of data series envelopes and essential attributes generated by a multilayer neural network... These claims are further investigated by applying statistical tests. With the results presented in this article and results from related investigations that are considered as well, we want to support practitioners or scholars in answering the following question: Which measure should be looked at first if accuracy is the most important criterion, if an application is time-critical, or if a compromise is needed? In this paper demonstrated feature extraction by novel method can improvement in time series data forecasting process
Why You Need Real-Time Data to Compete in E-CommercePromptCloud
In the fast-paced world of e-commerce, real-time data is crucial for staying competitive. By accessing up-to-date information on market trends, competitor pricing, and customer preferences, businesses can make informed decisions quickly. Real-time data enables dynamic pricing strategies, effective inventory management, and personalized marketing efforts, all of which are essential for meeting customer demands and outperforming competitors. Embrace real-time data to stay agile, optimize your operations, and drive growth in the ever-evolving e-commerce landscape. Get in touch for custom web scraping services: https://bit.ly/3WkqYVm
Introduction to Data Science
1.1 What is Data Science, importance of data science,
1.2 Big data and data Science, the current Scenario,
1.3 Industry Perspective Types of Data: Structured vs. Unstructured Data,
1.4 Quantitative vs. Categorical Data,
1.5 Big Data vs. Little Data, Data science process
1.6 Role of Data Scientist
Hadoop Vs Snowflake Blog PDF Submission.pptxdewsharon760
Explore the key differences between Hadoop and Snowflake. Understand their unique features, use cases, and how to choose the right data platform for your needs.
3. Time series applications + context
Time series prediction: e.g.
demand/sales forecasting...
Use prediction for anomaly
detection: e.g. manufacturing
settings...
Counterfactual prediction:
e.g. marketing campaigns...
Show ads
Counterfactual
4. Time series applications + context
Time series prediction: e.g.
demand/sales forecasting...
Use prediction for anomaly
detection: e.g. manufacturing
settings...
Counterfactual prediction:
e.g. marketing campaigns...
Show ads
Counterfactual
5. Time series prediction methods
(non-comprehensive list)
Classical autoregressive models Bayesian AR models
General machine learning
approaches
Deep learning
t+3
6. Number of time series (~ thousands)
[the SCALE problem]
Time series are often highly erratic,
intermittent or bursty (...and on highly
different scales)
~ 10 items
2 items
Product A Product B
...
(1)
(2)
Time series prediction and sales forecasting: issues
E.g. retail businesses
7. Time series belong to a hierarchy
of products/categories
E.g. online retailer selling clothes
Time series prediction and sales forecasting: issues
Now
Nike t-shirts
Clothes (total sales)
T-shirts total sales
~ 100
~ 1000(3)
For new products historical data is
missing (the cold-start problem)
(4)
Adidas t-shirts
8. Classical autoregressive models
Estimate model order (AIC, BIC)
Fit model parameters
(maximum likelihood)
Autoregressive component
Moving average component
Test residuals for
randomness
De-trending by differencing
Variance stabilization by log
or Box-Cox transformation
Workflow
9. Classical autoregressive models
THE PROS:
- Good explainability
- Solid theoretical background
- Very explicit model
- A lot of control as it is a manual process
THE CONS:
- Data is seldom stationary: trend,
seasonality, cycles need to modeled as
well
- Computationally intensive (one model for
each time series)
- No information sharing across time series
(apart from Hyndman’s hts approach) *
- Historical data are essential for
forecasting, (no cold-start)
* https://robjhyndman.com/publications/hierarchical/
Tech stack and packages
- Rob Hyndman’s online text:
https://otexts.com/fpp2/
- Infamous auto.arima
package, ets, tbats, garch,
stl...
- Python’s Pyramid
10. - Aggregate histograms over time scales
- Transform into Fourier space
- Add low/high pass filters as variables
General machine learning approach for ts prediction
Past Yt
t
Autoregressive component
- Can use any number of methods (linear, trees,
neural networks...)
- Turn the time series prediction problem into a
supervised learning problem
- Easily extendable to support multiple input
variables
- Covariates can be easily handled and
transformed through feature engineering
Covariates
E.g. feature engineering
11. THE PROS:
- Can model non-linear relationships
- Can model the “hierarchical structure” of the
time series through categorical variables
- Support for covariates (predictors) + feature
engineering
- One model is shared among multiple time
series
- Cold-start predictions are possible by
iteratively feeding the predictions back to the
feature space
THE CONS:
- Feature engineering takes time
- Long-term relationships between data points
need to be explicitly modeled
(autoregressive features)
General machine learning approach for ts prediction
Tech stack and packages
- Sklearn, PySpark for feature
engineering, data reduction
12. Bayesian AR models (Facebook Prophet)
Prophet is a Bayesian GAM (Generalized Additive Model)
Linear trend with
changepoints
Seasonal
component
Holiday-specific
componentt
Sales
1) Detect changepoints in the time
series
2) Fit linear trend parameters (k and
delta)
(piecewise) linear
trends
Growth rate Growth rate
adjustment
**
** An additional ‘offset’ term has been omitted from the formula
* Implemented using STAN
*
13. Bayesian AR models (Facebook Prophet)
E.g. P = 365 for yearly data
Need to estimate 2N parameters (an
and bn
) using MCMC!
Prophet is a Bayesian GAM (Generalized Additive Model)
Linear trend with
changepoints
Seasonal
component
Holiday-specific
componentt
Sales
14. THE PROS:
- Uncertainty estimation
- Bayesian changepoint detection
- User-in-the-loop paradigm (Prophet)
- Black-box variational inference is
revolutionizing Bayesian inference
THE CONS:
- Bayesian inference takes time (the “scale”
issue)
- One model for each time series
- No information sharing among series
(unless you specify a hierarchical bayesian
model with shared parameters, but still...)
- Historical data are needed for prediction!
- Performance is often on par* with
autoregressive models
Tech stack and packages
- Python/R clients for Prophet *
- R package for structural bayesian
time series models: Bsts
Bayesian AR models
* Taylor et al., Forecasting at scale* This may open endless discussions. Bottom line: depends on your data :)
15. Interlude: uncertainty estimation with deep learning
- Uncertainty estimation is a prerogative of Bayesian methods.
- Black box variational inference (ADVI) has sprung renewed interest towards Bayesian
neural networks, but we are not there yet in terms of performance
- A DeepMind paper from NIPS 2017 introduces a simple yet effective way to estimate
predictive uncertainty using Deep Ensembles
For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html
“Engineering Uncertainty
Estimation in Neural Networks for
Time Series Prediction at Uber”
https://eng.uber.com/neural-network
s-uncertainty-estimation/
1) 2)
16. Interlude: Deep Ensembles
Train a deep learning model using a custom
final layer which parametrizes a Gaussian
distribution
Sample x from the Gaussian
distribution using fitted
parameters
Calculate loss to backpropagate the
error (using Gaussian likelihood)
(1)
(3)
(2)
Network output
17. What the network is learning: different
regions of the x space have different
variances
Generate a synthetic
dataset with different
variances
Interlude: Deep Ensembles
PREDICTION ON
TRAINING DATASET
SYNTHETIC TRAINING
DATASET
Use the network from previous
slide to predict on the training
set to see if it actually detects
variance reduction
18. Interlude: Deep Ensembles
The authors suggest to train different NNs on the
same data (the whole training set) with random
initialization
Ensemble networks (improve generalization power)
Uniformly weighted mixture model
Predictions for regions outside of
the training dataset show
increasing variance (due to
ensembling)
In addition to ‘distribution’ modeling
and ensembling the authors suggest to
use the fast gradient sign method * to
produce adversarial training example
(Not shown here)
* Goodfellow et al., 2014
19. Interlude: Deep Ensembles
Custom GaussianLayer
Let’s just do some extra work and define a
custom layer
For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html
21. DeepAR (Amazon)
Instead of fitting separate models for each time series we create a global model from related time
series to handle widely-varying scales through rescaling and velocity-based sampling.
Differentscales
Probabilities
~1000 time series
Past Future
Covariates
Flunkert et al., 2017
22. DeepAR (Amazon)
ht-1
ht
ht+1
- Use LSTM interactions in the time series
- As seen with the Deep Ensemble
architecture, we can predict parameters of
distributions at each time point (theta
vector)
- Time series need to be scaled for the
network to learn time-varying dynamics
24. For a commentary + code review: https://arrigonialberto86.github.io/funtime/deepar.html
DeepAR (Amazon)
The mandatory ‘AirPassengers’ prediction example (results shown on training set)
It is given that this is not the use case Amazon had in mind...
25. DeepAR (Amazon)
- Long-term relationships are handled by
design using LSTMs
- One model is fitted for all the time series
- The hierarchical ts structure and
inter-dependencies are captured by
using covariates (even holidays,
recurrent events etc...)
- The model can be used for cold-start
predictions (using categorical covariates
with ‘descriptive’ product information)
- Hassle-free uncertainty estimation
DeepAR and the AWS ecosystem
AWS SageMaker
26. Deep State Space (NIPS 2018)*
A state space model or SSM is just like an Hidden Markov Model, except the hidden states are
continuous
Observation (zt
)
update
Latent state (lt
)
update
In normal settings we would need to fit these parameters for each time series
zt-1 zt
zt+1
???
* Rangapuram et al, 2018, Deep State Space Models for Time Series Forecasting
27. Deep State Space (NIPS 2018)
Training
Prediction
Compute the negative
likelihood, derive the
time-varying SS
parameters using
backpropagation
Use Kalman filtering to
estimate lt
, then
recursively apply the
transition equation and the
observation model to
generate prediction
samples
28. - Long-term relationships are handled by
design using LSTMs
- One model is fitted for all the time
series
- The hierarchical ts structure and
inter-dependencies are captured by
ad-hoc design and components of the SS
model (even holidays, recurrent events
etc...)
- The model can be used for cold-start
predictions (using categorical covariates
with ‘descriptive’ product information)
Deep State Space (NIPS 2018)
29. Going forward: Deep factors with GPs *
* Maddix et al., “Deep Factors with Gaussian Processes for Forecasting”, NIPS 2018
The combination of probabilistic graphical models with deep neural networks has been an active
research area recently
Global DNN backbone and local Gaussian Process (GP). The main idea is to represent each
time series as a combination of a global time series and a corresponding local model.
gt
gt
gt
gt
RNN
zit
+ covariates Backpropagation to find RNN
parameters to produce global factors (gt
)
+ GP hyperparameters
30. M4 forecasting competition winner algo (Uber, 2018)
The winning idea is often the simplest!
Hybrid Exponential Smoothing-Recurrent Neural Networks (ES-RNN) method. It
mixes hand-coded parts like ES formulas with a black-box recurrent neural network
(RNN) forecasting engine.
yt-1
yt
yt+1
Deseasonalized and normalized vector of covariates + previous state
RNN results are now part of a parametric model
34. DeepAR (Amazon)
Step 1 Step 2 Step 3
Training procedure:
- Predict parameters (e.g. mu,
sigma)
- Compute likelihood of the
prediction (can be Gaussian as we
have seen with Deep Ensembles)
*
- Sample next point
* Likelihood/loss is customizable: Gaussian/negative
binomial for count data + overdispersion
Training
Prediction (~ Monte Carlo)