Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
259 views

Time Series Analysis With Python

This document provides an introduction and overview of time series analysis using Python. It discusses installing necessary packages like Pandas, NumPy and Statsmodels. It outlines key concepts like dealing with dates in Pandas, common time series tools like linear regression and spectral analysis, and pre-processing techniques like de-trending and removing seasonality from time series data to make it stationary before forecasting. Autocorrelation and partial autocorrelation are also introduced as tools to help identify structures in time series data. Moving average and autoregressive processes are defined as common models for time series forecasting.

Uploaded by

pandey1987
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
259 views

Time Series Analysis With Python

This document provides an introduction and overview of time series analysis using Python. It discusses installing necessary packages like Pandas, NumPy and Statsmodels. It outlines key concepts like dealing with dates in Pandas, common time series tools like linear regression and spectral analysis, and pre-processing techniques like de-trending and removing seasonality from time series data to make it stationary before forecasting. Autocorrelation and partial autocorrelation are also introduced as tools to help identify structures in time series data. Moving average and autoregressive processes are defined as common models for time series forecasting.

Uploaded by

pandey1987
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

TIME SERIES ANALYSIS WITH

PYTHON

Aileen Nielsen
July, 13, 2016
aileen.a.nielsen@gmail.com
INSTALLATION INSTRUCTIONS

• Please install Conda per ‘quick install’ instructions:


http://conda.pydata.org/docs/install/quick.html
• Make sure you have the following packages installed:
• pandas
• numpy
• Statsmodels
• scikit-learn
• scipy

• These would be good to have but are not essential:


• pytz
• hmmlearn
OUTLINE

• Why time series?


• Quick Pandas intro
• Dealing with dates in Pandas
• Reading + manipulating time-stamped data
• Common time series analytical tools
• Prediction
• Classification
CAVEATS

• Time series analysis is a particularly tricky & controversial field


• I’ll give some background as we move ahead, but you need to
read more when you want to do a real analysis
• Tests for goodness of fit, etc, are particularly error prone in time
series analysis
• Whenever I don’t specify, but should, assume it’s iid normally
distributed (‘error’ terms)
WHAT’S SPECIAL ABOUT
TIME SERIES?
WHERE DO TIME SERIES POP UP?

• Many of the most controversial


questions arise from time series
analysis

• Whenever we want to know the


future, we’re pretty much stuck with
time series analysis

• Ditto for thinking about causality in


‘natural experiments’

http://www.amstat.org/publications/jse/v21n1/witt.pdf
http://www.forbes.com/sites/neilhowe/2015/05/28/whats-behind-the-decline-in-
crime/#3e1e6be07733
SPEECH RECOGNITION

http://www.amstat.org/publications/jse/v21n1/witt.pdf
http://www.forbes.com/sites/neilhowe/2015/05/28/whats-behind-the-decline-in-
crime/#3e1e6be07733
PHYSICS EXPERIMENTS
E CONOMICS , GOVE RNME NT, POLICY

http://freakonomics.com/2011/05/25/mining-for-correlations-it-works/
IN THE NEWS

http://www.forbes.com/sites/jasonevangelho/2016/07/10/pokemon-
go-about-to-surpass-twitter-in-daily-active-users/#5c27d4825174
BE C ARE FUL! IT’S E S PE CIALLY TRUE
FOR TIME S E RIE S THAT YOU NE E D
TO KNOW S OME THING ABOUT
YOUR DATA

http://twentytwowords.com/funny-graphs-show-correlation-
between-completely-unrelated-stats-9-pictures/
10 MINUTES TO PANDAS
PANDAS US E S ‘DATA FRAME S ’ TO
PUT DATA IN AN E AS Y-TO-US E
FORMAT

• All the convenience of a SQL-like API, but better


• Fast (in-place) if you know what you’re doing
• Fast read/write to standard storage formats like csv and
databases
• Let’s look at a quick notebook of examples
PANDAS DATA FRAME S ARE BUILT
FOR WHAT YOU WANT TO DO
ANYWAY

http://pandas.pydata.org/pandas-docs/stable/10min.html
DEALING WITH TIME
PANDAS FUNCTIONALITY

• Generate sequences of fixed-frequency dates and


time spans
• Conform or convert time series to a particular
frequency
• Compute ’relative’ dates based on various non-
standard time increments (e.g. 5 business days
before the last day of the year) or ’roll’ dates
backward and forward
http://pandas.pydata.org/pandas-docs/stable/timeseries.html
PANDAS TIME-RELATED DATA TYPES

http://pandas.pydata.org/pandas-docs/stable/timeseries.html
http://pandas.pydata.org/pandas-docs/stable/timeseries.html
http://pandas.pydata.org/pandas-docs/stable/timeseries.html
http://pandas.pydata.org/pandas-docs/stable/timeseries.html
DATEOFFSET COMPONENTS

http://pandas.pydata.org/pandas-docs/stable/timeseries.html
http://pandas.pydata.org/pandas-docs/stable/timeseries.html
http://pandas.pydata.org/pandas-docs/stable/timeseries.html
http://pandas.pydata.org/pandas-docs/stable/timeseries.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html
LAG FUNCTIONS
http://stackoverflow.com/questions/18300270/lag-in-time-series-regression-using-libsvm
http://www.spss-tutorials.com/spss-lag-function/
WINDOW FUNCTIONS
Rolling window Expanding window
Why use a rolling window function?

What’s a little funky here?


LINEAR REGRESSION
LINEAR REGRESSION INTUITION

http://datascience.stackexchange.com/questions/6787/is-decision-tree-algorithm-a-linear-or-nonlinear-algorithm
SPECTRAL ANALYSIS
INTUITION

• Decompose a time series into a sum of many many sine or cosine functions
• The coefficients of these functions should have uncorrelated values
• Regression on sinusoids

https://en.wikipedia.org/wiki/Fourier_transform
EXAMPLES

1. What are the advantages?


2. When would this provide useful information?
3. When would this *not* provide useful information?

https://faculty.washington.edu/dbp/PDFFILES/GHS-AP-Stat-talk.pdf
SPECTRA-B ASED FIT CAN BE
SURPRISINGLY GOOD
SPECTRAL ANALYSIS TURNS UP
EVERYWHERE

Astronomical data Paleo-climate proxy data Biology experiments


PRE-PREDICTION MUNGING
&
STATIONARITY
S TATIONARY TIME S E RIE S (a) Dow Jones index on 292
…NOT ALWAYS OBVIOUS consecutive days; (b) Daily
change in Dow Jones index on
292 consecutive days; (c)
Annual number of strikes in the
US; (d) Monthly sales of new
one-family houses sold in the US;
(e) Price of a dozen eggs in the
US (constant dollars); (f)
Monthly total of pigs slaughtered
in Victoria, Australia; (g) Annual
total of lynx trapped in the
McKenzie River district of north-
west Canada; (h) Monthly
Australian beer production; (i)
Monthly Australian electricity
production

Stationary: (b) & (g)

https://www.otexts.org/fpp/8/1
DIFFERENCING TO CREATE
STATIONARY TIME SERIES

https://www.otexts.org/fpp/8/1
YOU NE E D TO RE MOVE THE TRE ND
AND S E AS ONAL E LE ME NTS BE FORE
FORE C AS TING

• Most (interesting) data in the real world will show


• Trends
• Seasonality
• Most models require data that shows neither of these
properties to say something interesting
• In particular, we need a stationary time series
DE-TREND YOUR DATA

Use local smoothing or a linear regression


SEASONALITY

https://onlinecourses.science.psu.edu/stat510/node/57
REMOVE SEASONALITY

• Simplest: average de-trended values for specific season


• More common: use ‘loess’ method (‘locally weighted
scatterplot smoothing’)
• Window of specified width is placed over the data
• A weighted regression line or curve is fitted to the
data, with points closest to center of curve having
greatest weight
• Weighting is reduced on points farthest from
regression line/curve and calculation is rerun several
times.
• This yields one point on loess curve
• Helps reduce impact of outlier points
• Computationally taxing
DICKEY-FULLER TEST

• Tests the null hypothesis of whether a unit root is present in an autoregressive


model
• In plain English, tests whether ρ = 1 in

• The test gives back several values to help you assess significance with standard p-
value reasoning.
• Basic intuition: ρ should not have unit value
SELF-CORRELATION, SELF-
EXPLANATION, AND SELF-
PREDICTION
AUTOCORRELATION FUNCTION

• Used to help identify possible


structures of time series data
• Gives a sense of how different points
in time relate to each other in a way
explained by temporal distance

https://en.wikipedia.org/wiki/Autocorrelation
PARTIAL AUTOCORRELATION FUNCTION

• “gives the partial correlation of a time series with its


own lagged values, controlling for the values of time
series at all shorter lags”
• Why would this be useful?

https://en.wikipedia.org/wiki/Partial_autocorrelation_function &
https://www.mathworks.com/matlabcentral/answers/160904-interpretation-of-autocorrelation-function-to-
determine-number-of-lags-in-ar-model?requestedDomain=www.mathworks.com
FORECASTING
MOVING AVERAGE PROCESS
(MA)

• Defined as having the form:

Xt = µ + εt + θ1εt-1 + … + θqεt-q
• µ is the mean of the series, θ are parameters, θq not 0
• This is a stationary process regardless of values of θ
• Consider an MA(1) process (centered at 0):

Xt = εt + θ1εt-1
θ1 = +.5 θ1 = -.5

Time Series Analysis and Applications, Robert H. Shumway and David S. Stoffer
https://en.wikipedia.org/wiki/Moving-average_model
AUTOREGRESSIVE PROCESS
(AR)

• Defined as having the form:

Xt = φ1Xt-1 + … + φpXt-p + εt
• This is a stationary process if abs(φ) < 1
• Consider an AR(1) process:

Xt = φ1Xt-1 + εt

φ 1 = +.9 φ 1 = -.9

Time Series Analysis and Applications, Robert H. Shumway and David S. Stoffer
https://en.wikipedia.org/wiki/Moving-average_model
ARIMA MODEL (A.K.A. BOX-
JENKINS)

• AR = autoregressive terms
• I = differencing
• MA = moving average
• Hence specified as (autoregressive terms, differencing
terms, moving average terms)
ARIMA MODE: ‘THE MOST GENERAL
CLASS OF MODELS FOR FORECASTING
A TIME SERIES WHICH CAN BE
MADE TO BE STATIONARY

• Statistical properties (mean, variance) constant overt time


• ‘its short-term random time patterns always look the same in a statistical sense’
• Autocorrelation function & power spectrum remain constant over time
• Ok to do non-linear transformations to get there
• ARIMA model can be viewed as a combination of signal ad noise
• Extrapolate the signal to obtain forecasts

http://people.duke.edu/~rnau/411arim.htm
APPLYING THE APPROPRIATE
ARIMA MODEL

• Need to determine what ARIMA model to use


• Use plot of the data, the ACF, and the PACF
• With the plot of the data: look for trend (linear or
otherwise) & determine whether to transform data
• Most software will use a maximum likelihood estimation to
determine appropriate ARIMA parameters
ARIMA ANALYSIS TURNS UP
EVERYWHERE
LEARN MORE…

Vector auto-regression works similarly


for cases of multivariate time series

http://macromarketmusings.blogspot.com/2013/12/taking-model-to-data.html
CLUSTERING &
CLASSIFICATION
(YET ANOTHER ROUTE TO
PREDICTION)
NE E D TO THINK C ARE FULLY ABOUT
DIS TANCE ME TRIC

http://www.psb.ugent.be/cbd/papers/gentxwarper/DTWalgorithm.htm &
http://alexminnaar.com/time-series-classification-and-clustering-with-python.html
DYNAMIC TIME WARPING

http://www.psb.ugent.be/cbd/papers/gentxwarper/DTWalgorithm.htm &
http://alexminnaar.com/time-series-classification-and-clustering-with-python.html
APPLICATIONS

DTW-based clustering

DTW-based nearest neighbor classifcation


HIDDEN MARKOV MODELS
MULTI-STATE TIME-B ASED
SYSTEMS

Hidden Markov Models are another way of thinking


about time series classification.

https://en.wikipedia.org/wiki/Hidden_Markov_model
HOW IT WORKS

A transition matrix specifies the likelihood of


transitioning between states.

http://web.mit.edu/9.29/www/neville_jen/hmm/
GET
MORE
PRACTICE

You might also like