Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Python vs R for time-series forecasting

Valeriy Manokhin, PhD, MBA, CQF
7 min readNov 28, 2021

TL;DR when it comes to forecasting and time series in 2022, Python is the no-brainer lingua franca choice for forecasting R&D and undoubtedly even more so for putting forecasting models into production

A post from academia (one of the leading academic applied forecasting centres in the UK) recently asked the forecasting community: "Those working with time series in Python: what frustrations do you have?"

Woot? It is 2022 not 2012, one has to respond specifically on time series without getting distracted by yet another general Python vs R debate.

There seems to be a lot of unhealthy fixation on R in academia — especially in applied forecasting academia. This article is not about general Python vs R debate, but is specific to time series and forecasting. After all for some things like biostatistics R is still the go-to lingua franca and R still has its righful place under the sun. At least under statistical sun if not under machine learning sun.

Yet there is a lot of prevalent dogma and confusion with regard to the current state of Python 🐍 for time series and forecasting, often by promoted by ‘one trick R ponies.’ This fixation is not surprising given R's academic roots, a strong focus on statistics and inability of some applied forecasting academics to learn new and better tools. But yet, this fixation completely misses the point — when it comes to time series and forecasting whether it is in academia, in the R&D for industry or elsewhere Python already rules supreme and this is only going to accelerate further.

Yes, R has a lot of time series libraries — a lot of good work was done a while back by Rob Hyndman, including creating automatic ARIMA to make ARIMA available to anyone at the click of a button. For many years (about decade or so) auto ARIMA in R was the package to go to and that in itself created the false perception that R is the dominant language for time-series and forecasting. A few of good books about time series and forecasting were published at the time with R code. For example books by Prof. Ruey Tsay and Prof. Eric Zivot. Eric Zivot wrote a nice book about S for time series, but never finished the book about R for time series (an omen of things to come for R when it comes to forecasting)?

When this article was originally written in late 2021, the most commonly used forecasting tools were already available in Python 🐍, including automatic ARIMA and many many more. The decline of R dominance for time series has started with the release of auto ARIMA in Python in 2017.

Roll forward to 2022, a newcomer open source forecasting startup Nixtla came to the forecasting scene en force, rapidly releasing innovation after innovation. Nixtla team has cleverly identified the opportunity in terms of tools that many companies using forecasting at scale needed. Nixtla rebuilt and accelerated many classical forecasting libraries, fixing some bugs like inability of statsmodels to process outliers correctly. It has also made forecasting models of industrial grade that are very fast.

Here are just some of the releases in Nixtla in 2022:

Recently, Nixtla launched the fastest version of AutoARIMA for python using Numba. Now you can scale your computation horizontally and build different benchmarks for millions of time series leveraging the power of Ray clusters.

Blazing fast super AutoArima. Python 🐍 only

Now, I don’t know about you but I have been doing forecasting for quite a while starting with econometrics before machine learning came to the scene. Blazing fast AutoArima is a literal must for any company doing large-scale forecasting. And when the open-source package can forecast 1 million time series in 30 minutes, data scientists should take notice. How do you counter that good old R? Both T AutoARIMA and pmdarima in contrast are very slow.

Not only Nixtla AutoArima is super fast, it leaves one of the most popular (but largely non-performant) Facebook Prophet literally in the dust. In a large-scale study, the Nixtla team (mind it these are not your usual developers, these are the guys who also do cutting-edge R&D and publish in top machine learning journals as well and is the same team that has created powerfull model N-HiTs) Nixtla AutoARIMA outperformed Facebook Propher by around 20% in terms of performance with 37x less computational time resulting in a massive decrease in computational cost from $296 for the prophet to $10 for Nixtla AutoARIMA.

Bye-bye prophet: ARIMA is 17% more accurate and 37x faster than Facebook prophet.
Dropin replacement for prophet, turbocharge your forecasts and get higher profits

So we have a time-proven Nobel prize grade winning model (ARIMA) that has both excellent performance and is blazing fast. In Python 🐍 only that is. Still want to stick to R with its slow AutoArima? Keep reading.

Another recent release from the same team — Nixtla launched the fastest version of AutoARIMA for python using Numba. Now you can scale your computation horizontally and build different benchmarks for millions of time series leveraging the power of Ray clusters.

Another innovation from the Nixtla team is the recent launch of transfer learning low latency API for time series forecasting. The API allows anyone to compute forecasts in milliseconds ⏲️ using pre-trained Deep Learning models, saving the costs associated with training time. This is the first time any researcher, company or developer can get accurate forecasts through their commonly used stack (Java Script, Ruby, PHP, etc). This is literally a ‘killer app’ feature from any company perspective as most of the companies do not have a lot of data and to have pre-trained models is a literal goldmine as if anything this could serve as extremely fast and good benchmark in the forecasting R&D journey.

Blazing fast transfer learning with deep learning N-HiTs trained on ‘M’ forecasting competitions data

As they say all good things come in three (the Latin principle known as omne trium perfectum or, translated into English, the rule of three. Confucius mentioned the rule of three in 500 B.C. in “Analects” when he wrote: “Ko Wan Tze thought thrice before acting.”

So the third innovaiton from the Nixtla team was the state of the art in probabilistic forecasting!

But this article is not about one library even though it is amazing what Nixtla team has achieved in a very short space of time (gracias amigos).

You can check the full breadth and depth of Nixtla offering for yourself here, in addition to things mentioned before it has a lot of classical algorithms Statistical ⚡️ Forecast, machine learnling MLForecast based on LightGBM and deep learning models NeuralForecast and even Python port of Hyndman’s popular tsfeatures library.

Nixtla library for Open Source Time Series Forecasting

This is only ONE library in Python, to be fair to many other great libraries that are developing quickly there are also Darts, GluonTS, FEDOT, Greykite, ETNA and many more.

Back to R

It isn't easy to find packages in R. When did one last see article on 'Total Data Science' written about R for time series or conference talking about machine learning forecasting in R— a sure way to lose a machine learning or a business audience.

And now with the best and greatest packages from R such as AutoARIMA being available in Python at blazing fast speed (R is slow….) and can easily be put into production. What could be the only conclusion?

Python is by far the dominant 𝒍𝒊𝒏𝒈𝒖𝒂 𝒇𝒓𝒂𝒏𝒄𝒂 for business AND and machine learning. Key machine learning and deep learning libraries are written almost exclusively in Python 🐍 and Cython (Python dialect used to speed up computations).

New cutting edge time series packages (especially machine learning) are almost exclusively released in Python. Most of modern time series forecasting books are being published with Python code. Python moved leaps and bounds during the last 5+ years in terms of developments for time series and forecasting.

R contains a lot of time-series functionality but is primarily focused on classical forecasting models. On the other side, most of the innovation for time series has predominantly been happening in the machine learning space.

Machine learning algorithms based on boosted trees such as LightGBM dominated most Kaggle forecasting competitions for over a decade.

Conclusion:

TL;DR when it comes to forecasting and time series in 2022, Python is the no-brainer choice for both forecasting R&D and undoubtedly even more so for putting forecasting models into production

Time series classification, forecasting using classical, machine learning and deep learning methods, anomaly detection, clustering and much more.

References:

  1. https://github.com/Nixtla/statsforecast
  2. https://github.com/Nixtla/transfer-learning-time-series
  3. https://github.com/Nixtla/neuralforecast
  4. https://github.com/Nixtla/statsforecast/tree/main/experiments/arima_prophet_adapter
  5. https://github.com/Nixtla/neuralforecast/blob/main/examples/mqnhits.ipynb
  6. https://fedot.readthedocs.io/en/latest/
  7. https://github.com/awslabs/gluon-ts
  8. https://github.com/kamilest/conformal-rnn

--

--

Valeriy Manokhin, PhD, MBA, CQF

Principal Data Scientist, PhD in Machine Learning, creator of Awesome Conformal Prediction 👍Tip: hold down the Clap icon for up x50