Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Hacker News new | past | comments | ask | show | jobs | submit login

As someone who's worked in time series forecasting for a while, I haven't yet found a use case for these "time series" focused deep learning models.

On extremely high dimensional data (I worked at a credit card processor company doing fraud modeling), deep learning dominates, but there's simply no advantage in using a designated "time series" model that treats time differently than any other feature. We've tried most time series deep learning models that claim to be SoTA - N-BEATS, N-HiTS, every RNN variant that was popular pre-transformers, and they don't beat an MLP that just uses lagged values as features. I've talked to several others in the forecasting space and they've found the same result.

On mid-dimensional data, LightGBM/Xgboost is by far the best and generally performs at or better than any deep learning model, while requiring much less finetuning and a tiny fraction of the computation time.

And on low-dimensional data, (V)ARIMA/ETS/Factor models are still king, since without adequate data, the model needs to be structured with human intuition.

As a result I'm extremely skeptical of any of these claims about a generally high performing "time series" model. Training on time series gives a model very limited understanding of the fundamental structure of how the world works, unlike a language model, so the amount of generalization ability a model will gain is very limited.




Great write-up, thank you. Do you have rough measures for what constitutes high/mid/low- dimensional data? And how do you use XGBoost et al for multi-step forecasting, I.e. in scenarios where you want to predict multiple time steps in the future?


Because they're so cheap to train, you can just use n models if you want to predict n steps ahead.

In sklearn, if you have a single-output regressor, use this for ergonomics: https://scikit-learn.org/stable/modules/generated/sklearn.mu...

The added benefit is that you optimize each regressor towards its own target timestep t+1 ... t+n. A single loss on the aggregate of all timesteps is often problematic


There's been recent advances in joint fitting of multi-output regression forests ("vector leaf"): https://xgboost.readthedocs.io/en/stable/tutorials/multioutp...

In theory, this might suit the multi-step forecast use case.


I've found that it works well to add the prediction horizon as a numerical feature (e.g. # of days), and them replicate each row for many such horizons, while ensuring that all such rows go to the same training fold.


Thanks for this write up. Your comment clears up a lot of the confusion I've had around these time series transformers.

How does lagged features for an MLP compare to longer sequence lengths for attention in Transformers? Are you able to lag 128 time steps in a feed forward network and get good results?


I agree that the conventional (numeric) forecasting can hardly benefit from the newest approaches like transformers and LLMs. I made such a conclusion while working on the intelligent trading bot [0] by experimenting with many ML algorithms. Yet, there exist some cases where transformers might provide significant advantages. They could be useful where the (numeric) forecasting is augmented with discrete event analysis and where sequences of events are important. Another use case is where certain patterns are important like those detected in technical analysis. Yet, for these cases much more data is needed.

[0] https://github.com/asavinov/intelligent-trading-bot Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering


> I haven't yet found a use case for these "time series" focused deep learning models.

I guarantee you there will be chartists hawking GPT-powered market forecasts.


That is terrifying but inevitable. Lol, the back end will just be chatgpt API asking, which stock should I buy next?


What's terrifying about it?


Foundational models can work where so far „needs human intuition“ was the state of things. I can picture a time series model with large enough Training corpus being able to deal quite well with typical quirks of seasonalities, shocks, outliers, etc.

I fully agree regarding how things have been so far, but I’m excited to see practitioners try out models such as the one presented here — it might just work.


Reminds me a bit how in psychology you have ANOVA, MANOVA, ANCOVA, MANCOVA etc etc but really in the end we are just running regressions—variables are just variables.


Re: Causal inference [with observational data]: "The world needs computational social science" (2023) https://news.ycombinator.com/item?id=37746921


So fraud is consistent with respect to time?


My read on this was that you can just dump the lagged values as inputs and let the network figure it out just as well as the other, time series specific models do, not that time doesn't matter.


I assume the time series modelling is used to predict normal non-fraud behaviour. And then simpler algorithms are able to highlight deviations from the norm?


I agree! I worked on forecasting sales data for years, and we had the same results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: