Data for building global time series forecast model with deep learning

Yeonjoo_Jung · January 19, 2022, 4:30pm

I enjoyed yesterday’s webinar on time series forecasting with Ray.

One question that I didn’t have time to ask re: building global deep learning model.

We have 10K targets for forecasting but the data granularity is monthly. We have 10 years of historical data, so each time series will have 120 rows. Is it still a good data to train as global model as panel data set format with deep learning? The success criteria is whether the global model generally gives us more accurate results than the statistical, univariate models.

Or should we consider lowering the data frequency to daily at least? I’d like to try with monthly data set; but like to see if someone has tried it for monthly forecasting and observed the model still performs well.

reference article:

christy · January 19, 2022, 8:07pm

Hi Yeonjoo,

Thank you, glad you enjoyed yesterday’s webinar! hmm, the problem I’ve seen in the past with monthly sales data, even if 10 years history, the majority of items will not have been sold continuously for 10 years. I don’t know if that is your case?

One way to make a small experiment - subset your data to just those items with steady, long historical sales data. Another tip, especially with sales data, make sure you encode any missing data as “missing” not 0 ! The easiest is just to leave out missing timestamps entirely from the data. In my github.com/christy code that I demo’d, that is covered by the allow_missing_timesteps=True part.

# convert pandas to PyTorch tensor
training_data = ptf.data.TimeSeriesDataSet(
    the_df[lambda x: x.time_idx <= training_cutoff],
    allow_missing_timesteps=True,

I’d suggest try this with your monthly data first. If you don’t see good results with a DL global model, next try weekly-aggregated. The smaller your data granularity gets, the more chance the data will be too sparse to be useful. From your question, I’m guessing you’re hesitant whether you have enough data for a daily model?

If you have any questions along the way, please feel free to contact me. Not sure if you see my contact info here? If not Charles can give you my direct info.

Thanks and good luck!
Christy

Yeonjoo_Jung · January 19, 2022, 11:37pm

Thank you, Christy. Yes we’ll test monthly data first and test some tips you mentioned. I agree daily data will perform better most cases. Most of our existing data was prepared in monthly frequency including feature data sets since our users request monthly forecasts. We can test getting the data in weekly format and aggregate the forecast results to monthly and see if it works better than monthly forecast.

Yes, we have many cases the data is sparse; dealing it well is a key challenge.

I’ll reach out to you once we get to test building global models and have questions.

Topic		Replies	Views
Built in 2D Convolutions with LSTM RLlib	7	591	August 7, 2022
Ray SGD distributed tensorflow Ray Train	8	712	December 17, 2020
Custom LSTM model doesn't perform well RLlib	3	525	January 13, 2023
Issue in Ray dataset sharding Ray Libraries (Data, Train, Tune, Serve)	12	1015	October 15, 2022
[SGD] [Tune] Issue with ray.util.sgd.data.Dataset API Ray Tune	6	479	April 23, 2021

Data for building global time series forecast model with deep learning

Related Topics