Diffusion Models for Synthetic Time Series Data

6 min readMay 28, 2024

This post introduces TimeWeaver, Synthefy’s novel diffusion model for conditional time series synthesis, which beats Generative Adversarial Networks (GANs) by up-to 27% on real-world energy, medical, air quality, and traffic data sets.

“Generate the electricity demand pattern for a home in East Austin with an electric vehicle during a winter freeze.”
“Generate a realistic medical electrocardiogram (ECG) waveform for a patient of a specific age, gender, and weight who is a smoker with a pacemaker”.

We synthesize realistic time series based on a text prompt.

Use Cases of Synthetic Time Series Data

Imagine generating a realistic medical ECG pattern based on a patient’s health record. This generated data could be used to train medical residents, sell realistic (but anonymous) data to third parties, or even stress-test a pacemaker’s ability to detect diseases on rare patient subtypes. More broadly, synthetic time series are useful for:

Privacy-Preservation: Companies can anonymize private customer data for sharing or internal testing.
Asking “What-if” Questions: Companies can create rare or anomalous variants of real data to stress-test systems, such as energy grid capacity planning for a winter freeze.
Robust Machine Learning: Companies can augment imbalanced datasets, such as creating rare ECG variants or network traffic anomalies.

Despite years of research on time series synthesis, today’s methods, such as Generative Adversarial Networks (GANs), ignore rich contextual data. This metadata could be as diverse as weather and location for energy, patient health records for ECG analysis, and news articles in finance. As such, they often synthesize generic, low-quality samples that cannot be flexibly tuned to a specific real-world condition.

Why is Generating Time Series Data Harder Than Images or Audio?

In our opinion, generating time series is fundamentally different, and more challenging, than synthesizing realistic audio or video for the following reasons:

Rich Metadata: Metadata can be categorical (e.g., whether a patient has a pacemaker), quantitative (e.g., age), or even a time series, such as anticipated precipitation. In contrast, image, video, and audio generation often deal with static text prompts.
Visual Inspection of Synthetic Data Quality: Visual inspection is a key aspect in evaluating image generation approaches as evaluation metrics like the Inception Score (IS) are widely adopted due to their alignment with human judgment. However, it is non-trivial to glance at a time series and tell if it retains key features, such as statistical moments or frequency spectra.
Lack of Powerful Pre-trained Feature Extractors: In the image and audio domains, we have powerful feature extractors trained on internet-scale data, such as CLIP and CLAP. These are vital building blocks for encoding conditions in image generation. However, these models are non-existent in the time series domain since standard datasets vary significantly based on horizon, number of channels, and the heterogeneity of metadata.

Introducing TimeWeaver: Synthefy’s Novel Diffusion Model

Our key innovation is a novel diffusion model architecture that handles multi-modal context.

Synthefy has pioneered a novel joint architecture for time series synthesis, forecasting, and imputation using diffusion models. On the right, we start with rich categorical and quantitative meta-data. Then, we process this heterogeneous contextual metadata using Synthefy’s novel pre-processing layers.

Next, we start with pristine training data on the left, as exemplified by a pure sinusoid. During our training process, the diffusion process successively adds noise to the original data to eventually yield white noise. Our key technical insight is that, if we have learned the true data distribution well, we should be able to systematically revert this noise to recover the original training time series. As such, we also learn a custom Synthefy denoiser network. As we train on a large dataset, we eventually learn the intricacies of the data distribution. During real-time deployment, we start with a text prompt describing the condition and white noise. We then run inference on the model to create realistic synthetic time series data.

Case Studies: Medicine, Energy, Networking, and Transportation

The next figure shows the quality of our synthetic data on four challenging public datasets. These datasets feature a diverse mix of seasonalities, discrete and categorical conditions, a wide range of horizons, and multivariate correlated channels. An example task, corresponding to column 3, is: “Generate an electricity load pattern for a specific user in Austin during a winter freeze on a Saturday’’. For more details, see our extended technical report.

The first row is Synthefy’s “TimeWeaver” model, and the second row is the previous state-of-the-art of Generative Adversarial Networks (GANs). Each time series is for a test prompt/condition whose data was never seen before during training. The real time series is in blue, while Synthefy’s synthetic time series is in red. The key take-away is that Synthefy’s model closely matches ground-truth, challenging test time series and beats GANs by 5x.

How Realistic is Synthefy’s Synthetic Data?

Synthefy provides out-of-the-box evaluation metrics to test the utility of synthetic data.

Synthefy’s models outperform GANs significantly on 4 real-world public datasets.

Time and Frequency Domain Evaluation Metrics

The bottom two rows plot the distribution of time series values across a dataset for real, unseen test conditions (blue) and synthetic data (red). The third row is Synthefy’s model, while the fourth row is for our closest competitors of GANs. Synthefy’s platform also automatically evaluates that synthesized time series have a similar Fourier spectra as the real time series.

A Variant of the Frechet Inception Distance (FID) for Time Series

More specifically, the key metric we use to evaluate the quality of synthetic data is a variant of the popular Frechet Inception Distance (FID) score, but adapted for the time series domain. We call this the Joint Frechet Time Series Distance (J-FTSD), which is a metric joining our deep-learned representation (embedding) of contextual metadata and the time series embedding. J-FTSD captures the distributional difference of real and synthetic data, so lower scores are better. Our extended technical report illustrates that J-FTSD captures key statistical moments in the time and frequency domain. Moreover, good (lower) J-FTSD scores correlate with high accuracy on downstream ML tasks using synthetic data. Our toolbox can train a model on purely synthetic data and then evaluate the model on real, held-out test data, which we call the Train on Synthetic, Test on Real (TSTR) metric. The following figure shows that we outperform GANs on both the J-FTSD metric and TSTR on all real-world datasets.

Synthefy’s model improves the time series fidelity score of J-FTSD (left) and downstream classification accuracy of models compared to GANs.

Today’s GANs Suffer From Mode Collapse

Clearly, Synthefy’s models accurately represent the time domain distribution much more accurately than GANs. In particular, GANs are notorious for only being able to create uni-modal data distributions and suffer from the well-known mode collapse problem. This is clearly seen for the Traffic dataset (bottom right), where the real traffic data distribution is bi-modal, while the GAN only learns an inaccurate “average” mode.

Synthefy’s Synthetic Data Significantly Boosts Accuracy on Downstream ML Tasks

The ultimate test of synthetic data is its utility for downstream tasks, such as data augmentation or training robust ML models. As such, we train an ML inference model (classification or regression) on purely synthetic data, but test it on real data. This is the TSTR metric described earlier. In the ECG dataset, a disease classifier trained on real patient data achieves 95% accuracy on real, unseen test patients. Remarkably, a classifier trained on Synthefy’s purely synthetic data achieves a staggering 93% accuracy when tested on the same real, unseen test patients.

Curious to Test our Models?

Our models are already solving real business use cases for Fortune 500 global companies. For more information, stay tuned for our upcoming public Github. Or, come see us at ICML 2024.