Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction

Willard, Jared D.; Harrington, Peter; Subramanian, Shashank; Mahesh, Ankur; O'Brien, Travis A.; Collins, William D.

Computer Science > Machine Learning

arXiv:2404.19630 (cs)

[Submitted on 30 Apr 2024]

Title:Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction

Authors:Jared D. Willard, Peter Harrington, Shashank Subramanian, Ankur Mahesh, Travis A. O'Brien, William D. Collins

View PDF HTML (experimental)

Abstract:The rapid rise of deep learning (DL) in numerical weather prediction (NWP) has led to a proliferation of models which forecast atmospheric variables with comparable or superior skill than traditional physics-based NWP. However, among these leading DL models, there is a wide variance in both the training settings and architecture used. Further, the lack of thorough ablation studies makes it hard to discern which components are most critical to success. In this work, we show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets. Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS. We present some ablations on key aspects of the training pipeline, exploring different loss functions, model sizes and depths, and multi-step fine-tuning to investigate their effect. We also examine the model performance with metrics beyond the typical ACC and RMSE, and investigate how the performance scales with model size.

Comments:	9 pages, 6 figures
Subjects:	Machine Learning (cs.LG)
MSC classes:	68T07, 86A10
ACM classes:	J.2; I.2.6
Cite as:	arXiv:2404.19630 [cs.LG]
	(or arXiv:2404.19630v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2404.19630
Journal reference:	23rd Conference on Artificial Intelligence for Environmental Science. Jan 2024. Abstract #437874

Submission history

From: Jared Willard [view email]
[v1] Tue, 30 Apr 2024 15:30:14 UTC (3,709 KB)

Computer Science > Machine Learning

Title:Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators