Join us in Silicon Valley September 18-19 at the 2024 PyTorch Conference. Learn more.

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

July 30, 2024

Introducing torchchat: Accelerating Local LLM Inference on Laptop, Desktop and Mobile

Today, we’re releasing torchchat, a library showcasing how to seamlessly and performantly run Llama 3, 3.1, and other large language models across laptop, desktop, and mobile.

July 30, 2024

Quantization-Aware Training for Large Language Models with PyTorch

In this blog, we present an end-to-end Quantization-Aware Training (QAT) flow for large language models in PyTorch. We demonstrate how QAT in PyTorch can recover up to 96% of the accuracy degradation on hellaswag and 68% of the perplexity degradation on wikitext for Llama3 compared to post-training quantization (PTQ). We present the QAT APIs in torchao and showcase how users can leverage them for fine-tuning in torchtune.

July 24, 2024

PyTorch 2.4 Release Blog

We are excited to announce the release of PyTorch® 2.4 (release note)! PyTorch 2.4 adds support for the latest version of Python (3.12) for torch.compile. AOTInductor freezing gives developers running AOTInductor more performance-based optimizations by allowing the serialization of MKLDNN weights. As well, a new default TCPStore server backend utilizing libuv has been introduced which should significantly reduce initialization times for users running large-scale jobs. Finally, a new Python Cu...

July 22, 2024

Deep Dive on the Hopper TMA Unit for FP8 GEMMs

Abstract

July 11, 2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large language models and long-context applications. FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is now used by most libraries to accelerate Transformer training and inference. This has contributed to a massive increase in LLM context length in the last two years, from 2-4K (GPT-3, OPT) to 128K (GPT-4), or even 1M (Llam...

July 10, 2024

Learn how to develop Android applications with ExecuTorch and Llama models

This blog is courtesy of the PyTorch team at Arm. More details can be found here.

July 09, 2024

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Summary

Install PyTorch

Select your preferences and run the install command. Stable represents the most currently tested and supported version of PyTorch. This should be suitable for many users. Preview is available if you want the latest, not fully tested and supported, builds that are generated nightly. Please ensure that you have met the prerequisites below (e.g., numpy), depending on your package manager. Anaconda is our recommended package manager since it installs all dependencies. You can also install previous versions of PyTorch. Note that LibTorch is only available for C++.

NOTE: Latest PyTorch requires Python 3.8 or later.

PyTorch Build

Your OS

Package

Language

Compute Platform

Run this Command:

PyTorch Build

Stable (1.13.0)

Preview (Nightly)

Your OS

Linux

Mac

Windows

Package

Conda

Pip

LibTorch

Source

Language

Python

C++ / Java

Compute Platform

CUDA 11.8

CUDA 12.1

CUDA 12.4

ROCm 5.2

CPU

Run this Command:

conda install pytorch torchvision -c pytorch

Previous versions of PyTorch

Quick Start With
Cloud Partners

Get up and running with PyTorch quickly through popular cloud platforms and machine learning services.

Amazon Web Services

Google Cloud Platform

Cloud Deep Learning VM Image

Deep Learning Containers

Microsoft Azure

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources

Install PyTorch

Quick Start WithCloud Partners

Docs

Tutorials

Resources

Quick Start With
Cloud Partners