HC2023 Qualcomm Hexagon NPU
HC2023 Qualcomm Hexagon NPU
HC2023 Qualcomm Hexagon NPU
Eric Mahurin
Senior Director, Technology
Qualcomm Technologies, Inc
Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
Hexagon NPU
High Performance, Power Efficient ML Inference Processor for Qualcomm® SoCs
Hexagon
Hexagon NPU
+ Vector eXtensions
2
Hardware
Hexagon NPU
4
Vector
• Memory access
• Load/store with L2/DDR or TCM Shift Store
• Fully parallel scatter & gather with TCM to address arbitrary data- Load Scat
parallel workloads Multiply Gath
• Target applications
• Originally for image processing Multiply
• Adapted to additional workloads including DNNs
32 Vreg, 4 Vpred
5
Tensor
• Tensor SIMD
• Tensor instead of vector as data-parallel quantum
• 2D matrices, 3D (X, Y, depth), and 4D (multiple 3D)
Weights
• Bit widths (activations * weights): Memory
Matrix
• Integer: (8/16) * (4/8/16)
• Float: FP16 * FP16
• ISA accelerates:
• Matrix multiply
• Convolutional layer
Activations Accumulation
• Depth-wise and other small group sized convolutions
• Fused activation functions
Matrix Matrix
• Per output-channel scaling
6
Programming Model
Architecture – Threads
VLIW
dedicated registers I-Cache
• Instructions operate on thread-local registers and
[potentially shared] memory RegFile RegFile RegFile
8
Architecture – Memory model
VLIW
I-Cache
• Acts as a software-managed cache
• More scalable than a hardware cache
RegFile RegFile RegFile
• Much higher bandwidth than a typical cache
• Enables very high-bandwidth scatter/gather
• Predictable performance – no misses
• Virtually addressed DMA for hiding DDR latency
9
Efficiency
Tensor Data Locality – Temporal and Spatial
• Output stationary:
• Accumulators are wider bit-width than input activations & weights
• Accumulate across all input channels and filter taps
13
Pruning vs. Quantization P: Pruning Q: Quantization
More accuracy
• Reduces compute energy with more zeros
• Allows for skipping compute for each zero
• But, costly (area/energy) for dense case in tensor architecture
14
Application
Target Industries
16
Performance
Scalar
+Vector
Scalar
+Vector
+Tensor
Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar
, , ,J , j “
”, https://arxiv.org/abs/2303.17951
Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort. “ :
O H H ”, https://arxiv.org/abs/2306.12929
Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort. (2023).
“ : ?”, https://arxiv.org/abs/2307.02973
18
Thank you
Nothing in these materials is an offer to sell any of the components References in this presentation to “Qualcomm” may mean Qualcomm Incorporated,
or devices
Nothing referenced
in these herein.
materials is an offer to sell any of the components Qualcomm Technologies,
References Inc., and/or to
in this presentation other subsidiaries
“Qualcomm” mayormean
business units within
Qualcomm Incorporated,
or devices referenced herein. the Qualcomm
Qualcomm corporate structure,
Technologies, Inc.,asand/or
applicable. Qualcomm Incorporated
other subsidiaries includes
or business units within
©2018-2023 Qualcomm Technologies, Inc. and/or its affiliated
our licensing business,
the Qualcomm QTL, and
corporate the vast as
structure, majority of ourQualcomm
applicable. patent portfolio. Qualcomm
Incorporated
Follow us on: companies.
©2018-2023All Rights Reserved.
Qualcomm Technologies, Inc. and/or its affiliated
Technologies,
includes Inc., a subsidiary
our licensing of Qualcomm
business, QTL, andIncorporated, operates,
the vast majority of ouralong
patentwith its
portfolio.
companies. All Rights Reserved.
Qualcomm is a trademark or registered trademark of Qualcomm subsidiaries,
Qualcommsubstantially all of our
Technologies, Inc.,engineering,
a subsidiaryresearch and development
of Qualcomm Incorporated, operates,
For more information, visit us at: Incorporated.
Qualcomm andOther productsare
Snapdragon andtrademarks
brand names may be trademarks
or registered functions,
alongand substantially
with all of substantially
its subsidiaries, our products all
and ofservices businesses,
our engineering, including
research and
or registeredoftrademarks
trademarks QualcommofIncorporated.
their respective
Otherowners.
products and brand our QCT semiconductor
development business.
functions, and substantially all of our products and services businesses,
qualcomm.com & qualcomm.com/blog names may be trademarks or registered trademarks of their including our QCT semiconductor business. Snapdragon and Qualcomm branded
Snapdragon and Qualcomm branded products are products of Qualcomm
respective owners. products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are
Qualcomm patented technologies are licensed by Qualcomm Incorporated.
licensed by Qualcomm Incorporated.