NVM Express Revision 1.3
NVM Express Revision 1.3
NVM Express Revision 1.3
NVM Express
NVM Express
Revision 1.3
May 1, 2017
1
NVM Express 1.3
NVM Express revision 1.3 specification available for download at http://nvmexpress.org. NVM Express
revision 1.3 ratified on April 26, 2017.
SPECIFICATION DISCLAIMER
LEGAL NOTICE:
NOTICE TO USERS WHO ARE NVM EXPRESS, INC. MEMBERS: Members of NVM Express, Inc. have
the right to use and implement this NVM Express revision 1.3 specification subject, however, to the
Member’s continued compliance with the Company’s Intellectual Property Policy and Bylaws and the
Member’s Participation Agreement.
NOTICE TO NON-MEMBERS OF NVM EXPRESS, INC.: If you are not a Member of NVM Express, Inc.
and you have obtained a copy of this document, you only have a right to review this document or make
reference to or cite this document. Any such references or citations to this document must acknowledge
NVM Express, Inc. copyright ownership of this document. The proper copyright citation or reference is as
follows: “© 2007 - 2017 NVM Express, Inc. ALL RIGHTS RESERVED.” When making any such
citations or references to this document you are not permitted to revise, alter, modify, make any
derivatives of, or otherwise amend the referenced portion of this document in any way without the prior
express written permission of NVM Express, Inc. Nothing contained in this document shall be deemed as
granting you any kind of license to implement or use this document or the specification described therein,
or any of its contents, either expressly or impliedly, or to any intellectual property owned or controlled by
NVM Express, Inc., including, without limitation, any trademarks of NVM Express, Inc.
LEGAL DISCLAIMER:
THIS DOCUMENT AND THE INFORMATION CONTAINED HEREIN IS PROVIDED ON AN “AS IS”
BASIS. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, NVM EXPRESS, INC.
(ALONG WITH THE CONTRIBUTORS TO THIS DOCUMENT) HEREBY DISCLAIM ALL
REPRESENTATIONS, WARRANTIES AND/OR COVENANTS, EITHER EXPRESS OR IMPLIED,
STATUTORY OR AT COMMON LAW, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, VALIDITY,
AND/OR NONINFRINGEMENT.
All product names, trademarks, registered trademarks, and/or servicemarks may be claimed as the
property of their respective owners.
2
NVM Express 1.3
Table of Contents
1 INTRODUCTION ............................................................................................................. 6
1.1 Overview ......................................................................................................................................... 6
1.2 Scope .............................................................................................................................................. 6
1.3 Outside of Scope ............................................................................................................................ 6
1.4 Theory of Operation ........................................................................................................................ 6
1.5 Conventions .................................................................................................................................. 12
1.6 Definitions ..................................................................................................................................... 12
1.7 Keywords ...................................................................................................................................... 15
1.8 Byte, word and Dword Relationships ............................................................................................ 16
1.9 References ................................................................................................................................... 17
1.10 References Under Development............................................................................................... 17
2 SYSTEM BUS (PCI EXPRESS) REGISTERS ....................................................................... 18
2.1 PCI Header ................................................................................................................................... 18
2.2 PCI Power Management Capabilities ........................................................................................... 22
2.3 Message Signaled Interrupt Capability (Optional) ........................................................................ 23
2.4 MSI-X Capability (Optional) .......................................................................................................... 24
2.5 PCI Express Capability ................................................................................................................. 26
2.6 Advanced Error Reporting Capability (Optional) .......................................................................... 31
2.7 Other Capability Pointers .............................................................................................................. 35
3 CONTROLLER REGISTERS ............................................................................................. 36
3.1 Register Definition ........................................................................................................................ 36
3.2 Index/Data Pair registers (Optional) ............................................................................................. 47
4 DATA STRUCTURES ..................................................................................................... 49
4.1 Submission Queue & Completion Queue Definition ..................................................................... 49
4.2 Submission Queue Entry – Command Format ............................................................................. 51
4.3 Physical Region Page Entry and List ........................................................................................... 54
4.4 Scatter Gather List (SGL) ............................................................................................................. 55
4.5 Metadata Region (MR) ................................................................................................................. 61
4.6 Completion Queue Entry .............................................................................................................. 62
4.7 Controller Memory Buffer ............................................................................................................. 68
4.8 Namespace List ............................................................................................................................ 69
4.9 Controller List ................................................................................................................................ 69
4.10 Fused Operations...................................................................................................................... 70
4.11 Command Arbitration ................................................................................................................ 70
5 ADMIN COMMAND SET ................................................................................................. 74
5.1 Abort command ............................................................................................................................ 76
5.2 Asynchronous Event Request command ..................................................................................... 77
5.3 Create I/O Completion Queue command ..................................................................................... 81
5.4 Create I/O Submission Queue command ..................................................................................... 82
5.5 Delete I/O Completion Queue command ..................................................................................... 84
5.6 Delete I/O Submission Queue command ..................................................................................... 85
5.7 Doorbell Buffer Config command ................................................................................................. 86
5.8 Device Self-test command ............................................................................................................ 87
5.9 Directive Receive command ......................................................................................................... 89
5.10 Directive Send command .......................................................................................................... 89
5.11 Firmware Commit command ..................................................................................................... 90
5.12 Firmware Image Download command ...................................................................................... 92
5.13 Get Features command ............................................................................................................ 93
3
NVM Express 1.3
4
NVM Express 1.3
5
NVM Express 1.3
1 Introduction
1.1 Overview
NVM Express (NVMe) is an interface that allows host software to communicate with a non-volatile memory
subsystem. This interface is optimized for Enterprise and Client solid state drives, typically attached as a
register level interface to the PCI Express interface.
Note: During development, this specification was referred to as Enterprise NVMHCI. However, the name
was modified to NVM Express prior to specification completion. This interface is targeted for use in both
Client and Enterprise systems.
For an overview of changes from revision 1.2.1 to revision 1.3, refer to nvmexpress.org/changes for a
document that describes the new features, including mandatory requirements for a controller to comply with
revision 1.3.
6
NVM Express 1.3
many Enterprise capabilities like end-to-end data protection (compatible with SCSI Protection Information,
commonly known as T10 DIF, and SNIA DIX standards), enhanced error reporting, and virtualization.
The interface has the following key attributes:
Does not require uncacheable / MMIO register reads in the command submission or completion
path.
A maximum of one MMIO register write is necessary in the command submission path.
Support for up to 65,535 I/O queues, with each I/O queue supporting up to 64K outstanding
commands.
Priority associated with each I/O queue with well-defined arbitration mechanism.
All information to complete a 4KB read request is included in the 64B command itself, ensuring
efficient small I/O operation.
Efficient and streamlined command set.
Support for MSI/MSI-X and interrupt aggregation.
Support for multiple namespaces.
Efficient support for I/O virtualization architectures like SR-IOV.
Robust error reporting and management capabilities.
Support for multi-path I/O and namespace sharing.
An NVM Express controller is associated with a single PCI Function. The capabilities and settings that
apply to the entire controller are indicated in the Controller Capabilities (CAP) register and the Identify
Controller data structure.
A namespace is a quantity of non-volatile memory that may be formatted into logical blocks. An NVM
Express controller may support multiple namespaces that are referenced using a namespace ID.
Namespaces may be created and deleted using the Namespace Management and Namespace Attachment
commands. The Identify Namespace data structure indicates capabilities and settings that are specific to a
particular namespace. The capabilities and settings that are common to all namespaces are reported by
the Identify Namespace data structure for namespace ID FFFFFFFFh.
NVM Express is based on a paired Submission and Completion Queue mechanism. Commands are placed
by host software into a Submission Queue. Completions are placed into the associated Completion Queue
by the controller. Multiple Submission Queues may utilize the same Completion Queue. Submission and
Completion Queues are allocated in memory.
An Admin Submission and associated Completion Queue exist for the purpose of controller management
and control (e.g., creation and deletion of I/O Submission and Completion Queues, aborting commands,
etc.). Only commands that are part of the Admin Command Set may be submitted to the Admin Submission
Queue.
An I/O Command Set is used with an I/O queue pair. This specification defines one I/O Command Set,
named the NVM Command Set. The host selects one I/O Command Set that is used for all I/O queue
pairs.
Host software creates queues, up to the maximum supported by the controller. Typically the number of
command queues created is based on the system configuration and anticipated workload. For example,
on a four core processor based system, there may be a queue pair per core to avoid locking and ensure
data structures are created in the appropriate processor core’s cache. Figure 1 provides a graphical
representation of the queue pair mechanism, showing a 1:1 mapping between Submission Queues and
Completion Queues. Figure 2 shows an example where multiple I/O Submission Queues utilize the same
7
NVM Express 1.3
I/O Completion Queue on Core B. Figure 1 and Figure 2 show that there is always a 1:1 mapping between
the Admin Submission Queue and Admin Completion Queue.
Host
Controller Mgmt Core 0 Core 1 Core N-1
Admin Admin I/O I/O I/O I/O I/O I/O
Submission Completion Submission Completion Submission Completion Submission Completion
Queue Queue Queue 1 Queue 1 Queue 2 Queue 2 Queue N Queue N
Controller
Host
Controller Mgmt Core A Core B
Admin Admin I/O I/O I/O I/O I/O I/O
Submission Completion Submission Completion Submission Submission Submission Completion
Queue Queue Queue M Queue N Queue X Queue Y Queue Z Queue W
Controller
A Submission Queue (SQ) is a circular buffer with a fixed slot size that the host software uses to submit
commands for execution by the controller. The host software updates the appropriate SQ Tail doorbell
register when there are one to n new commands to execute. The previous SQ Tail value is overwritten in
the controller when there is a new doorbell register write. The controller fetches SQ entries in order from
the Submission Queue, however, it may then execute those commands in any order.
Each Submission Queue entry is a command. Commands are 64 bytes in size. The physical memory
locations in memory to use for data transfers are specified using Physical Region Page (PRP) entries or
Scatter Gather Lists. Each command may include two PRP entries or one Scatter Gather List (SGL)
segment. If more than two PRP entries are necessary to describe the data buffer, then a pointer to a PRP
List that describes a list of PRP entries is provided. If more than one SGL segment is necessary to describe
the data buffer, then the SGL segment provides a pointer to the next SGL segment.
A Completion Queue (CQ) is a circular buffer with a fixed slot size used to post status for completed
commands. A completed command is uniquely identified by a combination of the associated SQ identifier
8
NVM Express 1.3
and command identifier that is assigned by host software. Multiple Submission Queues may be associated
with a single Completion Queue. This feature may be used where a single worker thread processes all
command completions via one Completion Queue even when those commands originated from multiple
Submission Queues. The CQ Head pointer is updated by host software after it has processed completion
queue entries indicating the last free CQ slot. A Phase Tag (P) bit is defined in the completion queue entry
to indicate whether an entry has been newly posted without consulting a register. This enables host
software to determine whether the new entry was posted as part of the previous or current round of
completion notifications. Specifically, each round through the Completion Queue entries, the controller
inverts the Phase Tag bit.
1.4.1 Multi-Path I/O and Namespace Sharing
This section provides an overview of multi-path I/O and namespace sharing. Multi-path I/O refers to two or
more completely independent PCI Express paths between a single host and a namespace while
namespace sharing refers to the ability for two or more hosts to access a common shared namespace
using different NVM Express controllers. Both multi-path I/O and namespace sharing require that the NVM
subsystem contain two or more controllers. Concurrent access to a shared namespace by two or more
hosts requires some form of coordination between hosts. The procedure used to coordinate these hosts is
outside the scope of this specification.
Figure 3 shows an NVM subsystem that contains a single NVM Express controller and a single PCI Express
port. Since this is a single Function PCI Express device, the NVM Express controller shall be associated
with PCI Function 0. A controller may support multiple namespaces. The controller in Figure 3 supports two
namespaces labeled NS A and NS B. Associated with each controller namespace is a namespace ID,
labeled as NSID 1 and NSID 2, that is used by the controller to reference a specific namespace. The
namespace ID is distinct from the namespace itself and is the handle a host and controller use to specify a
particular namespace in a command. The selection of a controller’s namespace IDs is outside the scope of
this specification. In this example namespace ID 1 is associated with namespace A and namespace ID 2
is associated with namespace B. Both namespaces are private to the controller and this configuration
supports neither multi-path I/O nor namespace sharing.
PCI Function 0
NVM Express Controller
NSID 1 NSID 2
NS NS
A B
Figure 4 shows a multi-Function NVM Subsystem with a single PCI Express port containing two controllers,
one controller is associated with PCI Function 0 and the other controller is associated with PCI Function 1.
Each controller supports a single private namespace and access to shared namespace B. The namespace
ID shall be the same in all controllers that have access to a particular shared namespace. In this example
both controllers use namespace ID 2 to access shared namespace B.
9
NVM Express 1.3
NS NS
A C
NS
B
There is a unique Identify Controller data structure for each controller and a unique Identify Namespace
data structure for each namespace. Controllers with access to a shared namespace return the Identify
Namespace data structure associated with that shared namespace (i.e., the same data structure contents
are returned by all controllers with access to the same shared namespace). There is a globally unique
identifier associated with the namespace itself and may be used to determine when there are multiple paths
to the same shared namespace. Refer to section 7.10.
Controllers associated with a shared namespace may operate on the namespace concurrently. Operations
performed by individual controllers are atomic to the shared namespace at the write atomicity level of the
controller to which the command was submitted (refer to section 6.4). The write atomicity level is not
required to be the same across controllers that share a namespace. If there are any ordering requirements
between commands issued to different controllers that access a shared namespace, then host software or
an associated application, is required to enforce these ordering requirements.
Figure 5 illustrates an NVM Subsystem with two PCI Express ports, each with an associated controller.
Both controllers map to PCI Function 0 of the corresponding port. Each PCI Express port in this example
is completely independent and has its own PCI Express Fundamental Reset and reference clock input. A
reset of a port only affects the controller associated with that port and has no impact on the other controller,
shared namespace, or operations performed by the other controller on the shared namespace. The
functional behavior of this example is otherwise the same as that illustrated in Figure 4.
10
NVM Express 1.3
NS NS
A C
NS
B
The two ports shown in Figure 5 may be associated with the same Root Complex or with different Root
Complexes and may be used to implement both multi-path I/O and I/O sharing architectures. System-level
architectural aspects and use of multiple ports in a PCI Express fabric are beyond the scope of this
specification.
Figure 6 illustrates an NVM subsystem that supports Single Root I/O Virtualization (SR-IOV) and has one
Physical Function and four Virtual Functions. An NVM Express controller is associated with each Function
with each controller having a private namespace and access to a namespace shared by all controllers,
labeled NS F. The behavior of the controllers in this example parallels that of the other examples in this
section. Refer to section 8.5.4 for more information on SR-IOV.
Figure 6: PCI Express Device Supporting Single Root I/O Virtualization (SR-IOV)
PCIe Port
Physical Function 0 Virtual Function (0,1) Virtual Function (0,2) Virtual Function (0,3) Virtual Function (0,4)
NVMe Controller NVMe Controller NVMe Controller NVMe Controller NVMe Controller
NSID 1 NSID 2 NSID 3 NSID 2 NSID 4 NSID 2 NSID 5 NSID 2 NSID 6 NSID 2
NS NS NS NS NS
A B C D E
NS
F
11
NVM Express 1.3
Examples provided in this section are meant to illustrate concepts and are not intended to enumerate all
possible configurations. For example, an NVM subsystem may contain multiple PCI Express ports with
each port supporting SR-IOV.
1.5 Conventions
Hardware shall return ‘0’ for all bits and registers that are marked as reserved, and host software shall write
all reserved bits and registers with the value of ‘0’.
Inside the register section, the following abbreviations are used:
RO Read Only
RW Read Write
R/W Read Write. The value read may not be the last value written.
RWC Read/Write ‘1’ to clear
RWS Read/Write ‘1’ to set
Impl Spec Implementation Specific – the controller has the freedom to choose
its implementation.
HwInit The default state is dependent on NVM Express controller and
system configuration. The value is initialized at reset, for example by
an expansion ROM, or in the case of integrated devices, by a
platform BIOS.
For some register fields, it is implementation specific as to whether the field is RW, RWC, or RO; this is
typically shown as RW/RO or RWC/RO to indicate that if the functionality is not supported that the field is
read only.
When a register field is referred to in the document, the convention used is “Register Symbol.Field Symbol”.
For example, the PCI command register parity error response enable field is referred to by the name
CMD.PEE. If the register field is an array of bits, the field is referred to as “Register Symbol.Field Symbol
(array offset to element)”.A 0-based value is a numbering scheme for which the number 0h actually
corresponds to a value of 1h and thus produces the pattern of 0h = 1h, 1h = 2h, 2h = 3h, etc. In this
numbering scheme, there is not a method for specifying the value of 0h. Values in this specification are 1-
based (i.e., the number 1h corresponds to a value of 1h, 2h=2h, etc.) unless otherwise specified.
When a size is stated in the document as KB, the convention used is 1KB = 1024 bytes.
The ^ operator is used to denote the power to which that number, symbol, or expression is to be raised.
Some parameters are defined as an ASCII string. ASCII strings shall contain only code values 20h through
7Eh. For the string “Copyright”, the character “C” is the first byte, the character “o” is the second byte, etc.
The string is left justified and shall be padded with spaces (ASCII character 20h) to the right if necessary.
A hexadecimal ASCII string is an ASCII string that uses a subset of the code values: “0” to “9”, “A” to “F”
uppercase, and “a” to “f” lowercase.
1.6 Definitions
1.6.1 Admin Queue
The Admin Queue is the Submission Queue and Completion Queue with identifier 0. The Admin
Submission Queue and corresponding Admin Completion Queue are used to submit administrative
commands and receive completions for those administrative commands, respectively.
The Admin Submission Queue is uniquely associated with the Admin Completion Queue.
1.6.2 arbitration burst
The maximum number of commands that may be launched at one time from a Submission Queue that is
using round robin or weighted round robin with urgent priority class arbitration.
12
NVM Express 1.3
13
NVM Express 1.3
14
NVM Express 1.3
1.7 Keywords
Several keywords are used to differentiate between different levels of requirements.
1.7.1 mandatory
A keyword indicating items to be implemented as defined by this specification.
1.7.2 may
A keyword that indicates flexibility of choice with no implied preference.
1.7.3 optional
A keyword that describes features that are not required by this specification. However, if any optional
feature defined by the specification is implemented, the feature shall be implemented in the way defined by
the specification.
1.7.4 R
“R” is used as an abbreviation for “reserved” when the figure or table does not provide sufficient space for
the full word “reserved”.
1.7.5 reserved
A keyword referring to bits, bytes, words, fields, and opcode values that are set-aside for future
standardization. Their use and interpretation may be specified by future extensions to this or other
specifications. A reserved bit, byte, word, field, or register shall be cleared to zero, or in accordance with a
future extension to this specification. The recipient is not required to check reserved bits, bytes, words, or
fields. Receipt of reserved coded values in defined fields in commands shall be reported as an error. Writing
a reserved coded value into a controller register field produces undefined results.
1.7.6 shall
A keyword indicating a mandatory requirement. Designers are required to implement all such mandatory
requirements to ensure interoperability with other products that conform to the specification.
15
NVM Express 1.3
1.7.7 should
A keyword indicating flexibility of choice with a strongly preferred alternative. Equivalent to the phrase “it is
recommended”.
7 6 5 4 3 2 1 0
byte
1 1 1 1 1 1
5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
word
byte 1 byte 0
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
Dword
word 1 word 0
16
NVM Express 1.3
1.9 References
INCITS 501-2016, Information technology – Security Features for SCSI Commands (SFSC). Available
from http://webstore.ansi.org.
INCITS 514-2014, Information technology – SCSI Block Commands - 3 (SBC-3). Available from
http://webstore.ansi.org.
INCITS 522-2014, Information technology – ATA/ATAPI Command Set - 3 (ACS-3). Available from
http://webstore.ansi.org.
JEDEC JESD218B-01: Solid State Drive (SSD) Requirements and Endurance Test Method standard.
Available from http://www.jedec.org.
NVM Express over Fabrics Specification, Revision 1.0. Available from http://www.nvmexpress.org.
NVM Express Management Interface Specification, Revision 1.0. Available from
http://www.nvmexpress.org.
PCI specification, revision 3.0. Available from http://www.pcisig.com.
PCI Express specification, revision 2.1. Available from http://www.pcisig.com.
PCI Power Management specification. Available from http://www.pcisig.com.
PCI Single Root I/O Virtualization, revision 1.1. Available from
http://www.pcisig.com/specifications/iov/single_root/.
PCI Firmware 3.0 specification. Available from http://www.pcisig.com.
RFC 4301, Kent, S. and K. Seo, “Security Architecture for the Internet Protocol”, December 2005. Available
from https://www.ietf.org/rfc.html.
RFC 6234, Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA and SHA-based HMAC
and HKDF)", May 2011. Available from https://www.ietf.org/rfc.html.
UEFI 2.3.1 specification. Available from http://www.uefi.org.
Trusted Computing Group Storage Architecture Core specification, Version 2.01 Revision 1.00. Available
from http://www.trustedcomputinggroup.org.
Trusted Computing Group Storage Interface Interactions Specification (SIIS). Available from
http://www.trustedcomputinggroup.org.
17
NVM Express 1.3
MSI-X is the recommended interrupt mechanism to use. However, some systems do not support MSI-X,
thus devices should support both the MSI Capability and the MSI-X Capability.
It is recommended that implementations support the Advanced Error Reporting Capability to enable more
robust error handling.
18
NVM Express 1.3
19
NVM Express 1.3
20
NVM Express 1.3
2.1.11 Offset 14h: MUBAR (BAR1) – Memory Register Base Address, upper 32-bits
This register specifies the upper 32-bit address of the memory registers defined in section 3.
Bit Type Reset Description
31:00 RW 0 Base Address (BA): Upper 32-bits (bits 63:32) of the memory register base address.
NOTE: NVM Express implementations that reside behind PCI compliant bridges, such as PCI Express
Endpoints, are restricted to having 32-bit assigned base address registers due to limitations on the
maximum address that may be specified in the bridge for non-prefetchable memory. See the PCI Bridge
1.2 specification for more information on this restriction.
2.1.12 Offset 18h: BAR2 – Index/Data Pair Register Base Address or Vendor Specific (Optional)
If this register is configured as I/O space, then this register specifies the Index/Data Pair base address and
is configured as shown in the table below. These registers are used to access the memory registers defined
in section 3 using I/O based accesses.
Bit Type Reset Description
31:03 RW 0 Base Address (BA): Base address of Index/Data Pair registers that is 8 bytes in size.
02:01 RO 0 Reserved
00 RO 1 Resource Type Indicator (RTE): Indicates a request for register I/O space.
If this register is configured as memory space (Resource Type Indicator is cleared to ‘0’), then the BAR2
register is vendor specific. Vendor specific space may also be allocated at the end of the memory registers
defined in section 3.
2.1.13 Offset 1Ch – 20h: BAR3 – Vendor Specific
The BAR3 register is vendor specific. Vendor specific space may also be allocated at the end of the
memory registers defined in section 3.
2.1.14 Offset 20h – 23h: BAR4 – Vendor Specific
The BAR4 register is vendor specific. Vendor specific space may also be allocated at the end of the
memory registers defined in section 3.
2.1.15 Offset 24h – 27h: BAR5 – Vendor Specific
The BAR5 register is vendor specific. Vendor specific space may also be allocated at the end of the
memory registers defined in section 3.
2.1.16 Offset 28h: CCPTR – CardBus CIS Pointer
Bit Type Reset Description
31:00 RO 0 Not supported by NVM Express.
2.1.17 Offset 2Ch: SS - Sub System Identifiers
Bits Type Reset Description
31:16 RO HwInit Subsystem ID (SSID): Indicates the sub-system identifier.
15:00 RO HwInit Subsystem Vendor ID (SSVID): Indicates the sub-system vendor identifier
2.1.18 Offset 30h: EROM – Expansion ROM (Optional)
If the register is not implemented, it shall be read-only 00h.
Bit Type Reset Description
Impl ROM Base Address (RBA): Indicates the base address of the controller’s expansion
31:00 RW
Spec ROM. Not supported for integrated implementations.
2.1.19 Offset 34h: CAP – Capabilities Pointer
Bit Type Reset Description
Impl
7:0 RO Capability Pointer (CP): Indicates the first capability pointer offset.
Spec
21
NVM Express 1.3
22
NVM Express 1.3
2.2.3 Offset PMCAP + 4h: PMCS – PCI Power Management Control and Status
Bit Type Reset Description
15 RWC 0 PME Status (PMES): Refer to the PCI SIG specifications.
14:13 RO 0 Data Scale (DSC): Refer to the PCI SIG specifications.
RO / Data Select (DSE): If PME is not supported, then this field is read only ‘0’. Refer to the
12:09 0
RW PCI SIG specifications.
RO / PME Enable (PMEE): If PME is not supported, then this field is read only ‘0’. Refer to
08 0
RW the PCI SIG specifications.
07:04 RO 0 Reserved
No Soft Reset (NSFRST): A value of ‘1’ indicates that the controller transitioning from
03 RO 1
D3hot to D0 because of a power state command does not perform an internal reset.
02 RO 0 Reserved
Power State (PS): This field is used both to determine the current power state of the
controller and to set a new power state. The values are:
00 – D0 state
01 – D1 state
01:00 R/W 00
10 – D2 state
11 – D3HOT state
When in the D3HOT state, the controller’s configuration space is available, but the register
I/O and memory spaces are not. Additionally, interrupts are blocked.
23
NVM Express 1.3
24
NVM Express 1.3
For a 64-bit Base Address register, the Table BIR indicates the lower Dword. With PCI-
to-PCI bridges, BIR values 2 through 5 are also reserved.
2.4.4 Offset MSIXCAP + 8h: MPBA – MSI-X PBA Offset / PBA BIR
Bits Type Reset Description
PBA Offset (PBAO): Used as an offset from the address contained by one of the
Impl function’s Base Address registers to point to the base of the MSI-X PBA. The lower three
31:03 RO
Spec PBA BIR bits are masked off (cleared to 000b) by software to form a 32-bit Qword-aligned
offset.
PBA BIR (PBIR): This field indicates which one of a function’s Base Address registers,
located beginning at 10h in Configuration Space, is used to map the function’s MSI-X
PBA into system memory.
25
NVM Express 1.3
26
NVM Express 1.3
Endpoint L1 Acceptable Latency (L1L): This field indicates the acceptable latency
Impl
11:9 RO that the Endpoint is able to withstand due to a transition from the L1 state to the L0
Spec
state.
Endpoint L0s Acceptable Latency (L0SL): This field indicates the acceptable total
Impl
08:06 RO latency that the Endpoint is able to withstand due to the transition from L0s state to the
Spec
L0 state.
Impl Extended Tag Field Supported (ETFS): This field indicates the maximum supported
05 RO
Spec size of the Tag field as a Requester.
Phantom Functions Supported (PFS): This field indicates the support for use of
Impl
04:03 RO unclaimed Function Numbers to extend the number of outstanding transactions
Spec
allowed by logically combining unclaimed Function Numbers with the Tag identifier.
Impl Max_Payload_Size Supported (MPS): This field indicates the maximum payload size
02:00 RO
Spec that the function may support for TLPs.
2.5.4 Offset PXCAP + 8h: PXDC – PCI Express Device Control
Bits Type Reset Description
Initiate Function Level Reset – A write of ‘1’ initiates Function Level Reset to the
15 R/W 0b
Function. The value read by software from this bit shall always ‘0’.
Max_Read_Request_Size (MRRS): This field sets the maximum Read Request size
RW/ Impl
14:12 for the Function as a Requester. The Function shall not generate Read Requests with
RO Spec
size exceeding the set value.
Enable No Snoop (ENS): If this field is set to ‘1’, the Function is permitted to set the
RW/ No Snoop bit in the Requestor Attributes of transactions it initiates that do not require
11 0
RO hardware enforced cache coherency. This field may be hardwired to ‘0’ if a Function
would never set the No Snoop attribute in transactions it initiates.
AUX Power PM Enable (APPME): If this field is set to ‘1’, enables a Function to draw
RW/
10 0 AUX power independent of PME AUX power. Functions that do not implement this
RO
capability hardware this bit to 0b.
Phantom Functions Enable (PFE): If this field is set to ‘1’, enables a Function to use
RW/ unclaimed Functions as Phantom Functions to extend the number of outstanding
09 0
RO transaction identifiers. If this field is cleared to ‘0’, the Function is not allowed to use
Phantom Functions.
Extended Tag Enable (ETE): If this field is set to ‘1’, enables a Function to use an 8-
RW/
08 0 bit Tag field as a Requester. If this field is cleared to ‘0’, the Function is restricted to a
RO
5-bit Tag field.
Max_Payload_Size (MPS): This field sets the maximum TLP payload size for the
Function. As a receiver, the Function shall handle TLPs as large as the set value. As
RW/
07:05 000b a transmitter, the Function shall not generate TLPs exceeding the set value. Functions
RO
that support only the 128 byte max payload size are permitted to hardwire this field to
0h.
Enable Relaxed Ordering (ERO): If this field is set to ‘1’, the Function is permitted to
RW/ Impl
04 set the Relaxed Ordering bit in the Attributes field of transactions it initiates that do not
RO Spec
require strong write ordering.
Unsupported Request Reporting Enable (URRE): This bit, in conjunction with other
03 RW 0
bits, controls the signaling of Unsupported Requests by sending error messages.
Fatal Error Reporting Enable (FERE): This bit, in conjunction with other bits, controls
02 RW 0
the signaling of Unsupported Requests by sending ERR_FATAL messages.
Non-Fatal Error Reporting Enable (NFERE): This bit, in conjunction with other bits,
01 RW 0 controls the signaling of Unsupported Requests by sending ERR_NONFATAL
messages.
Correctable Error Reporting Enable (CERE): This bit, in conjunction with other bits,
00 RW 0
controls the signaling of Unsupported Requests by sending ERR_COR messages.
27
NVM Express 1.3
28
NVM Express 1.3
29
NVM Express 1.3
Value Definition
Impl
13:12 RO 00b TPH and Extended TPH Completer not supported
Spec
TPH Completer supported; Extended TPH Completer not
01b
supported
10b Reserved
11b Both TPH and Extended TPH Completer supported
Impl Latency Tolerance Reporting Supported (LTRS): If set to ‘1’, then the latency
11 RO
Spec tolerance reporting mechanism is supported.
10 RO 0 No RO-enabled PR-PR Passing (NPRPR): Not applicable to NVM Express.
Impl 128-bit CAS Completer Supported (128CCS): This bit shall be set to ‘1’ if the
09 RO
Spec Function supports this optional capability.
64-bit AtomicOp Completer Supported (64AOCS): Includes FetchAdd, Swap, and
Impl
08 RO CAS AtomicOps. This bit shall be set to ‘1’ if the Function supports this optional
Spec
capability.
32-bit AtomicOp Completer Supported (32AOCS): Includes FetchAdd, Swap, and
Impl
07 RO CAS AtomicOps. This bit shall be set to ‘1’ if the Function supports this optional
Spec
capability.
06 RO 0 AtomicOp Routing Supported (AORS): Not applicable to NVM Express.
05 RO 0 ARI Forwarding Supported (ARIFS): Not applicable for NVM Express.
Completion Timeout Disable Supported (CTDS): A value of ‘1’ indicates support
04 RO 1 for the Completion Timeout Disable mechanism. The Completion Timeout Disable
mechanism is required for Endpoints that issue requests on their own behalf.
Impl Completion Timeout Ranges Supported (CTRS): This field indicates device
03:00 RO
Spec function support for the optional Completion Timeout programmability mechanism.
30
NVM Express 1.3
31
NVM Express 1.3
2.6.4 Offset AERCAP + Ch: AERUCESEV – AER Uncorrectable Error Severity Register
This register controls whether an individual error is reported as a non-fatal or a fatal error. An error is
reported as fatal when the corresponding error bit in the severity register is set (‘1’). If the bit is cleared (‘0’),
the corresponding error is considered non-fatal. These bits are sticky – they are neither initialized nor
modified during a hot reset or FLR.
32
NVM Express 1.3
2.6.5 Offset AERCAP + 10h: AERCS – AER Correctable Error Status Register
This register reports error status of individual correctable error sources from the controller. These bits are
sticky – they are neither initialized nor modified during a hot reset or FLR.
2.6.6 Offset AERCAP + 14h: AERCEM – AER Correctable Error Mask Register
This register controls the reporting of the individual correctable errors by the controller. A masked error is
not reported to the host. These bits are sticky – they are neither initialized nor modified during a hot reset
or FLR.
33
NVM Express 1.3
2.6.7 Offset AERCAP + 18h: AERCC – AER Capabilities and Control Register
Bits Type Reset Description
31:12 RO 0 Reserved
TLP Prefix Log Present (TPLP) : If set to ‘1’ and FEP is valid, this indicates that the
11 RO 0 TLP Prefix Log register contains valid information. This field is sticky – it is neither
initialized nor modified during a hot reset or FLR.
Multiple Header Recording Enable (MHRE): If this field is set to ‘1’, this enables
the controller to generate more than one error header. This field is sticky – it is neither
10 RW/RO 0
initialized nor modified during a hot reset or FLR. If the controller does not implement
the associated mechanism, then this field is cleared to ‘0’.
Impl Multiple Header Recording Capable (MHRC): If this field is set to ‘1’, indicates that
09 RW/RO
Spec the controller is capable of generating more than one error header.
ECRC Check Enable (ECE): If this field is set to ‘1’, indicates that the ECRC
checking is enabled. This field is sticky – it is neither initialized nor modified during a
08 RW/RO 0
hot reset or FLR. If the controller does not implement the associated mechanism,
then this field is cleared to ‘0’.
Impl ECRC Check Capable (ECC): If this field is set to ‘1’, indicates that the controller is
07 RO
Spec capable of checking ECRC.
ECRC Generation Enable (EGE): If this field is set to ‘1’, indicates that the ECRC
generation is enabled. This field is sticky – it is neither initialized nor modified during
06 RW/RO 0
a hot reset or FLR. If the controller does not implement the associated mechanism,
then this field is cleared to ‘0’.
Impl ECRC Generation Capable (EGC): If this field is set to ‘1’, indicates that the
05 RO
Spec controller is capable of generating ECRC.
First Error Pointer (FEP): This field identifies the bit position of the first error reported
04:00 RO 0 in the AERUCES register. This field is sticky – it is neither initialized nor modified
during a hot reset or FLR.
34
NVM Express 1.3
2.6.9 Offset AERCAP + 38h: AERTLP – AER TLP Prefix Log Register (Optional)
This register contains the End-End TLP prefix(es) for the TLP corresponding to a detected error. This
register is sticky – it is neither initialized nor modified during a hot reset or FLR.
35
NVM Express 1.3
3 Controller Registers
Controller registers are located in the MLBAR/MUBAR registers (PCI BAR0 and BAR1) that shall be
mapped to a memory space that supports in-order access and variable access widths. For many computer
architectures, specifying the memory space as uncacheable produces this behavior. The host shall not
issue locked accesses. The host shall access registers in their native width or aligned 32-bit accesses.
Violation of either of these host requirements results in undefined behavior.
Accesses that target any portion of two or more registers are not supported.
All reserved registers and all reserved bits within registers are read-only and return 0h when read. Software
shall not rely on 0h being returned.
36
NVM Express 1.3
Bit Definition
Impl 37 NVM command set
44:37 RO
Spec 38 Reserved
39 Reserved
40 Reserved
41 Reserved
42 Reserved
43 Reserved
44 Reserved
NVM Subsystem Reset Supported (NSSRS): This field indicates whether the
controller supports the NVM Subsystem Reset feature defined in section 7.3.1.
Impl
36 RO This field is set to '1' if the controller supports the NVM Subsystem Reset feature.
Spec
This field is cleared to ‘0' if the controller does not support the NVM Subsystem
Reset feature.
Doorbell Stride (DSTRD): Each Submission Queue and Completion Queue
Doorbell register is 32-bits in size. This register indicates the stride between
Impl
35:32 RO doorbell registers. The stride is specified as (2 ^ (2 + DSTRD)) in bytes. A value
Spec
of 0h indicates a stride of 4 bytes, where the doorbell registers are packed without
reserved space between each register. Refer to section 8.6.
Timeout (TO): This is the worst case time that host software shall wait for
CSTS.RDY to transition from:
a) ‘0’ to ‘1’ after CC.EN transitions from ‘0’ to ‘1’; or
Impl b) ‘1’ to ‘0’ after CC.EN transitions from ‘1’ to ‘0’.
31:24 RO
Spec
This worst case time may be experienced after events such as an abrupt
shutdown or activation of a new firmware image; typical times are expected to be
much shorter. This field is in 500 millisecond units.
23:19 RO 0h Reserved
37
NVM Express 1.3
The round robin arbitration mechanism is not listed since all controllers shall
support this arbitration mechanism.
Contiguous Queues Required (CQR): This field is set to ‘1’ if the controller
requires that I/O Submission Queues and I/O Completion Queues are required
to be physically contiguous. This field is cleared to ‘0’ if the controller supports
Impl
16 RO I/O Submission Queues and I/O Completion Queues that are not physically
Spec
contiguous. If this field is set to ‘1’, then the Physically Contiguous bit
(CDW11.PC) in the Create I/O Submission Queue and Create I/O Completion
Queue commands shall be set to ‘1’.
Maximum Queue Entries Supported (MQES): This field indicates the
maximum individual queue size that the controller supports. For NVMe over PCIe
implementations, this value applies to the I/O Submission Queues and I/O
Impl
15:00 RO Completion Queues that the host creates. For NVMe over Fabrics
Spec
implementations, this value applies to only the I/O Submission Queues that the
host creates. This is a 0’s based value. The minimum value is 1h, indicating two
entries.
38
NVM Express 1.3
15:08 RO 02h Minor Version Number (MNR): Indicates the minor version is “2”.
07:00 RO 01h Tertiary Version Number (TER): Indicates the tertiary version is “1”.
39
NVM Express 1.3
Value Definition
00b No notification; no effect
15:14 RW 0h 01b Normal shutdown notification
10b Abrupt shutdown notification
11b Reserved
This field should be written by host software prior to any power down condition
and prior to any change of the PCI power management state. It is
recommended that this field also be written prior to a warm reboot. To
determine when shutdown processing is complete, refer to CSTS.SHST. Refer
to section 7.6.2 for additional shutdown processing details.
Other fields in the CC register (including the EN bit) may be modified as part of
updating this field to 01b or 10b.
Arbitration Mechanism Selected (AMS): This field selects the arbitration
mechanism to be used. This value shall only be changed when EN is cleared
to ‘0’. Host software shall only set this field to supported arbitration mechanisms
indicated in CAP.AMS. If this field is set to an unsupported value, the behavior
is undefined.
13:11 RW 0h Value Definition
000b Round Robin
Weighted Round Robin with
001b
Urgent Priority Class
010b – 110b Reserved
111b Vendor Specific
Memory Page Size (MPS): This field indicates the host memory page size.
The memory page size is (2 ^ (12 + MPS)). Thus, the minimum host memory
page size is 4KB and the maximum host memory page size is 128MB. The
10:07 RW 0h
value set by host software shall be a supported value as indicated by the
CAP.MPSMAX and CAP.MPSMIN fields. This field describes the value used
for PRP entry size. This field shall only be modified when EN is cleared to ‘0’.
I/O Command Set Selected (CSS): This field specifies the I/O Command Set
that is selected for use for the I/O Submission Queues. Host software shall only
select a supported I/O Command Set, as indicated in CAP.CSS. This field shall
only be changed when the controller is disabled (CC.EN is cleared to ‘0’). The
06:04 RW 0h I/O Command Set selected shall be used for all I/O Submission Queues.
Value Definition
000b NVM Command Set
001b – 111b Reserved
40
NVM Express 1.3
When this field is cleared to ‘0’, the CSTS.RDY bit is cleared to ‘0’ by the
controller once the controller is ready to be re-enabled. When this field is set to
‘1’, the controller sets CSTS.RDY to ‘1’ when it is ready to process commands.
CSTS.RDY may be set to ‘1’ before namespace(s) are ready to be accessed.
Setting this field from a ‘0’ to a ‘1’ when CSTS.RDY is a ‘1,’ or setting this field
from a '1' to a '0' when CSTS.RDY is a '0,' has undefined results. The Admin
Queue registers (AQA, ASQ, and ACQ) shall only be modified when EN is
cleared to ‘0’.
41
NVM Express 1.3
Value Definition
00b Normal operation (no shutdown has been requested)
03:02 RO 0 01b Shutdown processing occurring
10b Shutdown processing complete
11b Reserved
The reset value of this field is '1' when a fatal controller error is detected during
controller initialization.
Ready (RDY): This field is set to ‘1’ when the controller is ready to accept Submission
Queue Tail doorbell writes after CC.EN is set to ‘1’. This field shall be cleared to ‘0’
when CC.EN is cleared to ‘0’. Commands shall not be submitted to the controller until
00 RO 0
this field is set to ‘1’ after the CC.EN bit is set to ‘1’. Failure to follow this requirement
produces undefined results. Host software shall wait a minimum of CAP.TO seconds
for this field to be set to ‘1’ after setting CC.EN to ‘1’ from a previous value of ‘0’.
42
NVM Express 1.3
43
NVM Express 1.3
44
NVM Express 1.3
Read Data Support (RDS): If this bit is set to ‘1’, then the controller supports data and
metadata in the Controller Memory Buffer for commands that transfer data from the
Impl
03 RO controller to the host (e.g., Read). If this bit is cleared to ‘0’, then all data and metadata
Spec
for commands that transfer data from the controller to the host shall be transferred to
host memory.
PRP SGL List Support (LISTS): If this bit is set to ‘1’, then the controller supports PRP
Lists in the Controller Memory Buffer. If this bit is set to ‘1’ and SGLs are supported by
Impl
02 RO the controller, then the controller supports Scatter Gather Lists in the Controller Memory
Spec
Buffer. If this bit is set to ‘1’, then the Submission Queue Support bit shall be set to ‘1’. If
this bit is cleared to ‘0’, then all PRP Lists and SGLs shall be placed in host memory.
Completion Queue Support (CQS): If this bit is set to ‘1’, then the controller supports
Impl
01 RO Admin and I/O Completion Queues in the Controller Memory Buffer. If this bit is cleared
Spec
to ‘0’, then all Completion Queues shall be placed in host memory.
Submission Queue Support (SQS): If this bit is set to ‘1’, then the controller supports
Impl
00 RO Admin and I/O Submission Queues in the Controller Memory Buffer. If this bit is cleared
Spec
to ‘0’, then all Submission Queues shall be placed in host memory.
3.1.13 Offset 40h: BPINFO – Boot Partition Information
This optional register defines the characteristics of Boot Partitions (refer to section 8.13). If the controller
does not support the Boot Partitions feature then this register shall be cleared to 0h.
Bit Type Reset Description
Impl Active Boot Partition ID (ABPID): This field indicates the identifier of the active
31 RO
Spec Boot Partition.
30:26 RO 0h Reserved
Boot Read Status (BRS): This field indicates the status of Boot Partition read
operations initiated by the host writing to the BPRSEL.BPID field. Refer to section
8.13.
Value Definition
00b No Boot Partition read operation requested
25:24 RO 0h 01b Boot Partition read in progress
10b Boot Partition read completed successfully
11b Error completing Boot Partition read
If host software writes the BPRSEL.BPID field, this field transitions to 01b. After
successfully completing a Boot Partition read operation (i.e., transferring the
contents to the boot memory buffer), the controller sets this field to 10b. If there is
an error completing a Boot Partition read operation, this field is set to 11b, and the
contents of the boot memory buffer are undefined.
23:15 RO 0h Reserved
Impl Boot Partition Size (BPSZ): This field defines the size of each Boot Partition in
14:00 RO
Spec multiples of 128KB. Both Boot Partitions are the same size.
45
NVM Express 1.3
3.1.15 Offset 48h: BPMBL – Boot Partition Memory Buffer Location (Optional)
This optional register specifies the memory buffer that is used as the destination for data when a Boot
Partition is read (refer to section 8.13). If the controller does not support the Boot Partitions feature then
this register shall be cleared to 0h.
Bit Type Reset Description
Boot Partition Memory Buffer Base Address (BMBBA): Specifies the 64-bit physical
63:12 RW 0h address for the Boot Partition Memory Buffer. This address shall be 4KB aligned. Note
that this field contains the 52 most significant bits of the 64 bit address.
11:00 RO 0h Reserved
3.1.16 Offset (1000h + ((2y) * (4 << CAP.DSTRD))): SQyTDBL – Submission Queue y Tail Doorbell
This register defines the doorbell register that updates the Tail entry pointer for Submission Queue y. The
value of y is equivalent to the Queue Identifier. This indicates to the controller that new commands have
been submitted for processing.
The host should not read the doorbell registers. If a doorbell register is read, the value returned is vendor
specific. Writing to a non-existent Submission Queue Tail Doorbell has undefined results.
Bit Type Reset Description
31:16 RO 0 Reserved
Submission Queue Tail (SQT): Indicates the new value of the Submission Queue Tail
entry pointer. This value shall overwrite any previous Submission Queue Tail entry
pointer value provided. The difference between the last SQT write and the current SQT
15:00 RW 0h
write indicates the number of commands added to the Submission Queue.
3.1.17 Offset (1000h + ((2y + 1) * (4 << CAP.DSTRD))): CQyHDBL – Completion Queue y Head
Doorbell
This register defines the doorbell register that updates the Head entry pointer for Completion Queue y. The
value of y is equivalent to the Queue Identifier. This indicates Completion Queue entries that have been
processed by host software.
The host should not read the doorbell registers. If a doorbell register is read, the value returned is vendor
specific. Writing to a non-existent Completion Queue Head Doorbell has undefined results.
Host software should ensure it continues to process completion queue entries within Completion Queues
regardless of whether there are entries available in a particular or any Submission Queue.
46
NVM Express 1.3
3.2.1 Restrictions
Host software shall not alternate between Index/Data Pair based access and direct memory mapped access
methods. After using direct memory mapped access to the controller registers, the Index/Data Pair
mechanism shall not be used.
47
NVM Express 1.3
48
NVM Express 1.3
4 Data Structures
This section describes data structures used by NVM Express.
49
NVM Express 1.3
A Completion Queue entry is posted to the Completion Queue when the controller write of that Completion
Queue entry to the next free Completion Queue slot inverts the Phase Tag (P) bit from its previous value
in memory. The controller may generate an interrupt to the host to indicate that one or more Completion
Queue entries have been posted.
A Completion Queue entry has been consumed by the host when the host writes the associated Completion
Queue Head Doorbell with a new value that indicates that the Completion Queue Head Pointer has moved
past the slot in which that Completion Queue entry was placed. A Completion Queue Head Doorbell write
may indicate that one or more Completion Queue entries have been consumed.
Once a Submission Queue or Completion Queue entry has been consumed, the slot in which it was placed
is free and available for reuse. Altering a Submission Queue entry after that entry has been submitted but
before that entry has been consumed results in undefined behavior. Altering a Completion Queue entry
after that entry has been posted but before that entry has been consumed results in undefined behavior.
If there are no free slots in a Completion Queue, then the controller shall not post status to that Completion
Queue until slots become available. In this case, the controller may stop processing additional Submission
Queue entries associated with the affected Completion Queue until slots become available. The controller
shall continue processing for other queues.
50
NVM Express 1.3
occupied
occupied
occupied
occupied
occupied
occupied
51
NVM Express 1.3
PRP or SGL for Data Transfer (PSDT): This field specifies whether PRPs or SGLs are used for any data
transfer associated with the command. PRPs shall be used for all Admin commands for NVMe over PCIe.
SGLs shall be used for all Admin and I/O commands for NVMe over Fabrics. This field shall be set to 01b for
NVMe over Fabrics 1.0 implementations. The definition is described in the table below.
Value Definition
00b PRPs are used for this transfer.
SGLs are used for this transfer. If used, Metadata Pointer
01b (MPTR) contains an address of a single contiguous physical
15:14
buffer that is byte aligned.
SGLs are used for this transfer. If used, Metadata Pointer
10b (MPTR) contains an address of an SGL segment containing
exactly one SGL Descriptor that is Qword aligned.
11b Reserved
If there is metadata that is not interleaved with the logical block data, as specified in the Format NVM
command, then the Metadata Pointer (MPTR) field is used to point to the metadata. The definition of the
Metadata Pointer field is dependent on the setting in this field. Refer to Figure 11.
13:10 Reserved
Fused Operation (FUSE): In a fused operation, a complex command is created by “fusing” together two
simpler commands. Refer to section 4.10. This field specifies whether this command is part of a fused
operation and if so, which command it is in the sequence.
The 64 byte command format for the Admin Command Set and NVM Command Set is defined in Figure
11. Any additional I/O Command Set defined in the future may use an alternate command size or format.
SGLs shall not be used for Admin commands in NVMe over PCIe.
52
NVM Express 1.3
If CDW0.PSDT is set to 00b, then this field shall contain the address of a contiguous physical buffer of
metadata and shall be Dword aligned.
23:16
If CDW0.PSDT is set to 01b, then this field shall contain the address of a contiguous physical buffer of
metadata and shall be byte aligned.
If CDW0.PSDT is set to 10b, then this field shall contain the address of an SGL segment containing exactly
one SGL Descriptor and shall be Qword aligned. If the SGL segment is a Data Block descriptor, then it
describes the entire data transfer. Refer to section 4.4.
Data Pointer (DPTR): This field specifies the data used in the command.
If CDW0.PSDT is set to 01b or 10b, then the definition of this field is:
SGL Entry 1 (SGL1): This field contains the first SGL segment for the command.
If the SGL segment is an SGL Data Block or Keyed SGL Data Block descriptor,
then it describes the entire data transfer. If more than one SGL segment is needed
to describe the data transfer, then the first SGL segment is a Segment, or Last
39:24 Segment descriptor. Refer to section 4.4 for the definition of SGL segments and
descriptor types.
The NVMe Transport may support a subset of SGL Descriptor types and features
as defined in the NVMe Transport binding specification.
53
NVM Express 1.3
43:40 Command Dword 10 (CDW10): This field is command specific Dword 10.
47:44 Command Dword 11 (CDW11): This field is command specific Dword 11.
51:48 Command Dword 12 (CDW12): This field is command specific Dword 12.
55:52 Command Dword 13 (CDW13): This field is command specific Dword 13.
59:56 Command Dword 14 (CDW14): This field is command specific Dword 14.
63:60 Command Dword 15 (CDW15): This field is command specific Dword 15.
In addition to the fields commonly defined for all Admin and NVM commands, Admin and NVM Vendor
Specific commands may support the Number of Dwords in Data Transfer and Number of Dwords in
Metadata Transfer fields. If supported, the command format for the Admin Vendor Specific Command and
NVM Vendor Specific Commands are defined in Figure 12. For more details, refer to section 8.7.
Figure 12: Command Format – Admin and NVM Vendor Specific Commands (Optional)
Bytes Description
03:00 Command Dword 0 (CDW0): This field is common to all commands and is defined in Figure 10.
Namespace Identifier (NSID): This field indicates the namespace ID that this command applies
to. If the namespace ID is not used for the command, then this field shall be cleared to 0h. Setting
this value to FFFFFFFFh causes the command to be applied to all namespaces attached to this
controller, unless otherwise specified.
07:04
The behavior of a controller in response to an inactive namespace ID for a vendor specific
command is vendor specific. Specifying an invalid namespace ID in a command that uses the
namespace ID shall cause the controller to abort the command with status Invalid Namespace or
Format, unless otherwise specified.
15:08 Reserved
39:16 Refer to Figure 11 for the definition of these fields.
Number of Dwords in Data Transfer (NDT): This field indicates the number of Dwords in the
43:40
data transfer.
Number of Dwords in Metadata Transfer (NDM): This field indicates the number of Dwords in
47:44
the metadata transfer.
51:48 Command Dword 12 (CDW12): This field is command specific Dword 12.
55:52 Command Dword 13 (CDW13): This field is command specific Dword 13.
59:56 Command Dword 14 (CDW14): This field is command specific Dword 14.
63:60 Command Dword 15 (CDW15): This field is command specific Dword 15.
63 n+1 n 0
Page Base Address Offset 0 0
54
NVM Express 1.3
A physical region page list (PRP List) is a set of PRP entries in a single page of contiguous memory. A
PRP List describes additional PRP entries that could not be described within the command itself. Any PRP
entries described within the command are not duplicated in a PRP List. If the amount of data to transfer
requires multiple PRP List memory pages, then the last PRP entry before the end of the memory page shall
be a pointer to the next PRP List, indicating the next segment of the PRP List. Figure 15 shows the layout
of a PRP List.
Figure 15: PRP List Layout
63 n+1 n 0
Page Base Address k 0h
Page Base Address k+1 0h
…
Page Base Address k+m 0h
Page Base Address k+m+1 0h
Dependent on the command definition, the first PRP entry contained within the command may have a non-
zero offset within the memory page. The first PRP List entry (i.e. the first pointer to a memory page
containing additional PRP entries) that if present is typically contained in the PRP Entry 2 location within
the command, shall be Qword aligned and may also have a non-zero offset within the memory page.
PRP entries contained within a PRP List shall have a memory page offset of 0h. If a second PRP entry is
present within a command, it shall have a memory page offset of 0h. In both cases, the entries are memory
page aligned based on the value in CC.MPS. If the controller receives a non-zero offset for these PRP
entries the controller should return an error of PRP Offset Invalid.
PRP Lists shall be minimally sized with packed entries starting with entry 0. If more PRP List pages are
required, then the last entry of the PRP List contains the Page Base Address of the next PRP List page.
The next PRP List page shall be memory page aligned. The total number of PRP entries required by a
command is implied by the command parameters and memory page size.
55
NVM Express 1.3
array of one or more SGL descriptors. Only the last descriptor in an SGL segment may be an SGL Segment
descriptor or an SGL Last Segment descriptor.
A last SGL segment is an SGL segment that does not contain an SGL Segment descriptor, or an SGL Last
Segment descriptor.
A controller may support byte or Dword alignment and granularity of Data Blocks. If a controller supports
only Dword alignment and granularity as indicated in the SGL Support field of the Identify Controller data
structure, then the values in the Address and Length fields of all Data Block descriptors shall have their
lower two bits cleared to 00b. This requirement applies to Data Block descriptors that indicate data and/or
metadata memory regions.
A Keyed SGL Data Block descriptor is a Data Block descriptor that includes a key that is used as part of
the host memory access. The maximum length that may be specified in a Keyed SGL Data Block descriptor
is (16 MB – 1).
The SGL Identifier Descriptor Sub Type field may indicate additional information about a descriptor. As an
example, the Sub Type may indicate that the Address field is an offset rather than an absolute address.
The Sub Type may also indicate NVMe Transport specific information.
The controller shall abort a command if:
an SGL segment contains an SGL Segment descriptor or an SGL Last Segment descriptor in
other than the last descriptor in the segment;
a last SGL segment contains an SGL Segment descriptor, or an SGL Last Segment descriptor;
an SGL descriptor has an unsupported format; or
an SGL Data Block descriptor contains Address or Length fields with either of the two lower bits
set to 1b and the controller supports only Dword alignment and granularity as indicated in the
SGL Support field of the Identify Controller data structure.
An SGL segment contains one or more SGL descriptors. Figure 17 defines the generic SGL descriptor
format.
15 Bits Description
03:00 SGL Descriptor Sub Type (refer to Figure 19)
07:04 SGL Descriptor Type (refer to Figure 18)
The SGL Descriptor Type field defined in Figure 18 specifies the SGL descriptor type. If the SGL Descriptor
Type field is set to a reserved or unsupported value, then the SGL descriptor shall be processed as having
56
NVM Express 1.3
an error. If the SGL Descriptor Sub Type field is set to an unsupported value, then the descriptor shall be
processed as having an SGL Descriptor Type error.
An SGL descriptor set to all zeros is an SGL Data Block descriptor with the Address field set to
00000000_00000000h and the Length field set to 00000000h may be used as a NULL descriptor.
Figure 19 defines the SGL Descriptor Sub Type Values. For each Sub Type Value defined, the Descriptor
Types that it applies to are indicated.
The SGL Data Block descriptor, defined in Figure 20, describes a data block.
57
NVM Express 1.3
Bits Description
15
SGL Descriptor Sub Type field. Valid values are specified in
03:00
Figure 19.
07:04 SGL Descriptor Type: 0h as specified in Figure 18.
The SGL Bit Bucket descriptor, defined in Figure 21, is used to ignore parts of source data.
58
NVM Express 1.3
If the SGL Bit Bucket Descriptor describes a destination data buffer (e.g., a read from the controller
to memory), then the Length field specifies the number of bytes of the source data which the
controller shall discard (i.e., not transfer to the destination data buffer).
11:8
If the SGL Bit Bucket Descriptor describes a source data buffer (e.g., a write from memory to the
controller), then the Bit Bucket Descriptor shall be treated as if the Length field were set to
00000000h (i.e., the Bit Bucket Descriptor has no effect).
If SGL Bit Bucket descriptors are supported, their length in a destination data buffer shall be
included in the Number of Logical Blocks (NLB) parameter specified in NVM Command Set data
transfer commands. Their length in a source data buffer is not included in the NLB parameter.
14:12 Reserved
SGL Identifier: The definition of this field is described in the table below.
Bits Description
15
SGL Descriptor Sub Type field. Valid values are specified in
03:00
Figure 19.
07:04 SGL Descriptor Type: 1h as specified in Figure 18.
The SGL Segment descriptor, defined in Figure 22, describes the next SGL segment, which is not the last
SGL segment.
Bits Description
15
SGL Descriptor Sub Type field. Valid values are specified in
03:00
Figure 19.
07:04 SGL Descriptor Type: 2h as specified in Figure 18.
The SGL Last Segment descriptor, defined in Figure 23, describes the next and last SGL segment. A last
SGL segment that contains an SGL Segment descriptor or an SGL Last Segment descriptor is processed
as an error.
59
NVM Express 1.3
Bits Description
15
SGL Descriptor Sub Type field. Valid values are specified in
03:00
Figure 19.
07:04 SGL Descriptor Type: 3h as specified in Figure 18.
The Keyed SGL Data Block descriptor, defined in Figure 24, describes a keyed data block.
Bits Description
15
SGL Descriptor Sub Type field. Valid values are specified in
03:00
Figure 19.
07:04 SGL Descriptor Type: 4h as specified in Figure 18.
60
NVM Express 1.3
Host DRAM
61
NVM Express 1.3
In the case where the namespace is formatted to transfer the metadata as a separate buffer of data, then
the Metadata Region is used. In this case, the location of the Metadata Region is indicated by the Metadata
Pointer within the command. The Metadata Pointer within the command shall be Dword aligned.
The controller may support several physical formats of logical block size and associated metadata size.
There may be performance differences between different physical formats. This is indicated as part of the
Identify Namespace data structure.
If the namespace is formatted to use end-to-end data protection, then the first eight bytes or last eight bytes
of the metadata is used for protection information (specified as part of the NVM Format operation).
Figure 26: Completion Queue Entry Layout – Admin and NVM Command Set
31 23 15 7 0
DW0 Command Specific
DW1 Reserved
DW2 SQ Identifier SQ Head Pointer
DW3 Status Field P Command Identifier
62
NVM Express 1.3
63
NVM Express 1.3
64
NVM Express 1.3
Value Description
08h Command Aborted due to SQ Deletion: The command was aborted due to a Delete I/O
Submission Queue request received for the Submission Queue to which the command was
submitted.
09h Command Aborted due to Failed Fused Command: The command was aborted due to the other
command in a fused operation failing.
0Ah Command Aborted due to Missing Fused Command: The command was aborted due to the
companion fused command not being found as the subsequent Submission Queue entry.
0Bh Invalid Namespace or Format: The namespace or the format of that namespace is invalid.
0Ch Command Sequence Error: The command was aborted due to a protocol violation in a multi-
command sequence (e.g. a violation of the Security Send and Security Receive sequencing rules
in the TCG Storage Synchronous Interface Communications protocol).
0Dh Invalid SGL Segment Descriptor: The command includes an invalid SGL Last Segment or SGL
Segment descriptor. This may occur when the SGL segment pointed to by an SGL Last Segment
descriptor contains an SGL Segment descriptor or an SGL Last Segment descriptor or an SGL
Segment descriptor. This may occur when an SGL Last Segment descriptor contains an invalid
length (i.e., a length of zero or one that is not a multiple of 16).
0Eh Invalid Number of SGL Descriptors: There is an SGL Last Segment descriptor or an SGL
Segment descriptor in a location other than the last descriptor of a segment based on the length
indicated.
0Fh Data SGL Length Invalid: This may occur if the length of a Data SGL is too short. This may occur
if the length of a Data SGL is too long and the controller does not support SGL transfers longer than
the amount of data to be transferred as indicated in the SGL Support field of the Identify Controller
data structure.
10h Metadata SGL Length Invalid: This may occur if the length of a Metadata SGL is too short. This
may occur if the length of a Metadata SGL is too long and the controller does not support SGL
transfers longer than the amount of data to be transferred as indicated in the SGL Support field of
the Identify Controller data structure.
11h SGL Descriptor Type Invalid: The type of an SGL Descriptor is a type that is not supported by
the controller.
12h Invalid Use of Controller Memory Buffer: The attempted use of the Controller Memory Buffer is
not supported by the controller. Refer to section 4.7.
13h PRP Offset Invalid: The Offset field for a PRP entry is invalid. This may occur when there is a
PRP entry with a non-zero offset after the first entry.
14h Atomic Write Unit Exceeded: The length specified exceeds the atomic write unit size.
15h Operation Denied: The command was denied due to lack of access rights. Refer to the appropriate
security specification (e.g., TCG SIIS). For media access commands, the Access Denied status
code should be used instead.
16h SGL Offset Invalid: The offset specified in a descriptor is invalid. This may occur when using
capsules for data transfers in NVMe over Fabrics and an invalid offset in the capsule is specified.
17h Reserved
18h Host Identifier Inconsistent Format: The NVM subsystem detected the simultaneous use of 64-
bit and 128-bit Host Identifier values on different controllers.
19h Keep Alive Timeout Expired: The Keep Alive Timeout expired.
1Ah Keep Alive Timeout Invalid: The Keep Alive Timeout value specified is invalid. This may be due
to an attempt to specify a value of 0h on a transport that requires Keep Alive to be enabled. This
may be due to the value specified being too large for the associated NVMe Transport as defined in
the NVMe Transport binding specification.
1Bh Command Aborted due to Preempt and Abort: The command was aborted due to a Reservation
Acquire command with the Reservation Acquire Action (RACQA) set to 010b (Preempt and Abort).
1Ch Sanitize Failed: The most recent sanitize operation failed and no recovery action has been
successfully completed.
1Dh Sanitize In Progress: The requested function (e.g., command) is prohibited while a sanitize
operation is in progress. Refer to section 8.15.1.
65
NVM Express 1.3
Value Description
1Eh SGL Data Block Granularity Invalid: The Address alignment or Length granularity for an SGL
Data Block descriptor is invalid. This may occur when a controller supports Dword granularity only
and the lower two bits of the Address or Length are not cleared to 00b.
NOTE: An implementation compliant to revision 1.2.1 or earlier may use the status code value of
15h to indicate SGL Data Block Granularity Invalid.
1Fh Command Not Supported for Queue in CMB: The implementation does not support submission
of the command to a Submission Queue in the Controller Memory Buffer or command completion
to a Completion Queue in the Controller Memory Buffer.
NOTE: Revision 1.3 uses this status code only for Sanitize commands.
20h – 7Fh Reserved
80h – BFh I/O Command Set Specific
C0h – FFh Vendor Specific
Figure 32: Status Code – Generic Command Status Values, NVM Command Set
Value Description
80h LBA Out of Range: The command references an LBA that exceeds the size of the namespace.
81h Capacity Exceeded: Execution of the command has caused the capacity of the namespace to be
exceeded. This error occurs when the Namespace Utilization exceeds the Namespace Capacity,
as reported in Figure 114.
82h Namespace Not Ready: The namespace is not ready to be accessed. The Do Not Retry bit
indicates whether re-issuing the command at a later time may succeed.
83h Reservation Conflict: The command was aborted due to a conflict with a reservation held on the
accessed namespace. Refer to section 8.8.
84h Format In Progress: A Format NVM command is in progress on the namespace. The Do Not
Retry bit shall be cleared to ‘0’ to indicate that the command may succeed if it is resubmitted.
85h – BFh Reserved
66
NVM Express 1.3
67
NVM Express 1.3
Figure 34: Status Code – Command Specific Status Values, NVM Command Set
Value Description Commands Affected
80h Conflicting Attributes Dataset Management, Read, Write
81h Invalid Protection Information Compare, Read, Write, Write Zeroes
82h Attempted Write to Read Only Range Dataset Management, Write, Write
Uncorrectable, Write Zeroes
83h - BFh Reserved
Figure 35: Status Code – Media and Data Integrity Error Values
Value Description
00h – 7Fh Reserved
80h – BFh I/O Command Set Specific
C0h – FFh Vendor Specific
Figure 36: Status Code – Media and Data Integrity Error Values, NVM Command Set
Value Description
80h Write Fault: The write data could not be committed to the media.
81h Unrecovered Read Error: The read data could not be recovered from the media.
82h End-to-end Guard Check Error: The command was aborted due to an end-to-end guard check
failure.
83h End-to-end Application Tag Check Error: The command was aborted due to an end-to-end
application tag check failure.
84h End-to-end Reference Tag Check Error: The command was aborted due to an end-to-end
reference tag check failure.
85h Compare Failure: The command failed due to a miscompare during a Compare command.
86h Access Denied: Access to the namespace and/or LBA range is denied due to lack of access rights.
Refer to the appropriate security specification (e.g., TCG SIIS).
87h Deallocated or Unwritten Logical Block: The command failed due to an attempt to read from an
LBA range containing a deallocated or unwritten logical block.
88h – BFh Reserved
68
NVM Express 1.3
write the data and/or metadata to the Controller Memory Buffer rather than have the controller fetch it from
host memory.
The contents of the Controller Memory Buffer are initially undefined. Host software should initialize any
memory before it is referenced (e.g., a Completion Queue shall be initialized by host software in order for
the Phase Tag to be used correctly).
A controller memory based queue is used in the same manner as a host memory based queue – the
difference is the memory address used is located within the controller’s own memory rather than in the host
memory. The Admin or I/O Queues may be placed in the Controller Memory Buffer. For a particular queue,
all memory associated with it shall reside in either the Controller Memory Buffer or host memory. For all
queues in the Controller Memory Buffer, the queue shall be physically contiguous.
The controller may support PRPs and SGLs in the Controller Memory Buffer. For a particular PRP List or
SGL associated with a single command, all memory associated with the PRP List or SGLs shall reside in
either the Controller Memory Buffer or host memory. The PRPs and SGLs for a command may only be
placed in the Controller Memory Buffer if the associated command is present in a Submission Queue in the
Controller Memory Buffer.
The controller may support data and metadata in the Controller Memory Buffer. All data or metadata
associated with a particular command shall be located in either the Controller Memory Buffer or host
memory.
If the requirements for the Controller Memory Buffer use are violated by the host, the controller shall fail the
associated command with Invalid Use of Controller Memory Buffer status.
The address region allocated for the CMB shall be 4 KB aligned. It is recommended that a controller allocate
the CMB on an 8 KB boundary. The controller shall support burst transactions up to the maximum payload
size, support byte enables, and arbitrary byte alignment. The host shall ensure that all writes to the CMB
that are needed for a command have been sent before updating the SQ Tail doorbell register. The Memory
Write Request to the SQ Tail doorbell register shall not have the Relaxed Ordering bit set, to ensure that it
arrives at the controller after all writes to the CMB.
69
NVM Express 1.3
Whether a command is part of a fused operation is indicated in the Fused Operation field of Command
Dword 0 in Figure 10. The Fused Operation field also indicates whether this is the first or second command
in the operation.
70
NVM Express 1.3
A candidate command is a submitted command which has been transferred into the controller that the
controller deems ready for processing. The controller selects command(s) for processing from the pool of
submitted commands for each Submission Queue. The commands that comprise a fused operation shall
be processed together and in order by the controller. The controller may select candidate commands for
processing in any order. The order in which commands are selected for processing does not imply the
order in which commands are completed.
Arbitration is the method used to determine the Submission Queue from which the controller starts
processing the next candidate command(s). Once a Submission Queue is selected using arbitration, the
Arbitration Burst setting determines the maximum number of commands that the controller may start
processing from that Submission Queue before arbitration shall again take place. A fused operation may
be considered as one or two commands by the controller.
All controllers shall support the round robin command arbitration mechanism. A controller may optionally
implement weighted round robin with urgent priority class and/or a vendor specific arbitration mechanism.
The Arbitration Mechanism Supported field in the Controller Capabilities register (CC.AMS) indicates
optional arbitration mechanisms supported by the controller.
In order to make efficient use of the non-volatile memory, it is often advantageous to execute multiple
commands from a Submission Queue in parallel. For Submission Queues that are using weighted round
robin with urgent priority class or round robin arbitration, host software may configure an Arbitration Burst
setting. The Arbitration Burst setting indicates the maximum number of commands that the controller may
launch at one time from a particular Submission Queue. It is recommended that host software configure
the Arbitration Burst setting as close to the recommended value by the controller as possible (specified in
the Recommended Arbitration Burst field of the Identify Controller data structure in Figure 109), taking into
consideration any latency requirements. Refer to section 5.21.1.1.
ASQ
SQ
RR
SQ
SQ
71
NVM Express 1.3
The highest strict priority class is the Admin class that includes any command submitted to the Admin
Submission Queue. This class has the highest strict priority above commands submitted to any other
Submission Queue.
The next highest strict priority class is the Urgent class. Any I/O Submission Queue assigned to the Urgent
priority class is serviced next after commands submitted to the Admin Submission Queue, and before any
commands submitted to a weighted round robin priority level. Host software should use care in assigning
any Submission Queue to the Urgent priority class since there is the potential to starve I/O Submission
Queues in the weighted round robin priority levels as there is no fairness protocol between Urgent and non
Urgent I/O Submission Queues.
The lowest strict priority class is the Weighted Round Robin class. This class consists of the three weighted
round robin priority levels (High, Medium, and Low) that share the remaining bandwidth using weighted
round robin arbitration. Host software controls the weights for the High, Medium, and Low service classes
via Set Features. Round robin is used to arbitrate within multiple Submission Queues assigned to the same
weighted round robin level. The number of candidate commands that may start processing from each
Submission Queue per round is either the Arbitration Burst setting or the remaining weighted round robin
credits, whichever is smaller.
Figure 40: Weighted Round Robin with Urgent Priority Class Arbitration
Admin ASQ
SQ
Urgent RR
SQ
Strict
Priority 1
SQ
Strict
High
SQ RR Priority 2 Priority
Priority
SQ Weight(High)
Strict
Priority 3
SQ
Medium Weight(Medium)
SQ RR WRR
Priority
SQ
SQ Weight(Low)
Low
SQ RR
Priority
SQ
In Figure 40, the Priority decision point selects the highest priority candidate command selected next to
start processing.
72
NVM Express 1.3
73
NVM Express 1.3
74
NVM Express 1.3
Opcode by Field
Namespace
(07) (06:02) (01:00) Combined
1 Identifier Command
Generic Data 2 O/M
Function Opcode 3
Command 4 Used
Transfer
Vendor Specific
1b na NOTE 4 C0h – FFh O Vendor specific
NOTES:
1. O/M definition: O = Optional, M = Mandatory.
2. Opcodes not listed are reserved.
3. A subset of commands uses the Namespace Identifier field (CDW1.NSID). When not used, the field shall be
cleared to 0h.
4. Indicates the data transfer direction of the command. All options to the command shall transfer data as specified or
transfer no data. All commands, including vendor specific commands, shall follow this convention: 00b = no data
transfer; 01b = host to controller; 10b = controller to host; 11b = bidirectional.
5. For NVMe over PCIe implementations, the Keep Alive command is optional. For NVMe over Fabrics
implementations, the associated NVMe Transport binding defines whether the Keep Alive command is optional or
mandatory.
Figure 42 defines Admin commands that are specific to the NVM Command Set.
Figure 42: Opcodes for Admin Commands – NVM Command Set Specific
Opcode Opcode Opcode
Namespace
(07) (06:02) (01:00)
2 1 Identifier Command
Generic Data Opcode O/M
Function 3
Command 4 Used
Transfer
1b 000 00b 00b 80h O Yes Format NVM
1b 000 00b 01b 81h O NOTE 5 Security Send
1b 000 00b 10b 82h O NOTE 5 Security Receive
1b 000 01b 00b 84h O No Sanitize
NOTES:
1. O/M definition: O = Optional, M = Mandatory.
2. Opcodes not listed are reserved.
3. A subset of commands uses the Namespace Identifier field (CDW1.NSID). When not used, the field shall be
cleared to 0h.
4. Indicates the data transfer direction of the command. All options to the command shall transfer data as specified or
transfer no data. All commands, including vendor specific commands, shall follow this convention: 00b = no data
transfer; 01b = host to controller; 10b = controller to host; 11b = bidirectional.
5. The use of the Namespace Identifier is Security Protocol specific.
75
NVM Express 1.3
76
NVM Express 1.3
77
NVM Express 1.3
Vendor Specific event: Indicates a vendor specific event. To clear this event, host software reads
the indicated vendor specific log page using Get Log Page command with the Retain Asynchronous
Event field cleared to ‘0’.
Asynchronous events are reported due to a new entry being added to a log page (e.g., Error Information
log) or a status update (e.g., status in the SMART / Health log). A status change may be permanent (e.g.,
the media has become read only) or transient (e.g., the temperature exceeded a threshold for a period of
time). Host software should modify the event threshold or mask the event for transient and permanent
status changes before issuing another Asynchronous Event Request command to avoid repeated reporting
of asynchronous events.
If the controller needs to report an event and there are no outstanding Asynchronous Event Request
commands, the controller should send a single notification of that Asynchronous Event Type when an
Asynchronous Event Request command is received. If a Get Log Page command clears the event prior to
receiving the Asynchronous Event Request command or if a power off condition occurs, then a notification
is not sent.
Dword 0 of the completion queue entry contains information about the asynchronous event. The definition
of Dword 0 of the completion queue entry is in Figure 46.
78
NVM Express 1.3
Value Definition
02:00 0h Error status
1h SMART / Health status
2h Notice
3h – 5h Reserved
6h I/O Command Set specific status
7h Vendor specific
The information in either Figure 47, Figure 48, or Figure 50 is returned in the Asynchronous Event
Information field, depending on the Asynchronous Event Type.
79
NVM Express 1.3
A controller shall not send this event when Namespace Utilization has changed, as this is a
frequent event that does not require action by the host. A controller shall only send this event for
changes to the Format Progress Indicator field when bits 6:0 of that field transition from a non-
zero value to zero, or from a zero value to a non-zero value.
Firmware Activation Starting: The controller is starting a firmware activation process during
which command processing is paused. Host software may use CSTS.PP to determine when
1h
command processing has resumed. To clear this event, host software reads the Firmware Slot
Information log page.
Telemetry Log Changed: The controller has saved the controller internal state in the Telemetry
Controller-Initiated log page and set the Telemetry Controller-Initiated Data Available field to 1h
2h
in that log page. To clear this event, the host issues a Get Log Page with Retain Asynchronous
Event cleared to ‘0’ for the Telemetry Controller-Initiated Log.
3h – FFh Reserved
Figure 50: Asynchronous Event Information – NVM Command Set Specific Status
Value Description
0h Reservation Log Page Available: Indicates that one or more Reservation Notification log pages
(refer to section 5.14.1.9.1) have been added to the Reservation Notification log.
1h Sanitize Operation Completed: Indicates that a sanitize operation has completed and status is
available in the Sanitize Status log page (refer to section 5.14.1.9.2).
2h - FFh Reserved
80
NVM Express 1.3
81
NVM Express 1.3
Figure 54: Create I/O Completion Queue – Command Specific Status Values
Value Description
1h Invalid Queue Identifier: The creation of the I/O Completion Queue failed due to an invalid queue
identifier specified as part of the command. An invalid queue identifier is one that is currently in use
or one that is outside the range supported by the controller.
2h Invalid Queue Size: The host attempted to create an I/O Completion Queue with an invalid number
of entries (e.g., a value of zero or a value which exceeds the maximum supported by the controller,
specified in CAP.MQES).
8h Invalid Interrupt Vector: The creation of the I/O Completion Queue failed due to an invalid interrupt
vector specified as part of the command.
82
NVM Express 1.3
If the queue is located in the Controller Memory Buffer and PC is cleared to ‘0’, the controller shall
fail the command with Invalid Use of Controller Memory Buffer status.
83
NVM Express 1.3
Figure 58: Create I/O Submission Queue – Command Specific Status Values
Value Description
0h Completion Queue Invalid: The Completion Queue identifier specified in the command does not
exist.
1h Invalid Queue Identifier: The creation of the I/O Submission Queue failed due an invalid queue
identifier specified as part of the command. An invalid queue identifier is one that is currently in use
or one that is outside the range supported by the controller.
2h Invalid Queue Size: The host attempted to create an I/O Completion Queue with an invalid number
of entries (e.g., a value of zero or a value which exceeds the maximum supported by the controller,
specified in CAP.MQES).
Figure 60: Delete I/O Completion Queue – Command Specific Status Values
Value Description
Invalid Queue Identifier: The Queue Identifier specified in the command is invalid. This error is
1h
also indicated if the Admin Completion Queue identifier is specified.
Invalid Queue Deletion: This error indicates that it is invalid to delete the I/O Completion Queue
0Ch specified. The typical reason for this error condition is that there is an associated I/O Submission
Queue that has not been deleted.
84
NVM Express 1.3
Figure 62: Delete I/O Submission Queue – Command Specific Status Values
Value Description
1h Invalid Queue Identifier: The Queue Identifier specified in the command is invalid. This error is
also indicated if the Admin Submission Queue identifier is specified.
85
NVM Express 1.3
86
NVM Express 1.3
The Device Self-test command uses the Command Dword 10 field. All other command specific fields are
reserved.
Value Definition
0h Reserved
03:00 1h Start a short device self-test operation
2h Start an extended device self-test operation
3h-Dh Reserved
Eh Vendor specific
Fh Abort device self-test operation
The processing of a Device Self-test command and interactions with a device self-test operation already in
progress is defined in Figure 68.
87
NVM Express 1.3
88
NVM Express 1.3
89
NVM Express 1.3
Data Pointer (DPTR): This field specifies the start of the data buffer. Refer to Figure 11 for the
127:00
definition of this field.
90
NVM Express 1.3
Value Definition
Downloaded image replaces the image specified by
000b
the Firmware Slot field. This image is not activated.
Downloaded image replaces the image specified by
001b the Firmware Slot field. This image is activated at the
05:03 next reset.
The image specified by the Firmware Slot field is
010b
activated at the next reset.
The image specified by the Firmware Slot field is
011b
requested to be activated immediately without reset.
100-101b Reserved
Downloaded image replaces the Boot Partition
110b
specified by the Boot Partition ID field.
Mark the Boot Partition specified in the BPID field as
111b
active and update BPINFO.ABPID.
Firmware Slot (FS): Specifies the firmware slot that shall be used for the Commit Action, if
02:00 applicable. If the value specified is 0h, then the controller shall choose the firmware slot (slot 1 –
7) to use for the operation.
91
NVM Express 1.3
92
NVM Express 1.3
The Firmware Image Download command uses the Data Pointer, Command Dword 10, and Command
Dword 11 fields. All other command specific fields are reserved.
93
NVM Express 1.3
Select Description
000b Current
001b Default
010b Saved
011b Supported capabilities
10:08 100b – 111b Reserved
Refer to section 5.13.1 for details on the value returned in each case.
The controller indicates in bit 4 of the Optional NVM Command Support field of the Identify
Controller data structure in Figure 109 whether this field is supported.
If a Get Features command is received with the Select field set to 010b (i.e., saved) and the
controller does not support the Feature Identifier being saved or does not currently have any saved
values, then the controller shall treat the Select field as though it was set to 001b (i.e., default.)
07:00 Feature Identifier (FID): This field specifies the identifier of the Feature for which to provide data.
Figure 84 describes the Feature Identifiers whose attributes may be retrieved using Get Features. The
definition of the attributes returned and associated format is specified in the section indicated.
94
NVM Express 1.3
95
NVM Express 1.3
Host software should clear this field to ‘0’ for log pages that are not used with Asynchronous
Events. Refer to section 5.2.
14:12 Reserved
Log Specific Field (LSP): If not defined for the log specified by the Log Page Identifier field, this
11:08
field is reserved.
07:00 Log Page Identifier (LID): This field specifies the identifier of the log page to retrieve.
96
NVM Express 1.3
Figure 91: Get Log Page – Log Page Identifiers, NVM Command Set Specific
Log Identifier O/M Description Reference Section
80h O Reservation Notification 5.14.1.9.1
81h O Sanitize Status 5.14.1.9.2
82h – BFh Reserved
97
NVM Express 1.3
Figure 92: Get Log Page – Error Information Log Entry (Log Identifier 01h)
Bytes Description
Error Count: This is a 64-bit incrementing error count, indicating a unique identifier for this error.
The error count starts at 1h, is incremented for each unique error log entry, and is retained across
07:00 power off conditions. A value of 0h indicates an invalid entry; this value is used when there are
lost entries or when there are fewer errors than the maximum number of entries the controller
supports.
Submission Queue ID: This field indicates the Submission Queue Identifier of the command that
09:08 the error information is associated with. If the error is not specific to a particular command then
this field shall be set to FFFFh.
Command ID: This field indicates the Command Identifier of the command that the error is
11:10 assocated with. If the error is not specific to a particular command then this field shall be set to
FFFFh.
Status Field: This field indicates the Status Field for the command that completed. The Status
Field is located in bits 15:01, bit 00 corresponds to the Phase Tag posted for the command. If the
13:12
error is not specific to a particular command then this field reports the most applicable status
value.
Parameter Error Location: This field indicates the byte and bit of the command parameter that
the error is associated with, if applicable. If the parameter spans multiple bytes or bits, then the
location indicates the first byte and bit of the parameter.
Bits Description
15:11 Reserved
15:14
Bit in command that contained the error. Valid values
10:8
are 0 to 7.
Byte in command that contained the error. Valid
7:0
values are 0 to 63.
If the error is not specific to a particular command then this field shall be set to FFFFh.
23:16 LBA: This field indicates the first LBA that experienced the error condition, if applicable.
27:24 Namespace: This field indicates the namespace that the error is associated with, if applicable.
Vendor Specific Information Available: If there is additional vendor specific error information
28 available, this field provides the log page identifier associated with that page. A value of 00h
indicates that no additional information is available. Valid values are in the range of 80h to FFh.
31:29 Reserved
Command Specific Information: This field contains command specific information. If used, the
39:32
command definition specifies the information returned.
63:40 Reserved
98
NVM Express 1.3
Bit Definition
If set to ‘1’, then the available spare space has fallen below
0
the threshold.
If set to ‘1’, then a temperature is above an over
0 1 temperature threshold or below an under temperature
threshold (refer to section 5.21.1.4).
If set to ‘1’, then the NVM subsystem reliability has been
2 degraded due to significant media related errors or any
internal error that degrades NVM subsystem reliability.
If set to ‘1’, then the media has been placed in read only
3
mode.
If set to ‘1’, then the volatile memory backup device has
4 failed. This field is only valid if the controller has a volatile
memory backup solution.
7:5 Reserved
Composite Temperature: Contains a value corresponding to a temperature in degrees
Kelvin that represents the current composite temperature of the controller and namespace(s)
associated with that controller. The manner in which this value is computed is implementation
specific and may not represent the actual temperature of any physical point in the NVM
2:1 subsystem. The value of this field may be used to trigger an asynchronous event (refer to
section 5.21.1.4).
Warning and critical overheating composite temperature threshold values are reported by the
WCTEMP and CCTEMP fields in the Identify Controller data structure in Figure 109.
Available Spare: Contains a normalized percentage (0 to 100%) of the remaining spare
3
capacity available.
Available Spare Threshold: When the Available Spare falls below the threshold indicated in
4 this field, an asynchronous event completion may occur. The value is indicated as a
normalized percentage (0 to 100%).
Percentage Used: Contains a vendor specific estimate of the percentage of NVM subsystem
life used based on the actual usage and the manufacturer’s prediction of NVM life. A value of
100 indicates that the estimated endurance of the NVM in the NVM subsystem has been
consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed
5 100. Percentages greater than 254 shall be represented as 255. This value shall be updated
once per power-on hour (when the controller is not in a sleep state).
Refer to the JEDEC JESD218A standard for SSD device life and endurance measurement
techniques.
31:6 Reserved
99
NVM Express 1.3
Data Units Read: Contains the number of 512 byte data units the host has read from the
controller; this value does not include metadata. This value is reported in thousands (i.e., a
value of 1 corresponds to 1000 units of 512 bytes read) and is rounded up. When the LBA
size is a value other than 512 bytes, the controller shall convert the amount of data read to
47:32
512 byte units.
For the NVM command set, logical blocks read as part of Compare and Read operations shall
be included in this value.
Data Units Written: Contains the number of 512 byte data units the host has written to the
controller; this value does not include metadata. This value is reported in thousands (i.e., a
value of 1 corresponds to 1000 units of 512 bytes written) and is rounded up. When the LBA
size is a value other than 512 bytes, the controller shall convert the amount of data written to
63:48
512 byte units.
For the NVM command set, logical blocks written as part of Write operations shall be included
in this value. Write Uncorrectable commands shall not impact this value.
Host Read Commands: Contains the number of read commands completed by the controller.
79:64
For the NVM command set, this is the number of Compare and Read commands.
Host Write Commands: Contains the number of write commands completed by the
controller.
95:80
For the NVM command set, this is the number of Write commands.
Controller Busy Time: Contains the amount of time the controller is busy with I/O commands.
The controller is busy when there is a command outstanding to an I/O Queue (specifically, a
111:96 command was issued via an I/O Submission Queue Tail doorbell write and the corresponding
completion queue entry has not been posted yet to the associated I/O Completion Queue).
This value is reported in minutes.
127:112 Power Cycles: Contains the number of power cycles.
Power On Hours: Contains the number of power-on hours. This may not include time that
143:128
the controller was powered and in a non-operational power state.
Unsafe Shutdowns: Contains the number of unsafe shutdowns. This count is incremented
159:144
when a shutdown notification (CC.SHN) is not received prior to loss of power.
Media and Data Integrity Errors: Contains the number of occurrences where the controller
175:160 detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC
checksum failure, or LBA tag mismatch are included in this field.
Number of Error Information Log Entries: Contains the number of Error Information log
191:176
entries over the life of the controller.
Warning Composite Temperature Time: Contains the amount of time in minutes that the
controller is operational and the Composite Temperature is greater than or equal to the
Warning Composite Temperature Threshold (WCTEMP) field and less than the Critical
Composite Temperature Threshold (CCTEMP) field in the Identify Controller data structure in
195:192
Figure 109.
If the value of the WCTEMP or CCTEMP field is 0h, then this field is always cleared to 0h
regardless of the Composite Temperature value.
Critical Composite Temperature Time: Contains the amount of time in minutes that the
controller is operational and the Composite Temperature is greater than the Critical Composite
Temperature Threshold (CCTEMP) field in the Identify Controller data structure in Figure 109.
199:196
If the value of the CCTEMP field is 0h, then this field is always cleared to 0h regardless of the
Composite Temperature value.
Temperature Sensor 1: Contains the current temperature reported by temperature sensor 1.
201:200
This field is defined by Figure 94.
Temperature Sensor 2: Contains the current temperature reported by temperature sensor 2.
203:202
This field is defined by Figure 94.
Temperature Sensor 3: Contains the current temperature reported by temperature sensor 3.
205:204
This field is defined by Figure 94.
Temperature Sensor 4: Contains the current temperature reported by temperature sensor 4.
207:206
This field is defined by Figure 94.
100
NVM Express 1.3
The physical point in the NVM subsystem whose temperature is reported by the temperature
15:00
sensor and the temperature accuracy is implementation specific. An implementation that does
not implement the temperature sensor reports a temperature of zero degrees Kelvin. The
temperature reported by a temperature sensor may be used to trigger an asynchronous event
(refer to section 5.21.1.4).
101
NVM Express 1.3
Bit 7 is reserved.
Bits 6:4 indicates the firmware slot that is going to be activated at the next controller reset. If this
00 field is 0h, then the controller does not indicate the firmware slot that is going to be activated at
the next controller reset.
Bit 3 is reserved.
Bits 2:0 indicates the firmware slot from which the actively running firmware revision was loaded.
07:01 Reserved
Firmware Revision for Slot 1 (FRS1): Contains the revision of the firmware downloaded to
15:08 firmware slot 1. If no valid firmware revision is present or if this slot is unsupported, this field shall
be cleared to 0h.
Firmware Revision for Slot 2 (FRS2): Contains the revision of the firmware downloaded to
23:16 firmware slot 2. If no valid firmware revision is present or if this slot is unsupported, this field shall
be cleared to 0h.
Firmware Revision for Slot 3 (FRS3): Contains the revision of the firmware downloaded to
31:24 firmware slot 3. If no valid firmware revision is present or if this slot is unsupported, this field shall
be cleared to 0h.
Firmware Revision for Slot 4 (FRS4): Contains the revision of the firmware downloaded to
39:32 firmware slot 4. If no valid firmware revision is present or if this slot is unsupported, this field shall
be cleared to 0h.
Firmware Revision for Slot 5 (FRS5): Contains the revision of the firmware downloaded to
47:40 firmware slot 5. If no valid firmware revision is present or if this slot is unsupported, this field shall
be cleared to 0h.
Firmware Revision for Slot 6 (FRS6): Contains the revision of the firmware downloaded to
55:48 firmware slot 6. If no valid firmware revision is present or if this slot is unsupported, this field shall
be cleared to 0h.
Firmware Revision for Slot 7 (FRS7): Contains the revision of the firmware downloaded to
63:56 firmware slot 7. If no valid firmware revision is present or if this slot is unsupported, this field shall
be cleared to 0h.
511:64 Reserved
102
NVM Express 1.3
Figure 96: Get Log Page – Commands Supported and Effects Log
Bytes Description
Admin Command Supported 0 (ACS0): Contains the Commands Supported and Effects data
03:00
structure (refer to Figure 97) for the Admin command with an opcode value of 0h.
Admin Command Supported 1 (ACS1): Contains the Commands Supported and Effects data
07:04
structure (refer to Figure 97) for the Admin command with an opcode value of 1h.
… …
1019: Admin Command Supported 254 (ACS254): Contains the Commands Supported and Effects
1016 data structure (refer to Figure 97) for the Admin command with an opcode value of 254.
1023: Admin Command Supported 255 (ACS255): Contains the Commands Supported and Effects
1020 data structure (refer to Figure 97) for the Admin command with an opcode value of 255.
1027: I/O Command Supported 0 (IOCS0): Contains the Commands Supported and Effects data
1024 structure (refer to Figure 97) for the I/O command with an opcode value of 0h.
1031: I/O Command Supported 1 (IOCS1): Contains the Commands Supported and Effects data
1028 structure (refer to Figure 97) for the I/O command with an opcode value of 1h.
… …
2043: I/O Command Supported 254 (IOCS254): Contains the Commands Supported and Effects data
2040 structure (refer to Figure 97) for the I/O command with an opcode value of 254.
2047: I/O Command Supported 255 (IOCS255): Contains the Commands Supported and Effects data
2044 structure (refer to Figure 97) for the I/O command with an opcode value of 255.
4095:
Reserved
2048
The Commands Supported and Effects data structure describes the overall possible effect of a command,
including any optional features of the command.
Host software may take command effects into account when determining how to submit commands and
actions to take after the command is complete. It is recommended that if a command may change a
particular capability that host software re-enumerate and/or re-initialize the associated capability after the
command is complete. For example, if a namespace capability change may occur, then host software is
recommended to pause the use of the associated namespace, submit the command that may cause a
namespace capability change and wait for its completion, and then re-issue the Identify command.
Figure 97: Get Log Page – Commands Supported and Effects Data Structure
Bits Description
31:19 Reserved
Command Submission and Execution (CSE): This field defines the command submission and
execution recommendations for the associated command.
Value Definition
000b No command submission or execution restriction
Command may be submitted when there is no other outstanding
18:16 command to the same namespace and another command should
001b
not be submitted to the same namespace until this command is
complete
Command may be submitted when there is no other outstanding
010b command to any namespace and another command should not
be submitted to any namespace until this command is complete
011b – 111b Reserved
15:05 Reserved
Controller Capability Change (CCC): If this bit is set to ‘1’, then this command may change
controller capabilities. If this bit is cleared to ‘0’, then this command does not modify controller
04
capabilities. Controller capability changes include a firmware update that changes the capabilities
reported in the CAP register.
103
NVM Express 1.3
Namespace Inventory Change (NIC): If this bit is set to ‘1’, then this command may change the
number of namespaces or capabilities for multiple namespaces. If this bit is cleared to ‘0’, then
03
this command does not modify the number of namespaces or capabilities for multiple
namespaces. Namespace inventory changes include adding or removing namespaces.
Namespace Capability Change (NCC): If this bit is set to ‘1’, then this command may change
the capabilities of a single namespace. If this bit is cleared to ‘0’, then this command does not
02
modify any namespace capabilities for the specified namespace. Namespace capability changes
include a logical format change.
Logical Block Content Change (LBCC): If this bit is set to ‘1’, then this command may modify
logical block content in one or more namespaces. If this bit is cleared to ‘0’, then this command
01
does not modify logical block content in any namespace. Logical block content changes include
a write to a logical block.
Command Supported (CSUPP): If this bit is set to ‘1’, then this command is supported by the
00 controller. If this bit is cleared to ‘0’, then this command is not supported by the controller and all
other fields in this structure shall be cleared to 0h.
104
NVM Express 1.3
Bits 3:0 indicates the status of the current device self-test operation as defined in the following
table. If a device self-test operation is in process (i.e., this field is set to 1h or 2h), then the
controller shall not set this field to 0h until a new Self-test Result Data Structure is created (i.e.,
if a device self-test operation completes or is aborted, then the controller shall create a Self-
0 test Result Data Structure prior to setting this field to 0h.)
Value Definition
0h No device self-test operation in progress
1h Short device self-test operation in progress
2h Extended device self-test operation in progress
3h – Dh Reserved
Eh Vendor specific
Fh Reserved
Current Device Self-Test Completion: This field defines the completion status of the current
device self-test.
Bit 7 is reserved.
1
Bits 6:0 indicates the percentage of the device self-test operation that is complete (e.g., a value
of 25 indicates that 25% of the device self-test operation is complete and 75% remains to be
tested). If bits 3:0 in the Current Device Self-Test Operation field are set to 0h (indicating there
is no device-self test operation in progress), then this field is ignored.
3:2 Reserved
31:4 Newest Self-test Result Data Structure (refer to Figure 99)
59:32 2nd newest Self-test Result Data Structure (refer to Figure 99)
… …
535:508 19th newest Self-test Result Data Structure (refer to Figure 99)
563:536 20th newest Self-test Result Data Structure (refer to Figure 99)
105
NVM Express 1.3
Bits 7:4 indicates the Self-test Code value that was specified in the Device Self-test command
that started the device self-test operation that this Self-test Result Data Structure describes.
Value Definition
0h Reserved
1h Short device self-test operation
2h Extended device self-test operation
3h – Dh Reserved
Eh Vendor specific
Fh Reserved
Bits 3:0 indicates the result of the device self-test operation that this Self-test Result Data
Structure describes.
Value Definition
0
0h Operation completed without error
1h Operation was aborted by a Device Self-test command
2h Operation was aborted by a Controller Level Reset
Operation was aborted due to a removal of a namespace
3h
from the namespace inventory
Operation was aborted due to the processing of a Format
4h
NVM command
A fatal error or unknown test error occurred while the
5h controller was executing the device self-test operation and
the operation did not complete
Operation completed with a segment that failed and the
6h
segment that failed is not known
Operation completed with one or more failed segments
7h and the first segment that failed is indicated in the Segment
Number field
8h Operation was aborted for unknown reason
9h – Eh Reserved
Fh Entry not used (does not contain a test result)
Segment Number: This field indicates which segment the first self-test failure occurred. The
1
field is ignored if the Device Self-test Status field is not set to 7h.
Valid Diagnostic Information: This field indicates the diagnostic failure information that is
reported.
Bit 3 defines the SC_Valid bit. If set to ‘1’, then the contents of Status Code field is valid. If
cleared to ‘0’, then the contents of Status Code field is invalid.
2
Bit 2 defines the SCT_Valid bit. If set to ‘1’, then the contents of Status Code Type field is valid.
If cleared to ‘0’, then the contents of Status Code Type field is invalid.
Bit 1 defines the FLBA_Valid bit. If set to ‘1’, then the contents of Failing LBA field is valid. If
cleared to ‘0’, then the contents of Failing LBA field is invalid.
Bit 0 defines the NSID_Valid bit. If set to ‘1’, then the contents of Namespace Identifier field is
valid. If cleared to ‘0’, then the contents of Namespace Identifier field is invalid.
3 Reserved
106
NVM Express 1.3
Power On Hours (POH): This field indicates the number of power-on hours at the time the
11:4 device self-test operation was completed or aborted. This does not include time that the
controller was powered and in a low power state condition.
Namespace Identifier (NSID): This field indicates the namespace that the Failing LBA
15:12
occurred on. The contents of this field are valid only when the NSID_Valid bit is set to ‘1’.
Failing LBA: This field indicates the LBA of the logical block that caused the test to fail. If the
device encountered more than one failed logical block during the test, then this field only
23:16
indicates one of those failed logical blocks. The contents of this field are valid only when the
FLBA_Valid bit is set to ‘1’.
Status Code Type: This field may contain additional information related to errors or conditions.
The Telemetry Host-Initiated Data consists of three areas: Telemetry Host-Initiated Data Area 1, Telemetry
Host-Initiated Data Area 2, and Telemetry Host-Initiated Data Area 3. All three areas start at Telemetry
Host-Initiated Data Area Block 1. The last block of each area is indicated in Telemetry Host-Initiated Data
Area y Last Block, respectively. The telemetry data captured and its size is implementation dependent. The
size of the log page is variable and may be calculated using the Telemetry Host-Initiated Data Area 3 Last
Block field.
The controller shall return data for all blocks requested. The data beyond the last block in Telemetry Host-
Initiated Data Area 3 Last Block is undefined. If the host requests a data transfer that is not a multiple of
512 bytes then the controller shall return an error of Invalid Field in Command.
107
NVM Express 1.3
Figure 101: Get Log Page – Telemetry Host-Initiated Log (Log Identifier 07h)
Bytes Description
00 Log Identifier: This field shall be set to 07h.
04:01 Reserved
IEEE OUI Identifier (IEEE): Contains the Organization Unique Identifier (OUI) for the controller
vendor that is able to interpret the data. If set to 0h, no IEEE OUI Identifier is present. The OUI
07:05
shall be a valid IEEE/RAC assigned identifier that is registered at
http://standards.ieee.org/develop/regauth/oui/public.html.
Telemetry Host-Initiated Data Area 1 Last Block: Contains the value of the last block of
Telemetry Host-Initiated Data Area 1. If the Telemetry Host-Initiated Data Area 1 does not
09:08 contain data, then this field shall be cleared to 0h.
If this field is not 0h then Telemetry Host-Initiated Data Area 1 begins at block 1h and ends at
the block indicated in this field.
Telemetry Host-Initiated Data Area 2 Last Block: Contains the value of the last block of
Telemetry Host-Initiated Data Area 2. This value shall be greater than or equal to the value in
11:10 the Telemetry Host-Initiated Data Area 1 Last Block field.
If this field is not 0h then Telemetry Host-Initiated Data Area 2 begins at block 1h and ends at
the block indicated in this field.
Telemetry Host-Initiated Data Area 3 Last Block: Contains the value of the last block of
Telemetry Host-Initiated Data Area 3. This value shall be greater than or equal to the value in
13:12 the Telemetry Host-Initiated Data Area 2 Last Block field.
If this field is not 0h then Telemetry Host-Initiated Data Area 3 begins at block 1h and ends at
the block contained in this field.
381:14 Reserved
Telemetry Controller-Initiated Data Available: Contains the value of Telemetry Controller-
382
Initiated Data Available field in the Telemetry Controller-Initiated Log (refer to Figure 102).
Telemetry Controller-Initiated Data Generation Number: Contains the value of the Telemetry
383 Controller-Initiated Data Generation Number field in the Telemetry Controller-Initiated Log (refer
to Figure 102).
Reason Identifier: Contains a vendor specific identifier that describes the operating conditions
511:384 of the controller at the time of capture. The Reason Identifier field should provide an
identification of unique operating conditions of the controller.
Telemetry Host-Initiated Data Block 1: Contains Telemetry Data Block 1 for the Telemetry
1023:512
Host-Initiated Log.
Telemetry Host-Initiated Data Block 2: Contains Telemetry Data Block 2 for the Telemetry
1535:1024
Host-Initiated Log.
… …
(n*512)+511 Telemetry Host-Initiated Data Block n: Contains Telemetry Data Block n for the Telemetry
:(n*512) Host-Initiated Log.
108
NVM Express 1.3
size is implementation dependent. The size of the log page is variable and may be calculated using the
Telemetry Controller-Initiated Data Area 3 Last Block field.
The controller shall return data for all blocks requested. The data beyond the last block in Telemetry
Controller-Initiated Data Area 3 Last Block is undefined. If the host requests a data transfer that is not a
multiple of 512 bytes then the controller shall return an error of Invalid Field in Command.
Figure 102: Get Log Page – Telemetry Controller-Initiated Log (Log Identifier 08h)
Bytes Description
00 Log Identifier: This field shall be set to 08h.
04:01 Reserved
07:05 IEEE OUI Identifier (IEEE): Contains the Organization Unique Identifier (OUI) for the controller
vendor that is able to interpret the data. If set to 0h, no IEEE OUI Identifier is present. The OUI
shall be a valid IEEE/RAC assigned identifier that is registered at
http://standards.ieee.org/develop/regauth/oui/public.html.
09:08 Telemetry Controller-Initiated Data Area 1 Last Block: Contains the value of the last block of
Telemetry Controller-Initiated Data Area 1. If the Telemetry Controller-Initiated Data Area 1
does not contain data, then this field shall be cleared to 0h.
If this field is not 0h then Telemetry Controller-Initiated Data Area 1 begins at block 1 and ends
at the block indicated in this field.
11:10 Telemetry Controller-Initiated Data Area 2 Last Block: Contains the value of the last block of
Telemetry Controller-Initiated Data Area 2. This value shall be greater than or equal to the value
in the Telemetry Controller-Initiated Data Area 1 Last Block field.
If this field is not 0h then Telemetry Controller-Initiated Data Area 2 begins at block 1h and ends
at the block indicated in this field.
13:12 Telemetry Controller-Initiated Data Area 3 Last Block: Contains the value of the last block of
Telemetry Controller-Initiated Data Area 3. This value shall be greater than or equal to the value
in the Telemetry Controller-Initiated Data Area 2 Last Block field.
If this field is not 0h then Telemetry Controller-Initiated Data Area 3 begins at block 1h and ends
at the block indicated in this field.
381:14 Reserved
382 Telemetry Controller-Initiated Data Available: If this field is cleared to 0h, the log does not
contain saved internal controller state. If this field is set to 1h, the log contains saved internal
controller state. If this field is set to 1h, it shall not be cleared to 0h until a Get Log Page with
Retain Asynchronous Event cleared to ‘0’ for the Telemetry Controller-Initiated Log completes
successfully. This value is persistent across power states and reset.
Other values are reserved.
383 Telemetry Controller-Initiated Data Generation Number: Contains a value that is
incremented each time the controller initiates a capture of its internal controller state into the
Telemetry Controller-Initiated Data Blocks. This field is persistent across power on.
511:384 Reason Identifier: Contains a vendor specific identifier that describes the operating conditions
of the controller at the time of capture. The Controller-Initiated Reason Identifier field should
provide an identification of unique operating conditions of the controller.
Telemetry Controller-Initiated Data Block 1: Contains Telemetry Data Block 1 for the
1023:512
Telemetry Controller -Initiated Log captured at a vendor specific time.
Telemetry Controller-Initiated Data Block 2: Contains Telemetry Data Block 2 for the
1535:1024
Telemetry Controller -Initiated Log captured at a vendor specific time.
… …
(n*512)+51 Telemetry Controller-Initiated Data Block n: Contains Telemetry Data Block n for the
1:(n*512) Telemetry Controller-Initiated Log captured at a vendor specific time.
109
NVM Express 1.3
Value Definition
0 Empty Log Page: Get Log Page command was
processed when no unread Reservation Notification
08 log pages were available. All the fields of an empty
log page shall have a value of zero.
1 Registration Preempted
2 Reservation Released
3 Reservation Preempted
255:4 Reserved
Number of Available Log Pages: This field indicates the number of additional available
Reservation Notification log pages (i.e., the number of unread log pages not counting this one). If
09
there are more than 255 additional available log pages, then a value of 255 is returned. A value of
zero indicates that there are no additional available log pages.
11:10 Reserved
Namespace ID: This field indicates the namespace ID of the namespace associated with the
15:12
Reservation Notification described by this log page.
63:16 Reserved
110
NVM Express 1.3
Bit 8 (Global Data Erased) if set to ‘1’ then non-volatile storage in the NVM subsystem has
not been written to:
a) since being manufactured and the NVM subsystem has never been sanitized; or
b) since the most recent successful sanitize operation.
If cleared to ‘0’, then non-volatile storage in the NVM subsystem has been written to:
a) since being manufactured and the NVM subsystem has never been sanitized; or
b) since the most recent successful sanitize operation of the NVM subsystem.
03:02 Bits 7:3 contains the number of completed passes if the most recent sanitize operation was
an Overwrite. This field shall be cleared to 00000b if the most recent sanitize operation was
not an Overwrite.
Bits 2:0 contains the status of the most recent sanitize operation as shown below.
Value Definition
000b The NVM subsystem has never been sanitized.
001b The most recent sanitize operation completed successfully.
010b A sanitize operation is currently in progress.
011b The most recent sanitize operation failed.
100b-111b Reserved
Sanitize Command Dword 10 Information (SCDW10): This field contains the value of the
07:04 Command Dword 10 field of the Sanitize command that started the sanitize operation
whose status is reported in the SSTAT field. Refer to Figure 178.
11:08 Estimated Time For Overwrite: This field indicates the number of seconds required to
complete an Overwrite sanitize operation with 16 passes in the background (refer to section
5.24). A value of 0h indicates that the sanitize operation is expected to be completed in the
background when the Sanitize command that started that operation is completed. A value of
FFFFFFFFh indicates that no time period is reported.
15:12 Estimated Time For Block Erase: This field indicates the number of seconds required to
complete a Block Erase sanitize operation in the background (refer to section 5.24). A value
of 0h indicates that the sanitize operation is expected to be completed in the background
when the Sanitize command that started that operation is completed. A value of FFFFFFFFh
indicates that no time period is reported.
19:16 Estimated Time For Crypto Erase: This field indicates the number of seconds required to
complete a Crypto Erase sanitize operation in the background (refer to section 5.24). A value
of 0h indicates that the sanitize operation is expected to be completed in the background
when the Sanitize command that started that operation is completed. A value of FFFFFFFFh
indicates that no time period is reported.
511:20 Reserved
111
NVM Express 1.3
00h M If the controller supports Namespace Management and CDW1.NSID is set to FFFFFFFFh,
the controller returns an Identify Namespace data structure that specifies capabilities that are
common across namespaces for this controller. If the controller does not support Namespace
Management and CDW1.NSID is set to FFFFFFFFh, the controller shall fail the command
with a status code of Invalid Namespace or Format.
01h M The Identify Controller data structure is returned to the host for this controller.
112
NVM Express 1.3
A list of 1024 namespace IDs is returned containing active NSIDs in increasing order that are
greater than the value specified in the Namespace Identifier (CDW1.NSID) field of the
command. The controller should abort the command with status code Invalid Namespace or
02h M
Format if CDW1.NSID is set to FFFFFFFEh or FFFFFFFFh. Note that CDW1.NSID may be
cleared to 0h to retrieve a Namespace List including the namespace starting with NSID of 1h.
The data structure returned is a Namespace List (refer to section 4.8).
A list of Namespace Identification Descriptor structures (refer to Figure 116) is returned to
the host for the namespace specified in the Namespace Identifier (CDW1.NSID) field if it is
an active NSID.
The controller may return any number of variable length Namespace Identification Descriptor
structures that fit into the 4096 byte Identify payload. All remaining bytes after the namespace
03h M identification descriptor structures should be cleared to 0h, and the host shall interpret a
Namespace Identifier Descriptor Length (NIDL) value of 0h as the end of the list. If the hosts
sees an unknown descriptor type it should continue parsing the structure.
A controller shall not return multiple descriptors with the same Namespace Identification
Descriptor Type (NIDT). A controller shall return at least one descriptor identifying the
namespace.
04h – 0Fh Reserved
Controller and Namespace Management
A list of up to 1024 namespace IDs is returned to the host containing allocated NSIDs with a
namespace identifier greater than the value specified in the Namespace Identifier
(CDW1.NSID) field.
10h NOTE 1
The controller should abort the command with status code Invalid Namespace or Format if
CDW1.NSID is set to FFFFFFFEh or FFFFFFFFh. Note that CDW1.NSID may be cleared to
0h to retrieve a Namespace List including the namespace starting with NSID of 1h.
The Identify Namespace data structure is returned to the host for the namespace specified
in the Namespace Identifier (CDW1.NSID) field if it is an allocated NSID. If the specified
namespace is an unallocated NSID then the controller returns a zero filled data structure. If
11h NOTE 1
the specified namespace is an invalid NSID then the controller shall fail the command with a
status code of Invalid Namespace or Format. If CDW1.NSID is set to FFFFFFFFh then the
controller should fail the command with a status code of Invalid Namespace or Format.
A Controller List of up to 2047 controller identifiers is returned containing a controller identifier
greater than or equal to the value specified in the Controller Identifier (CDW10.CNTID) field.
12h NOTE 1
The list contains controller identifiers that are attached to the namespace specified in the
Namespace Identifier (CDW1.NSID) field.
A Controller List of up to 2047 controller identifiers is returned containing a controller identifier
greater than or equal to the value specified in the Controller Identifier (CDW10.CNTID) field.
13h NOTE 1
The list contains controller identifiers in the NVM subsystem that may or may not be attached
to namespace(s).
The Primary Controller Capabilities Structure (refer to Figure 110) is returned to the host for
14h NOTE 2
the primary controller specified.
A Secondary Controller List (refer to Figure 111) is returned to the host for up to 127
secondary controllers associated with the primary controller issuing this command. The list
15h NOTE 2
contains entries for controller identifiers greater than or equal to the value specified in the
Controller Identifier (CDW10.CNTID) field.
16h – 1Fh Reserved
Future Definition
20h – FFh Reserved
NOTES:
1. Mandatory for controllers that support Namespace Management.
2. Mandatory for controllers that support Virtualization Enhancements.
113
NVM Express 1.3
The Identify command uses the Data Pointer and Command Dword 10 fields. All other command specific
fields are reserved.
Figure 107: Identify – Data Pointer
Bit Description
Data Pointer (DPTR): This field specifies the start of the data buffer. Refer to Figure 11 for the
127:00 definition of this field. If using PRPs, this field shall not be a pointer to a PRP List as the data buffer
may not cross more than one page boundary.
114
NVM Express 1.3
If SGL Bit Bucket descriptors are supported, their lengths shall be included in
determining if a command exceeds the Maximum Data Transfer Size for destination
data buffers. Their length in a source data buffer is not included for a Maximum
Data Transfer Size calculation.
Controller ID (CNTLID): Contains the NVM subsystem unique controller identifier
79:78 M associated with the controller. Refer to section 7.10 for unique identifier
requirements.
Version (VER): This field contains the value reported in the Version register defined
83:80 M in section 3.1.2. Implementations compliant to revision 1.2 or later of this
specification shall report a non-zero value in this field.
RTD3 Resume Latency (RTD3R): This field indicates the typical latency in
87:84 M microseconds resuming from Runtime D3 (RTD3). Refer to section 8.4.4 for test
conditions. A value of 0h indicates RTD3 Resume Latency is not reported.
RTD3 Entry Latency (RTD3E): This field indicates the typical latency in
91:88 M microseconds to enter Runtime D3 (RTD3). Refer to section 8.4.4 for test conditions.
A value of 0h indicates RTD3 Entry Latency is not reported.
Optional Asynchronous Events Supported (OAES): This field indicates the
optional asynchronous events supported by the controller. A controller shall not send
optional asynchronous events before they are enabled by host software.
Bit 8 is set to ‘1’ if the controller supports sending Namespace Attribute Notices and
the associated Changed Namespace List log page. If cleared to ‘0’ then the
controller does not support the Namespace Attribute Notices event nor the
associated Changed Namespace List log page.
115
NVM Express 1.3
Bit 1 (Non-Operational Power State Permissive Mode): If set to ‘1’ then the controller
supports host control of whether the controller may temporarily exceed the power of
a non-operational power state for the purpose of executing controller initiated
background operations in a non-operational power state (i.e., Non-Operational
99:96 M
Power State Permissive Mode supported). If cleared to ‘0’ then the controller does
not support host control of whether the controller may exceed the power of a non-
operational state for the purpose of executing controller initiated background
operations in a non-operational state (i.e., Non-Operational Power State Permissive
Mode not supported). Refer to section 5.21.1.17.
Bit 0 if set to ‘1’ then the controller supports a 128-bit Host Identifier. Bit 0 if cleared
to ‘0’ then the controller does not support a 128-bit Host Identifier.
111:100 Reserved
FRU Globally Unique Identifier (FGUID): This field contains a 128-bit value that is
globally unique for a given Field Replaceable Unit (FRU). Refer to the NVM Express
Management Interface (NVMe-MI) specification for the definition of a FRU. This field
remains fixed throughout the life of the FRU. This field shall contain the same value
for each controller associated with a given FRU.
This field uses the EUI-64 based 16-byte designator format. Bytes 122:120 contain
127:112 O
the 24-bit Organizationally Unique Identifier (OUI) value assigned by the IEEE
Registration Authority. Bytes 127:123 contain an extension identifier assigned by the
corresponding organization. Bytes 119:112 contain the vendor specific extension
identifier assigned by the corresponding organization. See the IEEE EUI-64
guidelines for more information. This field is big endian (refer to section 7.10).
116
NVM Express 1.3
Bit 8 if set to '1' then the controller supports the Doorbell Buffer Config command. If
cleared to '0' then the controller does not support the Doorbell Buffer Config
command.
Bit 7 if set to ‘1’ then the controller supports the Virtualization Management
command. If cleared to ‘0’ then the controller does not support the Virtualization
Management command.
Bit 6 if set to ‘1’ then the controller supports the NVMe-MI Send and NVMe-MI
Receive commands. If cleared to ‘0’ then the controller does not support the NVMe-
MI Send and NVMe-MI Receive commands.
Bit 5 if set to ‘1’ then the controller supports Directives. If cleared to ‘0’ then the
controller does not support Directives. A controller that supports Directives shall
257:256 M
support the Directive Send and Directive Receive commands. Refer to section 9.
Bit 4 if set to ‘1’ then the controller supports the Device Self-test command. If cleared
to ‘0’ then the controller does not support the Device Self-test command.
Bit 3 if set to ‘1’ then the controller supports the Namespace Management and
Namespace Attachment commands. If cleared to ‘0’ then the controller does not
support the Namespace Management and Namespace Attachment commands.
Bit 2 if set to ‘1’ then the controller supports the Firmware Commit and Firmware
Image Download commands. If cleared to ‘0’ then the controller does not support
the Firmware Commit and Firmware Image Download commands.
Bit 1 if set to ‘1’ then the controller supports the Format NVM command. If cleared
to ‘0’ then the controller does not support the Format NVM command.
Bit 0 if set to ‘1’ then the controller supports the Security Send and Security Receive
commands. If cleared to ‘0’ then the controller does not support the Security Send
and Security Receive commands.
Abort Command Limit (ACL): This field is used to convey the maximum number
of concurrently executing Abort commands supported by the controller (refer to
258 M
section 5.1). This is a 0’s based value. It is recommended that implementations
support concurrent execution of a minimum of four Abort commands.
Asynchronous Event Request Limit (AERL): This field is used to convey the
maximum number of concurrently outstanding Asynchronous Event Request
259 M commands supported by the controller (see section 5.2). This is a 0’s based value.
It is recommended that implementations support a minimum of four Asynchronous
Event Request Limit commands outstanding simultaneously.
117
NVM Express 1.3
Bit 4 if set to ‘1’ indicates that the controller supports firmware activation without a
reset. If cleared to ‘0’ then the controller requires a reset for firmware to be activated.
260 M
Bits 3:1 indicate the number of firmware slots that the controller supports. This field
shall specify a value between one and seven, indicating that at least one firmware
slot is supported and up to seven maximum. This corresponds to firmware slots 1
through 7.
Bit 0 if set to ‘1’ indicates that the first firmware slot (slot 1) is read only. If cleared
to ‘0’ then the first firmware slot (slot 1) is read/write. Implementations may choose
to have a baseline read only firmware image.
Log Page Attributes (LPA): This field indicates optional attributes for log pages
that are accessed via the Get Log Page command.
Bit 3 if set to ‘1’ then the controller supports the Telemetry Host-Initiated and
Telemetry Controller-Initiated log pages and sending Telemetry Log Notices. If
cleared to ’0’ then the controller does not support the Telemetry Host-Initiated and
Telemetry Controller-Initiated log pages and Telemetry Log Notice events.
261 M Bit 2 if set to ‘1’ then the controller supports extended data for Get Log Page
(including extended Number of Dwords and Log Page Offset fields). Bit 2 if cleared
to ‘0’ then the controller does not support extended data for Get Log Page.
Bit 1 if set to ‘1’ then the controller supports the Commands Supported and Effects
log page. Bit 1 if cleared to ‘0’ then the controller does not support the Commands
Supported and Effects log page.
Bit 0 if set to ‘1’ then the controller supports the SMART / Health information log
page on a per namespace basis. If cleared to ‘0’ then the controller does not support
the SMART / Health information log page on a per namespace basis.
Error Log Page Entries (ELPE): This field indicates the maximum number of Error
262 M Information log entries that are stored by the controller. This field is a 0’s based
value.
Number of Power States Support (NPSS): This field indicates the number of NVM
Express power states supported by the controller. This is a 0’s based value. Refer
to section 8.4.
263 M
Power states are numbered sequentially starting at power state 0. A controller shall
support at least one power state (i.e., power state 0) and may support up to 31
additional power states (i.e., up to 32 total).
Admin Vendor Specific Command Configuration (AVSCC): This field indicates
the configuration settings for Admin Vendor Specific command handling. Refer to
section 8.7.
Bit 0 if set to ‘1’ indicates that all Admin Vendor Specific Commands use the format
defined in Figure 12. If cleared to ‘0’ indicates that the format of all Admin Vendor
Specific Commands are vendor specific.
118
NVM Express 1.3
119
NVM Express 1.3
Bits Description
Access Size: This field indicates the size that may be read or written per
RPMB access by Security Send or Security Receive commands for this
31:24
controller in 512B units. This is a 0’s based value. A value of 0h indicates
a size of 512B.
Total Size: This field indicates the size of each RPMB supported in the
23:16 controller in 128KB units. This is a 0’s based value. A value of 0h
indicates a size of 128KB.
315:312 O 15:06 Reserved
Authentication Method: This field indicates the authentication method
used to access all RPMBs in the controller. The values for this field are:
05:03 Value Definition
000b HMAC SHA-256
001b-111b Reserved
Number of RPMB Units: This field indicates the number of RPMB
targets the controller supports. All RPMB targets supported shall have
the same capabilities as defined in the RPMBS field. A value of 0h
02:00
indicates the controller does not support Replay Protected Memory
Blocks. If this value is non-zero, then the controller shall support the
Security Send and Security Receive commands.
Extended Device Self-test Time (EDSTT): If the Device Self-test command is
supported, then this field indicates the nominal amount of time in one minute units
317:316 O that the controller takes to complete an extended device self-test operation when in
power state 0. If the Device Self-test command is not supported, then this field is
reserved.
Device Self-test Options (DSTO): This field indicates the optional Device Self-test
command or operation behaviors supported by the controller or NVM subsystem.
318 O
Bit 0 if set to ‘1’ then the NVM subsystem supports only one device self-test
operation in progress at a time. If cleared to ‘0’ then the NVM subsystem supports
one device self-test operation per controller at a time.
Firmware Update Granularity (FWUG): This field indicates the minimum
granularity and alignment of the data provided in the Firmware Image Download
command. If the data in the Firmware Image Download command does not conform
to these granularity and alignment requirements, the firmware update may fail. For
the broadest interoperability with software, it is recommended that the controller set
319 M this value to the lowest value possible.
The value is reported in 4KB units (1h corresponds to 4KB, 2h corresponds to 8KB,
etc.). A value of 0h indicates that no information on granularity is provided. A value
of FFh indicates there is no restriction (i.e., any granularity and alignment in Dwords
is allowed).
Keep Alive Support (KAS): This field indicates the granularity of the Keep Alive
Timer in 100 ms units (refer to section 7.12). If this field is cleared to 0h then Keep
321:320 M
Alive is not supported. Keep Alive shall be supported for NVMe over Fabrics
implementations.
120
NVM Express 1.3
Bit 2 if set to ‘1’ then the controller supports the Overwrite sanitize operation. If
331:328 O cleared to ‘0’ then the controller does not support the Overwrite sanitize operation.
Bit 1 if set to ‘1’ then the controller supports the Block Erase sanitize operation. If
cleared to ‘0’ then the controller does not support the Block Erase sanitize
operation.
Bit 0 if set to ‘1’ then the controller supports the Crypto Erase sanitize operation. If
cleared to ‘0’ then the controller does not support the Crypto Erase sanitize
operation.
511:332 Reserved
NVM Command Set Attributes
Submission Queue Entry Size (SQES): This field defines the required and
maximum Submission Queue entry size when using the NVM Command Set.
Bits 7:4 define the maximum Submission Queue entry size when using the NVM
Command Set. This value is larger than or equal to the required SQ entry size. The
value is in bytes and is reported as a power of two (2^n). The recommended value
512 M is 6, corresponding to a standard NVM Command Set SQ entry size of 64 bytes.
Controllers that implement proprietary extensions may support a larger value.
Bits 3:0 define the required Submission Queue Entry size when using the NVM
Command Set. This is the minimum entry size that may be used. The value is in
bytes and is reported as a power of two (2^n). The required value shall be 6,
corresponding to 64.
121
NVM Express 1.3
Bits 7:4 define the maximum Completion Queue entry size when using the NVM
Command Set. This value is larger than or equal to the required CQ entry size. The
value is in bytes and is reported as a power of two (2^n). The recommended value
513 M is 4, corresponding to a standard NVM Command Set CQ entry size of 16 bytes.
Controllers that implement proprietary extensions may support a larger value.
Bits 3:0 define the required Completion Queue entry size when using the NVM
Command Set. This is the minimum entry size that may be used. The value is in
bytes and is reported as a power of two (2^n). The required value shall be 4,
corresponding to 16.
Maximum Outstanding Commands (MAXCMD): Indicates the maximum number
of commands that the controller processes at one time for a particular queue (which
may be larger than the size of the corresponding Submission Queue). The host may
515:514 M use this value to size Completion Queues and optimize the number of commands
submitted at one time to a particular I/O Queue. This field is mandatory for NVMe
over Fabrics and optional for NVMe over PCIe implementations. If the field is not
used, it shall be cleared to 0h.
Number of Namespaces (NN): This field defines the maximum number of
519:516 M namespaces supported by the controller. This field also represents the maximum
value of a valid NSID for the controller.
Optional NVM Command Support (ONCS): This field indicates the optional NVM
commands and features supported by the controller. Refer to section 6.
Bit 6 if set to ‘1’ then the controller supports the Timestamp feature. If cleared to ‘0’,
then the controller does not support the Timestamp feature. Refer to section
5.21.1.14.
Bit 5 if set to ‘1’ then the controller supports reservations. If cleared to ‘0’ then the
controller does not support reservations. If the controller supports reservations, then
it shall support the following commands associated with reservations: Reservation
Report, Reservation Register, Reservation Acquire, and Reservation Release. Refer
to section 8.8 for additional requirements.
Bit 4 if set to ‘1’ then the controller supports the Save field set to a non-zero value in
521:520 M the Set Features command and the Select field set to a non-zero value in the Get
Features command. If cleared to ‘0’ then the controller does not support the Save
field set to a non-zero value in the Set Features command and the Select field set
to a non-zero value in the Get Features command.
Bit 3 if set to ‘1’ then the controller supports the Write Zeroes command. If cleared
to ‘0’ then the controller does not support the Write Zeroes command.
Bit 2 if set to ‘1’ then the controller supports the Dataset Management command. If
cleared to ‘0’ then the controller does not support the Dataset Management
command.
Bit 1 if set to ‘1’ then the controller supports the Write Uncorrectable command. If
cleared to ‘0’ then the controller does not support the Write Uncorrectable command.
Bit 0 if set to ‘1’ then the controller supports the Compare command. If cleared to
‘0’ then the controller does not support the Compare command.
122
NVM Express 1.3
Bit 2 indicates whether cryptographic erase is supported as part of the secure erase
functionality. If set to ‘1’, then cryptographic erase is supported. If cleared to ‘0’,
then cryptographic erase is not supported.
Bit 0 indicates whether the format operation (excluding secure erase) applies to all
namespaces in an NVM subsystem or is specific to a particular namespace. If set to
‘1’, then all namespaces in an NVM subsystem shall be configured with the same
attributes and a format (excluding secure erase) of any namespace results in a
format of all namespaces in an NVM subsystem. If cleared to ‘0’, then the controller
supports format on a per namespace basis.
Volatile Write Cache (VWC): This field indicates attributes related to the presence
of a volatile write cache in the implementation.
Bit 0 if set to ‘1’ indicates that a volatile write cache is present. If cleared to ‘0’, a
525 M volatile write cache is not present. If a volatile write cache is present, then the host
may issue Flush commands and control whether the volatile write cache is enabled
with Set Features specifying the Volatile Write Cache feature identifier. If a volatile
write cache is not present, Flush commands complete successfully and have no
effect, Set Features with the Volatile Write Cache identifier field set shall fail with
Invalid Field status, and Get Features with the Volatile Write Cache identifier field
set should fail with Invalid Field status.
123
NVM Express 1.3
If a specific namespace guarantees a larger size than is reported in this field, then
this namespace specific size is reported in the NAWUN field in the Identify
Namespace data structure. Refer to section 6.4.
527:526 M If a write command is submitted with size less than or equal to the AWUN value, the
host is guaranteed that the write command is atomic to the NVM with respect to
other read or write commands. If a write command is submitted with size greater
than the AWUN value, then there is no guarantee of command atomicity. AWUN
does not have any applicability to write errors caused by power failure (refer to
Atomic Write Unit Power Fail).
A value of FFFFh indicates all commands are atomic as this is the largest command
size. It is recommended that implementations support a minimum of 128KB
(appropriately scaled based on LBA size).
Atomic Write Unit Power Fail (AWUPF): This field indicates the size of the write
operation guaranteed to be written atomically to the NVM across all namespaces
with any supported namespace format during a power fail or error condition.
If a specific namespace guarantees a larger size than is reported in this field, then
this namespace specific size is reported in the NAWUPF field in the Identify
Namespace data structure. Refer to section 6.4.
This field is specified in logical blocks and is a 0’s based value. The AWUPF value
shall be less than or equal to the AWUN value.
529:528 M
If a write command is submitted with size less than or equal to the AWUPF value,
the host is guaranteed that the write is atomic to the NVM with respect to other read
or write commands. If a write command is submitted that is greater than this size,
there is no guarantee of command atomicity. If the write size is less than or equal
to the AWUPF value and the write command fails, then subsequent read commands
for the associated logical blocks shall return data from the previous successful write
command. If a write command is submitted with size greater than the AWUPF value,
then there is no guarantee of data returned on subsequent reads of the associated
logical blocks.
NVM Vendor Specific Command Configuration (NVSCC): This field indicates the
configuration settings for NVM Vendor Specific command handling. Refer to section
8.7.
Bit 0 if set to ‘1’ indicates that all NVM Vendor Specific Commands use the format
defined in Figure 12. If cleared to ‘0’ indicates that the format of all NVM Vendor
Specific Commands are vendor specific.
531 M Reserved
124
NVM Express 1.3
If a specific namespace guarantees a larger size than is reported in this field, then
this namespace specific size is reported in the NACWU field in the Identify
533:532 O Namespace data structure. Refer to section 6.4.
This field shall be supported if the Compare and Write fused command is supported.
This field is specified in logical blocks and is a 0’s based value. If a Compare and
Write is submitted that requests a transfer size larger than this value, then the
controller may fail the command with a status code of Invalid Field in Command. If
Compare and Write is not a supported fused command, then this field shall be 0h.
535:534 M Reserved
SGL Support (SGLS): This field indicates if SGLs are supported for the NVM
Command Set and the particular SGL types supported. Refer to section 4.4.
Bits Description
31:21 Reserved
If set to ‘1’, then the controller supports the Address field in SGL
Data Block, SGL Segment, and SGL Last Segment descriptor
20
types specifying an offset. If cleared to ‘0’ then the Address field
specifying an offset is not supported.
If set to ‘1’, then use of a Metadata Pointer (MPTR) that contains
an address of an SGL segment containing exactly one SGL
19 Descriptor that is Qword aligned is supported. If cleared to
‘0’, then use of a MPTR containing an SGL Descriptor is not
supported.
If set to ‘1’, then the controller supports commands that contain
a data or metadata SGL of a length larger than the amount of
18
data to be transferred. If cleared to ‘0’, then the SGL length shall
be equal to the amount of data to be transferred.
If set to ‘1’, then use of a byte aligned contiguous physical buffer
of metadata (the Metadata Pointer field in Figure 11) is
539:536 O 17
supported. If cleared to ‘0’, then use of a byte aligned contiguous
physical buffer of metadata is not supported.
If set to ‘1’, then the SGL Bit Bucket descriptor is supported. If
16 cleared to ‘0’, then the SGL Bit Bucket descriptor is not
supported.
15:03 Reserved
If set to ‘1’, then the controller supports the Keyed SGL Data
02 Block descriptor. If cleared to ‘0’, then the controller does not
support the Keyed SGL Data Block descriptor.
This field is used to determine the SGL support for the NVM
Command Set. Valid values are shown in the table below.
Value Definition
00b SGLs are not supported.
01:00 SGLs are supported. There is no alignment
01b
nor granularity requirement for Data Blocks.
SGLs are supported. There is a Dword
10b alignment and granularity requirement for
Data Blocks (refer to section 4.4).
11b Reserved
767:540 Reserved
NVM Subsystem NVMe Qualified Name (SUBNQN): This field specifies the NVM
1023:768 M Subsystem NVMe Qualified Name as a UTF-8 null-terminated string. Refer to
section 7.9 for the definition of NVMe Qualified Name.
1791:1024 Reserved
125
NVM Express 1.3
126
NVM Express 1.3
Bit 0 if set to ‘1’ then VQ Resources are supported. Bit 0 if cleared to ‘0’ then VQ Resources are
not supported. Refer to section 8.5.1.
31:5 Reserved
VQ Resources Flexible Total (VQFRT): This field indicates the total number of VQ Flexible
35:32
Resources for the primary and its secondary controllers.
VQ Resources Flexible Assigned (VQRFA): This field indicates the total number of VQ
39:36
Flexible Resources Assigned to the associated secondary controllers.
VQ Resources Flexible Allocated to Primary (VQRFAP): This field indicates the total number
of VQ Flexible Resources currently allocated to the primary controller. This value may change
41:40
after a Controller Level Reset if a new value was set using the Virtualization Management
command. The default value of this field is implementation specific.
VQ Resources Private Total (VQPRT): This field indicates the total number of VQ Private
43:42
Resources for the primary controller.
VQ Resources Flexible Secondary Maximum (VQFRSM): This field indicates the maximum
45:44
number of VQ Flexible Resources that may be assigned to a secondary controller.
VQ Flexible Resource Preferred Granularity (VQGRAN): This field indicates the preferred
47:46 granularity of assigning and removing VQ Flexible Resources. Assigning and removing VQ
Resources in this granularity minimizes any wasted internal implementation resources.
63:48 Reserved
VI Resources Flexible Total (VIFRT): This field indicates the total number of VI Flexible
67:64
Resources for the primary and its secondary controllers.
VI Resources Flexible Assigned (VIRFA): This field indicates the total number of VI Flexible
71:68
Resources Assigned to the associated secondary controllers.
VI Resources Flexible Allocated to Primary (VIRFAP): This field indicates the total number
of VI Flexible Resources currently allocated to the primary controller. This value may change
73:72
after a Controller Level Reset if a new value was set using the Virtualization Management
command. The default value of this field is implementation specific.
VI Resources Private Total (VIPRT): This field indicates the total number of VI Private
75:74
Resources for the primary controller.
127
NVM Express 1.3
Bytes Description
VI Resources Flexible Secondary Maximum (VIFRSM): This field indicates the maximum
77:76
number of VI Flexible Resources that may be assigned to a secondary controller.
VI Flexible Resource Preferred Granularity (VIGRAN): This field indicates the preferred
79:78 granularity of assigning and removing VI Flexible Resources. Assigning and removing VI
Resources in this granularity minimizes any wasted internal implementation resources.
4095:80 Reserved
Figure 111 defines a Secondary Controller List. All secondary controllers are represented, including those
that are in an Offline state due to SR-IOV configuration settings (e.g., VF Enable is cleared to 0h or NumVFs
specifies a value that does not enable the associated secondary controller).
128
NVM Express 1.3
Figure 113 defines the power state descriptor that describes the attributes of each power state. For more
information on how the power state descriptor fields are used, refer to section 8.4 on power management.
Value Definition
00b Not reported for this power state
01b 0.0001 W
10b 0.01 W
11b Reserved
181:179 Reserved
178:176 Active Power Workload (APW): This field indicates the workload used to calculate maximum
power for this power state. Refer to section 8.4.3 for more details on each of the defined
workloads. This field shall not be “No Workload” unless ACTP is 0000h.
175:160 Active Power (ACTP): This field indicates the largest average power consumed by the NVM
subsystem over a 10 second period in this power state with the workload indicated in the Active
Power Workload field. The power in Watts is equal to the value in this field multiplied by the scale
indicated in the Active Power Scale field. A value of 0000h indicates Active Power is not reported.
159:152 Reserved
151:150 Idle Power Scale (IPS): This field indicates the scale for the Idle Power field.
Value Definition
00b Not reported for this power state
01b 0.0001 W
10b 0.01 W
11b Reserved
149:144 Reserved
143:128 Idle Power (IDLP): This field indicates the typical power consumed by the NVM subsystem over
30 seconds in this power state when idle (i.e., there are no pending commands, register
accesses, background processes, nor device self-test operations). The measurement starts after
the NVM subsystem has been idle for 10 seconds. The power in Watts is equal to the value in
this field multiplied by the scale indicated in the Idle Power Scale field. A value of 0000h indicates
Idle Power is not reported.
127:125 Reserved
124:120 Relative Write Latency (RWL): This field indicates the relative write latency associated with this
power state. The value in this field shall be less than the number of supported power states (e.g.,
if the controller supports 16 power states, then valid values are 0 through 15). A lower value
means lower write latency.
119:117 Reserved
116:112 Relative Write Throughput (RWT): This field indicates the relative write throughput associated
with this power state. The value in this field shall be less than the number of supported power
states (e.g., if the controller supports 16 power states, then valid values are 0 through 15). A
lower value means higher write throughput.
111:109 Reserved
108:104 Relative Read Latency (RRL): This field indicates the relative read latency associated with this
power state. The value in this field shall be less than the number of supported power states (e.g.,
if the controller supports 16 power states, then valid values are 0 through 15). A lower value
means lower read latency.
103:101 Reserved
129
NVM Express 1.3
Bits Description
100:96 Relative Read Throughput (RRT): This field indicates the relative read throughput associated
with this power state. The value in this field shall be less than the number of supported power
states (e.g., if the controller supports 16 power states, then valid values are 0 through 15). A
lower value means higher read throughput.
95:64 Exit Latency (EXLAT): This field indicates the maximum exit latency in microseconds
associated with exiting this power state. A value of 0h indicates Exit Latency is not reported.
63:32 Entry Latency (ENLAT): This field indicates the maximum entry latency in microseconds
associated with entering this power state. A value of 0h indicates Entry Latency is not reported.
31:26 Reserved
25 Non-Operational State (NOPS): This field indicates whether the controller processes I/O
commands in this power state. If this field is cleared to ‘0’, then the controller processes I/O
commands in this power state. If this field is set to ‘1’, then the controller does not process I/O
commands in this power state. Refer to section 8.4.1.
24 Max Power Scale (MXPS): This field indicates the scale for the Maximum Power field. If this
field is cleared to ‘0’, then the scale of the Maximum Power field is in 0.01 Watts. If this field is
set to ‘1’, then the scale of the Maximum Power field is in 0.0001 Watts.
23:16 Reserved
15:00 Maximum Power (MP): This field indicates the maximum power consumed by the NVM
subsystem in this power state. The power in Watts is equal to the value in this field multiplied by
the scale specified in the Max Power Scale field. A value of 0h indicates Maximum Power is not
reported.
130
NVM Express 1.3
Figure 114 shows the Identify Namespace data structure for the NVM Command Set.
Figure 114: Identify – Identify Namespace Data Structure, NVM Command Set Specific
Bytes O/M Description
Namespace Size (NSZE): This field indicates the total size of the namespace in logical
blocks. A namespace of size n consists of LBA 0 through (n - 1). The number of logical
7:0 M
blocks is based on the formatted LBA size. This field is undefined prior to the namespace
being formatted.
Namespace Capacity (NCAP): This field indicates the maximum number of logical
blocks that may be allocated in the namespace at any point in time. The number of
logical blocks is based on the formatted LBA size. This field is undefined prior to the
namespace being formatted. This field is used in the case of thin provisioning and reports
a value that is smaller than or equal to the Namespace Size. Spare LBAs are not
15:8 M
reported as part of this field.
When using the NVM command set: A logical block is allocated when it is written with a
23:16 M
Write or Write Uncorrectable command. A logical block may be deallocated using the
Dataset Management, Sanitize, or Write Zeroes command.
A controller may report NUSE equal to NCAP at all times if the product is not targeted
for thin provisioning environments.
Namespace Features (NSFEAT): This field defines features of the namespace.
Bit 3 if set to ‘1’ indicates that the non-zero NGUID and non-zero EUI64 fields for this
namespace are never reused by the controller. If cleared to ‘0’, then the NGUID and
EUI64 values may be reused by the controller for a new namespace created after this
namespace is deleted. This bit shall be cleared to ‘0’ if both NGUID and EUI64 fields are
cleared to 0h. Refer to section 7.11.
Bit 2 if set to ‘1’ indicates that the controller supports the Deallocated or Unwritten Logical
Block error for this namespace. If cleared to ‘0’, then the controller does not support the
Deallocated or Unwritten Logical Block error for this namespace. Refer to section 6.7.1.1.
24 M
Bit 1 if set to ‘1’ indicates that the fields NAWUN, NAWUPF, and NACWU are defined for
this namespace and should be used by the host for this namespace instead of the
AWUN, AWUPF, and ACWU fields in the Identify Controller data structure. If cleared to
‘0’, then the controller does not support the fields NAWUN, NAWUPF, and NACWU for
this namespace. In this case, the host should use the AWUN, AWUPF, and ACWU fields
defined in the Identify Controller data structure in Figure 109. Refer to section 6.4.
Bit 0 if set to ‘1’ indicates that the namespace supports thin provisioning. Specifically,
the Namespace Capacity reported may be less than the Namespace Size. When this
feature is supported and the Dataset Management command is supported then
deallocating LBAs shall be reflected in the Namespace Utilization field. Bit 0 if cleared
to ‘0’ indicates that thin provisioning is not supported and the Namespace Size and
Namespace Capacity fields report the same value.
131
NVM Express 1.3
It is recommended that software and controllers transition to an LBA size that is 4KB or
larger for ECC efficiency at the controller. If providing metadata, it is recommended that
at least 8 bytes are provided per logical block to enable use with end-to-end data
protection, refer to section 8.2.
Formatted LBA Size (FLBAS): This field indicates the LBA data size & metadata size
combination that the namespace has been formatted with (refer to section 5.23).
26 M Bit 4 if set to ‘1’ indicates that the metadata is transferred at the end of the data LBA,
creating an extended data LBA. Bit 4 if cleared to ‘0’ indicates that all of the metadata
for a command is transferred as a separate contiguous buffer of data. Bit 4 is not
applicable when there is no metadata.
Bits 3:0 indicates one of the 16 supported LBA Formats indicated in this data structure.
Metadata Capabilities (MC): This field indicates the capabilities for metadata.
Bit 1 if set to ‘1’ indicates the namespace supports the metadata being transferred as
part of a separate buffer that is specified in the Metadata Pointer. Bit 1 if cleared to ‘0’
27 M
indicates that the namespace does not support the metadata being transferred as part
of a separate buffer.
Bit 0 if set to ‘1’ indicates that the namespace supports the metadata being transferred
as part of an extended data LBA. Bit 0 if cleared to ‘0’ indicates that the namespace
does not support the metadata being transferred as part of an extended data LBA.
132
NVM Express 1.3
Bit 4 if set to ‘1’ indicates that the namespace supports protection information transferred
as the last eight bytes of metadata. Bit 4 if cleared to ‘0’ indicates that the namespace
does not support protection information transferred as the last eight bytes of metadata.
Bit 3 if set to ‘1’ indicates that the namespace supports protection information transferred
as the first eight bytes of metadata. Bit 3 if cleared to ‘0’ indicates that the namespace
28 M does not support protection information transferred as the first eight bytes of metadata.
Bit 2 if set to ‘1’ indicates that the namespace supports Protection Information Type 3.
Bit 2 if cleared to ‘0’ indicates that the namespace does not support Protection
Information Type 3.
Bit 1 if set to ‘1’ indicates that the namespace supports Protection Information Type 2.
Bit 1 if cleared to ‘0’ indicates that the namespace does not support Protection
Information Type 2.
Bit 0 if set to ‘1’ indicates that the namespace supports Protection Information Type 1.
Bit 0 if cleared to ‘0’ indicates that the namespace does not support Protection
Information Type 1.
End-to-end Data Protection Type Settings (DPS): This field indicates the Type
settings for the end-to-end data protection feature. Refer to section 8.3.
Bit 3 if set to ‘1’ indicates that the protection information, if enabled, is transferred as the
first eight bytes of metadata. Bit 3 if cleared to ‘0’ indicates that the protection
information, if enabled, is transferred as the last eight bytes of metadata.
29 M Bits 2:0 indicate whether Protection Information is enabled and the type of Protection
Information enabled. The values for this field have the following meanings:
Value Definition
000b Protection information is not enabled
001b Protection information is enabled, Type 1
010b Protection information is enabled, Type 2
011b Protection information is enabled, Type 3
100b – 111b Reserved
Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC): This field
specifies multi-path I/O and namespace sharing capabilities of the namespace.
Bits 7:1 are reserved.
30 O
Bit 0: If set to ‘1’ then the namespace may be attached to two or more controllers in the
NVM subsystem concurrently (i.e., may be a shared namespace). If cleared to ‘0’ then
the namespace is a private namespace and may only be attached to one controller at a
time.
133
NVM Express 1.3
Bit 7 if set to ‘1’ indicates that Ignore Existing Key is used as defined in revision 1.3 or
later of this specification. Bit 7 if cleared to ‘0’ indicates that Ignore Existing Key is used
as defined in revision 1.2.1 or earlier of this specification. This bit shall be set to ‘1’ if the
controller supports revision 1.3 or later as indicated in the Version register.
Bit 6 if set to ‘1’ indicates that the namespace supports the Exclusive Access – All
Registrants reservation type. If this bit is cleared to ‘0’, then the namespace does not
support the Exclusive Access – All Registrants reservation type.
Bit 5 if set to ‘1’ indicates that the namespace supports the Write Exclusive – All
Registrants reservation type. If this bit is cleared to ‘0’, then the namespace does not
support the Write Exclusive – All Registrants reservation type.
Bit 4 if set to ‘1’ indicates that the namespace supports the Exclusive Access –
31 O
Registrants Only reservation type. If this bit is cleared to ‘0’, then the namespace does
not support the Exclusive Access – Registrants Only reservation type.
Bit 3 if set to ‘1’ indicates that the namespace supports the Write Exclusive – Registrants
Only reservation type. If this bit is cleared to ‘0’, then the namespace does not support
the Write Exclusive – Registrants Only reservation type.
Bit 2 if set to ‘1’ indicates that the namespace supports the Exclusive Access reservation
type. If this bit is cleared to ‘0’, then the namespace does not support the Exclusive
Access reservation type.
Bit 1 if set to ‘1’ indicates that the namespace supports the Write Exclusive reservation
type. If this bit is cleared to ‘0’, then the namespace does not support the Write Exclusive
reservation type.
Bit 0 if set to ‘1’ indicates that the namespace supports the Persist Through Power Loss
capability. If this bit is cleared to ‘0’, then the namespace does not support the Persist
Through Power Loss Capability.
Format Progress Indicator (FPI): If a format operation is in progress, this field indicates
the percentage of the namespace that remains to be formatted.
Bit 7 if set to ‘1’ indicates that the namespace supports the Format Progress Indicator
defined by bits 6:0 in this field. If this bit is cleared to ‘0’, then the namespace does not
support the Format Progress Indicator and bits 6:0 in this field shall be cleared to 0h.
32 O
Bits 6:0 indicate the percentage of the Format NVM command that remains to be
completed (e.g., a value of 25 indicates that 75% of the Format NVM command has been
completed and 25% remains to be completed). A value of 0 indicates that the
namespace is formatted with the format specified by the FLBAS and DPS fields in this
data structure and there is no Format NVM command in progress.
134
NVM Express 1.3
Bit 4 if set to ‘1’ indicates that the Guard field for deallocated logical blocks that contain
protection information is set to the CRC for the value read from the deallocated logical
block and its metadata (excluding protection information). If cleared to ‘0’ indicates that
the Guard field for the deallocated logical blocks that contain protection information is set
to FFFFh.
Bit 3 if set to ‘1’ indicates that the controller supports the Deallocate bit in the Write Zeros
33 O command for this namespace. If cleared to ‘0’ indicates that the controller does not
support the Deallocate bit in the Write Zeros command for this namespace. This bit shall
be set to the same value for all namespaces in the NVM subsystem.
Bits 2:0 indicate the values read from a deallocated logical block and its metadata
(excluding protection information). The values for this field have the following meanings:
Value Definition
000b Not reported
001b All bytes set to 00h
010b All bytes set to FFh
011b – 111b Reserved
Namespace Atomic Write Unit Normal (NAWUN): This field indicates the namespace
specific size of the write operation guaranteed to be written atomically to the NVM during
normal operation.
35:34 O
A value of 0h indicates that the size for this namespace is the same size as that reported
in the AWUN field of the Identify Controller data structure. All other values specify a size
in terms of logical blocks using the same encoding as the AWUN field. Refer to section
6.4.
Namespace Atomic Write Unit Power Fail (NAWUPF): This field indicates the
namespace specific size of the write operation guaranteed to be written atomically to the
NVM during a power fail or error condition.
37:36 O
A value of 0h indicates that the size for this namespace is the same size as that reported
in the AWUPF field of the Identify Controller data structure. All other values specify a size
in terms of logical blocks using the same encoding as the AWUPF field. Refer to section
6.4.
Namespace Atomic Compare & Write Unit (NACWU): This field indicates the
namespace specific size of the write operation guaranteed to be written atomically to the
NVM for a Compare and Write fused command.
39:38 O
A value of 0h indicates that the size for this namespace is the same size as that reported
in the ACWU field of the Identify Controller data structure. All other values specify a size
in terms of logical blocks using the same encoding as the ACWU field. Refer to section
6.4.
Namespace Atomic Boundary Size Normal (NABSN): This field indicates the atomic
boundary size for this namespace for the NAWUN value. This field is specified in logical
blocks. Writes to this namespace that cross atomic boundaries are not guaranteed to be
atomic to the NVM with respect to other read or write commands.
41:40 O
A value of 0h indicates that there are no atomic boundaries for normal write
operations. All other values specify a size in terms of logical blocks using the same
encoding as the AWUN field. Refer to section 6.4.
135
NVM Express 1.3
A value of 0h indicates that there are no atomic boundaries for power fail or error
conditions. All other values specify a size in terms of logical blocks using the same
encoding as the AWUPF field. Refer to section 6.4.
Namespace Optimal IO Boundary (NOIOB): This field indicates the optimal IO
boundary for this namespace. This field is specified in logical blocks. The host should
47:46 O
construct read and write commands that do not cross the IO boundary to achieve optimal
performance. A value of 0h indicates that no optimal IO boundary is reported.
NVM Capacity (NVMCAP): This field indicates the total size of the NVM allocated to this
namespace. The value is in bytes. This field shall be supported if Namespace
Management and Namespace Attachment commands are supported.
63:48 O
Note: This field may not correspond to the logical block size multiplied by the Namespace
Size field. Due to thin provisioning or other settings (e.g., endurance), this field may be
larger or smaller than the Namespace Size reported.
103:64 Reserved
Namespace Globally Unique Identifier (NGUID): This field contains a 128-bit value
that is globally unique and assigned to the namespace when the namespace is created.
This field remains fixed throughout the life of the namespace and is preserved across
namespace and controller operations (e.g., controller reset, namespace format, etc.).
This field uses the EUI-64 based 16-byte designator format. Bytes 114:112 contain the
24-bit Organizationally Unique Identifier (OUI) value assigned by the IEEE Registration
119:104 O Authority. Bytes 119:115 contain an extension identifer assigned by the corresponding
organization. Bytes 111:104 contain the vendor specific extension identifier assigned by
the corresponding organization. See the IEEE EUI-64 guidelines for more information.
This field is big endian (refer to section 7.10).
The controller shall specify a globally unique namespace identifier in this field or the
EUI64 field when the namespace is created. If the controller is not able to allocate a
globally unique identifier then this field shall be cleared to 0h. Refer to section 7.11.
IEEE Extended Unique Identifier (EUI64): This field contains a 64-bit IEEE Extended
Unique Identifier (EUI-64) that is globally unique and assigned to the namespace when
the namespace is created. This field remains fixed throughout the life of the namespace
and is preserved across namespace and controller operations (e.g., controller reset,
namespace format, etc.).
The controller shall specify a globally unique namespace identifier in this field or the
NGUID field when the namespace is created. If the controller is not able to allocate a
globally unique 64-bit identifier then this field shall be cleared to 0h. Refer to section 7.11.
136
NVM Express 1.3
137
NVM Express 1.3
Figure 115: Identify – LBA Format Data Structure, NVM Command Set Specific
Bits Description
31:26 Reserved
Relative Performance (RP): This field indicates the relative performance of the LBA format
indicated relative to other LBA formats supported by the controller. Depending on the size of the
LBA and associated metadata, there may be performance implications. The performance
analysis is based on better performance on a queue depth 32 with 4KB read workload. The
meanings of the values indicated are included in the following table.
25:24
Value Definition
00b Best performance
01b Better performance
10b Good performance
11b Degraded performance
LBA Data Size (LBADS): This field indicates the LBA data size supported. The value is reported
23:16 in terms of a power of two (2^n). A value smaller than 9 (i.e. 512 bytes) is not supported. If the
value reported is 0h then the LBA format is not supported / used or is not currently available.
Metadata Size (MS): This field indicates the number of metadata bytes provided per LBA based
on the LBA Data Size indicated. If there is no metadata supported, then this field shall be cleared
to 00h.
15:00
If metadata is supported, then the namespace may support the metadata being transferred as
part of an extended data LBA or as part of a separate contiguous buffer. If end-to-end data
protection is enabled, then the first eight bytes or last eight bytes of the metadata is the protection
information.
138
NVM Express 1.3
Value Definition
0h Reserved
IEEE Extended Unique Identifier: The NID field contains a copy of the
EUI64 field in the Identify Namespace structure (refer to Figure 114). If
the EUI64 field of the Identify Namespace structure is not supported, the
1h controller shall not report a value of type 1h.
Namespace Identifier Length (NIDL): This field contains the length in bytes of the Namespace
Identifier field below. The total length of the Namespace Identifier Descriptor in bytes is the value in
01h
this field plus four. If this field is set to 0h it indicates the end of the Namespace Identifier Descriptor
list.
02h – 03h Reserved
Namespace Identifier (NID): This field contains a value that is globally unique and assigned to the
namespace when the namespace is created. This field remains fixed throughout the life of the
04h – (NIDL
namespace and is preserved across namespace and controller operations (e.g., controller reset,
+ 03h)
namespace format, etc.). The exact type of the value is specified by the Namespace Identifier Type
(NIDT) field, and the size is specified by the Namespace Identifier Length (NIDL) field.
139
NVM Express 1.3
140
NVM Express 1.3
141
NVM Express 1.3
Value Description
03:00
0h Controller Attach
1h Controller Detach
2h - Fh Reserved
142
NVM Express 1.3
The Namespace Management command uses the Data Pointer and Dword 10 fields. All other command
specific fields are reserved.
The Namespace Identifier (CDW1.NSID) field is used as follows for create and delete operations:
Create: The CDW1.NSID field is reserved for this operation; host software shall set this field to a
value of 0h. The controller shall select an available Namespace Identifier to use for the operation.
Delete: This field specifies the previously created namespace to delete in this operation. Specifying
a value of FFFFFFFFh may be used to delete all namespaces in the NVM subsystem.
143
NVM Express 1.3
Value Description
03:00
0h Create
1h Delete
2h - Fh Reserved
Dword 0 of the completion queue entry contains the Namespace Identifier created. The definition of Dword
0 of the completion queue entry is in Figure 131.
144
NVM Express 1.3
The controller indicates in bit 4 of the Optional NVM Command Support field of the Identify
31 Controller data structure in Figure 109 whether this field is supported.
If the Feature Identifer specified in the Set Features command is not saveable by the controller
and the controller recieves a Set Features command with the Save bit set to one, then the
command shall be aborted with a status of Feature Identifer Not Saveable.
30:08 Reserved
Feature Identifier (FID): This field indicates the identifier of the Feature that attributes are being
07:00
specified for.
145
NVM Express 1.3
146
NVM Express 1.3
Figure 135: Set Features, NVM Command Set Specific – Feature Identifiers
Feature Identifier 6 Persistent Uses Memory Description
O/M
Across Power Buffer for
States and Attributes
Reset1
80h O Yes No Software Progress Marker
81h O2 No Yes Host Identifier
82h O3 No No Reservation Notification Mask
83h O3 Yes No Reservation Persistance
84h – BFh Reserved
NOTES:
1. This column is only valid if bit 4 in the Optional NVM Command Support field of the Identify Controller
data structure in Figure 109 is cleared to ‘0’.
2. Mandatory if reservations are supported as indicated in the Identify Controller data structure.
3. Mandatory if reservations are supported by the namespace as indicated by a non-zero value in the
Reservation Capabilities (RESCAP) field in the Identify Namespace data structure.
4. O/M: O = Optional, M = Mandatory.
147
NVM Express 1.3
Each entry in the LBA Range Type data structure is defined in Figure 139. The LBA Range feature is a set
of 64 byte entries; the number of entries is indicated as a command parameter, the maximum number of
entries is 64. The LBA ranges shall not overlap. If the LBA ranges overlap, the controller should return an
148
NVM Express 1.3
error of Overlapping Range. All unused entries in the LBA Range Type data structure shall be cleared to
all zeroes for both Get Features and Set Features.
The default value for this feature should clear the Number of LBA Ranges field to 00h and initialize the LBA
Range Type data structure to contain a single entry with:
Type field cleared to 00h,
Attributes field set to 01h,
Starting LBA field cleared to 0h,
Number of Logical Blocks field set to indicate the number of LBAs in the namespace, and
GUID field set to a globally unique identifier.
Value Description
00h Reserved
01h Filesystem
00
02h RAID
03h Cache
04h Page / swap file
05h – 7Fh Reserved
80h - FFh Vendor Specific
Attributes: Specifies attributes of the LBA range. Each bit defines an attribute.
Bit Description
If set to ‘1’, the LBA range may be overwritten. If cleared to
0
01 ‘0’, the area should not be overwritten.
If set to ‘1’, the LBA range should be hidden from the OS /
1 EFI / BIOS. If cleared to ‘0’, the area should be visible to
the OS / EFI / BIOS.
2–7 Reserved
15:02 Reserved
Starting LBA (SLBA): This field specifies the 64-bit address of the first logical block that is part
23:16
of this LBA range.
Number of Logical Blocks (NLB): This field specifies the number of logical blocks that are part
31:24
of this LBA range. This is a 0’s based value.
Unique Identifier (GUID): This field is a global unique identifier that uniquely specifies the type
47:32 of this LBA range. Well known Types may be defined and are published on the NVM Express
website.
63:48 Reserved
The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI /
BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage
driver should not expose subsequent LBA ranges that follow a hidden LBA range.
149
NVM Express 1.3
The over temperature threshold feature shall be implemented for Composite Temperature. The under
temperature threshold Feature shall be implemented for Composite Temperature if a non-zero Warning
Composite Temperature Threshold (WCTEMP) field value is reported in the Identify Controller data
structure in Figure 109. The over temperature threshold and under temperature threshold features shall be
implemented for all implemented temperature sensors (i.e., all Temperature Sensor fields that report a non-
zero value).
The default value of the over temperature threshold feature for Composite Temperature is the value in the
Warning Composite Temperature Threshold (WCTEMP) field in the Identify Controller data if WCTEMP is
non-zero; otherwise, it is implementation specific. The default value of the under temperature threshold
feature for Composite Temperature is implementation specific. The default value of the over temperature
threshold for all implemented temperature sensors is FFFFh. The default value of the under temperature
threshold for all implemented temperature sensors is 0h.
If a Get Features command is submitted for this feature, the temperature threshold selected by Command
Dword 11 is returned in Dword 0 of the completion queue entry for that command.
Value Description
0000b Composite Temperature
0001b Temperature Sensor 1
0010b Temperature Sensor 2
19:16 0011b Temperature Sensor 3
0100b Temperature Sensor 4
0101b Temperature Sensor 5
0110b Temperature Sensor 6
0111b Temperature Sensor 7
1000b Temperature Sensor 8
1001b – 1110b Reserved
All implemented temperature sensors in a Set
1111b Features command. Reserved in a Get Features
command.
Temperature Threshold (TMPTH): Indicates the threshold value for the temperature sensor and
15:00
threshold type specified.
150
NVM Express 1.3
Note: This mechanism is primarily intended for use by host software that may have alternate
means of recovering the data.
151
NVM Express 1.3
Note: The value allocated may be smaller or larger than the number of queues requested, often in
virtualized implementations. The controller may not have as many queues to allocate as are requested.
Alternatively, the controller may have an allocation unit of queues (e.g. power of two) and may supply more
queues to host software to satisfy its allocation unit.
152
NVM Express 1.3
153
NVM Express 1.3
154
NVM Express 1.3
Each entry in the Autonomous Power State Transition data structure is defined in Figure 150. Each entry
is 64 bits in size. There is an entry for each of the allowable 32 power states. For power states that are
not supported, the unused Autonomous Power State Transition data structure entries shall be cleared to
all zeroes. The entries begin with power state 0 and then increase sequentially (i.e., power state 0 is
described in bytes 7:0, power state 1 is described in bytes 15:8, etc.). The data structure is 256 bytes in
size and shall be physically contiguous.
155
NVM Express 1.3
The Host Memory Descriptor List Address (HMDLLA/HMDLUA) points to a physically contiguous data
structure in host memory that describes the address and length pairs of the Host Memory Buffer. The
number of address and length pairs is specified in the Host Memory Descriptor List Entry Count in Figure
155. The Host Memory Descriptor List is described in Figure 156.
Each Host Memory Buffer Descriptor Entry shall describe a host memory address in memory page size
units and the number of contiguous memory page size units associated with the host address.
156
NVM Express 1.3
Figure 157: Host Memory Buffer – Host Memory Buffer Descriptor Entry
Bit Description
127:96 Reserved
Buffer Size (BSIZE): Indicates the number of contiguous memory page size (CC.MPS) units for
95:64
this descriptor.
Buffer Address (BADD): Indicates the host memory address for this descriptor aligned to the
memory page size (CC.MPS). The lower bits (n:0) of this field indicate the offset within the memory
63:00
page is 0h. If the memory page size is 4KB, then bits 11:00 shall be zero; if the memory page
size is 8KB, then bits 12:00 shall be zero, etc.
If a Get Features command is issued for this Feature, the attributes specified in Figure 151 are returned in
Dword 0 of the completion queue entry and the Host Memory Buffer Attributes data structure, whose
structure is defined in Figure 158, is returned in the data buffer for that command.
If a Get Features command is issued for this Feature, the data structure specified in Figure 160 is returned
in the data buffer for that command.
157
NVM Express 1.3
If the Timestamp Origin field is set to 001b, then this field is set to the last Timestamp value set by the host, plus
05:00
the time in milliseconds since the Timestamp was set. If the sum of the Timestamp value set by the host and
the elapsed time exceeds 2^48, the value returned should be reduced modulo 2^48.
If the Synch bit is set to 1b, then the Timestamp value may be reduced by vendor specific time intervals not
counted by the controller.
Value Definition
The Timestamp field was initialized to ‘0’ by a Controller Level
000b
Timestamp Reset.
03:01
Origin The Timestamp field was initialized with a Timestamp value
001b
using a Set Features command.
010b –
06 Reserved
111b
Value Definition
The controller counted time in milliseconds continuously since
00 Synch 0b
the Timestamp value was initialized.
The controller may have stopped counting during vendor specific
1b intervals after the Timestamp value was initialized (e.g., non-
operational power states).
07 Reserved
158
NVM Express 1.3
A value cleared to zero, specifies that this part of the feature shall be disabled.
The range of values that are supported by the controller are indicated in the Minimum Thermal
Management Temperature field and Maximum Thermal Management Temperature field in the
31:16 Identify Controller data structure in Figure 109.
If the host attempts to set this field to a value less than the value contained in the Minimum
Thermal Management Temperature field or greater than the value contained in the Maximum
Thermal Management Temperature field in the Identify Controller data structure in Figure 109,
then the command shall fail with a status code of Invalid Field in Command.
If the host attempts to set this field to a value greater than or equal to the value contained in the
Thermal Management Temperature 2 field, if non-zero, then the command shall fail with a status
code of Invalid Field in Command.
Thermal Management Temperature 2 (TMT2): This field specifies the temperature, in degrees
Kelvin, when the controller begins to transition to lower power active power states or perform
vendor specific thermal management actions regardless of the impact on performance (e.g.,
heavy throttling) in order to attempt to reduce the Composite Temperature.
A value cleared to zero, specifies that this part of the feature shall be disabled.
The range of values that are supported by the controller are indicated in the Minimum Thermal
Management Temperature field and Maximum Thermal Management Temperature field in the
15:00 Identify Controller data structure in Figure 109.
If the host attempts to set this field to a value less than the value contained in the Minimum
Thermal Management Temperature field or greater than the value contained in the Maximum
Thermal Management Temperature field in the Identify Controller data structure in Figure 109,
then the command shall fail with a status code of Invalid Field in Command.
If the host attempts to set this field to a non-zero value less than or equal to the value contained
in the Thermal Management Temperature 1 field, then the command shall fail with a status code
of Invalid Field in Command.
159
NVM Express 1.3
If the host attempts to set this field to ‘1’ and the controller does not support Non-Operational
Power State Permissive Mode as indicated in the Controller Attributes field of Identify Controller,
then the command fails with a status of Invalid Field in Command.
5.21.1.18 Software Progress Marker (Feature Identifier 80h), (Optional) – NVM Command Set Specific
This Feature is a software progress marker. The software progress marker is persistent across power
states. For additional details, refer to section 7.6.1.1. This information may be used to indicate to an OS
software driver whether there have been issues with the OS successfully loading. The attributes are
indicated in Command Dword 11.
If a Get Features command is submitted for this Feature, the attributes specified in Figure 164 are returned
in Dword 0 of the completion queue entry for that command.
1 Mandatory if reservations are supported as indicated in the Identify Controller data structure.
160
NVM Express 1.3
The Host Identifier is contained in the data structure indicated in Figure 166. The attributes are specified in
Command Dword 11. If a Get Features command is issued for this Feature, the data structure specified in
Figure 166 is returned in the data buffer for that command.
Figure 165: Host Identifier – Command Dword 11
Bit Description
31:01 Reserved
Enable Extended Host Identifier (EXHID): If set to ‘1’, then the host is using an extended 128-
bit Host Identifier. If cleared to ‘0’, then the host is using a 64-bit Host Identifier. NVMe over Fabrics
implementations shall use an extended 128-bit Host Identifier.
If the controller does not support a 128-bit Host Identifier as indicated in the Controller Attributes
00 field in the Identify Controller data structure and the host sets this bit to ‘1’, then a status value of
Invalid Field in Command shall be returned.
If the NVM subsystem detects that another controller in the NVM subsystem is using a Host
Identifier of a different size than specified in this command, a status of Host Identifier Inconsistent
Format shall be returned.
161
NVM Express 1.3
specifying a 64-bit Host Identifier (EXHID cleared to ‘0’) shall be aborted with a status of Invalid Field in
Command.
2
5.21.1.20 Reservation Notification Mask (Feature Identifier 82h), (Optional )
This Feature controls the masking of reservation notifications on a per namespace basis. A Reservation
Notification log page is created whenever a reservation notification occurs on a namespace and the
corresponding reservation notification type is not masked on that namespace by this Feature. If reservations
are supported by the controller, then this Feature shall be supported. The attributes are indicated in
Command Dword 11.
A Set Features command that uses a namespace ID other than FFFFFFFFh modifies the reservation
notification mask for the corresponding namespace only. A Set Features command that uses a namespace
ID of FFFFFFFFh modifies the reservation notification mask of all namespaces that are attached to the
controller and that support reservations. A Get Features command that uses a namespace ID other than
FFFFFFFFh returns the reservation notification mask for the corresponding namespace. A Get Features
command that uses a namespace ID of FFFFFFFFh is aborted with status Invalid Field in Command. If a
Set Features or Get Features attempts to access the Reservation Notification Mask on a namespace that
does not support reservations or is invalid, then the command is aborted with status Invalid Field in
Command.
If a Get Features command successfully completes for this Feature, the attributes specified in Figure 167
are returned in Dword 0 of the completion queue entry for that command.
3
5.21.1.21 Reservation Persistence (Feature Identifier 83h), (Optional )
Each namespace that supports reservations has a Persist Through Power Loss (PTPL) state that may be
modified using either a Set Features command or a Reservation Register command (refer to section 6.11).
The Reservation Persistence feature attributes are indicated in Command Dword 11.
The PTPL state is contained in the Reservation Persistence Feature that is namespace specific. A Set
Features command that uses the namespace ID FFFFFFFFh modifies the PTPL state associated with all
namespaces that are attached to the controller and that support PTPL (i.e., support reservations). A Set
Features command that uses a valid namespace ID other than FFFFFFFFh and corresponds to a
namespace that supports reservations, modifies the PTPL state for that namespace. A Get Features
2 Mandatory if reservations are supported by the namespace as indicated by a non-zero value in the
Reservation Capabilities (RESCAP) field in the Identify Namespace data structure.
3 Mandatory if reservations are supported by the namespace as indicated by a non-zero value in the
162
NVM Express 1.3
command that uses a namespace ID of FFFFFFFFh is aborted with status Invalid Field in Command. A
Get Features command that uses a valid namespace ID other than FFFFFFFFh and corresponds to a
namespace that supports PTPL, returns the PTPL state for that namespace. If a Set Features or Get
Features command using a namespace ID other than FFFFFFFFh attempts to access the PTPL state for
a namespace that does not support this Feature Identifier, then the command is aborted with status Invalid
Field in Command.
If a Get Features command successfully completes for this Feature Identifier, the attributes specified in
Figure 168 are returned in Dword 0 of the completion queue entry for that command
Figure 168: Reservation Persistence Configuration – Command Dword 11
Bit Description
31:01 Reserved
Persist Through Power Loss (PTPL): If set to '1', then reservations and registrants persist across
00 a power loss. If cleared to ‘0’, then reservations are released and registrants are cleared on a
power loss.
163
NVM Express 1.3
15:11 Reserved
Resource Type (RT): This field indicates the type of controller resource to be modified.
Value Description
10:08
000b VQ Resources
001b VI Resources
010b – 111b Reserved
07:04 Reserved
Action (ACT): This field indicates the operation for the command to perform as described below.
Value Description
0h Reserved
1h Primary Controller Flexible Allocation: Set the number of Flexible Resources
allocated to this primary controller following the next Controller Level Reset. If the
Controller Identifier field does not correspond to this primary controller then an
error of Invalid Controller Identifier is returned. This value is persistent across
power cycles and resets.
2h – 6h Reserved
7h Secondary Controller Offline: Place the secondary controller in the Offline state
and remove all Flexible Resources. If the Controller Identifier field does not
correspond to a secondary controller associated with this primary controller then
03:00 an error of Invalid Controller Identifier is returned.
8h Secondary Controller Assign: Assign the number of controller resources
specified in Number of Controller Resources to the secondary controller. If the
Controller Identifier field does not correspond to a secondary controller associated
with this primary controller then an error of Invalid Controller Identifier is returned.
If the secondary controller is not in the Offline state then an error of Invalid
Secondary Controller State is returned.
9h Secondary Controller Online: Place the secondary controller in the Online state.
If the Controller Identifier field does not correspond to a secondary controller
associated with this primary controller then an error of Invalid Controller Identifier
is returned. If the secondary controller is not configured appropriately (refer to
section 8.5) or the primary controller is not enabled, then an error of Invalid
Secondary Controller State is returned.
Ah – Fh Reserved
164
NVM Express 1.3
Dword 0 of the completion queue entry contains information about the controller resources that were
modified as part of the Primary Controller Flexible Allocation and Secondary Controller Assign actions.
Dword 0 of the completion queue entry is defined in Figure 173.
165
NVM Express 1.3
The scope of the format operation and secure erase depend on the attributes that the controller supports
for the Format NVM command and the Namespace Identifier specified in the command. The scope for the
format operation is defined in Figure 174. The scope for secure erase, if applicable based on the setting of
the Secure Erase Settings field in Command Dword 10 is defined in Figure 175.
The Format NVM command shall fail if the controller is in an invalid security state (refer to the appropriate
security specification, e.g., TCG SIIS). The Format NVM command may fail if there are outstanding I/O
commands to the namespace specified to be formatted. I/O commands for a namespace that has a Format
NVM command in progress may fail.
The settings specified in the Format NVM command are reported as part of the Identify Namespace data
structure.
The Format NVM command uses the Command Dword 10 field. All other command specific fields are
reserved.
Value Definition
000b No secure erase operation requested
User Data Erase: All user data shall be erased, contents of the
11:09
user data after the erase is indeterminate (e.g., the user data
001b may be zero filled, one filled, etc.). The controller may perform
a cryptographic erase when a User Data Erase is requested if
all user data is encrypted.
Cryptographic Erase: All user data shall be erased
010b cryptographically. This is accomplished by deleting the
encryption key.
011b – 111b Reserved
166
NVM Express 1.3
Bit Description
Protection Information Location (PIL): If set to ‘1’ and protection information is enabled, then
protection information is transferred as the first eight bytes of metadata. If cleared to ‘0’ and
08 protection information is enabled, then protection information is transferred as the last eight bytes
of metadata. This setting is reported in the Formatted LBA Size field of the Identify Namespace
data structure.
Protection Information (PI): This field specifies whether end-to-end data protection is enabled
and the type of protection information. The values for this field have the following meanings:
Value Definition
000b Protection information is not enabled
001b Protection information is enabled, Type 1
07:05
010b Protection information is enabled, Type 2
011b Protection information is enabled, Type 3
100b – 111b Reserved
When end-to-end data protected is enabled, the host shall specify the appropriate protection
information in the Read, Write, or Compare commands.
Metadata Settings (MSET): This field is set to ‘1’ if the metadata is transferred as part of an
extended data LBA. This field is cleared to ‘0’ if the metadata is transferred as part of a separate
04
buffer. The metadata may include protection information, based on the Protection Information (PI)
field. If the Metadata Size for the LBA Format selected is 0h, then this field is not applicable.
LBA Format (LBAF): This field specifies the LBA format to apply to the NVM media. This
03:00 corresponds to the LBA formats indicated in the Identify command, refer to Figure 114 and Figure
115. Only supported LBA formats shall be selected.
When a sanitize operation starts on any controller, all controllers in the NVM subsystem:
Shall clear any outstanding Sanitize Operation Completed asynchronous event;
Shall update the Sanitize Status log (refer to section 5.14.1.9.2);
Shall abort any command (submitted or in progress) not allowed during a sanitize operation with a
status of Sanitize In Progress (refer to section 8.15.1);
Should suspend power management activities; and
Shall release stream identifiers for any open streams.
167
NVM Express 1.3
While a sanitize operation is in progress, all controllers in the NVM subsystem shall abort any command
not allowed during a sanitize operation with a status of Sanitize In Progress (refer to section 8.15.1).
After a sanitize operation fails, all controllers in the NVM subsystem shall abort any command not allowed
during a sanitize operation with a status of Sanitize Failed (refer to section 8.15.1) until a subsequent
sanitize operation is started or successful recovery from the failed sanitize operation occurs.
If the most recent failed sanitize operation was started in unrestricted completion mode (i.e. the AUSE bit
was set to ‘1’ in the Sanitize command), failure recovery requires the host to issue a subsequent Sanitize
command in restricted or unrestricted completion mode or to issue a subsequent Sanitize command with
the Exit Failure Mode action.
If the most recent failed sanitize operation was started in restricted completion mode (i.e. the AUSE bit was
cleared to ‘0’ in the Sanitize command), failure recovery requires the host to issue a subsequent Sanitize
command in restricted completion mode. In the case of a sanitize operation failure in restricted completion
mode, before starting another sanitize operation:
any subsequent Sanitize command issued with the Exit Failure Mode action shall be aborted with
a status of Sanitize Failed; and
any Sanitize command issued in unrestricted completion mode shall be aborted with a status of
Sanitize Failed.
The Sanitize Capabilities field in the Identify Controller data structure indicates the sanitize operation types
supported. If an unsupported sanitize operation type is selected by a Sanitize command then the controller
shall abort the command with a status of Invalid Field in Command.
If a firmware activation is pending, then the controller shall abort any Sanitize command with a status of
Firmware Activation Requires NVM Subsystem Reset. Activation of new firmware is prohibited during a
sanitize operation (refer to section 8.15.1).
Support for Sanitize commands in a Controller Memory Buffer (i.e., submitted to an Admin Submission
Queue in a Controller Memory Buffer or specifying an Admin Completion Queue in a Controller Memory
Buffer) is implementation specific. If an implementation does not support Sanitize commands in a Controller
Memory Buffer and a controller’s Admin Submission Queue or Admin Completion Queue is in the Controller
Memory Buffer, then the controller shall abort all Sanitize commands with a status of Command Not
Supported for Queue in CMB.
All sanitize operations (Block Erase, Crypto Erase, Overwrite) are performed in the background (i.e.,
Sanitize command completion does not indicate sanitize operation completion). If a sanitize operation is
started, then the controller shall complete the Sanitize command with a status of Successful Completion. If
the controller completes a Sanitize command with any status other than Successful Completion, then the
controller:
shall not start the sanitize operation for that command;
shall not modify the Sanitize Status log page; and
shall not alter any user data.
The Sanitize command uses Command Dword 10 and Command Dword 11. All other command specific
fields are reserved.
168
NVM Express 1.3
Value Description
000b Reserved
02:00 001b Exit Failure Mode
010b Start a Block Erase sanitize operation
011b Start an Overwrite sanitize operation
100b Start a Crypto Erase sanitize operation
101b – 111b Reserved
169
NVM Express 1.3
The association between a Security Receive command and previous Security Send commands is
dependent on the Security Protocol. The format of the data to be transferred is dependent on the
Security Protocol. Refer to SFSC for Security Protocol details.
Each Security Receive command returns the appropriate data corresponding to a Security Send
command as defined by the rules of the Security Protocol. The Security Receive command data may not
be retained if there is a loss of communication between the controller and host, or if a controller reset
occurs.
The fields used are Data Pointer, Command Dword 10, and Command Dword 11 fields. All other
command specific fields are reserved.
170
NVM Express 1.3
Figure 184: Security Protocol EAh – Security Protocol Specific Field Values
SP Specific (SPSP) Description NVMe Security Specific Field (NSSF)
Value Definition
0001h Replay Protected Memory Block RPMB Target
0002h - FFFFh Reserved Reserved
171
NVM Express 1.3
172
NVM Express 1.3
6.1 Namespaces
A namespace is a collection of logical blocks that range from 0 to the capacity of the namespace – 1. A
namespace ID (NSID) is an identifier used by a controller to provide access to a namespace.
Valid NSIDs are the range of possible NSIDs that correspond to a namespace that may exist in the NVM
subsystem. Any NSID is valid, except if it is zero or greater than the Number of Namespaces field reported
in the Identify Controller data structure. NSID FFFFFFFFh is a broadcast value that is used to specify all
namespaces. An invalid NSID is any value that is not a valid NSID or the broadcast value.
Active NSIDs are valid NSIDs that are attached to the specific controller. Valid NSIDs that are not attached
to the specific controller are called inactive. An active NSID becomes inactive when the associated
namespace is detached from the specific controller or is deleted.
Allocated NSIDs are valid NSIDs that refer to namespaces that currently exist within an NVM subsystem.
An allocated NSID may not be attached to any controller. An allocated NSID shall be attached to a controller
before host software may submit I/O commands for that namespace on that controller. An allocated NSID
becomes unallocated when the associated namespace is deleted.
Unless otherwise noted, specifying an inactive namespace ID in a command that uses the namespace ID
shall cause the controller to abort the command with status Invalid Field in Command. Specifying an invalid
NSID in a command that uses the NSID field shall cause the controller to abort the command with status
Invalid Namespace or Format.
The following table summarizes the valid NSID types Figure 189 visually shows the NSID types and how
they relate.
173
NVM Express 1.3
NSID
0 1 NN NN+1 FFFFFFFFh
Broadcast Value
NVM
Allocated Unallocated
Subsystem
If Namespace Management is supported then Namespace IDs shall be unique within the NVM subsystem
(e.g., namespace ID of 3 shall refer to the same physical namespace regardless of the accessing
controller). If Namespace Management is not supported then Namespace IDs for private namespaces are
not required to be unique.
The Identify command may be used to determine the active NSIDs for a controller and the allocated NSIDs
in the NVM subsystem.
To determine the active NSIDs for a particular controller, the host may follow either of the following methods:
1. Issue Identify with the CNS field set to 00h for each valid NSID (based on the Number of
Namespaces value in Identify Controller). If a non-zero data structure is returned for a particular
NSID, then that is an active NSID.
2. Issue Identify with a CNS field set to 02h to retrieve a list of up to 1024 active NSIDs. If there are
more than 1024 active NSIDs, continue to issue Identify with a CNS field set to 02h until all active
NSIDs are retrieved.
To determine the allocated NSIDs in the NVM subsystem, the host may Issue Identify with the CNS field
set to 10h to retrieve a list of up to 1024 allocated NSIDs. If there are more than 1024 allocated NSIDs,
continue to issue Identify with a CNS field set to 10h until all allocated NSIDs are retrieved.
Namespace IDs may change across power off conditions or due to namespace management. However, it
is recommended that namespace identifiers remain static in order to avoid issues with EFI or OSes.
The Namespace Size field in the Identify Namespace data structure defines the total size of the namespace
in logical blocks (LBA 0 through n-1). The Namespace Utilization field in the Identify Namespace data
structure defines the number of logical blocks currently allocated in the namespace. The Namespace
Capacity field in the Identify data structure defines the maximum number of logical blocks that may be
allocated at one time as part of the namespace in a thin provisioning usage model. The following
relationship holds: Namespace Size >= Namespace Capacity >= Namespace Utilization.
174
NVM Express 1.3
A namespace may or may not have a relationship to a Submission Queue; this relationship is determined
by the host software implementation. The controller shall support access to any valid namespace from any
I/O Submission Queue.
175
NVM Express 1.3
Parameter Name 1
Value
The NVM subsystem reports in the Identify Controller data structure the size in logical blocks of the write
operation guaranteed to be written atomically under various conditions, including normal operation, power
fail, and in a Compare & Write fused operation. The values reported in the Identify Controller data structure
are valid across all namespaces with any supported namespace format, forming a baseline value that is
guaranteed not to change.
An NVM subsystem may report per namespace values for these fields that are specific to the namespace
format in Identify Namespace. If an NVM subsystem reports a per namespace value, it shall be greater
than or equal to the corresponding baseline value indicated in Identify Controller.
The values are reported in the fields (Namespace) Atomic Write Unit Normal, (Namespace) Atomic Write
Unit Power Fail, and (Namespace) Atomic Compare & Write Unit in Identify Controller or Identify
Namespace depending on whether the values are the baseline or namespace specific.
A controller may support Atomic Boundaries that shall not be crossed by an atomic operation. The
Namespace Atomic Boundary Parameters (NABSN, NABO, and NABSPF) define these boundaries for a
namespace. A namespace supports Atomic Boundaries if NABSN or NABSPF is set to a non-zero value.
A namespace that does not support Atomic Boundaries shall clear the NABSN and NABSPF fields to 0h.
Namespace Atomicity Parameter and Namespace Atomic Boundary Parameter values may be format
specific and may change if the namespace format is modified.
In the case of a shared namespace, operations performed by an individual controller are atomic to the
shared namespace at the write atomicity level reported in the corresponding Identify Controller or Identify
Namespace data structures of the controller to which the command was submitted.
176
NVM Express 1.3
6.4.1 AWUN/NAWUN
AWUN/NAWUN control the atomicity of command execution in relation to other commands. They impose
inter-command serialization of writing of blocks of data to the NVM and prevent blocks of data ending up
on the NVM containing partial data from one new command and partial data from one or more other new
commands.
If a write command is submitted with size less than or equal to the AWUN/NAWUN value and the write
command does not cross an atomic boundary (refer to section 6.4.3), then the host is guaranteed that the
write command is atomic to the NVM with respect to other read or write commands. If a write command is
submitted with size greater than the AWUN/NAWUN value or crosses an atomic boundary, then there is no
guarantee of command atomicity. AWUN/NAWUN does not have any applicability to write errors caused
by power failure or other error conditions (refer to Atomic Write Unit Power Fail).
The host may indicate that AWUN and NAWUN are not necessary by configuring the Write Atomicity Normal
feature (refer to section 5.21.1.10), which may result in higher performance in some implementations.
LBA 0 1 2 3 4 5 6 7
Valid Result A A A A B
Valid Result A B B B B
Invalid Result A A B B B
Invalid Result A B A A B
If the size of write commands A and B is larger than the AWUN/NAWUN value, then there is no guarantee
of ordering. After execution of command A and command B, there may be an arbitrary mix of data from
command A and command B in the LBA range specified.
177
NVM Express 1.3
6.4.2 AWUPF/NAWUPF
AWUPF and NAWUPF indicate the behavior of the controller if a power fail or other error condition interrupts
a write operation causing a torn write. A torn write is a write operation where only some of the logical blocks
that are supposed to be written contiguously are actually stored on the NVM, leaving the target logical
blocks in an indeterminate state in which some logical blocks contain original data and some logical blocks
contain new data from the write operation.
If a write command is submitted with size less than or equal to the AWUPF/NAWUPF value and the write
command does not cross an atomic boundary (refer to section 6.4.3), the controller guarantees that if the
command fails due to a power failure or other error condition, then subsequent read commands for the
logical blocks associated with the write command shall return one of the following:
All old data (i.e. original data on the NVM in the LBA range addressed by the interrupted write), or
All new data (i.e. all data to be written to the NVM by the interrupted write)
If a write command is submitted with size greater than the AWUPF/NAWUPF value or crosses an atomic
boundary, then there is no guarantee of the data returned on subsequent reads of the associated logical
blocks.
LBA 0 1 2 3 4 5 6 7
C B B B B
Command A begins executing but is interrupted by a power failure during the writing of the logical block at
LBA 1. Figure 194 describes valid and invalid results.
LBA 0 1 2 3 4 5 6 7
Valid Result A A B B B
Valid Result C B B B B
Invalid Result A B B B B
Invalid Result C A B B B
Invalid Result D D B B B
If the size of write command A is larger than the AWUPF/NAWUPF value, then there is no guarantee of the
state of the data contained in the specified LBA range after the power fail or error condition.
After a write command has completed, reads for that location which are subsequently submitted shall return
the data from that write command and not an older version of the data from previous write commands with
the following exception;
178
NVM Express 1.3
then subsequent reads for locations written to the volatile write cache that were not written to non-volatile
storage may return older data.
179
NVM Express 1.3
PRACT Metadata
Description
03 Value Size
The protection information is stripped (read) or inserted
1b 8 Bytes
(write).
The protection information is passed (read) or replaces the
1b > 8 Bytes
first or last 8 bytes of the metadata (write).
0b any The protection information is passed (read and write).
Protection Information Check (PRCHK): The protection information check field indicates the
fields that need to be checked as part of end-to-end data protection processing. This field is only
used if the namespace is formatted to use end-to-end protection information. Refer to section 8.3.
Bit Definition
If set to ‘1’ enables protection information checking of
02 the Guard field. If cleared to ‘0’, the Guard field is not
02:00 checked.
If set to ‘1’ enables protection information checking of
01 the Application Tag field. If cleared to ‘0’, the
Application Tag field is not checked.
If set to ‘1’ enables protection information checking of
the Logical Block Reference Tag field. If cleared to
00
‘0’, the Logical Block Reference Tag field is not
checked.
180
NVM Express 1.3
181
NVM Express 1.3
182
NVM Express 1.3
If the Dataset Management command is supported, all combinations of attributes specified in Figure 206
may be set.
The data that the Dataset Management command provides is a list of ranges with context attributes. Each
range consists of a starting LBA, a length of logical blocks that the range consists of and the context
attributes to be applied to that range. The definition of the Dataset Management command Range field is
specified in Figure 207. The maximum case of 256 ranges is shown.
183
NVM Express 1.3
Value Definition
00b None. No latency information provided.
AL: Access Latency 05:04
01b Idle. Longer latency acceptable.
10b Normal. Typical latency.
11b Low. Smallest possible latency.
Value Definition
0000b No frequency information provided.
0001b Typical number of reads and writes expected for this LBA range.
0010b Infrequent writes and infrequent reads to the LBA range indicated.
AF: Access Frequency 03:00 0011b Infrequent writes and frequent reads to the LBA range indicated.
0100b Frequent writes and infrequent reads to the LBA range indicated.
0101b Frequent writes and frequent reads to the LBA range indicated.
0110b
– Reserved
1111b
6.7.1.1 Deallocate
A logical block that has been deallocated using the Dataset Management command is no longer deallocated
when the logical block is written. Read operations do not affect the deallocation status of a logical block.
The value read from a deallocated logical block shall be deterministic; specifically, the value returned by
subsequent reads of that logical block shall be the same until a write occurs to that logical block.
The values read from a deallocated logical block and its metadata (excluding protection information) shall
be all bytes set to 00h, all bytes set to FFh, or the last data written to the associated logical block and its
metadata, except that access is prohibited to all data and metadata values written before the most recent
successful sanitize operation, if any. The Deallocate Logical Block Features field in the Identify Namespace
data structure may report the values read from a deallocated logical block and its metadata.
The values read from a deallocated or unwritten logical block’s protection information field shall:
have the Guard field value set to FFFFh or set to the CRC for the value read from the deallocated
logical block and its metadata (excluding protection information) (e.g., set to 0000h if the value read
is all bytes set to 00h); and
have the Application Tag field value set to FFFFh and the Reference Tag field value set to
FFFFFFFFh (indicating the protection information shall not be checked).
Host software may enable an error to be returned if a deallocated or unwritten logical block is read in the
Error Recovery feature. If this error is supported for the namespace and enabled, then a read or compare
containing a deallocated or unwritten logical block shall fail with the Unwritten or Deallocated Logical Block
status code. Note: Legacy software may not handle an error for this case.
Note: The operation of the Deallocate function is similar to the ATA DATA SET MANAGEMENT with Trim
feature described in ACS-2 and SCSI UNMAP command described in SBC-3.
184
NVM Express 1.3
185
NVM Express 1.3
186
NVM Express 1.3
Value Definition
None. No latency information
Access 00b
05:04 provided.
Latency
01b Idle. Longer latency acceptable.
10b Normal. Typical latency.
11b Low. Smallest possible latency.
Value Definition
0000b No frequency information provided.
07:00
Typical number of reads and writes
0001b
expected for this LBA range.
Infrequent writes and infrequent
0010b
reads to the LBA range indicated.
Infrequent writes and frequent
0011b
reads to the LBA range indicated.
Frequent writes and infrequent
Access 0100b
03:00 reads to the LBA range indicated.
Frequency
Frequent writes and frequent reads
0101b
to the LBA range indicated.
One time read. E.g. command is
0110b due to virus scan, backup, file copy,
or archive.
Speculative read. The command is
0111b
part of a prefetch operation.
The LBA range is going to be
1000b
overwritten in the near future.
1001b – 1111b Reserved
187
NVM Express 1.3
188
NVM Express 1.3
189
NVM Express 1.3
The controller ignores the value of this field when the Ignore Existing Key (IEKEY) bit is set
to ‘1’.
New Reservation Key (NRKEY): If the Reservation Register Action is 000b (i.e., Register
Reservation Key) or 010b (i.e., Replace Reservation Key), then this field contains the new
15:8 M
reservation key associated with the host. For all other Reservation Register Action values,
this field is reserved.
190
NVM Express 1.3
191
NVM Express 1.3
31:00 If this field corresponds to a length that is less than the size of the Reservation Status data
structure, then only that specified portion of the data structure is transferred. If this field
corresponds to a length that is greater than the size of the Reservation Status data structure, then
the entire contents of the data structure are transferred and no additional data is transferred.
192
NVM Express 1.3
193
NVM Express 1.3
Bit 0 is set to '1' if the controller is associated with a host that holds a reservation on the
namespace.
7:3 Reserved
Host Identifier (HOSTID): This field contains the 64-bit Host Identifier of the controller
15:8
described by this data structure.
Reservation Key (RKEY): This field contains the reservation key of the host associated with
23:16
the controller described by this data structure.
194
NVM Express 1.3
195
NVM Express 1.3
Value Definition
None. No latency information
Access 00b
05:04 provided.
Latency
01b Idle. Longer latency acceptable.
10b Normal. Typical latency.
11b Low. Smallest possible latency.
196
NVM Express 1.3
197
NVM Express 1.3
198
NVM Express 1.3
Protection Information Field (PRINFO): Specifies the protection information action and check
29:26
field, as defined in Figure 196. The Protection Information Check field (PRCHK) shall be 000b.
Deallocate (DEAC): If set to ‘1’, then the controller should deallocate logical blocks and may write
25 all bytes set to 00h. If cleared to ‘0’, then the controller may write all bytes set to 00h or may
deallocate logical blocks.
24:16 Reserved
Number of Logical Blocks (NLB): This field indicates the number of logical blocks to be written.
15:00
This is a 0’s based value.
199
NVM Express 1.3
7 Controller Architecture
7.1 Introduction
Host software submits commands to the controller through pre-allocated Submission Queues. The
controller is alerted to newly submitted commands through SQ Tail Doorbell register writes. The difference
between the previous doorbell register value and the current register write indicates the number of
commands that were submitted.
The controller fetches the commands from the Submission Queue(s) and transmits them to the NVM
subsystem for processing. Except for fused operations, there are no ordering restrictions for processing of
the commands within or across Submission Queues. Host software should not place commands in the list
that may not be re-ordered arbitrarily. Data may or may not be committed to the NVM media in the order
that commands are received.
Host software submits commands of higher priorities to the appropriate Submission Queues. Priority is
associated with the Submission Queue itself, thus the priority of the command is based on the Submission
Queue it is issued through. The controller arbitrates across the Submission Queues based on fairness and
priority according to the arbitration scheme specified in section 4.11.
Upon completion of the commands by the NVM subsystem, the controller presents completion queue
entries to the host through the appropriate Completion Queues. If MSI-X or multiple message MSI is in
use, then the interrupt vector indicates the Completion Queue(s) with possible new command completions
for the host to process. If pin-based interrupts or single message MSI interrupts are used, host software
interrogates the Completion Queue(s) for new completion queue entries. The host updates the CQ Head
doorbell register to release Completion Queue entries to the controller and clear the associated interrupt.
There are no ordering restrictions for completions to the host. Each completion queue entry identifies the
Submission Queue Identifier and Command Identifier of the associated command. Host software uses this
information to correlate the completions with the commands submitted to the Submission Queue(s).
Host software is responsible for creating all required Submission and Completion Queues prior to submitting
commands to the controller. I/O Submission and Completion Queues are created using Admin commands
defined in section 5.
200
NVM Express 1.3
queue entry has a Phase Tag inverted from the previous entry to indicate to the host that this
completion queue entry is a new entry.
6. The controller optionally generates an interrupt to the host to indicate that there is a new completion
queue entry to consume and process. In the figure, this is shown as an MSI-X interrupt, however,
it could also be a pin-based or MSI interrupt. Note that based on interrupt coalescing settings, an
interrupt may or may not be generated for each new completion queue entry.
7. The host consumes and then processes the new completion queue entries in the Completion
Queue. This includes taking any actions based on error conditions indicated. The host continues
consuming and processing completion queue entries until it encounters a previously consumed
entry with a Phase Tag inverted from the value of the current completion queue entries.
8. The host writes the Completion Queue Head Doorbell register to indicate that the completion queue
entry has been consumed. The host may consume many entries before updating the associated
Completion Queue Head Doorbell register.
201
NVM Express 1.3
e. MPTR shall be filled in with the offset to the beginning of the Metadata Region, if there
is a data transfer and the namespace format contains metadata as a separate buffer.
f. PRP1 and/or PRP2 (or SGL Entry 1 if SGLs are used) are set to the source/destination
of data transfer, if there is a data transfer.
g. CDW10 – CDW15 are set to any command specific information.
2. Host software writes the corresponding Submission Queue doorbell register (SQxTDBL)
to submit one or more commands for processing.
The write to the Submission Queue doorbell register triggers the controller to consume one or more
new commands contained in the Submission Queue entry. The controller indicates the most recent
SQ entry that has been consumed as part of reporting completions. Host software may use this
information to determine when SQ slots may be re-used for new commands.
202
NVM Express 1.3
After the command is built, host software submits the command for execution by writing the Admin
Submission Queue doorbell (SQ0TDBL) to indicate to the controller that this command is available for
processing.
Host software shall maintain the PRP List unmodified in host memory until the Submission Queue is
deleted.
203
NVM Express 1.3
Non-Contiguous
Cmd#0 (64Bytes)
Cmd#1 (64Bytes)
Cmd#2 (64Bytes)
PRP List
PRP1
PRP Entry#0 (offset of 0h) 4KB Memory Page i
PRP Entry#1 (offset of 0h)
PRP Entry#2 (offset of 0h) Cmd#63 (64Bytes)
Cmd#64 (64Bytes)
Cmd#65 (64Bytes)
Cmd#66 (64Bytes)
Cmd#127 (64Bytes)
Cmd#128 (64Bytes)
Cmd#129 (64Bytes)
Cmd#130 (64Bytes)
Cmd#191 (64Bytes)
204
NVM Express 1.3
205
NVM Express 1.3
Command
Page #0
31:24 PRP 1 (offset of xxh)
LBA #0 (4KB)
39:32 PRP 2 (offset of xxh)
Page #1
LBA #1 (4KB)
PRP List
PRP Entry#0 (offset of 0h)
PRP Entry#1 (offset of 0h)
PRP Entry#2 (offset of 0h) Page #2
LBA #2 (4KB)
Page #3
LBA #3 (4KB)
206
NVM Express 1.3
7.3 Resets
7.3.1 NVM Subsystem Reset
An NVM Subsystem Reset is initiated when:
Power is applied to the NVM subsystem,
A value of 4E564D65h (“NVMe”) is written to the NSSR.NSSRC field, or
A vendor specific event occurs.
When an NVM Subsystem Reset occurs, the entire NVM subsystem is reset. This includes the initiation of
a Controller Level Reset on all controllers that make up the NVM subsystem and a transition to the Detect
LTSSM state by all PCI Express ports of the NVM subsystem.
The occurrence of an NVM Subsystem Reset while power is applied to the NVM subsystem is reported by
the initial value of the CSTS.NSSRO field following the NVM Subsystem Reset. This field may be used by
host software to determine if the sudden loss of communication with a controller was due to an NVM
Subsystem Reset or some other condition.
The ability for host software to initiate an NVM Subsystem Reset by writing to the NSSR.NSSRC field is an
optional capability of a controller indicated by the state of the CAP.NSSRS field. An implementation may
protect the NVM subsystem from an inadvertent NVM Subsystem Reset by not providing this capability to
one or more controllers that make up the NVM subsystem.
207
NVM Express 1.3
208
NVM Express 1.3
7.5 Interrupts
The interrupt architecture allows for efficient reporting of interrupts such that the host may service interrupts
through the least amount of overhead.
The specification allows the controller to be configured to report interrupts in one of four modes. The four
modes are: pin-based interrupt, single message MSI, multiple message MSI, and MSI-X. It is
recommended that MSI-X be used whenever possible to enable higher performance, lower latency, and
lower CPU utilization for processing interrupts.
Interrupt aggregation, also referred to as interrupt coalescing, mitigates host interrupt overhead by reducing
the rate at which interrupt requests are generated by a controller. This reduced host overhead typically
comes at the expense of increased latency. Rather than prescribe a specific interrupt aggregation algorithm,
this specification defines the mechanisms a host may use to communicate desired interrupt aggregation
parameters to a controller and leaves the specific interrupt aggregation algorithm used by a controller as
vendor specific. Interrupts associated with the Admin Completion Queue should not be delayed.
The Aggregation Threshold field in the Interrupt Coalescing feature (refer to section 5.21.1.8) specifies the
host desired minimum interrupt aggregation threshold on a per vector basis. This value defines the number
of Completion Queue entries that when aggregated on a per interrupt vector basis reduces host interrupt
processing overhead below a host determined threshold. This value is provided to the controller as a
recommendation by the host and a controller is free to generate an interrupt before or after this aggregation
threshold is achieved. The specific manner in which this value is used by the interrupt aggregation algorithm
implemented by a controller is implementation specific.
The Aggregation Time field in the Interrupt Coalescing feature (refer to section 5.21.1.8) specifies the host
desired maximum delay that a controller may apply to a Completion Queue entry before an interrupt is
signaled to the host. This value is provided to the controller as a recommendation by the host and a
controller is free to generate an interrupt before or after this aggregation time is achieved. A controller may
apply this value on a per vector basis or across all vectors. The specific manner in which this value is used
by the interrupt aggregation algorithm implemented by a controller is implementation specific.
Although support of the Get Features and Set Features commands associated with interrupt coalescing is
required, the manner in which the Aggregation Threshold and Aggregation Time fields are used is
implementation specific. For example, an implementation may ignore these fields and not implement
interrupt coalescing.
209
NVM Express 1.3
Within the controller there is an interrupt status register (IS) that is not visible to the host. In this mode, the
IS register determines whether the PCI interrupt line shall be driven active or an MSI message shall be
sent. Each bit in the IS register corresponds to an interrupt vector. The IS bit is set to ‘1’ when the AND of
the following conditions is true:
There is one or more unacknowledged completion queue entries in a Completion Queue that
utilizes this interrupt vector;
The Completion Queue(s) with unacknowledged completion queue entries has interrupts enabled
in the “Create I/O Completion Queue” command;
The corresponding INTM bit exposed to the host is cleared to ‘0’, indicating that the interrupt is not
masked.
For single and multiple MSI, the INTM register masks interrupt delivery prior to MSI logic. As such, an
interrupt on a vector masked by INTM does not cause the corresponding Pending bit to assert within the
MSI Capability Structure.
If MSIs are not enabled, IS[0] being a one causes the PCI interrupt line to be active (electrical ‘0’). If MSIs
are enabled, any change to the IS register that causes an unmasked status bit to transition from zero to
one or clearing of a mask bit whose corresponding status bit is set shall cause an MSI to be sent. Therefore,
while in wire mode, a single wire remains active, while in MSI mode, several messages may be sent, as
each edge triggered event on a port shall cause a new message.
In order to clear an interrupt for a particular interrupt vector, host software shall acknowledge all completion
queue entries for Completion Queues associated with the interrupt vector.
210
NVM Express 1.3
should process all completion queue entries and acknowledge the completion queue entries have been
processed by writing the associated CQyHDBL doorbell registers. When all completion queue entries have
been processed, host software should unmask interrupts by clearing the appropriate mask register bits to
‘0 via the INTMC register.
It is recommended that the MSI interrupt vector associated with the CQ(s) being processed be masked
during processing of completion queue entries within the CQ(s) to avoid spurious and/or lost interrupts. For
single message or multiple message MSI, the INTMS and INTMC registers should be used to appropriately
mask interrupts during completion queue entry processing.
211
NVM Express 1.3
MSI-X, similar to multiple-message MSI, allows completions to be aggregated on a per vector basis.
However, the maximum number of vectors is 2K. MSI-X also allows each interrupt to send a unique
message data corresponding to the vector.
MSI-X allows completions to be aggregated on a per vector basis. Each Completion Queue(s) may send
its own interrupt message, as opposed to a single message for all completions.
When generating an MSI-X message, the following checks occur before generating the message:
The function mask bit in the MSI-X Message Control register is not set to ‘1’
The corresponding vector mask in the MSI-X table structure is not set to ‘1’
If either of the masks are set, the corresponding pending bit in the MSI-X PBA structure is set to ‘1’ to
indicate that an interrupt is pending for that vector. The MSI for that vector is later generated when both the
mask bits are reset to ‘0’.
It is recommended that the interrupt vector associated with the CQ(s) being processed be masked during
processing of completion queue entries within the CQ(s) to avoid spurious and/or lost interrupts. The
interrupt mask table defined as part of MSI-X should be used to mask interrupts.
7.6.1 Initialization
The host should perform the following actions in sequence to initialize the controller to begin executing
commands:
1. Set the PCI and PCI Express registers described in section 2 appropriately based on the system
configuration. This includes configuration of power management features. A single interrupt (e.g.
pin-based, single-MSI, or single MSI-X) should be used until the number of I/O Queues is
determined.
2. The host waits for the controller to indicate that any previous reset is complete by waiting for
CSTS.RDY to become ‘0.’
3. The Admin Queue should be configured. The Admin Queue is configured by setting the Admin
Queue Attributes (AQA), Admin Submission Queue Base Address (ASQ), and Admin Completion
Queue Base Address (ACQ) to appropriate values.
4. The controller settings should be configured. Specifically:
a. The arbitration mechanism should be selected in CC.AMS.
b. The memory page size should be initialized in CC.MPS.
c. The I/O Command Set that is to be used should be selected in CC.CSS.
5. The controller should be enabled by setting CC.EN to ‘1’.
6. The host should wait for the controller to indicate it is ready to process commands. The controller
is ready to process commands when CSTS.RDY is set to ‘1’.
7. The host should determine the configuration of the controller by issuing the Identify command,
specifying the Controller data structure. The host should then determine the configuration of each
namespace by issuing the Identify command for each namespace, specifying the Namespace data
structure.
8. The host should determine the number of I/O Submission Queues and I/O Completion Queues
supported using the Set Features command with the Number of Queues feature identifier. After
determining the number of I/O Queues, the MSI and/or MSI-X registers should be configured.
9. The host should allocate the appropriate number of I/O Completion Queues based on the number
required for the system configuration and the number supported by the controller. The I/O
Completion Queues are allocated using the Create I/O Completion Queue command.
212
NVM Express 1.3
10. The host should allocate the appropriate number of I/O Submission Queues based on the number
required for the system configuration and the number supported by the controller. The I/O
Submission Queues are allocated using the Create I/O Submission Queue command.
11. If the host desires asynchronous notification of optional events, the host should issue a Set
Features command specifying the events to enable. If the host desires asynchronous notification
of events, the host should submit an appropriate number of Asynchronous Event Request
commands. This step may be done at any point after the controller signals it is ready (i.e.,
CSTS.RDY is set to ‘1’).
After performing these steps, the controller may be used for I/O commands.
For exit of the D3 power state, the initialization steps outlined should be followed. In this case, the number
of I/O Submission Queues and I/O Completion Queues shall not change, thus step 7 of the initialization
sequence is optional.
7.6.2 Shutdown
It is recommended that the host perform an orderly shutdown of the controller by following the procedure
in this section when a power-off or shutdown condition is imminent.
The host should perform the following actions in sequence for a normal shutdown:
1. Stop submitting any new I/O commands to the controller and allow any outstanding commands to
complete.
2. The host should delete all I/O Submission Queues, using the Delete I/O Submission Queue
command. A result of the successful completion of the Delete I/O Submission Queue command is
that any remaining commands outstanding are aborted.
3. The host should delete all I/O Completion Queues, using the Delete I/O Completion Queue
command.
4. The host should set the Shutdown Notification (CC.SHN) field to 01b to indicate a normal shutdown
operation. The controller indicates when shutdown processing is completed by updating the
Shutdown Status (CSTS.SHST) field to 10b.
For entry to the D3 power state, the shutdown steps outlined for a normal shutdown should be followed.
The host should perform the following actions in sequence for an abrupt shutdown:
1. Stop submitting any new I/O commands to the controller.
2. The host should set the Shutdown Notification (CC.SHN) field to 10b to indicate an abrupt shutdown
operation. The controller indicates when shutdown processing is completed by updating the
Shutdown Status (CSTS.SHST) field to 10b.
213
NVM Express 1.3
It is recommended that the host wait a minimum of the RTD3 Entry Latency reported in the Identify
Controller data structure for the shutdown operations to complete; if the value reported in RTD3 Entry
Latency is 0h, then the host should wait for a minimum of one second. It is not recommended to disable
the controller via the CC.EN field. This causes a Controller Reset which may impact the time required to
complete shutdown processing.
It is safe to power off the controller when CSTS.SHST indicates shutdown processing is complete
(regardless of the value of CC.EN). It remains safe to power off the controller until CC.EN transitions from
‘0’ to ‘1’.
To start executing commands on the controller after a shutdown operation, a Controller Reset (CC.EN
cleared from ‘1’ to ‘0’) is required. The initialization sequence should then be executed.
It is an implementation choice whether the host aborts all outstanding commands to the Admin Queue prior
to the shutdown. The only commands that should be outstanding to the Admin Queue at shutdown are
Asynchronous Event Request commands.
1. If the event(s) in the reported Log Page may be disabled with the Asynchronous Event
Configuration feature (refer to section 5.21.1.11), then host software issues a Set Features
command for the Asynchronous Event Configuration feature specifying to disable reporting of all
events that utilize the Log Page reported. Host software should wait for the Set Features
command to complete.
2. Host software issues a Get Log Page command requesting the Log Page reported as part of the
Asynchronous Event Command completion. Host software should wait for the Get Log Page
command to complete.
3. Host software parses the returned Log Page. If the condition is not persistent, then host software
should re-enable all asynchronous events that utilize the Log Page. If the condition is persistent,
then host software should re-enable all asynchronous events that utilize the Log Page except for
the one(s) reported in the Log Page. The host re-enables events by issuing a Set Features
command for the Asynchronous Event Configuration feature.
4. Host software should issue an Asynchronous Event Request command to the controller (restoring
to n the number of these commands outstanding).
5. If the reporting of event(s) was disabled, host software should enable reporting of the event(s)
using the Asynchronous Event Configuration feature. If the condition reported may persist, host
software should continue to monitor the event (e.g., spare below threshold) to determine if
reporting of the event should be re-enabled.
214
NVM Express 1.3
The default value for each Feature is vendor specific and set by the manufacturer unless otherwise
specified; it is not changeable. The saveable value is the value that the Feature has after a power on or
reset event. The controller may not support a saveable value for a Feature; this is discovered by using the
‘supported capabilities’ value in the Select field in Get Features. If the controller does not support a
saveable value for a Feature, then the default value is used after a power on or reset event. The current
value is the value actively in use by the controller for a Feature after a Set Features command completes.
Set Features may be used to modify the saveable and current value for a Feature. Get Features may be
used to read the default, saveable, and current value for a Feature. If the controller does not support a
saveable value for a Feature, then the default value is returned for the saveable value in Get Features.
Feature settings may apply to the entire controller (and all associated namespaces) or may apply to each
namespace individually. To change or retrieve a value that applies to the controller and all associated
namespaces, host software sets CDW1.NSID to 0h or FFFFFFFFh in the Set Features or Get Features
command. Features that are not namespace specific shall have the CDW1.NSID field set to 0h.
To change or retrieve a value that applies to a specific namespace, host software sets CDW1.NSID to the
identifier of that namespace in the Set Features or Get Features command. If host software specifies a
valid CDW1.NSID value that is not 0h or FFFFFFFFh and the Feature is not namespace specific, then a
Set Features command returns the Feature Not Namespace Specific status code, whereas a Get Features
command returns the Feature value that applies to the entire controller.
If the controller supports the Save field in the Set Features command and the Select field in the Get Features
command, then any Feature Identifier that is namespace specific may be saved on a per namespace basis.
There are mandatory and optional Feature Identifiers defined in Figure 134 and Figure 135. If a Get
Features command or Set Features command is processed that specifies a Feature Identifier that is not
supported, then the controller shall abort the command with a status of Invalid Field in Command.
The following are examples of NVMe Qualified Names that may be generated by “Example NVMe, Inc.”
215
NVM Express 1.3
nqn.2014-08.com.example:nvme:nvm-subsystem-sn-d78432
nqn.2014-08.com.example:nvme.host.sys.xyz
The second format may be used to create a unique identifier when there is not a naming authority or there
is not a desire for a human readable string. This format consists of:
The string “nqn.”
The string “2014-08.org.nvmexpress:uuid:”.
A 128-bit UUID based on the definition in RFC 4122 represented as a string formatted as
“11111111-2222-3333-4444-555555555555”.
The following is an example of an NVMe Qualified Name using the UUID-based format:
nqn.2014-08.org.nvmexpress:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6
Byte 00 01 02 03
Value CDh ABh 34h 12h
Byte 04 05 06 23 - 07 24 25 63 - 26
Value 53h (‘S’) 4Eh (‘N’) 31h (‘1’) 20h (‘ ‘) 4Dh (‘M’) 32h (‘2’) 20h (‘ ‘)
216
NVM Express 1.3
Byte 73 74 75
Value EFh CDh ABh
EUI64 is defined in big endian format. The OUI field differs from the OUI Identifier which is in little endian
format as described in section 7.10.3.
Example:
OUI Identifier = ABCDEFh
Extension Identifier = 0123456789h
The MA-L format is similar to the World Wide Name (WWN) format defined as IEEE Registered designator
(NAA = 5) as shown below.
Byte 0 1 2 3 4 5 6 7
EUI64 OUI Extension Identifier
WWN
5h OUI Vendor Specific Identifier
(NAA = 5)
217
NVM Express 1.3
The NGUID format is similar to the World Wide Name (WWN) format as IEEE Registered Extended
designator (NAA = 6) as shown below.
Byte 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
NGUID Vendor Specific Extension Identifier OUI Extension Identifier
WWN
6h OUI Vendor Specific Identifier Vendor Specific Identifier Extension
(NAA = 6)
An NVM subsystem may contain multiple controllers. All of the controllers that make up an NVM subsystem
share the same NVM subsystem unique identifier. The Controller ID (CNTLID) value returned in the Identify
Controller data structure may be used to uniquely identify a controller within an NVM subsystem. The
Controller ID value when combined with the NVM subsystem identifier forms a globally unique value that
218
NVM Express 1.3
identifies the controller. The mechanism used by the vendor to assign Controller ID values is outside the
scope of this specification.
The Identify Namespace data structure contains the IEEE Extended Unique Identifier (EUI64) and the
Namespace Globally Unique Identifier (NGUID) fields. EUI64 is an 8-byte EUI-64 identifier and NGUID is a
16-byte identifier based on EUI-64. When creating a namespace, the controller specifies a globally unique
value in the EUI64 or NGUID field (the controller may optionally specify a globally unique value in both
fields). In cases where the 64-bit EUI64 field is unable to ensure a globally unique namespace identifier,
the EUI64 field shall be cleared to 0h. When not implemented, these fields contain a value of 0h. A controller
may reuse a non-zero NGUID or EUI64 value for a new namespace after the original namespace using the
value has been deleted. If bit 3 in NSFEAT is cleared to ‘0’, then a controller may reuse a non-zero
NGUID/EUI64 value for a new namespace after the original namespace using the value has been deleted.
219
NVM Express 1.3
If X is less than or equal to Y, the controller doorbell register should be updated with the new doorbell value.
220
NVM Express 1.3
8 Features
8.1 Firmware Update Process
The process for a firmware update to be activated by a reset is:
1. The host issues a Firmware Image Download command to download the firmware image to the
controller. There may be multiple portions of the firmware image to download, thus the offset for
each portion of the firmware image being downloaded is specified in the Firmware Image Download
command. The data provided in the Firmware Image Download command should conform to the
Firmware Update Granularity indicated in the Identify Controller data structure or the firmware
update may fail.
2. After the firmware is downloaded to the controller, the next step is for the host to submit a Firmware
Commit command. The Firmware Commit command verifies that the last firmware image
downloaded is valid and commits that image to the firmware slot indicated for future use. A
firmware image that does not start at offset zero, contains gaps, or contains overlapping regions is
considered invalid. A controller may employ additional vendor specific means (e.g., checksum,
CRC, cryptographic hash or a digital signature) to determine the validity of a firmware image.
a. The Firmware Commit command may also be used to activate a firmware image
associated with a previously committed firmware slot.
3. The last step is to perform a reset that then causes the firmware image specified in the Firmware
Slot field in the Firmware Commit command to be activated. The reset may be an NVM Subsystem
Reset, Conventional Reset, Function Level Reset, or Controller Reset (CC.EN transitions from ‘1’
to ‘0’).
a. In some cases a Conventional Reset or NVM Subsystem Reset is required to activate a
Firmware image. This requirement is indicated by Firmware Commit command specific
status (refer to section 5.11.1).
4. After the reset has completed, host software re-initializes the controller. This includes re-allocating
I/O Submission and Completion Queues. Refer to section 7.6.1.
The process for a firmware update to be activated without a reset is:
1. The host issues a Firmware Image Download command to download the firmware image to the
controller. There may be multiple portions of the firmware image to download, thus the offset for
each portion of the firmware image being downloaded is specified in the Firmware Image Download
command. The data provided in the Firmware Image Download command should conform to the
Firmware Update Granularity indicated in the Identify Controller data structure or the firmware
update may fail.
2. The host submits a Firmware Commit command with a Commit Action of 011b which specifies that
the image should be activated immediately without reset. The downloaded image should replace
the image in the firmware slot. If no image was downloaded since the last reset or Firmware Commit
command, (i.e., the first step was skipped), then the controller shall verify and activate the image
in the specified slot. If the controller starts to activate the firmware, any controllers affected by the
new firmware send a Firmware Activation Starting asynchronous event to the host if Firmware
Activation Notices are enabled (refer to Figure 148).
a. The Firmware Commit command may also be used to activate a firmware image
associated with a previously committed firmware slot.
3. The controller completes the Firmware Commit command. The following actions are taken in
certain error scenarios:
a. If the firmware image is invalid, then the controller reports the appropriate error (e.g., Invalid
Firmware Image).
b. If the firmware activation was not successful because a reset is required to activate this
firmware, then the controller reports an error of Firmware Activation Requires Reset and
the image is applied at the next reset.
221
NVM Express 1.3
c. If the firmware activation was not successful because the firmware activation time would
exceed the MTFA value reported in the Identify Controller data structure, then the controller
reports an error of Firmware Activation Requires Maximum Time Violation. In this case, to
activate the firmware, the Firmware Commit command needs to be re-issued and the
image activated using a reset.
If a D3 cold condition occurs during the firmware activation process, the controller may resume operation
with either the old or new firmware.
If the firmware is not able to be successfully loaded, then the controller shall revert to the previously active
firmware image or the baseline read-only firmware image, if available, and indicate the failure as an
asynchronous event with a Firmware Image Load Error.
Host software shall not update multiple firmware images simultaneously. A firmware image shall be
committed to the firmware slot using Firmware Commit before downloading additional firmware images. If
the controller does not receive a Firmware Commit command, then it shall delete the portion(s) of the new
image in the case of a reset.
Figure 255: Metadata – Contiguous with LBA Data, Forming Extended LBA
LBA n Data
Sector
LBA nN
Metadata
LBA n+1 Data
Sector
LBA nN+2
+1
Metadata
… Host
The second mechanism for transferring the metadata is as a separate buffer of data. This mechanism is
illustrated in Figure 256. In this case, the metadata is pointed to with the Metadata Pointer, while the logical
block data is pointed to by the Data Pointer. When a command uses PRPs for the metadata in the
command, the metadata is required to be physically contiguous. When a command uses SGLs for the
metadata in the command, the metadata is not required to be physically contiguous.
222
NVM Express 1.3
Sector
LBA nN Sector
LBA n+1
N+1 Sector
LBA n+2
N+2 …
Metadata Metadata Metadata
Metadata Buffer (MD)
Host
Sector
LBA nNData
Data LBA n+1 Data LBA n+2 Data
…
Data Buffer (PRP1 & PRP2)
One of the transfer mechanisms shall be selected for each namespace when it is formatted; transferring a
portion of metadata with one mechanism and a portion with the other mechanism is not supported.
If end-to-end data protection is used, then the Protection Information field for each logical block is contained
in the metadata.
223
NVM Express 1.3
Bit
7 6 5 4 3 2 1 0
0 MSB
Guard
1 LSB
2 MSB
Application Tag
3 LSB
Byte
4 MSB
5
Reference Tag
6
7 MSB
LSB
224
NVM Express 1.3
NOTE: In cases (b) and (d) the Protection Information could be before or after the 8 bytes of metadata.
225
NVM Express 1.3
If the namespace is formatted with protection information and the PRACT bit is cleared to ‘0’, then the
logical block data and metadata, which in this case contains the protection information and possibly
additional host metadata, is transferred by the controller from the NVM to the host buffer (i.e., the metadata
field remains the same size in the NVM and the host buffer). As the logical block data and metadata pass
through the controller, the protection information within the metadata is checked. If a protection information
check error is detected, the command completes with the status code of the error detected (i.e., End-to-
end Guard Check, End-to-end Application Tag Check or End-to-end Reference Tag Check).
If the namespace is formatted with protection information and the PRACT bit is set to ‘1’, then:
a) if the namespace is formatted with Metadata Size equal to 8 (refer to Figure 115), the logical
block data and metadata (which in this case is, by definition, the protection information), is read
from the NVM by the controller. As the logical block and metadata pass through the controller,
the protection information is checked. If a protection information check error is detected, the
command completes with the status code of the error detected (i.e., End-to-end Guard Check,
End-to-end Application Tag Check or End-to-end Reference Tag Check). After processing the
protection information, the controller strips it and returns the logical block data to the host (i.e., the
metadata is not resident within the host buffer);
b) if the namespace is formatted with Metadata Size greater than 8, the logical block data and the
metadata, which in this case contains the protection information and additional host formatted
metadata, is read from the NVM by the controller. As the logical block and metadata pass
through the controller, the protection information embedded within the metadata is checked. If a
protection information check error is detected, the command completes with the status code of
the error detected (i.e., End-to-end Guard Check, End-to-end Application Tag Check or End-to-
end Reference Tag Check). After processing the protection information, the controller passes the
logical block data and metadata, with the embedded protection information unchanged, to the
host (i.e., the metadata field remains the same size in the NVM as within the host buffer).
226
NVM Express 1.3
NOTE: In cases (b) and (d) the PI could be before or after the 8 bytes of metadata.
227
NVM Express 1.3
228
NVM Express 1.3
For Type 3 protection, if bit 0 of the PRCHK field is set to ‘1’, then the command may be aborted with status
Invalid Field in Command. The controller may ignore the ILBRT and EILBRT fields when Type 3 protection
is used because the computed reference tag remains unchanged.
Protection checking may be disabled as a side effect of the value of the protection information Application
Tag and Reference Tag fields regardless of the state of the PRCHK field in the command. If the namespace
is formatted for Type 1 or Type 2 protection, then all protection information checks are disabled regardless
of the state of the PRCHK field when the protection information Application Tag has a value of FFFFh. If
the namespace is formatted for Type 3 protection, then all protection information checks are disabled
regardless of the state of the PRCHK field when the protection information Application Tag has a value of
FFFFh and the protection information Reference Tag has a value of FFFFFFFFh.
Inserted protection information consists of the computed CRC-16 in the Guard field, the LBAT field value
in the Application Tag, and the computed reference tag in the Reference Tag field.
The number of power states implemented by a controller is returned in the Number of Power States
Supported (NPSS) field in the Identify Controller data structure. A controller shall support at least one
power state and may optionally support up to a total of 32 power states. Power states are contiguously
numbered starting with zero such that each subsequent power state consumes less than or equal to the
maximum power consumed in the previous state. Thus, power state zero indicates the maximum power
that the NVM subsystem is capable of consuming.
Associated with each power state is a Power State Descriptor in the Identify Controller data structure (refer
to Figure 113). The descriptors for all implemented power states may be viewed as forming a table as
shown in Figure 262 for a controller with seven implemented power states. Note that Figure 262 is
illustrative and does not include all fields in the power state descriptor. The Maximum Power (MP) field
indicates the maximum power that may be consumed in that state. Refer to the appropriate form factor
specification for power measurement methodologies for that form factor. The controller may employ
229
NVM Express 1.3
autonomous power management techniques to reduce power consumption below this level, but under no
circumstances is power allowed to exceed this level.
Figure 262: Example Power State Descriptor Table
Relative Relative Relative Relative
Maximum Entry Exit
Power Read Read Write Write
Power Latency Latency
State Throughput Latency Throughput Latency
(MP) (ENTLAT) (EXLAT)
(RRT) (RRL) (RWT) (RWL)
0 25 W 5 µs 5 µs 0 0 0 0
1 18 W 5 µs 7 µs 0 0 1 0
2 18 W 5 µs 8 µs 1 0 0 0
3 15 W 20 µs 15 µs 2 0 2 0
4 10 W 20 µs 30 µs 1 1 3 0
5 8W 50 µs 50 µs 2 2 4 0
6 5W 20 µs 5000 µs 4 3 5 1
The Idle Power (IDLP) field indicates the typical power consumed by the NVM subsystem over 30 seconds
in the power state when idle (i.e., there are no pending commands register accesses, background
processes, nor device self-test operations). The measurement starts after the NVM subsystem has been
idle for 10 seconds.
The Active Power (ACTP) field indicates the largest average power of the NVM subsystem over a 10 second
window on a particular workload (refer to section 8.4.3). Active Power measurement starts when the first
command is submitted and ends when the last command is completed. The largest average power over a
10 second window, consumed by the NVM subsystem in that state is reported in the Active Power field. If
the workload completes faster than 10 seconds, the average active power should be measured over the
period of the workload. Non-operational states shall set Active Power Scale, Active Power Workload, and
Active Power fields to 0h.
The host may dynamically modify the power state using the Set Features command and determine the
current power state using the Get Features command. The host may directly transition between any two
supported power states. The Entry Latency (ENTLAT) field in the power management descriptor indicates
the maximum amount of time in microseconds that it takes to enter that power state and the Exit Latency
(EXLAT) field indicates the maximum amount of time in microseconds that it takes to exit that state.
The maximum amount of time to transition between any two power states is equal to the sum of the old
state’s exit latency and the new state’s entry latency. The host is not required to wait for a previously
submitted power state transition to complete before initiating a new transition. The maximum amount of
time for a sequence of power state transitions to complete is equal to the sum of transition times for each
individual power state transition in the sequence.
Associated with each power state descriptor are Relative Read Throughput (RRT), Relative Write
Throughput (RWT), Relative Read Latency (RRL) and Relative Write Latency (RWL) fields that provide the
host with an indication of relative performance in that power state. Relative performance values provide an
ordering of performance characteristics between power states. Relative performance values may repeat,
may be skipped, and may be assigned in any order (i.e., increasing power states need not have increasing
relative performance values).
A lower relative performance value indicates better performance (e.g., higher throughput or lower latency).
For example, in Figure 262 power state 1 has higher read throughput than power state 2, and power states
0 through 3 all have the same read latency. Relative performance ordering is only with respect to a single
performance characteristic. Thus, although the relative read throughput value of one power state may
equal the relative write throughput value of another power state, this does not imply that the actual read
and write performance of these two power states are equal.
230
NVM Express 1.3
The default NVM Express power state is implementation specific and shall correspond to a state that does
not consume more power than the lowest value specified in the form factor specification used by the PCI
Express SSD. The host shall never select a power state that consumes more power than the PCI Express
slot power limit control value expressed by the Captured Slot Power Limit Value (CSPLV) and Captured
Slot Power Limit Scale (CSPLS) fields of the PCI Express Device Capabilities (PXDCAP) register. Hosts
that do not dynamically manage power should set the power state to the lowest numbered state that
satisfies the PCI Express slot power limit control value.
If a controller implements the PCI Express Dynamic Power Allocation (DPA) capability and it is enabled
(i.e., the Substate Control Enable bit is set), then the maximum power that may be consumed by the NVM
subsystem is equal to the minimum value specified by the DPA substate or the NVM Express power state,
whichever is lower.
231
NVM Express 1.3
The power state to transition to shall be a non-operational power state (a non-operational power state may
autonomously transition to another non-operational power state). If an operational power state is specified
then the controller should abort the command with a status of Invalid Field in Command. Refer to section
8.4.1 for more details.
232
NVM Express 1.3
between active power states or perform vendor specific thermal management actions in order to attempt to
meet thermal management requirements specified by the host. If active power states transitions are used
to attempt to meet these thermal management requirements specified by the host then those active power
states transitions are vendor specific.
The host specifies and enables the thermal management requirements by setting the Thermal Management
Temperature 1 field and/or Thermal Management Temperature 2 field (refer to section 5.21.1.16) in a Set
Features command to a non-zero value. The supported range of values for the Thermal Management
Temperature 1 field and Thermal Management Temperature 2 field are indicated in the Identify Controller
data structure in Figure 109.
The Thermal Management Temperature 1 specifies that if the Composite Temperature (refer to Figure 94)
is:
a) at or above this value; and
b) less than the Thermal Management Temperature 2, if non-zero,
then the controller should start transitioning to lower power active power states or perform vendor specific
thermal management actions while minimizing the impact on performance in order to attempt to reduce the
Composite Temperature (e.g., transition to an active power state that performs light throttling).
The Thermal Management Temperature 2 field specifies that if the Composite Temperature is at or above
this value, then the controller shall start transitioning to lower power active power states or perform vendor
specific thermal management actions regardless of the impact on performance in order to attempt to reduce
the Composite Temperature (e.g., transition to an active power state that performs heavy throttling).
If the controller is currently in a lower power active power state or performing vendor specific thermal
management actions because of this feature (e.g., throttling performance) because the Composite
Temperature:
a) is at or above the current value of the Thermal Management Temperature 1 field; and
b) is below the current value of the Thermal Management Temperature 2 field;
and the Composite Temperature decreases to a value below the current value of the Thermal Management
Temperature 1 field, then the controller should return to the active power state that the controller was in
prior to going to a lower power active power state or stop performing vendor specific thermal management
actions because of this feature, the Composite Temperature and the current value of the Thermal
Management Temperature 1 field.
If the controller is currently in a lower power active power state or performing vendor specific thermal
management actions because the Composite Temperature is at or above the current value of the Thermal
Management Temperature 2 field and the Composite Temperature decreases to below the current value
of the Thermal Management Temperature 1 field, then the controller should return to the active power state
that the controller was in prior to going to a lower power active power state or stop performing vendor
specific thermal management actions because of this feature, and the Composite Temperature.
The temperature at which the controller stops being in a lower power active power state or performing
vendor specific thermal management actions because of this feature is vendor specific (i.e., hysteresis is
vendor specific.)
Figure 264 shows examples of how the Composite Temperature may be effected by this feature.
233
NVM Express 1.3
TMT2
TMT1
No Thermal Management
Vendor
Specific
Note: Since the host controlled thermal management (HCTM) feature uses the Composite
Temperature, the actual interactions between a platform (e.g., tablet, or laptop) and two different device
implementations may vary even with the same Thermal Management Temperature 1 and Thermal
Management Temperature 2 temperature settings. The use of this feature requires validation between
those devices implementations and the platform in order to be used effectively.
234
NVM Express 1.3
allocation of Flexible Resources may be modified using the Virtualization Management command and the
change takes effect after a Controller Level Reset. A secondary controller only supports having Flexible
Resources assigned or removed when it is in the Offline state.
Private Resources are controller resources that are permanently assigned to a primary or secondary
controller. These resources are not supported by the Virtualization Management command.
The primary controller is allowed to have a mix of Private and Flexible Resources for a particular controller
resource type. If there is a mix, then the Private Resources occupy the lower contiguous range of resource
identifiers starting with 0. Secondary controllers shall have all Private or all Flexible Resources for a
particular resource type. Controller resources assigned to a secondary controller always occupy a
contiguous range of identifiers with no gaps, starting with 0. If a particular controller resource type is
supported as indicated in the Controller Resource Types field of the Primary Controller Capabilities
Structure, then all secondary controllers shall have that controller resource type assigned as a Flexible
Resource. Figure 265 shows the controller resource allocation model for a controller resource type that is
assignable as a Flexible Resource.
For each controller resource type supported, the Primary Controller Capabilities Structure defines:
The total number of Flexible Resources;
The total number of Private Resources for the primary controller;
The maximum number of Flexible Resources that may be allocated to the primary controller using
the Virtualization Management command;
The maximum number of Flexible Resources that may be assigned to a secondary controller using
the Virtualization Management command; and
235
NVM Express 1.3
236
NVM Express 1.3
A primary controller may be allocated VQ Resources using the Primary Controller Flexible Allocation action
of the Virtualization Management command. The VQ resources allocated take effect after a Controller Level
Reset and are persistent across power cycles and resets. The number of VQ Resources currently allocated
is discoverable in the Primary Controller Capabilities Structure. The number of VQ Resources currently
allocated may also be discovered using the Get Features command with the Number of Queues Feature
identifier (refer to section 5.21.1.7).
237
NVM Express 1.3
Offline: The secondary controller may not be used by a host. CSTS.CFS shall be set to ‘1’.
Controller registers other than CSTS are undefined in this state.
The host may request a transition to the Online or Offline state using the Virtualization Management
command. When a secondary controller transitions from the Online state to the Offline state all Flexible
Resources are removed from the secondary controller.
To ensure that the host accurately detects capabilities of the secondary controller, the host should complete
the following procedure to bring a secondary controller Online:
1. Use the Virtualization Management command to set the secondary controller to the Offline state.
2. Use the Virtualization Management command to assign VQ resources and VI resources.
3. Perform a Controller Level Reset. If the secondary controller is a VF, then this should be a VF
Function Level Reset.
4. Use the Virtualization Management command to set the secondary controller to the Online state.
If VI Resources are supported, then following this process ensures the MSI-X Table size indicated by
MSIXCAP.MXC.TS is updated to reflect the appropriate number of VI Resources before the transition to
the Online state.
A primary controller or secondary controller is enabled when CC.EN and CSTS.RDY are both set to ‘1’ for
that controller. A secondary controller may only be enabled when it is in the Online state. If the primary
controller associated with a secondary controller is disabled or undergoes a Controller Level Reset, then
the secondary controller shall transition to the Offline state implicitly.
Resources shall only be assigned to a secondary controller when it is in the Offline state. If the minimum
number of resources are not assigned to a secondary controller, then a request to transition to the Online
state shall fail for that secondary controller. For implementations that support SR-IOV, if VF Enable is
cleared to 0h or NumVFs specifies a value that does not enable the associated secondary controller then
the secondary controller shall transition to the Offline state implicitly.
238
NVM Express 1.3
While the controller registers of a controller that is a VF are accessible only if SR-IOV Control.VF MSE is
set to ‘1’, clearing VF MSE from ‘1’ to ‘0’ does not cause a reset of that controller. In this case, controller
registers are hidden, but their values are not reset.
239
NVM Express 1.3
Namespace
NVM Subsystem
A reservation requires an association between a host and a namespace. As shown in Figure 266, each
controller in a multi-path I/O and namespace sharing environment is associated with exactly one host. While
it is possible to construct systems where two or more hosts share a single controller, such usage is outside
the scope of this specification.
A host may be associated with multiple controllers. In Figure 266 host A is associated with two controllers
while hosts B and C are each associated with a single controller. A host registers a Host Identifier (Host
Identifier) with each controller with which it is associated using a Set Features command prior to performing
any operations associated with reservations. The Host Identifier allows the NVM subsystem to identify
controllers associated with the same host and preserve reservation properties across these controllers (i.e.,
a host issued command has the same reservation rights no matter which controller associated with the host
processes the command).
Support for reservations by a namespace or controller is optional. A namespace indicates support for
reservations by reporting a non-zero value in the Reservation Capabilities (RESCAP) field in the Identify
Namespace data structure. A controller indicates support for reservations through the Optional NVM
Command Support (ONCS) field in the Identify Controller data structure. If a host submits a command
associated with reservations (i.e., Reservation Report, Reservation Register, Reservation Acquire, and
Reservation Release) to a controller or a namespace that do not both support reservations, then the
command is aborted by the controller with status Invalid Command Opcode.
Controllers that make up an NVM subsystem shall all have the same support for reservations. Although
strongly encouraged, namespaces that make up an NVM subsystem are not all required to have the same
support for reservations. For example, some namespaces within a single controller may support
reservations while others do not, or the supported reservation types may differ among namespaces. If a
controller supports reservations, then the controller shall:
Indicate support for reservations by returning a '1' in bit 5 of the Optional NVM Command Support
(ONCS) field in the Identify Controller data structure;
Support the Reservation Report command, Reservation Register command, Reservation Acquire
command, and Reservation Release command;
Support the Reservation Notification log page;
Support the Reservation Log Page Available asynchronous events;
Support the Reservation Notification Mask Feature;
Support the Host Identifier Feature; and
240
NVM Express 1.3
8.8.2 Registering
Prior to establishing a reservation on a namespace, a host shall become a registrant of that namespace by
registering a reservation key. This reservation key may be used as a means of identifying the registrant
(host), authenticating the registrant, and preempting a failed or uncooperative registrant. The value of the
reservation key used by a host and the method used to select its value is outside the scope of this
specification.
Registering a reservation key with a namespace creates an association between a host and a namespace.
A host that is a registrant of a namespace may use any controller with which it is associated (i.e., that has
the same Host Identifier, refer to section 5.21.1.19) to access that namespace as a registrant. Thus, a host
need only register on a single controller in order to become a registrant of the namespace on all controllers
in the NVM subsystem that have access to the namespace and are associated with the host.
A host registers a reservation key by executing a Reservation Register command on the namespace with
the Reservation Register Action (RREGA) field set to 000b (i.e., Register Reservation Key) and supplying
a reservation key in the New Reservation Key (NRKEY) field.
A host that is a registrant of a namespace may register the same reservation key value multiple times with
the namespace on the same or different controllers. It is an error for a host that is already a registrant of a
namespace to register with the same namespace using a different registration key value (i.e., the command
is aborted with status Reservation Conflict). There are no restrictions on the reservation key value used by
hosts with different Host Identifiers. For example, multiple hosts may all register with the same reservation
key value.
A host that is a registrant of a namespace may replace its existing reservation key by executing a
Reservation Register command on the namespace with the RREGA field set to 010b (i.e., Replace
Reservation Key), supplying the current reservation key in the Current Reservation Key (CRKEY) field, and
the new reservation key in the NRKEY field. If the contents of the CRKEY field do not match the key
currently associated with the host, then the command is aborted with a status of Reservation Conflict. A
host may replace its reservation key without regard to its registration status or current reservation key value
241
NVM Express 1.3
by setting the Ignore Existing Key (IEKEY) bit to '1' in the Reservation Register command. Replacing a
reservation key has no effect on any reservation that may be held on the namespace.
Reservation Non-
Registrant
Holder Registrant
Reservation Type Read Write Read Write Read Write Reservation Holder Definition
Write Exclusive Y Y Y N Y N One Reservation Holder
Exclusive Access Y Y N N N N One Reservation Holder
Write Exclusive -
Y Y Y Y Y N One Reservation Holder
Registrants Only
Exclusive Access -
Y Y Y Y N N One Reservation Holder
Registrants Only
Write Exclusive - All All Registrants are Reservation
Y Y Y Y Y N
Registrants Holders
Exclusive Access - All All Registrants are Reservation
Y Y Y Y N N
Registrants Holders
The differences between these reservation types are: the type of access that is excluded (i.e., writes or all
accesses), whether registrants have the same access rights as the reservation holder, and whether
registrants are also considered to be reservation holders. These differences are summarized in Figure 267
and the specific behavior for each NVM Express command is shown in Figure 268.
Reservations and registrations persist across all Controller Level Resets and all NVM Subsystem Resets
except reset due to power loss. A reservation may be optionally configured to be retained across a reset
due to power loss using the Persist Through Power Loss State (PTPLS). A Persist Through Power Loss
State (PTPLS) is associated with each namespace that supports reservations and may be modified as a
side effect of a Reservation Register command or a Set Features command.
242
NVM Express 1.3
Registrant
Registrant
Registrant
Registrant
Registrant
Registrant
Registrant
Registrant
Non-
Non-
Non-
Non-
NVM Read Command Group:
Read
A A C C A A C A
Compare
Security Receive (Admin)
NVM Write Command Group:
Write
Write Uncorrectable
Write Zeroes
Dataset Management
Flush C C C C C A C A
Format NVM (Admin)
Namespace Attachment (Admin)
Namespace Management (Admin)
Sanitize (Admin)
Security Send (Admin)
Reservation Acquire - Acquire C C C C C C C C
Reservation Release
Reservation Acquire - Preempt C A C A C A C A
Reservation Acquire - Preempt and Abort
1 A A A A A A A A
All other commands
Key:
A definition: A=Allowed, command processed normally by the controller
C definition: C=Conflict, command aborted by the controller with status Reservation Conflict
Notes:
1. The behavior of a vendor specific command is vendor specific.
8.8.4 Unregistering
A host that is a registrant of a namespace may unregister with the namespace by executing a Reservation
Register command on the namespace with the RREGA field set to 001b (i.e., Unregister Reservation Key)
and supplying its current reservation key in the CRKEY field. If the contents of the CRKEY field do not
match the key currently associated with the host, then the command is aborted with a status of Reservation
Conflict. If the host is not a registrant, then the command is aborted with a status of Reservation Conflict.
Successful completion of an unregister operation causes the host to no longer be a registrant of that
namespace. A host may unregister without regard to its current reservation key value by setting the IEKEY
bit to '1' in the Reservation Register command.
Unregistering by a host may cause a reservation held by the host to be released. If a host is the last
remaining reservation holder (i.e., the reservation type is Write Exclusive - All Registrants or Exclusive
Access - All Registrants) or is the only reservation holder, then the reservation is released when the host
unregisters.
If a reservation is released and the type of the released reservation was Write Exclusive - Registrants Only
or Exclusive Access - Registrants Only, then a reservation released notification occurs on all controllers
associated with a registered host other than the host that issued the Reservation Register command.
243
NVM Express 1.3
Acquire Action (RACQA) field to 000b (Acquire), and supplying the current reservation key associated with
the host in the Current Reservation Key (CRKEY) field. The CRKEY value shall match that used by the
registrant to register with the namespace. If the key value does not match, then the command is aborted
with status Reservation Conflict. If the host is not a registrant, then the command is aborted with a status
of Reservation Conflict.
Only one reservation is allowed at a time on a namespace. If a registrant attempts to obtain a reservation
on a namespace that already has a reservation holder, then the command is aborted with status
Reservation Conflict. If a reservation holder attempts to obtain a reservation of a different type on a
namespace for which it already is the reservation holder, then the command is aborted with status
Reservation Conflict. It is not an error if a reservation holder attempts to obtain a reservation of the same
type on a namespace for which it already is the reservation holder. A reservation holder may preempt a
reservation to change the reservation type.
244
NVM Express 1.3
host that issued the command are unregistered, the reservation is released, and a new reservation is
created for the host of the type specified by the Reservation Type (RTYPE) field in the command. If the
PRKEY value is non-zero, then registrants whose reservation key matches the value of the PRKEY field
are unregistered. If the PRKEY value is non-zero and there are no registrants whose reservation key
matches the value of the PRKEY field, the controller should return an error of Reservation Conflict.
If there is no reservation held on the namespace, then execution of the command causes registrants whose
reservation key match the value of the PRKEY field to be unregistered.
A reservation holder may preempt itself using the above mechanism. When a host preempts itself the
following occurs as an atomic operation: registration of the host is maintained, the reservation is released,
and a new reservation is created for the host of the type specified by the RTYPE field.
A host may abort commands as a side effect of preempting a reservation by executing a Reservation
Acquire command and setting the RACQA field to 010b (Preempt and Abort). The behavior of such a
command is exactly the same as that described above with the RACQA field set to 001b (Preempt), with
two exceptions:
After the atomic operation changes namespace reservation and registration state, all controllers
associated with any host whose reservation or registration is preempted by that atomic operation
are requested to abort all commands being processed that target the namespace specified in the
Namespace Identifier field (CDW1.NSID of the Reservation Acquire command) (refer to section
4.11 for the definition of “being processed”); and
Completion of the Reservation Acquire command shall not occur until all commands that are
requested to be aborted are completed, regardless of whether or not each command is actually
aborted.
As with the Abort Admin command, abort as a side effect of preempting a reservation is best effort; as a
command that is requested to be aborted may currently be at a point in execution where it can no longer
be aborted or may have already completed, when a Reservation Acquire or Abort Admin command is
submitted. Although prompt execution of abort requests reduces delay in completing the Reservation
Acquire command, a command which is requested to be aborted shall either be aborted or otherwise
completed before the completion of the Reservation Acquire command.
When a registrant is unregistered as a result of actions described in this section, then a registration
preempted notification occurs on all controllers associated with a host that was unregistered other than the
host that issued the Reservation Acquire command.
When the type of reservation held on a namespace changes as a result of actions described in this section,
then a reservation released notification occurs on all controllers associated with hosts that remain
registrants of the namespace except the host that issued the Reservation Acquire command.
245
NVM Express 1.3
246
NVM Express 1.3
Each RPMB Data Frame is 256 bytes in size plus the size of the Data field, and is organized as shown in
Figure 273. RPMB uses a sector size of 512 bytes. The RPMB sector size is independent and not related
to the logical block size used for the namespace(s).
247
NVM Express 1.3
RPMB Frame
Requires
Response Message Types Description Length
Data
(bytes)
Returned as a result of the host requesting a Result
Authentication key
0100h read request Message Type after programming the No 256
programming response
Authentication Key
Returned as a result of the host requesting a Result
Reading of the Write
0200h read request Message Type after requesting the No 256
Counter value response
Write Counter value
Returned as a result of the host requesting a Result
Authenticated data write
0300h read request Message Type after attempting to write No 256
response
data to an RPMB target
Returned as a result of the host requesting a Result
Authenticated data read
0400h read request Message Type after attempting to read Yes M + 256
response
data from an RPMB target
Authenticated Device Returned as a result of the host requesting a Result
0600h Configuration data write read request Message Type after attempting to write No 256
response a Device Configuration Block to an RPMB target
Authenticated Device Returned as a result of the host requesting a Result
0700h Configuration data read read request Message Type after attempting to read Yes 512 + 256
response DCB from an RPMB target
The operation result defined in Figure 271 indicates whether an RPMB request was successful or not.
Value Description
00h Operation successful
01h General failure
02h Authentication failure (MAC comparison not matching, MAC calculation
failure)
03h Counter failure (counters not matching in comparison, counter
06:00 incrementing failure)
04h Address failure (address out of range, wrong address alignment)
05h Write failure (data/counter/result write failure)
06h Read failure (data/counter/result read failure)
07h Authentication Key not yet programmed. This value is the only valid
Result value until the Authentication Key has been programmed. Once
the key is programmed, this Result value shall no longer be used.
08h Invalid RPMB Device Configuration Block – this may be used when the
target is not 0.
Figure 272 defines the non-volatile contents stored within the controller for each RPMB target.
248
NVM Express 1.3
Each RPMB Data Frame is 256 bytes in size plus the size of the Data field, and is organized as shown in
Figure 273. RPMB uses a sector size of 512 bytes. The RPMB sector size is independent and not related
to the logical block size used for the namespace(s).
249
NVM Express 1.3
Security Send and Security Receive commands are used to encapsulate and deliver data packets of any
security protocol between the host and controller without interpreting, dis-assembling or re-assembling the
data packets for delivery. Security Send and Security Receive commands used for RPMB access are
populated with the RPMB Data Frame(s) defined in Figure 273. The controller shall not return successful
completion of a Security Send or Security Receive command for RPMB access until the requested RPMB
Request/Response Message Type indicated is completed. The Security Protocol used for RPMB is defined
in section 5.25.3.
250
NVM Express 1.3
The host sends a Response Message Type to the controller to read the result of a previous operation
request, to read the Write Counter, or to read data from the RPMB memory block. To deliver a Response
Message Type, the host uses the Security Receive command. If the data to be read from the controller is
more than reported in Identify Controller data structure, the host sends multiple Security Receive
commands to transfer the entire data.
251
NVM Express 1.3
252
NVM Express 1.3
If the write fails then the returned result is 0005h (write failure). If another error occurs during the write
procedure then the returned result is 0001h (general failure).
The controller returns a successful completion for the Security Send command when the Authenticated
Data Write operation is completed regardless of whether the Authenticated Data Write was successful or
not.
The success of programming the data should be checked by the host by reading the result register of the
RPMB.
1) The host initiates the Authenticated Data Write verification process by issuing a Security Send
command with delivery of a RPMB data frame containing the Request Message Type = 0005h.
2) The controller returns a successful completion of the Security Send command when the
verification result is ready for retrieval.
3) The host should then retrieve the verification result by issuing a Security Receive command.
4) The controller returns a successful completion of the Security Receive command and returns the
RPMB data frame containing the Response Message Type = 0300h, the incremented counter
value, the data address, the MAC and result of the data programming operation.
253
NVM Express 1.3
254
NVM Express 1.3
in the request, the Address, the Data, the controller calculated MAC, and the Result. Note: It is the
responsibility of the host to verify the MAC returned on an Authenticated Data Read Request.
If the data transfer from the addressed location in the controller fails, the returned Result is 0006h (read
failure). If the Address provided in the Security Send command is not valid, then the returned Result is
0004h (address failure). If another error occurs during the read procedure then the returned Result is 0001h
(general failure).
255
NVM Express 1.3
If the Data from the RPMB Device Configuration Block attempts to disable Boot Partition Protection, then
the controller sets the result to 08h (Invalid RPMB Device Configuration Block) and no data is written to the
RPMB Device Configuration Block.
If the MAC in the request and the calculated MAC are equal then the write request is authenticated. The
Data from the request is written to the RPMB Device Configuration Block.
If any other error occurs during the write procedure then the returned result is 0001h (general failure).
The controller returns a successful completion for the Security Send command when the Authenticated
Data Write operation is completed regardless of whether the Authenticated Device Configuration Block
Write was successful or not.
When the host receives a successful completion of the Security Send command from the controller, it
should send a Security Receive command to the controller to retrieve the data. The controller returns an
RPMB Data Frame with Response Message Type (0600h), the MAC, and the Result. All other fields are
cleared to 0h.
The Write Counter for the Device Configuration Block is independent of the Write Counter for RPMB target
0. Authenticated Device Configuration Block Writes do not affect the Write Counter for RPMB target 0 since
the data is not part of the RPMB data area. The current value of the Write Counter for the Device
Configuration Block may be read using an Authenticated Device Configuration Block Read (refer to section
8.10.4).
256
NVM Express 1.3
257
NVM Express 1.3
A device self-test operation is performed in the background allowing concurrent processing of some
commands and requiring suspension of the device self-test operation to process other commands. Which
commands may be processed concurrently versus require suspension of the device self-test operation is
vendor specific.
If the controller receives any command that requires suspension of the device self-test operation to
process and complete, then the controller shall:
1) suspend the device self-test operation,
2) process and complete that command, and
3) resume the device self-test operation.
During a device self-test operation, the performance of the NVM subsystem may be degraded (e.g.,
controllers not performing the device self-test operation may also experience degraded performance.)
258
NVM Express 1.3
259
NVM Express 1.3
An extended device self-test operation shall persist across any Controller Level Reset, and shall resume
after completion of the reset or any restoration of power, if any. The segment where the extended device
self-test operation resumes is vendor specific, but implementations should only have to perform tests
again within the last segment that was being tested prior to the reset.
260
NVM Express 1.3
261
NVM Express 1.3
To read the contents of a Boot Partition, the host allocates a Boot Partition Memory Buffer in host memory
for the controller to copy contents from a Boot Partition. The host initializes the Boot Partition Memory
Buffer Base Address. The host sets the Boot Partition ID, Boot Partition Read Size, and Boot Partition
Read Offset to initiate the Boot Partition read operation. The host may continue reading from the Boot
Partition until the entire Boot Partition has been read.
A portion of the Boot Partition may be read by the host any time the NVM subsystem is powered (i.e.,
whether or not CC.EN is set to ‘1’). The host shall not modify the PCI Express registers (described in
section 2), reset, or shutdown the controller while a Boot Partition read is in progress.
To read data from a Boot Partition, the host follows these steps:
1. Initialize the transport (e.g., PCIe link), if necessary.
2. Determine if Boot Partitions are supported by the controller (CAP.BPS),
3. Determine which Boot Partition is active (BPINFO.ABPID) and the size of the Boot Partition
(BPINFO.BPSZ).
4. Allocate a physically contiguous memory buffer in the host to store the contents of a Boot Partition.
5. Initialize the address (BPMBL.BMBBA) into the memory buffer where the contents should be
copied.
6. Initiate the transfer of data from a Boot Partition by writing to the Boot Partition Read Select
(BPRSEL) register. This includes setting the Boot Partition identifier (BPRSEL.BPID), size of Boot
Partition Read Size (BPRSEL.BPRSZ) and Boot Partition Read Offset (BPRSEL.BPROF). The
controller sets the Boot Read Status field (BPINFO.BRS) while transferring the Boot Partition
contents to indicate a Boot Partition read operation is in progress.
7. Wait for the controller to completely transfer the requested portion of the Boot Partition, indicated
in the status field (BPINFO.BRS). If BPINFO.BRS is set to 2h, the requested Boot Partition data
has been transferred to the Boot Partition Memory Buffer. If BPINFO.BRS is set to 3h, there was
an error transferring the requested Boot Partition data and the host may request the Boot Partition
data again.
In constrained memory environments, the host may read the contents of a Boot Partition with a small Boot
Partition Memory Buffer by reading a small portion of a Boot Partition, moving the data out of the Boot
Memory Buffer to another memory location, and then reading another portion of the Boot Partition until the
entire Boot Partition has been read.
262
NVM Express 1.3
If an internal error, reset, or power loss condition occurs while committing the downloaded image to a Boot
Partition, the contents of the Boot Partition may contain the old contents, new contents, or a mixture of both.
Host software should verify the contents of a Boot Partition before marking it active to ensure the active
Boot Partition is stable.
Host software should not read the contents of a Boot Partition while writing to the Boot Partition. The
controller may return a combination of new and old data if the host attempts to perform a Boot Partition
read operation while overwriting the contents.
The default state for all Boot Partitions is the “Unlocked” state. In this state, host software may read and
write a Boot Partition.
All Boot Partitions remain unlocked until Boot Partition Protection is enabled by host software. Host software
enables Boot Partition Protection by setting the Boot Partition Protection Enable bit in the RPMB Device
Configuration Block data structure (refer to section 8.10). Once Boot Partition Protection is enabled, the
controller shall reject Authenticated Device Configuration Block Writes that disable Boot Partition Protection
(i.e., enabling Boot Partition Protection is permanent). Once Boot Partition Protection is enabled, Boot
Partitions may only be modified after unlocking the Boot Partition using RPMB.
After activating Boot Partition Protection, the default state for all Boot Partitions is the “Locked” state. In
this state, host software may read a Boot Partition. In this state, the controller rejects attempts to write to
a Boot Partition using the Firmware Commit command.
Each Boot Partition may be locked or unlocked independently using the corresponding bit in the Device
Configuration Block data structure.
263
NVM Express 1.3
264
NVM Express 1.3
set to ‘1’. The controller should complete the command quickly (e.g., in less than one second) to avoid a
user rebooting the system prior to completion of the data collection.
The controller notifies the host to collect controller-initiated data through the completion of an Asynchronous
Event Request command with an Asynchronous Event Type of Notice that indicates a Telemetry Log
Changed event. The host may also determine controller-initiated data is available via the Telemetry
Controller-Initiated Data Available field in the Telemetry Host-Initiated or the Telemetry Controller-Initiated
log pages. The host proceeds with a controller-initiated data collection by submitting the Get Log Page
command for the Telemetry Controller-Initiated log page. Once the host has started reading the Telemetry
Controller-Initiated log page, the controller should avoid modifying the controller-initiated data until the host
has finished reading all controller-initiated data.
Since there is only one set of controller-initiated data, the controller is responsible for prioritizing the version
of the controller-initiated data that is available for the host to collect. When the controller replaces the
controller-initiated data with new controller-initiated data it shall increment the Telemetry Controller-Initiated
Data Generation Number field. The host needs to ensure that the Telemetry Controller-Initiated Data
Generation Number field has not changed between the start and completion of the controller-initiated data
collection to ensure the data captured is consistent.
265
NVM Express 1.3
When only the second data areas is populated, then the Telemetry Host-Initiated log page has no data in
Telemetry Data Area 1 shown by having its corresponding last block value cleared to 0h, and no additional
data in Telemetry Data Area 3 shown by having its corresponding last block value set to the same value as
the last block value for Telemetry Data Area 2. As an example, the following values correspond to the layout
shown in Figure 284:
Telemetry Host-Initiated Data Area 1 Last Block = 0,
Telemetry Host-Initiated Data Area 2 Last Block = 1000,
Telemetry Host-Initiated Data Area 3 Last Block = 1000.
266
NVM Express 1.3
The scope of a sanitize operation is all locations in the NVM subsystem that are able to contain user data,
including caches and unallocated or deallocated areas of the media. Sanitize operations do not affect the
Replay Protected Memory Block, boot partitions, or other media and caches that do not contain user data.
A sanitize operation also may alter log pages as necessary (e.g., to prevent derivation of user data from
log page information). Once a sanitize operation is started, it cannot be aborted and continues after a
Controller Level Reset including across power cycles.
The Sanitize command (refer to section 5.24) is used to start a sanitize operation or to recover from a
previously failed sanitize operation. All sanitize operations are performed in the background (i.e.,
completion of the Sanitize command does not indicate completion of the sanitize operation). The
completion of a sanitize operation is indicated in the Sanitize Status log page, and with the Sanitize
Operation Completed asynchronous event (if an Asynchronous Event Request Command is outstanding).
267
NVM Express 1.3
The Sanitize Capabilities field of the Identify Controller data structure indicates the sanitize operation
types supported.
The Overwrite sanitize operation is media specific and may not be appropriate for all media types. For
example, if the media is NAND, multiple pass overwrite operations may have an adverse effect on media
endurance.
To start a sanitize operation, the host submits a Sanitize command specifying one of the sanitize
operation types (i.e., Block Erase, Overwrite, or Crypto Erase). The host sets command parameters,
including the Allow Unrestricted Sanitize Exit bit and the No Deallocate After Sanitize bit, to the desired
values. After validating the Sanitize command parameters, the controller starts the sanitize operation in
the background, updates the Sanitize Status log page and then completes the Sanitize command with
Successful Completion status. If a Sanitize command is completed with any status other than Successful
Completion, then the controller shall not start the sanitize operation and shall not update the Sanitize
Status log page. The controller ignores Critical Warning(s) in the SMART / Health Information log page
(e.g., read only mode) and attempts to complete the sanitize operation requested. While a sanitize
operation is in progress, all controllers shall abort any commands not listed in Figure 287 with a status of
Sanitize In Progress (refer to section 8.15.1).
The user data values that result from a successful sanitize operation are specified in Figure 286. If the
controller deallocates user data after successful completion of a sanitize operation, then values read from
deallocated logical blocks are described in section 6.7.1.1. The host may specify that sanitized logical
blocks not be deallocated by setting the No Deallocate After Sanitize bit to ‘1’ in the Sanitize command.
268
NVM Express 1.3
The Sanitize Status log page (refer to section 5.14.1.9.2) contains estimated times for sanitize operations
and a consistent snapshot of information about the most recently started sanitize operation, including
whether a sanitize operation is in progress, the sanitize operation parameters and the status of the most
recent sanitize operation. If a sanitize operation is not in progress, then the Global Data Erased bit in the
log page indicates whether the NVM subsystem may contain any user data (i.e., has not been written to
since the most recent successful sanitize operation).
The Sanitize Status log page should be updated periodically during a sanitize operation to make progress
information available to hosts.
During a sanitize operation, the host may periodically examine the Sanitize Status log page to check for
progress, however, the host should limit this polling (e.g., to at most once every several minutes) to avoid
interfering with the progress of the sanitize operation itself.
The host should read the Sanitize Status log page upon completion of a sanitize operation (which clears
the asynchronous event, if one was generated).
If a sanitize operation fails, all controllers in the NVM subsystem shall abort any command not allowed
during a sanitize operation with a status of Sanitize Failed (refer to section 8.15.1) until a subsequent
sanitize operation is started or successful recovery from the failed sanitize operation occurs. A subsequent
successful sanitize operation or the Exit Failure Mode action may be used to recover from a failed sanitize
operation. Refer to section 5.24 for recovery details.
If the Sanitize command is supported, then the NVM subsystem and all controllers shall:
Support the Sanitize Status log page;
Support the Sanitize Operation Completed asynchronous event and enable the event by default;
Support the Exit Failure Mode action for a Sanitize command;
269
NVM Express 1.3
Support at least one of the following sanitize operation types: Block Erase, Overwrite, or Crypto
Erase; and
Indicate support for all supported sanitize operation types in the Sanitize Capabilities field in the
Identify Controller data structure.
While a failed sanitize operation has occurred, a subsequent sanitize operation has not started and
successful recovery from the failed sanitize operation has not occurred:
All controllers in the NVM subsystem shall only process the Sanitize command (refer to section
5.24) and the Admin commands listed in Figure 287 subject to the additional restrictions noted in
that figure;
All I/O Commands are shall be aborted with a status of Sanitize Failed;
The Sanitize command is permitted with action restrictions (refer to section 5.24); and
Aside from the Sanitize command, any other command or command option that is not explicitly
permitted in Figure 287 shall be aborted with a status of Sanitize Failed if fetched by any
controller in the NVM subsystem.
270
NVM Express 1.3
271
NVM Express 1.3
9 Directives
Directives is a mechanism to enable host and NVM subsystem or controller information exchange. The
Directive Receive command is used to transfer data related to a specific Directive Type from the controller
to the host. The Directive Send command is used to transfer data related to a specific Directive Type from
the host to the controller. Other commands may include a Directive Specific value specific for a given
Directive Type (e.g., the Write command in the NVM command set).
Support for Directives is optional and is indicated in the Optional Admin Command Support (OACS) field in
the Identify Controller data structure.
If a controller supports Directives, then the controller shall:
Indicate support for Directives in the Optional Admin Command Support (OACS) field in the
Identify Controller data structure;
Support the Directive Receive command;
Support the Directive Send command; and
Support the Identify Directive (i.e., Type 00h).
The Directive Types that may be supported by a controller (refer to Figure 288) are the Identify Directive
(refer to section 9.2), and the Streams Directive (refer to section 9.3). The Directive Specific field and
Directive Operation field are dependent on the Directive Type specified in the command (e.g., Directive
Send, Directive Receive, or I/O command).
If a Directive is not supported or is supported and disabled, then all Directive Send commands and Directive
Receive commands with that Directive Type shall be aborted with a status of Invalid Field in Command.
Support for a specific directive type is indicated using the Return Parameters operation of the Identify
Directive. A specific directive may be enabled or disabled using the Enable operation of the Identify
Directive. Before using a specific directive, the host should determine if that directive is supported and
should enable that directive using the Identify Directive.
272
NVM Express 1.3
In an I/O command:
if no I/O Command Directive is enabled or the DTYPE field is cleared to 00h, then the DTYPE
field and the DSPEC field are ignored; and
if one or more I/O Command Directives is enabled and the DTYPE field is set to a value that is
not supported or not enabled, then the controller shall abort the command with a status of Invalid
Field in Command.
For the Streams Directive (i.e., DTYPE field set to 01h), if the DSPEC field is cleared to 0000h in a Write
command, then that Write command shall be processed as a normal write operation (i.e., as if DTYPE field
is cleared to 00h).
273
NVM Express 1.3
274
NVM Express 1.3
If the host issues a Dataset Management command to deallocate logical blocks that are associated with a
stream, it should specify a starting LBA and length that is aligned to and in multiples of the Stream
Granularity Size. This provides optimal performance and endurance of the media.
Stream resources are the resources in the NVM subsystem that are necessary to track operations
associated with a specified stream identifier. There are a maximum number of stream resources that are
available in an NVM subsystem as indicated by the Max Stream Limit (MSL) field in the Return Parameters
data structure. Stream resources may be allocated for the exclusive use of a specified namespace
associated with a particular Host Identifier using the Allocate Resources operation. Stream resources that
are not allocated for the exclusive use of any namespace are available NVM subsystem stream resources
as reported in NVM Subsystem Streams Available (NSSA) and may be used by any namespace that has
the Streams Directive enabled and has not been allocated exclusive stream resources in response to an
Allocate Resources operation. As stream resources are allocated for the exclusive use of a specified
namespace, the available NVM subsystem stream resources reported in the NSSA field are reduced.
275
NVM Express 1.3
The Directive operations that shall be supported if the Streams Directive is supported are listed in Figure
294. The Directive Specific field in a command is referred to as the stream identifier when the Directive
Type field is set to the Streams Directive.
Stream identifiers are assigned by the host and may be in the range 0001h to FFFFh. The host may specify
a sparse set of stream identifiers (i.e., there is no requirement for the host to use Stream Identifiers in any
particular order).
The host may be accessing a namespace through multiple controllers in the NVM subsystem. The
controllers in an NVM subsystem distinguish if the stream identifier has the same meaning for a particular
namespace by the Host Identifier. If more than one Host Identifier has the same non-zero value, then that
value represents a single host that is accessing the namespace through multiple controllers and the stream
identifier is used across controllers to access the same stream on the namespace. If a Host Identifier is
zero or has a unique value, then that value represents a unique host that is accessing the namespace and
the stream identifier does not have the same meaning for a particular namespace.
The controller(s) recognized by the NVM subsystem as being associated with a specific host and attached
to a specific namespace either:
utilizes a number of stream resources allocated for exclusive use of that namespace as returned
in response to an Allocate Resources operation; or
utilizes resources from the NVM subsystem stream resources.
The value of Namespace Streams Allocated (NSA) indicates how many resources for individual stream
identifiers have been allocated for exclusive use of the specified namespace by the associated controllers.
This indicates the maximum number of stream identifiers that may be open at any given time in the specified
namespace by the associated controllers. To request a different number of resources than are currently
allocated for exclusive use by the associated controllers of a specific namespace, all currently allocated
resources are first required to be released using the Release Resources operation. There is no mechanism
to incrementally increase or decrease the number of allocated resources for a given namespace.
Streams are opened by the controller when the host issues a write command that specifies a stream
identifier that is not currently open. While a stream is open the controller maintains context for that stream
(e.g., buffers for associated data). The host may determine the streams that are open using the Get Status
operation.
For a namespace that has a non-zero value of Namespace Streams Allocated (NSA), if the host submits a
write command specifying a stream identifier not currently in use and stream resources are exhausted, then
an arbitrary stream identifier for that namespace is released by the controller to free the stream resources
associated with that stream identifier for the new stream. The host may ensure the number of open streams
does not exceed the allocated stream resources for the namespace by explicitly releasing stream identifiers
as necessary using the Release Identifier operation.
For a namespace that has zero namespace streams allocated, if the host submits a write command
specifying a stream identifier not currently in use and:
276
NVM Express 1.3
NVM subsystem streams available are exhausted, then an arbitrary stream identifier for an
arbitrary namespace that is using NVM subsystem stream resources is released by the NVM
subsystem to free the stream resources associated with that stream identifier for the new stream;
or
all NVM subsystem stream resources have been allocated for exclusive use of specific
namespaces, then the write command is treated as a normal write command that does not
specify a stream identifier.
The host determines parameters associated with stream resources using the Return Parameters operation.
The host may get a list of open stream identifiers using the Get Status operation.
If the Streams Directive becomes disabled, then all stream resources and stream identifiers are released
for the affected namespace. If the host issues a Format NVM command, or deletes a namespace, then all
stream identifiers for all open streams for affected namespaces are released.
Streams Directive defines the command specific status values specified in Figure 295.
Value Description
Stream Resource Allocation Failed: The controller was not able to allocate stream resources
7Fh for exclusive use of the specified namespace and no NVM subsystem stream resources are
available.
277
NVM Express 1.3
278
NVM Express 1.3
279
NVM Express 1.3
future operation then it is referring to a different stream. If the specified identifier does not correspond to an
open stream for the specified namespace, then the command completes successfully. If there are stream
resources allocated for the specified namespace, then the stream resources remain allocated for this
namespace, and may be re-used in a subsequent write command. If there are no stream resources
allocated for the specified namespace, then the stream resources are returned to the NVM subsystem
stream resources for future use by a namespace without allocated stream resources. If an NSID value of
FFFFFFFFh is specified, then the controller shall abort the command with a status of Invalid Field in
Command.
No data transfer occurs.
280
NVM Express 1.3
281
NVM Express 1.3
software shall ignore any data transfer associated with the command. The host may choose to re-submit
the command or indicate an error to the higher level software.
282