Blackfin® Processor Instruction Set
Blackfin® Processor Instruction Set
Blackfin® Processor Instruction Set
Disclaimer
Analog Devices, Inc. reserves the right to change this product without
prior notice. Information furnished by Analog Devices is believed to be
accurate and reliable. However, no responsibility is assumed by Analog
Devices for its use; nor for any infringement of patents or other rights of
third parties which may result from its use. No license is granted by impli-
cation or otherwise under the patent rights of Analog Devices, Inc.
PREFACE
Purpose of This Manual ................................................................ xiii
Intended Audience ........................................................................ xiii
Manual Contents ........................................................................... xiv
What’s New in This Manual ........................................................... xiv
Technical or Customer Support ....................................................... xv
Supported Processors ...................................................................... xvi
Product Information ..................................................................... xvii
MyAnalog.com ........................................................................ xvii
Processor Product Information ................................................. xvii
Related Documents ................................................................ xviii
Online Technical Documentation ............................................. xix
Accessing Documentation From VisualDSP++ ...................... xix
Accessing Documentation From Windows ............................. xx
Accessing Documentation From the Web ............................... xx
Printed Manuals ....................................................................... xxi
VisualDSP++ Documentation Set ......................................... xxi
Hardware Tools Manuals ...................................................... xxi
Processor Manuals ................................................................ xxi
Data Sheets .......................................................................... xxi
Conventions ................................................................................. xxii
INTRODUCTION
Manual Organization .................................................................... 1-1
LOAD / STORE
Instruction Overview .................................................................... 3-2
Load Immediate ............................................................................ 3-3
Load Pointer Register .................................................................... 3-7
Load Data Register ...................................................................... 3-10
Load Half-Word – Zero-Extended ............................................... 3-15
Load Half-Word – Sign-Extended ................................................ 3-19
Load High Data Register Half ..................................................... 3-23
Load Low Data Register Half ...................................................... 3-27
Load Byte – Zero-Extended ......................................................... 3-31
Load Byte – Sign-Extended ......................................................... 3-34
Store Pointer Register .................................................................. 3-37
Store Data Register ..................................................................... 3-40
Store High Data Register Half ..................................................... 3-45
Store Low Data Register Half ...................................................... 3-49
Store Byte ................................................................................... 3-54
MOVE
Instruction Overview .................................................................... 4-1
Move Register ............................................................................... 4-2
Move Conditional ......................................................................... 4-8
Move Half to Full Word – Zero-Extended ................................... 4-10
Move Half to Full Word – Sign-Extended .................................... 4-13
Move Register Half ..................................................................... 4-15
STACK CONTROL
Instruction Overview .................................................................... 5-1
--SP (Push) ................................................................................... 5-2
--SP (Push Multiple) ..................................................................... 5-5
SP++ (Pop) ................................................................................... 5-8
SP++ (Pop Multiple) ................................................................... 5-12
LINK, UNLINK ........................................................................ 5-17
LOGICAL OPERATIONS
Instruction Overview .................................................................... 7-1
& (AND) ..................................................................................... 7-2
~ (NOT One’s Complement) ........................................................ 7-4
| (OR) .......................................................................................... 7-6
^ (Exclusive-OR) .......................................................................... 7-8
BXORSHIFT, BXOR ................................................................. 7-10
BIT OPERATIONS
Instruction Overview .................................................................... 8-1
BITCLR ....................................................................................... 8-2
BITSET ........................................................................................ 8-4
BITTGL ....................................................................................... 8-6
BITTST ....................................................................................... 8-8
DEPOSIT .................................................................................. 8-10
EXTRACT ................................................................................. 8-16
BITMUX .................................................................................... 8-21
ONES (One’s Population Count) ................................................ 8-26
SHIFT/ROTATE OPERATIONS
Instruction Overview .................................................................... 9-1
Add with Shift .............................................................................. 9-2
Shift with Add .............................................................................. 9-5
Arithmetic Shift ............................................................................ 9-7
Logical Shift ............................................................................... 9-14
ROT (Rotate) ............................................................................. 9-21
ARITHMETIC OPERATIONS
Instruction Overview .................................................................. 10-2
ABS ............................................................................................ 10-3
Add ............................................................................................ 10-6
Add/Subtract – Prescale Down .................................................. 10-10
Add/Subtract – Prescale Up ....................................................... 10-13
CACHE CONTROL
Instruction Overview .................................................................. 12-1
PREFETCH ............................................................................... 12-2
FLUSH ....................................................................................... 12-4
FLUSHINV ................................................................................ 12-6
IFLUSH ..................................................................................... 12-8
VECTOR OPERATIONS
Instruction Overview .................................................................. 14-2
Add on Sign ............................................................................... 14-3
VIT_MAX (Compare-Select) ...................................................... 14-9
Vector ABS ............................................................................... 14-16
Vector Add / Subtract ............................................................... 14-19
Vector Arithmetic Shift ............................................................. 14-25
Vector Logical Shift .................................................................. 14-30
Vector MAX ............................................................................. 14-34
Vector MIN .............................................................................. 14-37
Vector Multiply ........................................................................ 14-40
Vector Multiply and Multiply-Accumulate ................................ 14-43
Vector Negate (Two’s Complement) .......................................... 14-48
Vector PACK ............................................................................ 14-50
Vector SEARCH ....................................................................... 14-52
ADSP-BF535 FLAGS
INDEX
PREFACE
Thank you for purchasing and developing systems using Blackfin® pro-
cessors from Analog Devices.
Intended Audience
The primary audience for this manual is a programmer who is familiar
with Analog Devices processors. This manual assumes that the audience
has a working knowledge of the appropriate processor architecture and
instruction set. Programmers who are unfamiliar with Analog Devices
processors can use this manual, but should supplement it with other texts
(such as the appropriate hardware reference manuals and data sheets) that
describe your target architecture.
Manual Contents
The manual consists of:
• Chapter 1, Introduction
Provides a general description of the instruction syntax and nota-
tion conventions.
• Chapters 2–14, Program Flow Control, Load / Store, Move, Stack
Control, Control Code Bit Management, Logical Operations, Bit
Operations, Shift/Rotate Operations, Arithmetic Operations,
External Event Management, Cache Control, Video Pixel Opera-
tions, and Vector Operations
Provide descriptions of assembly language instructions and describe
their execution.
• Chapter 15, Issuing Parallel Instructions
Provides a description of parallel instruction operations and shows
how to use parallel instruction syntax.
• Appendix A, ADSP-BF535 Flags
Provides a description of the status flag bits for the ADSP-BF535
processor only.
Supported Processors
The following is the list of Analog Devices, Inc. processors supported in
VisualDSP++®.
TigerSHARC (ADSP-TSxxx) Processors
The name “TigerSHARC” refers to a family of floating-point and
fixed-point [8-bit, 16-bit, and 32-bit] processors. VisualDSP++ currently
supports the following TigerSHARC processors:
ADSP-TS101 , ADSP-TS201, ADSP-TS202, and ADSP-TS203
SHARC (ADSP-21xxx) Processors
The name “SHARC” refers to a family of high-performance, 32-bit,
floating-point processors that can be used in speech, sound, graphics, and
imaging applications. VisualDSP++ currently supports the following
SHARC processors:
ADSP-21020, ADSP-21060, ADSP-21061, ADSP-21062,
ADSP-21065L, ADSP-21160, ADSP-21161, ADSP-21261,
ADSP-21262, ADSP-21266, ADSP-21267, ADSP-21363, ADSP-21364,
and ADSP-21365
Blackfin (ADSP-BFxxx) Processors
The name “Blackfin” refers to a family of 16-bit, embedded processors.
VisualDSP++ currently supports the following Blackfin processors:
ADSP-BF531, ADSP-BF532 (formerly ADSP-21532), ADSP-BF533,
ADSP-BF535 (formerly ADSP-21535), ADSP-BF561, AD6532, and
AD90747
Product Information
You can obtain product information from the Analog Devices Web site,
from the product CD-ROM, or from the printed publications (manuals).
Analog Devices is online at www.analog.com. Our Web site provides infor-
mation about a broad range of products—analog integrated circuits,
amplifiers, converters, and digital signal processors.
MyAnalog.com
MyAnalog.com is a free feature of the Analog Devices Web site that allows
customization of a Web page to display only the latest information on
products you are interested in. You can also choose to receive weekly
e-mail notifications containing updates to the Web pages that meet your
interests. MyAnalog.com provides access to books, application notes, data
sheets, code examples, and more.
Registration
Visit www.myanalog.com to sign up. Click Register to use MyAnalog.com.
Registration takes about five minutes and serves as a means to select the
information you want to receive.
If you are already a registered user, just log on. Your user name is your
e-mail address.
You may also obtain additional information about Analog Devices and its
products in any of the following ways.
• E-mail questions or requests for information to
dsp.support@analog.com
Related Documents
For information on product related development software and Analog
Devices processors, see these publications:
• VisualDSP++ User's Guide for Blackfin Processors
• VisualDSP++ C/C++ Compiler and Library Manual for Blackfin
Processors
• VisualDSP++ Assembler and Preprocessor Manual for Blackfin
Processors
• VisualDSP++ Linker and Utilities Manual for Blackfin Processors
• VisualDSP++ Kernel (VDK) User's Guide
Visit the Technical Library Web site to access all processor and tools
manuals and data sheets:
http://www.analog.com/processors/resources/technicalLibrary
File Description
.HTM or Dinkum Abridged C++ library and FlexLM network license manager software doc-
.HTML umentation. Viewing and printing the .HTML files requires a browser, such as
Internet Explorer 4.0 (or higher).
.PDF VisualDSP++ and processor manuals in Portable Documentation Format (PDF).
Viewing and printing the .PDF files requires a PDF reader, such as Adobe Acrobat
Reader (4.0 or higher).
Select a processor family and book title. Download archive (.ZIP) files, one
for each manual. Use any archive management software, such as WinZip,
to decompress downloaded files.
Printed Manuals
For general questions regarding literature ordering, call the Literature
Center at 1-800-ANALOGD (1-800-262-5643) and follow the prompts.
Processor Manuals
Hardware reference and instruction set reference manuals may be ordered
through the Literature Center at 1-800-ANALOGD (1-800-262-5643),
or downloaded from the Analog Devices Web site. Manuals may be
ordered by title or by product number located on the back cover of each
manual.
Data Sheets
All data sheets (preliminary and production) may be downloaded from the
Analog Devices Web site. Only production (final) data sheets (Rev. 0, A,
B, C, and so on) can be obtained from the Literature Center at
1-800-ANALOGD (1-800-262-5643); they also can be downloaded from
the Web site.
To have a data sheet faxed to you, call the Analog Devices Faxback System
at 1-800-446-6212. Follow the prompts and a list of data sheet code
numbers will be faxed to you. If the data sheet you want is not listed,
check for it on the Web site.
Conventions
Text conventions used in this manual are identified and described as
follows.
Example Description
Close command Titles in reference sections indicate the location of an item within the
(File menu) VisualDSP++ environment’s menu system. For example, the Close
command appears on the File menu.
.SECTION Commands, directives, keywords, and feature names are in text with
letter gothic font.
Manual Organization
The instructions are grouped according to their functions. Within group-
ings, the instructions are generally arranged alphabetically unless a
functional relationship makes another order clearer for the programmer.
One such example of nonalphabetic ordering is the Load/Store chapter
where the Load Pointer Register appears before a pile of seven Load Data
Register derivations. The instructions are listed at the beginning of each
chapter in the order they appear.
The instruction groups, or chapters, are arranged according to complexity,
beginning with the basic Program Flow Control and Load/Store chapters
and progressing to Video Pixel Operations and Vector Operations.
Syntax Conventions
The Blackfin processor instruction set supports several syntactic conven-
tions that appear throughout this document. Those conventions are given
below.
Case Sensitivity
The instruction syntax is case insensitive. Upper and lower case letters can
be used and intermixed arbitrarily.
The assembler treats register names and instruction keywords in a
case-insensitive manner. User identifiers are case sensitive. Thus, R3.l,
R3.L, r3.l, r3.L are all valid, equivalent input to the assembler.
Free Format
Assembler input is free format, and may appear anywhere on the line. One
instruction may extend across multiple lines, or more than one instruction
may appear on the same line. White space (space, tab, comments, or new-
line) may appear anywhere between tokens. A token must not have
embedded spaces. Tokens include numbers, register names, keywords,
user identifiers, and also some multicharacter special symbols like “+=”,
“/*”, or “||”.
Instruction Delimiting
A semicolon must terminate every instruction. Several instructions can be
placed together on a single line at the programmer’s discretion, provided
each instruction ends with a semicolon.
Comments
The assembler supports various kinds of comments, including the
following.
• End of line: A double forward slash token (“//”) indicates the
beginning of a comment that concludes at the next newline
character.
• General comment: A general comment begins with the token “/*”
and ends with “*/”. It may contain any characters and extend over
multiple lines.
Comments are not recursive; if the assembler sees a “/*” within a general
comment, it issues an assembler warning. A comment functions as white
space.
Notation Conventions
This manual and the assembler use the following conventions.
• Register names are alphabetical, followed by a number in cases
where there are more than one register in a logical group. Thus,
examples include ASTAT, FP, R3, and M2.
• Register names are reserved and may not be used as program
identifiers.
• Some operations (such as “Move Register”) require a register pair.
Register pairs are always Data Registers and are denoted using a
colon, e.g., R3:2. The larger number must be written first. Note
that the hardware supports only odd-even pairs, e.g., R7:6, R5:4,
R3:2, and R1:0.
Behavior Conventions
All operations that produce a result in an Accumulator saturate to a 40-bit
quantity unless noted otherwise. See “Saturation” on page 1-11 for a
description of saturation behavior.
Glossary
The following terms appear throughout this document. Without trying to
explain the Blackfin processor, here are the terms used with their defini-
tions. See the Blackfin Processor Hardware Reference for your specific
product for more details on the architecture.
Register Names
The architecture includes the registers shown in Table 1-1.
Functional Units
The architecture includes the three processor sections shown in Table 1-2.
Accumulators The set of 40-bit registers A1 and A0 that normally contain data that is being
manipulated. Each Accumulator can be accessed in five ways: as one 40-bit reg-
ister, as one 32-bit register (designated as A1.W or A0.W), as two 16-bit regis-
ters similar to Data Registers (designated as A1.H, A1.L, A0.H, or A0.L) and as
one 8-bit register (designated A1.X or A0.X) for the bits that extend beyond bit
31.
Data The set of 32-bit registers (R0, R1, R2, R3, R4, R5, R6, and R7) that normally
Registers contain data for manipulation. Abbreviated D-register or Dreg. Data Registers
can be accessed as 32-bit registers, or optionally as two independent 16-bit reg-
isters. The least significant 16 bits of each register is called the “low” half and is
designated with “.L” following the register name. The most significant 16 bit is
called the “high” half and is designated with “.H” following the name. Example:
R7.L, r2.h, r4.L, R0.h.
Pointer The set of 32-bit registers (P0, P1, P2, P3, P4, P5, including SP and FP) that
Registers normally contain byte addresses of data structures. Accessed only as a 32-bit reg-
ister. Abbreviated P-register or Preg. Example: p2, p5, fp, sp.
Stack Pointer SP; contains the 32-bit address of the last occupied byte location in the stack.
The stack grows by decrementing the Stack Pointer. A subset of the Pointer Reg-
isters.
Frame FP; contains the 32-bit address of the previous Frame Pointer in the stack,
Pointer located at the top of a frame. A subset of the Pointer Registers.
Loop Top LT0 and LT1; contains 32-bit address of the top of a zero overhead loop.
Loop Count LC0 and LC1; contains 32-bit counter of the zero overhead loop executions.
Loop Bottom LB0 and LB1; contains 32-bit address of the bottom of a zero overhead loop.
Index The set of 32-bit registers I0, I1, I2, I3 that normally contain byte addresses of
Register data structures. Abbreviated I-register or Ireg.
Modify The set of 32-bit registers M0, M1, M2, M3 that normally contain offset values
Registers that are added or subtracted to one of the Index Registers. Abbreviated as Mreg.
Length The set of 32-bit registers L0, L1, L2, L3 that normally contain the length (in
Registers bytes) of the circular buffer. Abbreviated as Lreg. Clear Lreg to disable circular
addressing for the corresponding Ireg. Example: Clear L3 to disable circular
addressing for I3.
Base The set of 32-bit registers B0, B1, B2, B3 that normally contain the base
Registers address (in bytes) of the circular buffer. Abbreviated as Breg.
Data Address Calculates the effective address for indirect and indexed memory
Generator (DAG) accesses. Consists of two sections–DAG0 and DAG1.
Multiply and Performs the arithmetic functions on data. Consists of two sections
Accumulate Unit (MAC0 and MAC1)–each associated with an Accumulator (A0 and A1,
(MAC) respectively).
Arithmetic Logical Performs arithmetic computations and binary shifts on data. Operates
Unit (ALU) on the Data Registers and Accumulators. Consists of two units (ALU0
and ALU1), each associated with an Accumulator (A0 and A1, respec-
tively). Each ALU operates in conjunction with a Multiply and Accu-
mulate Unit.
See the Blackfin Processor Hardware Reference for your specific product for
more details on the architecture.
AN Negative
AQ Quotient
AZ Zero
Fractional Convention
Fractional numbers include subinteger components less than ±1. Whereas
decimal fractions appear to the right of a decimal point, binary fractions
appear to the right of a binal point.
In DSP instructions that assume placement of a binal point, for example
in computing sign bits for normalization or for alignment purposes, the
binal point convention depends on the size of the register being used as
shown in Table 1-4 and Figure 1-1 on page 1-11.
Fractional
Extension
Registers Size Format Notation Sign
Bits
Bits
Bit
40-bit accumulator
S 8-bit extension 31-bit fraction
32-bit register
S 31-bit fraction
Saturation
When the result of an arithmetic operation exceeds the range of the desti-
nation register, important information can be lost.
Saturation is a technique used to contain the quantity within the values
that the destination register can represent. When a value is computed that
exceeds the capacity of the destination register, then the value written to
the register is the largest value that the register can hold with the same sign
as the original.
• If an operation would otherwise cause a positive value to overflow
and become negative, instead, saturation limits the result to the
maximum positive value for the size register being used.
• Conversely, if an operation would otherwise cause a negative value
to overflow and become positive, saturation limits the result to the
maximum negative value for the register size.
The overflow arithmetic flag is never set by an operation that enforces
saturation.
Some instructions for this processor support biased and unbiased round-
ing. The RND_MOD bit in the Arithmetic Status (ASTAT) Register determines
which mode is used. See the Blackfin Processor Hardware Reference for your
specific product for more details on the ASTAT Register.
Another common way to reduce the significant bits representing a number
is to simply mask off the N-M lower bits. This process is known as trunca-
tion and results in a relatively large bias.
Figure 1-2 shows other examples of rounding and truncation methods.
The circular buffer registers define the length (Lreg) of the data block in
bytes and the base (Breg) address to reinitialize the Ireg.
Some instructions modify an Index Register without using it for address-
ing; for example, the Add Immediate and Modify – Decrement
instructions. Such instructions are still affected by circular addressing, if
enabled.
Disable circular addressing for an Ireg by clearing the Lreg that corre-
sponds to the Ireg used in the instruction. For example, clear L2 to disable
circular addressing for register I2. Any nonzero value in an Lreg enables
circular addressing for its corresponding buffer registers.
See the Blackfin Processor Hardware Reference for your specific product for
more details on circular addressing capabilities and operation.
Instruction Summary
• “Jump” on page 2-2
• “IF CC JUMP” on page 2-5
• “Call” on page 2-8
• “RTS, RTI, RTX, RTN, RTE (Return)” on page 2-10
• “LSETUP, LOOP” on page 2-13
Instruction Overview
This chapter discusses the instructions that control program flow. Users
can take advantage of these instructions to force new values into the Pro-
gram Counter and change program flow, branch conditionally, set up
loops, and call and return from subroutines.
Jump
General Form
JUMP (destination_indirect)
JUMP (PC + offset)
JUMP offset
JUMP.S offset
JUMP.L offset
Syntax
JUMP ( Preg ) ; /* indirect to an absolute (not PC-relative)
address (a) */
JUMP ( PC + Preg ) ; /* PC-relative, indexed (a) */
JUMP pcrelm2 ; /* PC-relative, immediate (a) or (b) */
see “Functional Description” on page 2-31
JUMP.S pcrel13m2 ; /* PC-relative, immediate, short (a) */
JUMP.L pcrel25m2 ; /* PC-relative, immediate, long (b) */
JUMP user_label ; /* user-defined absolute address label,
resolved by the assembler/linker to the appropriate PC-relative
instruction (a) or (b) */
Syntax Terminology
Preg: P5–0, SP, FP
1
This instruction can be used in assembly-level programs when the final distance to the target is
unknown at coding time. The assembler substitutes the opcode for JUMP.S or JUMP.L depending on
the final target. Disassembled code shows the mnemonic JUMP.S or JUMP.L.
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Jump instruction forces a new value into the Program Counter (PC) to
change program flow.
In the Indirect and Indexed versions of the instruction, the value in Preg
must be an even number (bit0=0) to maintain 16-bit address alignment.
Otherwise, an odd offset in Preg causes the processor to invoke an align-
ment exception.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The Jump instruction cannot be issued in parallel with other instructions.
Example
jump get_new_sample ; /* assembler resolved target, abstract
offsets */
jump (p5) ; /* P5 contains the absolute address of the target
*/
jump (pc + p2) ; /* P2 relative absolute address of the target
and then a presentation of the absolute values for target */
jump 0x224 ; /* offset is positive in 13 bits, so target
address is PC + 0x224, a forward jump */
jump.s 0x224 ; /* same as above with jump “short” syntax */
jump.l 0xFFFACE86 ; /* offset is negative in 25 bits, so target
address is PC + 0x1FA CE86, a backwards jump */
Also See
Call, IF CC JUMP
Special Applications
None
IF CC JUMP
General Form
IF CC JUMP destination
IF !CC JUMP destination
Syntax
IF CC JUMP pcrel11m2 ; /* branch if CC=1, branch predicted as
not taken (a) */
1
IF CC JUMP pcrel11m2 (bp) ; /* branch if CC=1, branch predicted
as taken (a) */
IF !CC JUMP pcrel11m2 ; /* branch if CC=0, branch predicted as
not taken (a) */
2
IF !CC JUMP pcrel11m2 (bp) ; /* branch if CC=0, branch pre-
dicted as taken (a) */
IF CC JUMP user_label ; /* user-defined absolute address label,
resolved by the assembler/linker to the appropriate PC-relative
instruction (a) */
IF CC JUMP user_label (bp) ; /* user-defined absolute address
label, resolved by the assembler/linker to the appropriate
PC-relative instruction (a) */
IF !CC JUMP user_label ; /* user-defined absolute address
label, resolved by the assembler/linker to the appropriate
PC-relative instruction (a) */
IF !CC JUMP user_label (bp) ; /* user-defined absolute address
label, resolved by the assembler/linker to the appropriate
PC-relative instruction (a) */
1 CC bit = 1 causes a branch to an address, computed by adding the signed, even offset to the current
PC value.
2 CC bit = 0 causes a branch to an address, computed by adding the signed, even relative offset to the
current PC value.
Syntax Terminology
pcrel11m2: 11-bit signed even relative offset, with a range of –1024
through 1022 bytes (0xFC00 to 0x03FE). This value can optionally be
replaced with an address label that is evaluated and replaced during
linking.
user_label: valid assembler address label, resolved by the assembler/linker
to a valid PC-relative offset
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Conditional JUMP instruction forces a new value into the Program
Counter (PC) to change the program flow, based on the value of the CC bit.
The range of valid offset values is –1024 through 1022.
Option
The Branch Prediction appendix (bp) helps the processor improve branch
instruction performance. The default is branch predicted-not-taken. By
appending (bp) to the instruction, the branch becomes predicted-taken.
Typically, code analysis shows that a good default condition is to predict
branch-taken for branches to a prior address (backwards branches), and to
predict branch-not-taken for branches to subsequent addresses (forward
branches).
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
if cc jump 0xFFFFFE08 (bp) ; /* offset is negative in 11 bits,
so target address is a backwards branch, branch predicted */
if cc jump 0x0B4 ; /* offset is positive, so target offset
address is a forwards branch, branch not predicted */
if !cc jump 0xFFFFFC22 (bp) ; /* negative offset in 11 bits, so
target address is a backwards branch, branch predicted */
if !cc jump 0x120 ; /* positive offset, so target address is a
forwards branch, branch not predicted */
if cc jump dest_label ; /* assembler resolved target, abstract
offsets */
Also See
Jump, Call
Special Applications
None
Call
General Form
CALL (destination_indirect
CALL (PC + offset)
CALL offset
Syntax
CALL ( Preg ) ; /* indirect to an absolute (not PC-relative)
address (a) */
CALL ( PC + Preg ) ; /* PC-relative, indexed (a) */
CALL pcrel25m2 ; /* PC-relative, immediate (b) */
CALL user_label ; /* user-defined absolute address label,
resolved by the assembler/linker to the appropriate PC-relative
instruction (a) or (b) */
Syntax Terminology
Preg: P5–0 (SP and FP are not allowed as the source register for this
instruction.)
pcrel25m2: 25-bit signed, even, PC-relative offset; can be specified as a
symbolic address label, with a range of –16,777,216 through 16,777,214
(0xFF00 0000 to 0x00FF FFFE) bytes.
user_label: valid assembler address label, resolved by the assembler/linker
to a valid PC-relative offset
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The CALL instruction calls a subroutine from an address that a P-register
points to or by using a PC-relative offset. After the CALL instruction exe-
cutes, the RETS register contains the address of the next instruction.
The value in the Preg must be an even value to maintain 16-bit alignment.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
call ( p5 ) ;
call ( pc + p2 ) ;
call 0x123456 ;
call get_next_sample ;
Also See
RTS, RTI, RTX, RTN, RTE (Return), Jump, IF CC JUMP
Special Applications
None
General Form
RTS, RTI, RTX, RTN, RTE
Syntax
RTS ; // Return from Subroutine (a)
RTI ; // Return from Interrupt (a)
RTX ; // Return from Exception (a)
RTN ; // Return from NMI (a)
RTE ; // Return from Emulation (a)
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Return instruction forces a return from a subroutine, maskable or
NMI interrupt routine, exception routine, or emulation routine (see
Table 2-1).
Flags Affected
None
Required Mode
Table 2-2 identifies the modes required by the Return instruction.
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
RTS Forces a return from a subroutine by loading the value of the RETS
Register into the Program Counter (PC), causing the processor to fetch
the next instruction from the address contained in RETS. For nested
subroutines, you must save the value of the RETS Register. Otherwise,
the next subroutine CALL instruction overwrites it.
RTI Forces a return from an interrupt routine by loading the value of the
RETI Register into the PC. When an interrupt is generated, the pro-
cessor enters a non-interruptible state. Saving RETI to the stack
re-enables interrupt detection so that subsequent, higher priority inter-
rupts can be serviced (or “nested”) during the current interrupt service
routine. If RETI is not saved to the stack, higher priority interrupts are
recognized but not serviced until the current interrupt service routine
concludes. Restoring RETI back off the stack at the conclusion of the
interrupt service routine masks subsequent interrupts until the RTI
instruction executes. In any case, RETI is protected against inadvertent
corruption by higher priority interrupts.
RTX Forces a return from an exception routine by loading the value of the
RETX Register into the PC.
RTI, RTX, and RTN Supervisor only. Any attempt to execute in User mode produces a
protection violation exception.
Example
rts ;
rti ;
rtx ;
rtn ;
rte ;
Also See
Call, --SP (Push), SP++ (Pop)
Special Applications
None
LSETUP, LOOP
General Form
There are two forms of this instruction. The first is:
LOOP loop_name loop_counter
LOOP_BEGIN loop_name
LOOP_END loop_name
Syntax
For Loop0
LOOP loop_name LC0 ; /* (b) */
LOOP loop_name LC0 = Preg ; /* autoinitialize LC0 (b) */
LOOP loop_name LC0 = Preg >> 1 ; /* autoinit LC0(b) */
LOOP_BEGIN loop_name ; /* define the 1st instruction of loop(b)
*/
LOOP_END loop_name ; /* define the last instruction of the loop
(b) */
/* use any one of the LOOP syntax versions with a LOOP_BEGIN and
a LOOP_END instruction. The name of the loop (“loop_name” in the
syntax) relates the three instructions together. */
For Loop1
LOOP loop_name LC1 ; /* (b) */
LOOP loop_name LC1 = Preg ; /* autoinitialize LC1 (b) */
LOOP loop_name LC1 = Preg >> 1 ; /* autoinitialize LC1 (b) */
LOOP_BEGIN loop_name ; /* define the first instruction of the
loop (b) */
LOOP_END loop_name ; /* define the last instruction of the loop
(b) */
/* Use any one of the LOOP syntax versions with a LOOP_BEGIN and
a LOOP_END instruction. The name of the loop (“loop_name” in the
syntax) relates the three instructions together. */
Syntax Terminology
Preg: P5–0 (SP and FP are not allowed as the source register for this
instruction.)
pcrel5m2: 5-bit unsigned, even, PC-relative offset; can be replaced by a
symbolic label. The range is 4 to 30, or 25–2.
lppcrel11m2: 11-bit unsigned, even, PC-relative offset for a loop; can be
replaced by a symbolic label. The range is 4 to 2046 (0x0004 to 0x07FE),
or 211–2.
loop_name: a symbolic identifier
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Zero-Overhead Loop Setup instruction provides a flexible,
counter-based, hardware loop mechanism that provides efficient,
zero-overhead software loops. In this context, zero-overhead means that
the software in the loops does not incur a performance or code size penalty
by decrementing a counter, evaluating a loop condition, then calculating
and branching to a new target address.
L the
When the address is the next sequential address after
Begin_Loop
instruction, the loop has zero overhead. If the
LSETUP
Begin_Loop address is not the next sequential address after the
LSETUP instruction, there is some overhead that is incurred on loop
entry only.
The architecture includes two sets of three registers each to support two
independent, nestable loops. The registers are Loop_Top (LTn),
Loop_Bottom (LBn) and Loop_Count (LCn). Consequently, LT0, LB0, and
LC0 describe Loop0, and LT1, LB1, and LC1 describe Loop1.
The LOOP and LSETUP instructions are a convenient way to initialize all
three registers in a single instruction. The size of the LOOP and LSETUP
instructions only supports a finite number of bits, so the loop range is lim-
ited. However, LT0 and LT1, LB0 and LB1 and LC0 and LC1 can be
initialized manually using Move instructions if loop length and repetition
count need to be beyond the limits supported by the LOOP and LSETUP syn-
tax. Thus, a single loop can span the entire 4 GB of memory space.
The instruction syntax supports an optional initialization value from a
P-register or P-register divided by 2.
The LOOP, LOOP_BEGIN, LOOP_END syntax is generally more readable and
user friendly. The LSETUP syntax contains the same information, but in a
more compact form.
If LCn is nonzero when the fetch address equals LBn, the processor decre-
ments LCn and places the address in LTn into the PC. The loop always
executes once through because Loop_Count is evaluated at the end of the
loop.
There are two special cases for small loop count values. A value of 0 in
Loop_Count causes the hardware loop mechanism to neither decrement or
loopback, causing the instructions enclosed by the loop pointers to be exe-
cuted as straight-line code. A value of 1 in Loop_Count causes the hardware
loop mechanism to decrement only (not loopback), also causing the
instructions enclosed by the loop pointers to be executed as straight-line
code.
In the instruction syntax, the designation of the loop counter–LC0 or LC1–
determines which loop level is initialized. Consequently, to initialize
Loop0, code LC0; to initialize Loop1, code LC1.
In the case of nested loops that end on the same instruction, the processor
requires Loop0 to describe the outer loop and Loop1 to describe the inner
loop. The user is responsible for meeting this requirement.
For example, if LB0=LB1, then the processor assumes loop 1 is the inner
loop and loop 0 the outer loop.
Just like entries in any other register, loop register entries can be saved and
restored. If nesting beyond two loop levels is required, the user can explic-
itly save the outermost loop register values, re-use the registers for an inner
loop, and then restore the outermost loop values before terminating the
inner loop. In such a case, remember that loop 0 must always be outside of
loop 1. Alternately, the user can implement the outermost loop in soft-
ware with the Conditional Jump structure.
Begin_Loop, the value loaded into LTn, is a 5-bit, PC-relative, even offset
from the current instruction to the first instruction in the loop. The user
is required to preserve half-word alignment by maintaining even values in
this register. The offset is interpreted as a one’s complement, unsigned
number, eliminating backwards loops.
End_Loop, the value loaded into LBn, is an 11-bit, unsigned, even, PC-rela-
tive offset from the current instruction to the last instruction of the loop.
When using the LSETUP instruction, Begin_Loop and End_Loop are typi-
cally address labels. The linker replaces the labels with offset values.
A loop counter register (LC0 or LC1) counts the trips through the loop.
The register contains a 32-bit unsigned value, supporting as many as
4,294,967,294 trips through the loop. The loop is disabled (subsequent
executions of the loop code pass through without reiterating) when the
loop counter equals 0.
There are some restrictions on the last instruction in a loop for
ADSP-BF535 processors. The last instruction of the loop executing on an
ADSP-BF535 Blackfin processor must not be any of the following:
• Jump
• Conditional Branch
• Call
• CSYNC
• SSYNC
Also, the last instruction in the loop must not modify the registers that
define the currently active loop (LCn, LTn, or LBn). User modifications to
those registers while the hardware accesses them produces undefined exe-
cution. Software can legally modify the loop counter at any other location
in the loop.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
lsetup ( 4, 4 ) lc0 ;
lsetup ( poll_bit, end_poll_bit ) lc0 ;
lsetup ( 4, 6 ) lc1 ;
lsetup ( FIR_filter, bottom_of_FIR_filter ) lc1 ;
lsetup ( 4, 8 ) lc0 = p1 ;
lsetup ( 4, 8 ) lc0 = p1>>1 ;
Also See
IF CC JUMP, Jump
Special Applications
None
Instruction Summary
• “Load Immediate” on page 3-3
• “Load Pointer Register” on page 3-7
• “Load Data Register” on page 3-10
• “Load Half-Word – Zero-Extended” on page 3-15
• “Load Half-Word – Sign-Extended” on page 3-19
• “Load High Data Register Half” on page 3-23
• “Load Low Data Register Half” on page 3-27
• “Load Byte – Zero-Extended” on page 3-31
• “Load Byte – Sign-Extended” on page 3-34
• “Store Pointer Register” on page 3-37
• “Store Data Register” on page 3-40
• “Store High Data Register Half” on page 3-45
• “Store Low Data Register Half” on page 3-49
• “Store Byte” on page 3-54
Instruction Overview
This chapter discusses the load/store instructions. Users can take advan-
tage of these instructions to load and store immediate values, pointer
registers, data registers or data register halves, and half words (zero or sign
extended).
Load Immediate
General Form
register = constant
A1 = A0 = 0
Syntax
Half-Word Load
reg_lo = uimm16 ; /* 16-bit value into low-half data or
address register (b) */
reg_hi = uimm16 ; /* 16-bit value into high-half data or
address register (b) */
Zero Extended
reg = uimm16 (Z) ; /* 16-bit value, zero-extended, into data or
address register (b) */
A0 = 0 ; /* Clear A0 register (b) */
A1 = 0 ; /* Clear A1 register (b) */
A1 = A0 = 0 ; /* Clear both A1 and A0 registers (b) */
Sign Extended
Dreg = imm7 (X) ; /* 7-bit value, sign extended, into Dreg (a)
*/
Preg = imm7 (X) ; /* 7-bit value, sign extended, into Preg
(a) */
reg = imm16 (X) ; /* 16-bit value, sign extended, into data or
address register (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Load Immediate instruction loads immediate values, or explicit con-
stants, into registers.
The instruction loads a 7-bit or 16-bit quantity, depending on the size of
the immediate data. The range of constants that can be loaded is 0x8000
through 0x7FFF, equivalent to –32768 through +32767.
The only values that can be immediately loaded into 40-bit Accumulator
registers are zeros.
Sixteen-bit half-words can be loaded into either the high half or low half
of a register. The load operation leaves the unspecified half of the register
intact.
Loading a 32-bit value into a register using Load Immediate requires two
separate instructions—one for the high and one for the low half. For
example, to load the address “foo” into register P3, write:
p3.h = foo ;
p3.1 = foo ;
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
r7 = 63 (z) ;
p3 = 12 (z) ;
r0 = -344 (x) ;
r7 = 436 (z) ;
m2 = 0x89ab (z) ;
p1 = 0x1234 (z) ;
m3 = 0x3456 (x) ;
l3.h = 0xbcde ;
a0 = 0 ;
a1 = 0 ;
a1 = a0 = 0 ;
Also See
Load Pointer Register, Load Pointer Register
Special Applications
Use the Load Immediate instruction to initialize registers.
General Form
P-register = [ indirect_address ]
Syntax
Preg = [ Preg ] ; /* indirect (a) */
Preg = [ Preg ++ ] ; /* indirect, post-increment (a) */
Preg = [ Preg -- ] ; /* indirect, post-decrement (a) */
Preg = [ Preg + uimm6m4 ] ; /* indexed with small offset (a) */
Preg = [ Preg + uimm17m4 ] ; /* indexed with large offset
(b) */
Preg = [ Preg - uimm17m4 ] ; /* indexed with large offset
(b) */
Preg = [ FP - uimm7m4 ] ; /* indexed FP-relative (a) */
Syntax Terminology
Preg: P5–0, SP, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Load Pointer Register instruction loads a 32-bit P-register with a
32-bit word from an address specified by a P-register.
The indirect address and offset must yield an even multiple of 4 to main-
tain 4-byte word address alignment. Failure to maintain proper alignment
causes a misaligned memory access exception.
Options
The Load Pointer Register instruction supports the following options.
• Post-increment the source pointer by 4 bytes.
• Post-decrement the source pointer by 4 bytes.
• Offset the source pointer with a small (6-bit), word-aligned (multi-
ple of 4), unsigned constant.
• Offset the source pointer with a large (18-bit), word-aligned (mul-
tiple of 4), signed constant.
• Frame Pointer (FP) relative and offset with a 7-bit, word-aligned
(multiple of 4), negative constant.
The indexed FP-relative form is typically used to access local variables in a
subroutine or function. Positive offsets relative to FP (useful to access
arguments from a called function) can be accomplished using one of the
other versions of this instruction. Preg includes the Frame Pointer and
Stack Pointer.
Auto-increment or auto-decrement pointer registers cannot also be the
destination of a Load instruction. For example, sp=[sp++] is not a valid
instruction because it prescribes two competing values for the Stack
Pointer–the data returned from memory, and post-incremented SP++.
Similarly, P0=[P0++] and P1=[P1++], etc. are invalid. Such an instruction
causes an undefined instruction exception.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
p3 = [ p2 ] ;
p5 = [ p0 ++ ] ;
p2 = [ sp -- ] ;
p3 = [ p2 + 8 ] ;
p0 = [ p2 + 0x4008 ] ;
p1 = [ fp - 16 ] ;
Also See
Load Immediate, SP++ (Pop), SP++ (Pop Multiple)
Special Applications
None
General Form
D-register = [ indirect_address ]
Syntax
Dreg = [ Preg ] ; /* indirect (a) */
Dreg = [ Preg ++ ] ; /* indirect, post-increment (a) */
Dreg = [ Preg -- ] ; /* indirect, post-decrement (a) */
Dreg = [ Preg + uimm6m4 ] ; /* indexed with small offset (a) */
Dreg = [ Preg + uimm17m4 ] ; /* indexed with large offset
(b) */
Dreg = [ Preg - uimm17m4 ] ; /* indexed with large offset
(b) */
Dreg = [ Preg ++ Preg ] ; /* indirect, post-increment index
(a) */
1
Dreg = [ FP - uimm7m4 ] ; /* indexed FP-relative (a) */
Dreg = [ Ireg ] ; /* indirect (a) */
Dreg = [ Ireg ++ ] ; /* indirect, post-increment (a) */
Dreg = [ Ireg -- ] ; /* indirect, post-decrement (a) */
Dreg = [ Ireg ++ Mreg ] ; /* indirect, post-increment index
(a) */
1
Syntax Terminology
Dreg: R7–0
Ireg: I3–0
Mreg: M3–0
1
See “Indirect and Post-Increment Index Addressing” on page 3-12.
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Load Data Register instruction loads a 32-bit word into a 32-bit
D-register from a memory location. The Source Pointer register can be a
P-register, I-register, or the Frame Pointer.
The indirect address and offset must yield an even multiple of 4 to main-
tain 4-byte word address alignment. Failure to maintain proper alignment
causes a misaligned memory access exception.
Options
The Load Data Register instruction supports the following options.
• Post-increment the source pointer by 4 bytes to maintain word
alignment.
• Post-decrement the source pointer by 4 bytes to maintain word
alignment.
• Offset the source pointer with a small (6-bit), word-aligned (multi-
ple of 4), unsigned constant.
• Offset the source pointer with a large (18-bit), word-aligned (mul-
tiple of 4), signed constant.
• Frame Pointer (FP) relative and offset with a 7-bit, word-aligned
(multiple of 4), negative constant.
The indexed FP-relative form is typically used to access local variables in a
subroutine or function. Positive offsets relative to FP (useful to access
arguments from a called function) can be accomplished using one of the
other versions of this instruction. Preg includes the Frame Pointer and
Stack Pointer.
where:
• Dest is the destination register. (Dreg in the syntax example).
• Src_1 is the first source register on the right-hand side of the
equation.
• Src_2 is the second source register.
Indirect and post-increment index addressing supports customized indi-
rect address cadence. The indirect, post-increment index version must
have separate P-registers for the input operands. If a common Preg is used
for the inputs, the auto-increment feature does not work.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r3 = [ p0 ] ;
r7 = [ p1 ++ ] ;
r2 = [ sp -- ] ;
r6 = [ p2 + 12 ] ;
r0 = [ p4 + 0x800C ] ;
r1 = [ p0 ++ p1 ] ;
r5 = [ fp -12 ] ;
r2 = [ i2 ] ;
r0 = [ i0 ++ ] ;
r0 = [ i0 -- ] ;
/* Before indirect post-increment indexed addressing*/
r7 = 0 ;
i3 = 0x4000 ; /* Memory location contains 15, for example.*/
m0 = 4 ;
r7 = [i3 ++ m0] ;
/* Afterwards . . .*/
/* r7 = 15 from memory location 0x4000*/
/* i3 = i3 + m0 = 0x4004*/
/* m0 still equals 4*/
Also See
Load Immediate
Special Applications
None
General Form
D-register = W [ indirect_address ] (Z)
Syntax
Dreg = W [ Preg ] (Z) ; /* indirect (a)*/
Dreg = W [ Preg ++ ] (Z) ; /* indirect, post-increment (a)*/
Dreg = W [ Preg -- ] (Z) ; /* indirect, post-decrement (a)*/
Dreg = W [ Preg + uimm5m2 ] (Z) ; /* indexed with small offset
(a) */
Dreg = W [ Preg + uimm16m2 ] (Z) ; /* indexed with large offset
(b) */
Dreg = W [ Preg - uimm16m2 ] (Z) ; /* indexed with large offset
(b) */
Dreg = W [ Preg ++ Preg ] (Z) ; /* indirect, post-increment
index (a) */
1
Syntax Terminology
Dreg: R7–0
1
See “Indirect and Post-Increment Index Addressing” on page 3-17.
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Load Half-Word – Zero-Extended instruction loads 16 bits from a
memory location into the lower half of a 32-bit data register. The instruc-
tion zero-extends the upper half of the register. The Pointer register is a
P-register.
The indirect address and offset must yield an even numbered address to
maintain 2-byte half-word address alignment. Failure to maintain proper
alignment causes a misaligned memory access exception.
Options
The Load Half-Word – Zero-Extended instruction supports the following
options.
• Post-increment the source pointer by 2 bytes.
• Post-decrement the source pointer by 2 bytes.
• Offset the source pointer with a small (5-bit), half-word-aligned
(even), unsigned constant.
• Offset the source pointer with a large (17-bit), half-word-aligned
(even), signed constant.
where:
• Dest is the destination register. (Dreg in the syntax example).
• Src_1 is the first source register on the right-hand side of the
equation.
• Src_2 is the second source register.
Indirect and post-increment index addressing supports customized indi-
rect address cadence. The indirect, post-increment index version must
have separate P-registers for the input operands. If a common Preg is used
for the inputs, the instruction functions as a simple, non-incrementing
load. For example, r0 = W[p2++p2](z) functions as r0 = W[p2](z).
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r3 = w [ p0 ] (z) ;
r7 = w [ p1 ++ ] (z) ;
r2 = w [ sp -- ] (z) ;
r6 = w [ p2 + 12 ] (z) ;
r0 = w [ p4 + 0x8004 ] (z) ;
r1 = w [ p0 ++ p1 ] (z) ;
Also See
Load Half-Word – Sign-Extended, Load Low Data Register Half, Load
High Data Register Half, Load Data Register
Special Applications
To read consecutive, aligned 16-bit values for high-performance DSP
operations, use the Load Data Register instructions instead of these
Half-Word instructions. The Half-Word Load instructions use only half
the available 32-bit data bus bandwidth, possibly imposing a bottleneck
constriction in the data flow rate.
General Form
D-register = W [ indirect_address ] (X)
Syntax
Dreg = W [ Preg ] (X) ; // indirect (a)
Dreg = W [ Preg ++ ] (X) ; // indirect, post-increment (a)
Dreg = W [ Preg -- ] (X) ; // indirect, post-decrement (a)
Dreg = W [ Preg + uimm5m2 ] (X) ; /* indexed with small offset
(a) */
Dreg = W [ Preg + uimm16m2 ] (X) ; /* indexed with large offset
(b) */
Dreg = W [ Preg - uimm16m2 ] (X) ; /* indexed with large offset
(b) */
Dreg = W [ Preg ++ Preg ] (X) ; /* indirect, post-increment
index (a) */
1
Syntax Terminology
Dreg: R7–0
1
See “Indirect and Post-Increment Index Addressing” on page 3-21.
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Load Half-Word – Sign-Extended instruction loads 16 bits
sign-extended from a memory location into a 32-bit data register. The
Pointer register is a P-register. The MSB of the number loaded is repli-
cated in the whole upper-half word of the destination D-register.
The indirect address and offset must yield an even numbered address to
maintain 2-byte half-word address alignment. Failure to maintain proper
alignment causes a misaligned memory access exception.
Options
The Load Half-Word – Sign-Extended instruction supports the following
options.
• Post-increment the source pointer by 2 bytes.
• Post-decrement the source pointer by 2 bytes.
• Offset the source pointer with a small (5-bit), half-word-aligned
(even), unsigned constant.
• Offset the source pointer with a large (17-bit), half-word-aligned
(even), signed constant.
where:
• Dest is the destination register. (Dreg in the syntax example).
• Src_1 is the first source register on the right-hand side of the
equation.
• Src_2 is the second source register.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r3 = w [ p0 ] (x) ;
r7 = w [ p1 ++ ] (x) ;
r2 = w [ sp -- ] (x) ;
r6 = w [ p2 + 12 ] (x) ;
r0 = w [ p4 + 0x800E ] (x) ;
r1 = w [ p0 ++ p1 ] (x) ;
Also See
Load Half-Word – Zero-Extended, Load Low Data Register Half, Load
High Data Register Half
Special Applications
To read consecutive, aligned 16-bit values for high-performance DSP
operations, use the Load Data Register instructions instead of these
Half-Word instructions. The Half-Word Load instructions use only half
the available 32-bit data bus bandwidth, possibly imposing a bottleneck
constriction in the data flow rate.
General Form
Dreg_hi = W [ indirect_address ]
Syntax
Dreg_hi = W [ Ireg ] ; /* indirect (DAG) (a)*/
Dreg_hi = W [ Ireg ++ ] ; /* indirect, post-increment (DAG)
(a) */
Dreg_hi = W [ Ireg -- ] ; /* indirect, post-decrement (DAG)
(a) */
Dreg_hi = W [ Preg ] ; /* indirect (a)*/
Dreg_hi = W [ Preg ++ Preg ] ; /* indirect, post-increment
index (a) */
1
Syntax Terminology
Dreg_hi: R7–0.H
Ireg: I3–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Load High Data Register Half instruction loads 16 bits from a mem-
ory location indicated by an I-register or a P-register into the most
significant half of a 32-bit data register. The operation does not affect the
least significant half.
1
See “Indirect and Post-Increment Index Addressing” on page 3-25.
Options
The Load High Data Register Half instruction supports the following
options.
• Post-increment the source pointer I-register by 2 bytes to maintain
half-word alignment.
• Post-decrement the source pointer I-register by 2 bytes to maintain
half-word alignment.
where:
• Dst_hi is the most significant half of the destination register.
(Dreg_hi in the syntax example).
• Src_1 is the memory source pointer register on the right-hand side
of the syntax.
• Src_2 is the increment pointer register.
Indirect and post-increment index addressing supports customized indi-
rect address cadence. The indirect, post-increment index version must
have separate P-registers for the input operands. If a common Preg is used
for the inputs, the instruction functions as a simple, non-incrementing
load. For example, r0.h = W[p2++p2] functions as r0.h = W[p2].
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3.h = w [ i1 ] ;
r7.h = w [ i3 ++ ] ;
r1.h = w [ i0 -- ] ;
r2.h = w [ p4 ] ;
r5.h = w [ p2 ++ p0 ] ;
Also See
Load Low Data Register Half, Load Half-Word – Zero-Extended, Load
Half-Word – Sign-Extended
Special Applications
To read consecutive, aligned 16-bit values for high-performance DSP
operations, use the Load Data Register instructions instead of these
Half-Word instructions. The Half-Word Load instructions use only half
the available 32-bit data bus bandwidth, possibly imposing a bottleneck
constriction in the data flow rate.
General Form
Dreg_lo = W [ indirect_address ]
Syntax
Dreg_lo = W [ Ireg ] ; /* indirect (DAG) (a)*/
Dreg_lo = W [ Ireg ++ ] ; /* indirect, post-increment (DAG) (a)
*/
Dreg_lo = W [ Ireg -- ] ; /* indirect, post-decrement (DAG) (a)
*/
Dreg_lo = W [ Preg ] ; /* indirect (a)*/
Dreg_lo = W [ Preg ++ Preg ] ; /* indirect, post-increment
index (a) */
1
Syntax Terminology
Dreg_lo: R7–0.L
Ireg: I3–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Load Low Data Register Half instruction loads 16 bits from a mem-
ory location indicated by an I-register or a P-register into the least
significant half of a 32-bit data register. The operation does not affect the
most significant half of the data register.
1
See “Indirect and Post-Increment Index Addressing” on page 3-29.
Options
The Load Low Data Register Half instruction supports the following
options.
• Post-increment the source pointer I-register by 2 bytes.
• Post-decrement the source pointer I-register by 2 bytes.
where:
• Dst_lo is the least significant half of the destination register.
(Dreg_lo in the syntax example).
• Src_1 is the memory source pointer register on the right side of the
syntax.
• Src_2 is the increment index register.
Indirect and post-increment index addressing supports customized indi-
rect address cadence. The indirect, post-increment index version must
have separate P-registers for the input operands. If a common Preg is used
for the inputs, the instruction functions as a simple, non-incrementing
load. For example, r0.l = W[p2++p2] functions as r0.l = W[p2].
Flags Affected
None
Required Mode
User & Supervisor
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3.l = w[ i1 ] ;
r7.l = w[ i3 ++ ] ;
r1.l = w[ i0 -- ] ;
r2.l = w[ p4 ] ;
r5.l = w[ p2 ++ p0 ] ;
Also See
Load High Data Register Half, Load Half-Word – Zero-Extended, Load
Half-Word – Sign-Extended
Special Applications
To read consecutive, aligned 16-bit values for high-performance DSP
operations, use the Load Data Register instructions instead of these
Half-Word instructions. The Half-Word Load instructions use only half
of the available 32-bit data bus bandwidth, possibly imposing a bottleneck
constriction in the data flow rate.
General Form
D-register = B [ indirect_address ] (Z)
Syntax
Dreg = B [ Preg ] (Z) ; /* indirect (a)*/
Dreg = B [ Preg ++ ] (Z) ; /* indirect, post-increment (a)*/
Dreg = B [ Preg -- ] (Z) ; /* indirect, post-decrement (a)*/
Dreg = B [ Preg + uimm15 ] (Z) ; /* indexed with offset (b)*/
Dreg = B [ Preg - uimm15 ] (Z) ; /* indexed with offset (b)*/
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Load Byte – Zero-Extended instruction loads an 8-bit byte,
zero-extended to 32 bits indicated by an I-register or a P-register, from a
memory location into a 32-bit data register. Fill the D-register bits 31–8
with zeros.
The indirect address and offset have no restrictions for memory address
alignment.
Options
The Load Byte – Zero-Extended instruction supports the following
options.
• Post-increment the source pointer by 1 byte.
• Post-decrement the source pointer by 1 byte.
• Offset the source pointer with a 16-bit signed constant.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r3 = b [ p0 ] (z) ;
r7 = b [ p1 ++ ] (z) ;
r2 = b [ sp -- ] (z) ;
r0 = b [ p4 + 0xFFFF800F ] (z) ;
Also See
Load Byte – Sign-Extended
Special Applications
None
General Form
D-register = B [ indirect_address ] (X)
Syntax
Dreg = B [ Preg ] (X) ; /* indirect (a)*/
Dreg = B [ Preg ++ ] (X) ; /* indirect, post-increment (a)*/
Dreg = B [ Preg -- ] (X) ; /* indirect, post-decrement (a)*/
Dreg = B [ Preg + uimm15 ] (X) ; /* indexed with offset (b)*/
Dreg = B [ Preg - uimm15 ] (X) ; /* indexed with offset (b)*/
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Load Byte – Sign-Extended instruction loads an 8-bit byte,
sign-extended to 32 bits, from a memory location indicated by a P-register
into a 32-bit data register. The Pointer register is a P-register. Fill the
D-register bits 31–8 with the most significant bit of the loaded byte.
The indirect address and offset have no restrictions for memory address
alignment.
Options
The Load Byte – Sign-Extended instruction supports the following
options.
• Post-increment the source pointer by 1 byte.
• Post-decrement the source pointer by 1 byte.
• Offset the source pointer with a 16-bit signed constant.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r3 = b [ p0 ] (x) ;
r7 = b [ p1 ++ ](x) ;
r2 = b [ sp -- ] (x) ;
r0 = b [ p4 + 0xFFFF800F ](x) ;
Also See
Load Byte – Zero-Extended
Special Applications
None
General Form
[ indirect_address ] = P-register
Syntax
[ Preg ] = Preg ; /* indirect (a)*/
[ Preg ++ ] = Preg ; /* indirect, post-increment (a)*/
[ Preg -- ] = Preg ; /* indirect, post-decrement (a)*/
[ Preg + uimm6m4 ] = Preg ; /* indexed with small offset (a)*/
[ Preg + uimm17m4 ] = Preg ; /* indexed with large offset (b)*/
[ Preg - uimm17m4 ] = Preg ; /* indexed with large offset (b)*/
[ FP - uimm7m4 ] = Preg ; /* indexed FP-relative (a)*/
Syntax Terminology
Preg: P5–0, SP, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Store Pointer Register instruction stores the contents of a 32-bit
P-register to a 32-bit memory location. The Pointer register is a P-register.
The indirect address and offset must yield an even multiple of 4 to main-
tain 4-byte word address alignment. Failure to maintain proper alignment
causes a misaligned memory access exception.
Options
The Store Pointer Register instruction supports the following options.
• Post-increment the destination pointer by 4 bytes.
• Post-decrement the destination pointer by 4 bytes.
• Offset the source pointer with a small (6-bit), word-aligned (multi-
ple of 4), unsigned constant.
• Offset the source pointer with a large (18-bit), word-aligned (mul-
tiple of 4), signed constant.
• Frame Pointer (FP) relative and offset with a 7-bit, word-aligned
(multiple of 4), negative constant.
The indexed FP-relative form is typically used to access local variables in a
subroutine or function. Positive offsets relative to FP (useful to access
arguments from a called function) can be accomplished using one of the
other versions of this instruction. Preg includes the Frame Pointer and
Stack Pointer.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
[ p2 ] = p3 ;
[ sp ++ ] = p5 ;
[ p0 -- ] = p2 ;
[ p2 + 8 ] = p3 ;
[ p2 + 0x4444 ] = p0 ;
[ fp -12 ] = p1 ;
Also See
--SP (Push), --SP (Push Multiple)
Special Applications
None
General Form
[ indirect_address ] = D-register
Syntax
Using Pointer Registers
[ Preg ] = Dreg ; /* indirect (a)*/
[ Preg ++ ] = Dreg ; /* indirect, post-increment (a)*/
[ Preg -- ] = Dreg ; /* indirect, post-decrement (a)*/
[ Preg + uimm6m4 ] = Dreg ; /* indexed with small offset (a)*/
[ Preg +
[ uimm17m4 ] = Dreg ; /* indexed with large offset (b)*/
[ Preg - uimm17m4 ] = Dreg ; /* indexed with large offset (b)*/
[ Preg ++ Preg ] = Dreg ; /* indirect, post-increment index (a)
*/
1
[ FP - uimm7m4 ] = Dreg ; /* indexed FP-relative (a)*/
Syntax Terminology
Dreg: R7–0
1
See “Indirect and Post-Increment Index Addressing” on page 3-43.
Ireg: I3–0
Mreg: M3–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Store Data Register instruction stores the contents of a 32-bit D-reg-
ister to a 32-bit memory location. The destination Pointer register can be
a P-register, I-register, or the Frame Pointer.
The indirect address and offset must yield an even multiple of 4 to main-
tain 4-byte word address alignment. Failure to maintain proper alignment
causes a misaligned memory access exception.
Options
The Store Data Register instruction supports the following options.
• Post-increment the destination pointer by 4 bytes.
• Post-decrement the destination pointer by 4 bytes.
• Offset the source pointer with a small (6-bit), word-aligned (multi-
ple of 4), unsigned constant.
• Offset the source pointer with a large (18-bit), word-aligned (mul-
tiple of 4), signed constant.
• Frame Pointer (FP) relative and offset with a 7-bit, word-aligned
(multiple of 4), negative constant.
The indexed FP-relative form is typically used to access local variables in a
subroutine or function. Positive offsets relative to FP (such as is useful to
access arguments from a called function) can be accomplished using one
of the other versions of this instruction. Preg includes the Frame Pointer
and Stack Pointer.
where:
• Src is the source register. (Dreg in the syntax example).
• Dst_1 is the memory destination register on the left side of the
equation.
• Dst_2 is the increment index register.
Indirect and post-increment index addressing supports customized indi-
rect address cadence. The indirect, post-increment index version must
have separate P-registers for the input operands. If a common Preg is used
for the inputs, the auto-increment feature does not work.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
[ p0 ] = r3 ;
[ p1 ++ ] = r7 ;
[ sp -- ] = r2 ;
[ p2 + 12 ] = r6 ;
[ p4 - 0x1004 ] = r0 ;
[ p0 ++ p1 ] = r1 ;
[ fp - 28 ] = r5 ;
[ i2 ] = r2 ;
[ i0 ++ ] = r0 ;
[ i0 -- ] = r0 ;
[ i3 ++ m0 ] = r7 ;
Also See
Load Immediate
Special Applications
None
General Form
W [ indirect_address ] = Dreg_hi
Syntax
W [ Ireg ] = Dreg_hi ; /* indirect (DAG) (a)*/
W [ Ireg ++ ] = Dreg_hi ; /* indirect, post-increment (DAG) (a)
*/
W [ Ireg -- ] = Dreg_hi ; /* indirect, post-decrement (DAG) (a)
*/
W [ Preg ] = Dreg_hi ; /* indirect (a)*/
W [ Preg ++ Preg ] = Dreg_hi ; /* indirect, post-increment
index (a) */
1
Syntax Terminology
Dreg_hi: P7–0.H
Ireg: I3–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Store High Data Register Half instruction stores the most significant
16 bits of a 32-bit data register to a 16-bit memory location. The Pointer
register is either an I-register or a P-register.
1
See “Indirect and Post-Increment Index Addressing” on page 3-47.
The indirect address and offset must yield an even number to maintain
2-byte half-word address alignment. Failure to maintain proper alignment
causes a misaligned memory access exception.
Options
The Store High Data Register Half instruction supports the following
options.
• Post-increment the destination pointer I-register by 2 bytes.
• Post-decrement the destination pointer I-register by 2 bytes.
where:
• Src_hi is the most significant half of the source register. (Dreg_hi
in the syntax example).
• Dst_1 is the memory destination pointer register on the left side of
the syntax.
• Dst_2 is the increment index register.
Indirect and post-increment index addressing supports customized indi-
rect address cadence. The indirect, post-increment index version must
have separate P-registers for the input operands. If a common Preg is used
for the inputs, the auto-increment feature does not work.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
Example
w[ i1 ] = r3.h ;
w[ i3 ++ ] = r7.h ;
w[ i0 -- ] = r1.h ;
w[ p4 ] = r2.h ;
w[ p2 ++ p0 ] = r5.h ;
Also See
Store Low Data Register Half
Special Applications
To write consecutive, aligned 16-bit values for high-performance DSP
operations, use the Store Data Register instructions instead of these
Half-Word instructions. The Half-Word Store instructions use only half
the available 32-bit data bus bandwidth, possibly imposing a bottleneck
constriction in the data flow rate.
General Form
W [ indirect_address ] = Dreg_lo
W [ indirect_address ] = D-register
Syntax
W [ Ireg ] = Dreg_lo ; /* indirect (DAG) (a)*/
W [ Ireg ++ ] = Dreg_lo ; /* indirect, post-increment (DAG) (a)
*/
W [ Ireg -- ] = Dreg_lo ; /* indirect, post-decrement (DAG) (a)
*/
W [ Preg ] = Dreg_lo ; /* indirect (a)*/
W [ Preg ] = Dreg ; /* indirect (a)*/
W [ Preg ++ ] = Dreg ; /* indirect, post-increment (a)*/
W [ Preg -- ] = Dreg ; /* indirect, post-decrement (a)*/
W [ Preg + uimm5m2 ] = Dreg ; /* indexed with small offset (a)
*/
W [ Preg + uimm16m2 ] = Dreg ; /* indexed with large offset (b)
*/
W [ Preg - uimm16m2 ] = Dreg ; /* indexed with large offset (b)
*/
W [ Preg ++ Preg ] = Dreg_lo ; /* indirect, post-increment
index (a) */
1
Syntax Terminology
Dreg_lo: R7–0.L
Ireg: I3–0
1
See “Indirect and Post-Increment Index Addressing” on page 3-51.
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Store Low Data Register Half instruction stores the least significant
16 bits of a 32-bit data register to a 16-bit memory location. The Pointer
register is either an I-register or a P-register.
The indirect address and offset must yield an even number to maintain
2-byte half-word address alignment. Failure to maintain proper alignment
causes an misaligned memory access exception.
Options
The Store Low Data Register Half instruction supports the following
options.
• Post-increment the destination pointer by 2 bytes.
• Post-decrement the destination pointer by 2 bytes.
• Offset the source pointer with a small (5-bit), half-word-aligned
(even), unsigned constant.
• Offset the source pointer with a large (17-bit), half-word-aligned
(even), signed constant.
where:
• Src is the least significant half of the source register. (Dreg or
Dreg_lo in the syntax example).
• Dst_1 is the memory destination pointer register on the left side of
the syntax.
• Dst_2 is the increment index register.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
w [ i1 ] = r3.l ;
w [ p0 ] = r3 ;
w [ i3 ++ ] = r7.l ;
w [ i0 -- ] = r1.l ;
w [ p4 ] = r2.l ;
w [ p1 ++ ] = r7 ;
w [ sp -- ] = r2 ;
w [ p2 + 12 ] = r6 ;
w [ p4 - 0x200C ] = r0 ;
w [ p2 ++ p0 ] = r5.l ;
Also See
Store High Data Register Half, Store Data Register
Special Applications
To write consecutive, aligned 16-bit values for high-performance DSP
operations, use the Store Data Register instructions instead of these
Half-Word instructions. The Half-Word Store instructions use only half
the available 32-bit data bus bandwidth, possibly imposing a bottleneck
constriction in the data flow rate.
Store Byte
General Form
B [ indirect_address ] = D-register
Syntax
B [ Preg ] = Dreg ; /* indirect (a)*/
B [ Preg ++ ] = Dreg ; /* indirect, post-increment (a)*/
B [ Preg -- ] = Dreg ; /* indirect, post-decrement (a)*/
B [ Preg + uimm15 ] = Dreg ; /* indexed with offset (b)*/
B [ Preg - uimm15 ] = Dreg ; /* indexed with offset (b)*/
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Store Byte instruction stores the least significant 8-bit byte of a data
register to an 8-bit memory location. The Pointer register is a P-register.
The indirect address and offset have no restrictions for memory address
alignment.
Options
The Store Byte instruction supports the following options.
• Post-increment the destination pointer by 1 byte to maintain byte
alignment.
• Post-decrement the destination pointer by 1 byte to maintain byte
alignment.
• Offset the destination pointer with a 16-bit signed constant.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 32-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
b [ p0 ] = r3 ;
b [ p1 ++ ] = r7 ;
b [ sp -- ] = r2 ;
b [ p4 + 0x100F ] = r0 ;
b [ p4 - 0x53F ] = r0 ;
Also See
None
Special Applications
To write consecutive, 8-bit values for high-performance DSP operations,
use the Store Data Register instructions instead of these byte instructions.
The byte store instructions use only one fourth the available 32-bit data
bus bandwidth, possibly imposing a bottleneck constriction in the data
flow rate.
Instruction Summary
• “Move Register” on page 4-2
• “Move Conditional” on page 4-8
• “Move Half to Full Word – Zero-Extended” on page 4-10
• “Move Half to Full Word – Sign-Extended” on page 4-13
• “Move Register Half” on page 4-15
• “Move Byte – Zero-Extended” on page 4-23
• “Move Byte – Sign-Extended” on page 4-25
Instruction Overview
This chapter discusses the move instructions. Users can take advantage of
these instructions to move registers (or register halves), move half words
(zero or sign extended), move bytes, and perform conditional moves.
Move Register
General Form
dest_reg = src_reg
Syntax
genreg = genreg ; /* (a) */
genreg = dagreg ; /* (a) */
dagreg = genreg ; /* (a) */
dagreg = dagreg ; /* (a) */
genreg = USP ; /* (a)*/
USP = genreg ; /* (a)*/
Dreg = sysreg ; /* sysreg to 32-bit D-register (a) */
sysreg = Dreg ; /* 32-bit D-register to sysreg (a) */
sysreg = Preg ; /* 32-bit P-register to sysreg (a) */
sysreg = USP ; /* (a) */
A0 = A1 ; /* move 40-bit Accumulator value (b) */
A1 = A0 ; /* move 40-bit Accumulator value (b) */
A0 = Dreg ; /* 32-bit D-register to 40-bit A0, sign extended
(b)*/
A1 = Dreg ; /* 32-bit D-register to 40-bit A1, sign extended
(b)*/
Syntax Terminology
genreg: R7–0, P5–0, SP, FP, A0.X, A0.W, A1.X, A1.W
sysreg: ASTAT, SEQSTAT, SYSCFG, RETI, RETX, RETN, RETE, RETS, LC0 and
LC1, LT0 and LT1, LB0 and LB1, CYCLES, CYCLES2, and EMUDAT
USP: The User Stack Pointer Register
Dreg: R7–0
L When combining
and
Dreg_even
two moves in the same instruction, the
operands must be members of the same
Dreg_odd
register pair, e.g. from the set R1:0, R3:2, R5:4, R7:6.
opt_mode: Optionally (FU) or (ISS2)
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Move Register instruction copies the contents of the source register
into the destination register. The operation does not affect the source reg-
ister contents.
All moves from smaller to larger registers are sign extended.
Options
The Accumulator to Data Register Move instruction supports the options
listed in the table below.
Default Signed fraction. Copy Accumulator 9.31 format to register 1.31 format. Saturate
results between minimum -1 and maximum 1-2-31.
Signed integer. Copy Accumulator 40.0 format to register 32.0 format. Saturate
results between minimum -231 and maximum 231-1.
In either case, the resulting hexadecimal range is minimum 0x8000 0000 through
maximum 0x7FFF FFFF.
The Accumulator is unaffected by extraction.
(FU) Unsigned fraction. Copy Accumulator 8.32 format to register 0.32 format. Saturate
results between minimum 0 and maximum 1-2-32.
Unsigned integer. Copy Accumulator 40.0 format to register 32.0 format. Saturate
results between minimum 0 and maximum 232-1.
In either case, the resulting hexadecimal range is minimum 0x0000 0000 through
maximum 0xFFFF FFFF.
The Accumulator is unaffected by extraction.
(ISS2) Signed fraction with scaling. Shift the Accumulator contents one place to the left
(multiply x 2). Saturate result to 1.31 format. Copy to destination register. Results
range between minimum -1 and maximum 1-2-31.
Signed integer with scaling. Shift the Accumulator contents one place to the left
(multiply x 2). Saturate result to 32.0 format. Copy to destination register. Results
range between minimum -1 and maximum 231-1.
In either case, the resulting hexadecimal range is minimum 0x8000 0000 through
maximum 0x7FFF FFFF.
The Accumulator is unaffected by extraction.
Flags Affected
The ASTAT register that contains the flags can be explicitly modified by
this instruction.
The Accumulator to D-register Move versions of this instruction affect the
following flags.
• V is set if the result written to the D-register file saturates 32 bits;
cleared if no saturation. In the case of two simultaneous operations,
V represents the logical “OR” of the two.
Required Mode
User & Supervisor for most cases.
Explicit accesses to USP, SEQSTAT, SYSCFG, RETI, RETX, RETN and RETE
require Supervisor mode. If any of these registers are explicitly accessed
from User mode, an Illegal Use of Protected Resource exception occurs.
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
The 16-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r3 = r0 ;
r7 = p2 ;
r2 = a0 ;
a0 = a1 ;
a1 = a0 ;
a0 = r7 ; /* move R7 to 32-bit A0.W */
a1 = r3 ; /* move R3 to 32-bit A1.W */
retn = p0 ; /* must be in Supervisor mode */
r2 = a0 ; /* 32-bit move with saturation */
r7 = a1 ; /* 32-bit move with saturation */
r0 = a0 (iss2) ; /* 32-bit move with scaling, truncation and
saturation */
Also See
Load Immediate to initialize registers.
Move Register Half to move values explicitly into the A0.X and A1.X
registers.
LSETUP, LOOP to implicitly access registers LC0, LT0, LB0, LC1, LT1 and
LB1.
Call, RAISE (Force Interrupt / Reset) and RTS, RTI, RTX, RTN, RTE
(Return) to implicitly access registers RETI, RETN, and RETS.
Special Applications
None
Move Conditional
General Form
IF CC dest_reg = src_reg
IF ! CC dest_reg = src_reg
Syntax
IF CC DPreg = DPreg ; /* move if CC = 1 (a) */
IF ! CC DPreg = DPreg ; /* move if CC = 0 (a) */
Syntax Terminology
DPreg: R7–0, P5–0, SP, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Move Conditional instruction moves source register contents into a
destination register, depending on the value of CC.
IF CC DPreg = DPreg, the move occurs only if CC = 1.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The Move Conditional instruction cannot be issued in parallel with other
instructions.
Example
if cc r3 = r0 ; /* move if CC=1 */
if cc r2 = p4 ;
if cc p0 = r7 ;
if cc p2 = p5 ;
if ! cc r3 = r0 ; /* move if CC=0 */
if ! cc r2 = p4 ;
if ! cc p0 = r7 ;
if ! cc p2 = p5 ;
Also See
Compare Accumulator, Move CC, Negate CC, IF CC JUMP
Special Applications
None
General Form
dest_reg = src_reg (Z)
Syntax
Dreg = Dreg_lo (Z) ; /* (a) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Move Half to Full Word – Zero-Extended instruction converts an
unsigned half word (16 bits) to an unsigned word (32 bits).
The instruction copies the least significant 16 bits from a source register
into the lower half of a 32-bit register and zero-extends the upper half of
the destination register. The operation supports only D-registers. Zero
extension is appropriate for unsigned values. If used with signed values, a
small negative 16-bit value will become a large positive value.
Flags Affected
The following flags are affected by the Move Half to Full Word –
Zero-Extended instruction.
• AZ is set if result is zero; cleared if nonzero.
• AN is cleared.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
/* If r0.l = 0xFFFF */
r4 = r0.l (z) ; /* Equivalent to r4.l = r0.l and r4.h = 0 */
/* . . . then r4 = 0x0000FFFF */
Also See
Move Half to Full Word – Sign-Extended, Move Register Half
Special Applications
None
General Form
dest_reg = src_reg (X)
Syntax
Dreg = Dreg_lo (X) ; /* (a)*/
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Move Half to Full Word – Sign-Extended instruction converts a
signed half word (16 bits) to a signed word (32 bits). The instruction cop-
ies the least significant 16 bits from a source register into the lower half of
a 32-bit register and sign-extends the upper half of the destination regis-
ter. The operation supports only D-registers.
Flags Affected
The following flags are affected by the Move Half to Full Word –
Sign-Extended instruction.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with any other instructions.
Example
r4 = r0.l(x) ;
r4 = r0.l ;
Also See
Move Half to Full Word – Zero-Extended, Move Register Half
Special Applications
None
General Form
dest_reg_half = src_reg_half
dest_reg_half = accumulator (opt_mode)
Syntax
A0.X = Dreg_lo ; /* least significant 8 bits of Dreg into A0.X
1
(b) */
A1.X = Dreg_lo ; /* least significant 8 bits of Dreg into A1.X
(b) */
Dreg_lo = A0.X ; /* 8-bit A0.X, sign-extended, into least sig-
nificant 16 bits of Dreg (b) */
Dreg_lo = A1.X ; /* 8-bit A1.X, sign-extended, into least sig-
nificant 16 bits of Dreg (b) */
A0.L = Dreg_lo ; /* least significant 16 bits of Dreg into
least significant 16 bits of A0.W (b) */
A1.L = Dreg_lo ; /* least significant 16 bits of Dreg into
least significant 16 bits of A1.W (b) */
A0.H = Dreg_hi ; /* most significant 16 bits of Dreg into most
significant 16 bits of A0.W (b) */
A1.H = Dreg_hi ; /* most significant 16 bits of Dreg into most
significant 16 bits of A1.W (b) */
1
The Accumulator Extension registers A0.X and A1.X are defined only for the 8 low-order bits 7
through 0 of A0.X and A1.X. This instruction truncates the upper byte of Dreg_lo before moving the
value into the Accumulator Extension register (A0.X or A1.X).
Syntax Terminology
Dreg_lo: R7–0.L
Dreg_hi: R7–0.H
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Move Register Half instruction copies 16 bits from a source register
into half of a 32-bit register. The instruction does not affect the unspeci-
fied half of the destination register. It supports only D-registers and the
Accumulator.
One version of the instruction simply copies the 16 bits (saturated at 16
bits) of the Accumulator into a data half-register. This syntax supports
truncation and rounding beyond a simple Move Register Half instruction.
The fraction version of this instruction (the default option) transfers the
Accumulator result to the destination register according to the diagrams in
Figure 4-1. Accumulator A0.H contents transfer to the lower half of the
destination D-register. A1.H contents transfer to the upper half of the des-
tination D-register.
The integer version of this instruction (the (IS) option) transfers the
Accumulator result to the destination register according to the diagrams,
shown in Figure 4-2. Accumulator A0.L contents transfer to the lower half
of the destination D-register. A1.L contents transfer to the upper half of
the destination D-register.
Some versions of this instruction are affected by the RND_MOD bit in the
ASTAT register when they copy the results into the destination register.
RND_MOD determines whether biased or unbiased rounding is used. RND_MOD
controls rounding for all versions of this instruction except the (IS) and
(ISS2) options.
Options
The Accumulator to Half D-Register Move instructions support the copy
options in Table 4-2.
Default Signed fraction format. Round Accumulator 9.31 format value at bit 16.
(RND_MOD bit in the ASTAT register controls the rounding.) Saturate the
result to 1.15 precision and copy it to the destination register half. Result is
between minimum -1 and maximum 1-2-15 (or, expressed in hex, between mini-
mum 0x8000 and maximum 0x7FFF).
The Accumulator is unaffected by extraction.
(FU) Unsigned fraction format. Round Accumulator 8.32 format value at bit 16.
(RND_MOD bit in the ASTAT register controls the rounding.) Saturate the
result to 0.16 precision and copy it to the destination register half. Result is
between minimum 0 and maximum 1-2-16 (or, expressed in hex, between mini-
mum 0x0000 and maximum 0xFFFF).
The Accumulator is unaffected by extraction.
(IS) Signed integer format. Extract the lower 16 bits of the Accumulator. Saturate for
16.0 precision and copy to the destination register half. Result is between mini-
mum -215 and maximum 215-1 (or, expressed in hex, between minimum 0x8000
and maximum 0x7FFF).
The Accumulator is unaffected by extraction.
(IU) Unsigned integer format. Extract the lower 16 bits of the Accumulator. Saturate
for 16.0 precision and copy to the destination register half. Result is between
minimum 0 and maximum 216-1 (or, expressed in hex, between minimum
0x0000 and maximum 0xFFFF).
The Accumulator is unaffected by extraction.
(T) Signed fraction with truncation. Truncate Accumulator 9.31 format value at bit
16. (Perform no rounding.) Saturate the result to 1.15 precision and copy it to
the destination register half. Result is between minimum -1 and maximum
1-2-15 (or, expressed in hex, between minimum 0x8000 and maximum 0x7FFF).
The Accumulator is unaffected by extraction.
(S2RND) Signed fraction with scaling and rounding. Shift the Accumulator contents one
place to the left (multiply x 2). Round Accumulator 9.31 format value at bit 16.
(RND_MOD bit in the ASTAT register controls the rounding.) Saturate the
result to 1.15 precision and copy it to the destination register half. Result is
between minimum -1 and maximum 1-2-15 (or, expressed in hex, between mini-
mum 0x8000 and maximum 0x7FFF).
The Accumulator is unaffected by extraction.
(ISS2) Signed integer with scaling. Extract the lower 16 bits of the Accumulator. Shift
them one place to the left (multiply x 2). Saturate the result for 16.0 format and
copy to the destination register half. Result is between minimum -215 and maxi-
mum 215-1 (or, expressed in hex, between minimum 0x8000 and maximum
0x7FFF).
The Accumulator is unaffected by extraction.
(IH) Signed integer, high word extract. Round Accumulator 40.0 format value at bit
16. (RND_MOD bit in the ASTAT register controls the rounding.) Saturate to
32.0 result. Copy the upper 16 bits of that value to the destination register half.
Result is between minimum -215 and maximum 215-1 (or, expressed in hex,
between minimum 0x8000 and maximum 0x7FFF).
The Accumulator is unaffected by extraction.
To truncate the result, the operation eliminates the least significant bits
that do not fit into the destination register.
When necessary, saturation is performed after the rounding.
See “Saturation” on page 1-11 for a description of saturation behavior.
Flags Affected
The Accumulator to Half D-register Move versions of this instruction
affect the following flags.
• V is set if the result written to the half D-register file saturates 16
bits; cleared if no saturation.
• VS is set if V is set; unaffected otherwise.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For more information, see “Issuing Parallel
Instructions” on page 15-1.
Example
a0.x = r1.l ;
a1.x = r4.l ;
r7.l = a0.x ;
r0.l = a1.x ;
a0.l = r2.l ;
a1.l = r1.l ;
a0.l = r5.l ;
a1.l = r3.l ;
a0.h = r7.h ;
a1.h = r0.h ;
r7.l = a0 ; /* copy A0.H into R7.L with saturation. */
r2.h = a1 ; /* copy A0.H into R2.H with saturation. */
Also See
Move Half to Full Word – Zero-Extended, Move Half to Full Word –
Sign-Extended
Special Applications
None
General Form
dest_reg = src_reg_byte (Z)
Syntax
Dreg = Dreg_byte (Z) ; /* (a)*/
Syntax Terminology
Dreg_byte: R7–0.B, the low-order 8 bits of each Data Register
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Move Byte – Zero-Extended instruction converts an unsigned byte to
an unsigned word (32 bits). The instruction copies the least significant 8
bits from a source register into the least significant 8 bits of a 32-bit regis-
ter. The instruction zero-extends the upper bits of the destination register.
This instruction supports only D-registers.
Flags Affected
The following flags are affected by the Move Byte – Zero-Extended
instruction.
• AZ is set if result is zero; cleared if nonzero.
• AN is cleared.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with any other instructions.
Example
r7 = r2.b (z) ;
Also See
Move Register Half to explicitly access the Accumulator Extension regis-
ters A0.X and A1.X.
Move Byte – Sign-Extended
Special Applications
None
General Form
dest_reg = src_reg_byte (X)
Syntax
Dreg = Dreg_byte (X) ; /* (a) */
Syntax Terminology
Dreg_byte: R7–0.B, the low-order 8 bits of each Data Register
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Move Byte – Sign-Extended instruction converts a signed byte to a
signed word (32 bits). It copies the least significant 8 bits from a source
register into the least significant 8 bits of a 32-bit register. The instruction
sign-extends the upper bits of the destination register. This instruction
supports only D-registers.
Flags Affected
The following flags are affected by the Move Byte – Sign-Extended
instruction.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with any other instructions.
Example
r7 = r2.b ;
r7 = r2.b(x) ;
Also See
Move Byte – Zero-Extended
Special Applications
None
Instruction Summary
• “--SP (Push)” on page 5-2
• “--SP (Push Multiple)” on page 5-5
• “SP++ (Pop)” on page 5-8
• “SP++ (Pop Multiple)” on page 5-12
• “LINK, UNLINK” on page 5-17
Instruction Overview
This chapter discusses the instructions that control the stack. Users can
take advantage of these instructions to save the contents of single or multi-
ple registers to the stack or to control the stack frame space on the stack
and the Frame Pointer (FP) for that space.
--SP (Push)
General Form
[ -- SP ] = src_reg
Syntax
[ -- SP ] = allreg ; /* predecrement SP (a) */
Syntax Terminology
allreg: R7–0, P5–0, FP, I3–0, M3–0, B3–0, L3–0, A0.X, A0.W, A1.X, A1.W,
ASTAT, RETS, RETI, RETX, RETN, RETE, LC0, LC1, LT0, LT1, LB0, LB1, CYCLES,
CYCLES2, EMUDAT, USP, SEQSTAT, and SYSCFG
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Push instruction stores the contents of a specified register in the
stack. The instruction pre-decrements the Stack Pointer to the next avail-
able location in the stack first. Push and Push Multiple are the only
instructions that perform pre-modify functions.
The stack grows down from high memory to low memory. Consequently,
the decrement operation is used for pushing, and the increment operation
is used for popping values. The Stack Pointer always points to the last
used location. Therefore, the effective address of the push is SP–4.
The following illustration shows what the stack would look like when a
series of pushes occur.
higher memory
P5 [--sp]=p5 ;
P1 [--sp]=p1 ;
R3 <-------- SP [--sp]=r3 ;
...
lower memory
The Stack Pointer must already be 32-bit aligned to use this instruction. If
an unaligned memory access occurs, an exception is generated and the
instruction aborts.
Push/pop on RETS has no effect on the interrupt system.
Push/pop on RETI does affect the interrupt system.
Pushing RETI enables the interrupt system, whereas popping RETI disables
the interrupt system.
Pushing the Stack Pointer is meaningless since it cannot be retrieved from
the stack. Using the Stack Pointer as the destination of a pop instruction
(as in the fictional instruction SP=[SP++]) causes an undefined instruction
exception. (Refer to “Register Names” on page 1-6 for more information.)
Flags Affected
None
Required Mode
User & Supervisor for most cases.
Explicit accesses to USP, SEQSTAT, SYSCFG, RETI, RETX, RETN, and RETE
requires Supervisor mode. A protection violation exception results if any
of these registers are explicitly accessed from User mode.
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
[ -- sp ] = r0 ;
[ -- sp ] = r1 ;
[ -- sp ] = p0 ;
[ -- sp ] = i0 ;
Also See
--SP (Push Multiple), SP++ (Pop)
Special Applications
None
General Form
[ -- SP ] = (src_reg_range)
Syntax
[ -- SP ] = ( R7 : Dreglim , P5 : Preglim ) ; /* Dregs and
indexed Pregs (a) */
[ -- SP ] = ( R7 : Dreglim ) ; /* Dregs, only (a) */
[ -- SP ] = ( P5 : Preglim ) ; /* indexed Pregs, only (a) */
Syntax Terminology
Dreglim: any number in the range 7 through 0
Preglim: any number in the range 5 through 0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Push Multiple instruction saves the contents of multiple data and/or
Pointer registers to the stack. The range of registers to be saved always
includes the highest index register (R7 and/or P5) plus any contiguous
lower index registers specified by the user down to and including R0
and/or P0. Push and Push Multiple are the only instructions that perform
pre-modify functions.
The instructions start by saving the register having the lowest index then
advance to the register with the highest index. The index of the first regis-
ter saved in the stack is specified by the user in the instruction syntax.
Data registers are pushed before Pointer registers if both are specified in
one instruction.
lower memory
Because the lowest-indexed registers are saved first, it is advisable that a
runtime system be defined to have its compiler scratch registers as the low-
est-indexed registers. For instance, data registers R0, P0 would be the
return value registers for a simple calling convention.
Although this instruction takes a variable amount of time to complete
depending on the number of registers to be saved, it reduces compiled
code size.
This instruction is not interruptible. Interrupts asserted after the first
issued stack write operation are appended until all the writes complete.
However, exceptions that occur while this instruction is executing cause it
to abort gracefully. For example, a load/store operation might cause a pro-
tection violation while Push Multiple is executing. The SP is reset to its
value before the execution of this instruction. This measure ensures that
the instruction can be restarted after the exception. Note that when a Push
Multiple operation is aborted due to an exception, the memory state is
changed by the stores that have already completed before the exception.
The Stack Pointer must already be 32-bit aligned to use this instruction. If
an unaligned memory access occurs, an exception is generated and the
instruction aborts, as described above.
Only pointer registers P5–0 can be operands for this instruction; SP and FP
cannot. All data registers R7–0 can be operands for this instruction.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
[ -- sp ] = (r7:5, p5:0) ; /* D-registers R4:0 excluded */
[ -- sp ] = (r7:2) ; /* R1:0 excluded */
[ -- sp ] = (p5:4) ; /* P3:0 excluded */
Also See
--SP (Push), SP++ (Pop), SP++ (Pop Multiple)
Special Applications
None
SP++ (Pop)
General Form
dest_reg = [ SP ++ ]
Syntax
mostreg = [ SP ++ ] ; /* post-increment SP; does not apply to
Data Registers and Pointer Registers (a) */
Dreg = [ SP ++ ] ; /* Load Data Register instruction (repeated
here for user convenience) (a) */
Preg = [ SP ++ ] ; /* Load Pointer Register instruction
(repeated here for user convenience) (a) */
Syntax Terminology
mostreg: I3–0, M3–0, B3–0, L3–0, A0.X, A0.W, A1.X, A1.W, ASTAT, RETS,
RETI, RETX, RETN, RETE, LC0, LC1, LT0, LT1, LB0, LB1, USP, SEQSTAT, and
SYSCFG
Dreg: R7–0
Preg: P5–0, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Pop instruction loads the contents of the stack indexed by the current
Stack Pointer into a specified register. The instruction post-increments
the Stack Pointer to the next occupied location in the stack before
concluding.
The stack grows down from high memory to low memory, therefore the
decrement operation is used for pushing, and the increment operation is
used for popping values. The Stack Pointer always points to the last used
location. When a pop operation is issued, the value pointed to by the
Stack Pointer is transferred and the SP is replaced by SP+4.
The illustration below shows what the stack would look like when a pop
such as R3 = [ SP ++ ] occurs.
higher memory
Word0
Word1 BEGINNING STATE
Word2 <------- SP
...
lower memory
higher memory
Word0
Word1 LOAD REGISTER R3 FROM STACK
Word2 <------ SP ========> R3 = Word2
...
lower memory
higher memory
Word0 POST-INCREMENT STACK POINTER
Word1 <------ SP
Word2
...
lower memory
The value just popped remains on the stack until another push instruction
overwrites it.
Of course, the usual intent for Pop and these specific Load Register
instructions is to recover register values that were previously pushed onto
the stack. The user must exercise programming discipline to restore the
stack values back to their intended registers from the first-in, last-out
structure of the stack. Pop or load exactly the same registers that were
pushed onto the stack, but pop them in the opposite order.
The Stack Pointer must already be 32-bit aligned to use this instruction. If
an unaligned memory access occurs, an exception is generated and the
instruction aborts.
A value cannot be popped off the stack directly into the Stack Pointer.
SP = [SP ++] is an invalid instruction. Refer to “Register Names” on
page 1-6 for more information.
Flags Affected
The ASTAT = [SP++] version of this instruction explicitly affects arith-
metic flags.
Flags are not affected by other versions of this instruction.
Required Mode
User & Supervisor for most cases
Explicit access to USP, SEQSTAT, SYSCFG, RETI, RETX, RETN, and RETE
requires Supervisor mode. A protection violation exception results if any
of these registers are explicitly accessed from User mode.
Parallel Issue
The 16-bit versions of the Load Data Register and Load Pointer Register
instructions can be issued in parallel with specific other instructions. For
details, see “Issuing Parallel Instructions” on page 15-1.
The Pop instruction cannot be issued in parallel with other instructions.
Example
r0 = [sp++] ; /* Load Data Register instruction */
p4 = [sp++] ; /* Load Pointer Register instruction */
i1 = [sp++] ; /* Pop instruction */
reti = [sp++] ; /* Pop instruction; supervisor mode required */
Also See
Load Pointer Register, Load Data Register, --SP (Push), --SP (Push Multi-
ple), SP++ (Pop Multiple)
Special Applications
None
General Form
(dest_reg_range) = [ SP ++ ]
Syntax
( R7 : Dreglim, P5 : Preglim ) = [ SP ++ ] ; /* Dregs and
indexed Pregs (a) */
( R7 : Dreglim ) = [ SP ++ ] ; /* Dregs, only (a) */
( P5 : Preglim ) = [ SP ++ ] ; /* indexed Pregs, only (a) */
Syntax Terminology
Dreglim: any number in the range 7 through 0
Preglim: any number in the range 5 through 0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Pop Multiple instruction restores the contents of multiple data
and/or Pointer registers from the stack. The range of registers to be
restored always includes the highest index register (R7 and/or P5) plus any
contiguous lower index registers specified by the user down to and includ-
ing R0 and/or P0.
The instructions start by restoring the register having the highest index
then descend to the register with the lowest index. The index of the last
register restored from the stack is specified by the user in the instruction
syntax. Pointer registers are popped before Data registers, if both are spec-
ified in the same instruction.
lower memory
higher memory
R3
R4
R6 LOAD REGISTER R7 FROM STACK
R7 <------ SP ========> R7 = Word3
...
lower memory
higher memory
R4
R5 LOAD REGISTER R6 FROM STACK
R6 <------ SP ========> R6 = Word2
R7
...
lower memory
higher memory.
..
R5 LOAD REGISTER R5 FROM STACK
R6 <------ SP ========> R5 = Word1
R7
..
lower memory
higher memory
..
... POST-INCREMENT STACK POINTER
Word0 <------ SP
Word1
Word2
lower memory
The value(s) just popped remain on the stack until another push instruc-
tion overwrites it.
Of course, the usual intent for Pop Multiple is to recover register values
that were previously pushed onto the stack. The user must exercise pro-
gramming discipline to restore the stack values back to their intended
registers from the first-in, last-out structure of the stack. Pop exactly the
same registers that were pushed onto the stack, but pop them in the oppo-
site order.
Although this instruction takes a variable amount of time to complete
depending on the number of registers to be saved, it reduces compiled
code size.
This instruction is not interruptible. Interrupts asserted after the first
issued stack read operation are appended until all the reads complete.
However, exceptions that occur while this instruction is executing cause it
to abort gracefully. For example, a load/store operation might cause a pro-
tection violation while Pop Multiple is executing. In that case, SP is reset
to its original value prior to the execution of this instruction. This mea-
sure ensures that the instruction can be restarted after the exception.
Note that when a Pop Multiple operation aborts due to an exception,
some of the destination registers are changed as a result of loads that have
already completed before the exception.
The Stack Pointer must already be 32-bit aligned to use this instruction. If
an unaligned memory access occurs, an exception is generated and the
instruction aborts, as described above.
Only Pointer registers P5–0 can be operands for this instruction; SP and FP
cannot. All data registers R7–0 can be operands for this instruction.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
(p5:4) = [ sp ++ ] ; /* P3 through P0 excluded */
(r7:2) = [ sp ++ ] ; /* R1 through R0 excluded */
(r7:5, p5:0) = [ sp ++ ] ; /* D-registers R4 through R0
optionally excluded */
Also See
--SP (Push), --SP (Push Multiple), SP++ (Pop)
Special Applications
None
LINK, UNLINK
General Form
LINK, UNLINK
Syntax
LINK uimm18m4 ; /* allocate a stack frame of specified size
(b) */
UNLINK ; /* de-allocate the stack frame (b)*/
Syntax Terminology
uimm18m4: 18-bit unsigned field that must be a multiple of 4, with a range
of 8 through 262,152 bytes (0x00008 through 0x3FFFC)
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Linkage instruction controls the stack frame space on the stack and
the Frame Pointer (FP) for that space. LINK allocates the space and UNLINK
de-allocates the space.
LINK saves the current RETS and FP registers to the stack, loads the FP regis-
ter with the new frame address, then decrements the SP by the
user-supplied frame size value.
Typical applications follow the LINK instruction with a Push Multiple
instruction to save pointer and data registers to the stack.
The user-supplied argument for LINK determines the size of the allocated
stack frame. LINK always saves RETS and FP on the stack, so the minimum
frame size is 2 words when the argument is zero. The maximum stack
frame size is 218 + 8 = 262152 bytes in 4-byte increments.
UNLINK performs the reciprocal of LINK, de-allocating the frame space by
moving the current value of FP into SP and restoring previous values into
FP and RETS from the stack.
higher memory
...
... AFTER LINK EXECUTES
Saved RETS
Prior FP <-FP
Allocated
words for local
<-SP = FP +– frame_size
subroutine
variables
...
lower memory
higher memory
...
...
AFTER A PUSH
Saved RETS
MULTIPLE EXECUTES
Prior FP <-FP
Allocated
words for local
subroutine
variables
R0
R1
:
R7
P0
:
P5 <-SP
lower memory
The Stack Pointer must already be 32-bit aligned to use this instruction. If
an unaligned memory access occurs, an exception is generated and the
instruction aborts, as described above.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
link 8 ; /* establish frame with 8 words allocated for local
variables */
[ -- sp ] = (r7:0, p5:0) ; /* save D- and P-registers */
(r7:0, p5:0) = [ sp ++ ] ; /* restore D- and P-registers */
unlink ; /* close the frame* /
Also See
--SP (Push Multiple) SP++ (Pop Multiple)
Special Applications
The Linkage instruction is used to set up and tear down stack frames for a
high-level language like C.
Instruction Summary
• “Compare Data Register” on page 6-2
• “Compare Pointer” on page 6-6
• “Compare Accumulator” on page 6-9
• “Move CC” on page 6-12
• “Negate CC” on page 6-15
Instruction Overview
This chapter discusses the instructions that affect the Control Code (CC)
bit in the ASTAT register. Users can take advantage of these instructions to
set the CC bit based on a comparison of values from two registers, pointers,
or accumulators. In addition, these instructions can move the status of the
CC bit to and from a data register or arithmetic status bit, or they can
negate the status of the CC bit.
General Form
CC = operand_1 == operand_2
CC = operand_1 < operand_2
CC = operand_1 <= operand_2
CC = operand_1 < operand_2 (IU)
CC = operand_1 <= operand_2 (IU)
Syntax
CC = Dreg == Dreg ; /* equal, register, signed (a) */
CC = Dreg == imm3 ; /* equal, immediate, signed (a) */
CC = Dreg < Dreg ; /* less than, register, signed (a) */
CC = Dreg < imm3 ; /* less than, immediate, signed (a) */
CC = Dreg <= Dreg ; /* less than or equal, register, signed
(a) */
CC = Dreg <= imm3 ; /* less than or equal, immediate, signed
(a) */
CC = Dreg < Dreg (IU) ; /* less than, register, unsigned
(a) */
CC = Dreg < uimm3 (IU) ; /* less than, immediate, unsigned (a)
*/
CC = Dreg <= Dreg (IU) ; /* less than or equal, register,
unsigned (a) */
CC = Dreg <= uimm3 (IU) ; /* less than or equal, immediate
unsigned (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Compare Data Register instruction sets the Control Code (CC) bit
based on a comparison of two values. The input operands are D-registers.
The compare operations are nondestructive on the input operands and
affect only the CC bit and the flags. The value of the CC bit determines all
subsequent conditional branching.
The various forms of the Compare Data Register instruction perform
32-bit signed compare operations on the input operands or an unsigned
compare operation, if the (IU) optional mode is appended. The compare
operations perform a subtraction and discard the result of the subtraction
without affecting user registers. The compare operation that you specify
determines the value of the CC bit.
Flags Affected
The Compare Data Register instruction uses the values shown in
Table 6-1 in signed and unsigned compare operations.
The following flags are affected by the Compare Data Register instruction.
• CC is set if the test condition is true; cleared if false.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is set if result generated a carry; cleared if no carry.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
cc = r3 == r2 ;
cc = r7 == 1 ;
/* If r0 = 0x8FFF FFFF and r3 = 0x0000 0001, then the signed
operation . . . */
cc = r0 < r3 ;
/* . . . produces cc = 1, because r0 is treated as a negative
value */
cc = r2 < -4 ;
cc = r6 <= r1 ;
cc = r4 <= 3 ;
Also See
Compare Pointer, Compare Accumulator, IF CC JUMP, BITTST
Special Applications
None
Compare Pointer
General Form
CC = operand_1 == operand_2
CC = operand_1 < operand_2
CC = operand_1 <= operand_2
CC = operand_1 < operand_2 (IU)
CC = operand_1 <= operand_2 (IU)
Syntax
CC = Preg == Preg ; /* equal, register, signed (a) */
CC = Preg == imm3 ; /* equal, immediate, signed (a) */
CC = Preg < Preg ; /* less than, register, signed (a) */
CC = Preg < imm3 ; /* less than, immediate, signed (a) */
CC = Preg <= Preg ; /* less than or equal, register, signed
(a) */
CC = Preg <= imm3 ; /* less than or equal, immediate, signed
(a) */
CC = Preg < Preg (IU) ; /* less than, register, unsigned (a) */
CC = Preg < uimm3 (IU) ; /* less than, immediate, unsigned (a) */
CC = Preg <= Preg (IU) ; /* less than or equal, register,
unsigned (a) */
CC = Preg <= uimm3 (IU) ; /* less than or equal, immediate
unsigned (a) */
Syntax Terminology
Preg: P5–0, SP, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Compare Pointer instruction sets the Control Code (CC) bit based on
a comparison of two values. The input operands are P-registers.
The compare operations are nondestructive on the input operands and
affect only the CC bit and the flags. The value of the CC bit determines all
subsequent conditional branching.
The various forms of the Compare Pointer instruction perform 32-bit
signed compare operations on the input operands or an unsigned compare
operation, if the (IU) optional mode is appended. The compare opera-
tions perform a subtraction and discard the result of the subtraction
without affecting user registers. The compare operation that you specify
determines the value of the CC bit.
Flags Affected
• CC is set if the test condition is true; cleared if false.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
cc = p3 == p2 ;
cc = p0 == 1 ;
cc = p0 < p3 ;
cc = p2 < -4 ;
cc = p1 <= p0 ;
cc = p4 <= 3 ;
cc = p5 < p3 (iu) ;
cc = p1 < 0x7 (iu) ;
cc = p2 <= p0 (iu) ;
cc = p3 <= 2 (iu) ;
Also See
Compare Data Register, Compare Accumulator, IF CC JUMP
Special Applications
None
Compare Accumulator
General Form
CC = A0 == A1
CC = A0 < A1
CC = A0 <= A1
Syntax
CC = A0 == A1 ; /* equal, signed (a) */
CC = A0 < A1 ; /* less than, Accumulator, signed (a) */
CC = A0 <= A1 ; /* less than or equal, Accumulator, signed (a) */
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Compare Accumulator instruction sets the Control Code (CC) bit
based on a comparison of two values. The input operands are
Accumulators.
These instructions perform 40-bit signed compare operations on the
Accumulators. The compare operations perform a subtraction and discard
the result of the subtraction without affecting user registers. The compare
operation that you specify determines the value of the CC bit.
No unsigned compare operations or immediate compare operations are
performed for the Accumulators.
The compare operations are nondestructive on the input operands, and
affect only the CC bit and the flags. All subsequent conditional branching
is based on the value of the CC bit.
Flags Affected
The Compare Accumulator instruction uses the values shown in Table 6-2
in compare operations.
Equal AZ=1
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
cc = a0 == a1 ;
cc = a0 < a1 ;
cc = a0 <= a1 ;
Also See
Compare Pointer, Compare Data Register, IF CC JUMP
Special Applications
None
Move CC
General Form
dest = CC
dest |= CC
dest &= CC
dest ^= CC
CC = source
CC |= source
CC &= source
CC ^= source
Syntax
Dreg = CC ; /* CC into 32-bit data register, zero-extended (a)
*/
statbit = CC ; /* status bit equals CC (a) */
statbit |= CC ; /* status bit equals status bit OR CC (a) */
statbit &= CC ; /* status bit equals status bit AND CC (a) */
statbit ^= CC ; /* status bit equals status bit XOR CC (a) */
CC = Dreg ; /* CC set if the register is non-zero (a) */
CC = statbit ; /* CC equals status bit (a) */
CC |= statbit ; /* CC equals CC OR status bit (a) */
CC &= statbit ; /* CC equals CC AND status bit (a) */
CC ^= statbit ; /* CC equals CC XOR status bit (a) */
Syntax Terminology
Dreg: R7–0
statbit: AZ, AN, AC0, AC1, V, VS, AV0, AV0S, AV1, AV1S, AQ
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Move CC instruction moves the status of the Control Code (CC) bit to
and from a data register or arithmetic status bit.
When copying the CC bit into a 32-bit register, the operation moves the CC
bit into the least significant bit of the register, zero-extended to 32 bits.
The two cases are as follows.
• If CC = 0, Dreg becomes 0x00000000.
• If CC = 1, Dreg becomes 0x00000001.
When copying a data register to the CC bit, the operation sets the CC bit to
1 if any bit in the source data register is set; that is, if the register is non-
zero. Otherwise, the operation clears the CC bit.
Some versions of this instruction logically set or clear an arithmetic status
bit based on the status of the Control Code.
The use of the CC bit as source and destination in the same instruction is
disallowed. See the Negate CC instruction to change CC based solely on its
own value.
Flags Affected
• The Move CC instruction affects flags CC, AZ, AN, AC0, AC1, V, VS,
AV0, AV0S, AV1, AV1S, AQ, according to the status bit and syntax
used, as described in “Syntax” on page 6-12.
• All other flags not explicitly specified by the syntax are unaffected.
Required Mode
User & Supervisor
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
r0 = cc ;
az = cc ;
an |= cc ;
ac0 &= cc ;
av0 ^= cc ;
cc = r4 ;
cc = av1 ;
cc |= aq ;
cc &= an ;
cc ^= ac1 ;
Also See
Negate CC
Special Applications
None
Negate CC
General Form
CC = ! CC
Syntax
CC = ! CC ; /* (a) */
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Negate CC instruction inverts the logical state of CC.
Flags Affected
• CC is toggled from its previous value by the Negate CC instruction.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
cc =! cc ;
Also See
Move CC
Special Applications
None
Instruction Summary
• “& (AND)” on page 7-2
• “~ (NOT One’s Complement)” on page 7-4
• “| (OR)” on page 7-6
• “^ (Exclusive-OR)” on page 7-8
• “BXORSHIFT, BXOR” on page 7-10
Instruction Overview
This chapter discusses the instructions that specify logical operations.
Users can take advantage of these instructions to perform logical AND,
NOT, OR, exclusive-OR, and bit-wise exclusive-OR (BXORSHIFT)
operations.
& (AND)
General Form
dest_reg = src_reg_0 & src_reg_1
Syntax
Dreg = Dreg & Dreg ; /* (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The AND instruction performs a 32-bit, bit-wise logical AND operation
on the two source registers and stores the results into the dest_reg.
The instruction does not implicitly modify the source registers. The
dest_reg and one src_reg can be the same D-register. This would explic-
itly modifies the src_reg.
Flags Affected
The AND instruction affects flags as follows.
• AZ is set if the final result is zero, cleared if nonzero.
• AN is set if the result is negative, cleared if non-negative.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
r4 = r4 & r3 ;
Also See
| (OR)
Special Applications
None
General Form
dest_reg = ~ src_reg
Syntax
Dreg = ~ Dreg ; /* (a)*/
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The NOT One’s Complement instruction toggles every bit in the 32-bit
register.
The instruction does not implicitly modify the src_reg. The dest_reg
and src_reg can be the same D-register. Using the same D-register as the
dest_reg and src_reg would explicitly modify the src_reg.
Flags Affected
The NOT One’s Complement instruction affects flags as follows.
• AZ is set if the final result is zero, cleared if nonzero.
• AN is set if the result is negative, cleared if non-negative.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
r3 = ~ r4 ;
Also See
Negate (Two’s Complement)
Special Applications
None
| (OR)
General Form
dest_reg = src_reg_0 | src_reg_1
Syntax
Dreg = Dreg | Dreg ; /* (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The OR instruction performs a 32-bit, bit-wise logical OR operation on
the two source registers and stores the results into the dest_reg.
The instruction does not implicitly modify the source registers. The
dest_reg and one src_reg can be the same D-register. This would explic-
itly modifies the src_reg.
Flags Affected
The OR instruction affects flags as follows.
• AZ is set if the final result is zero, cleared if nonzero.
• AN is set if the result is negative, cleared if non-negative.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
r4 = r4 | r3 ;
Also See
^ (Exclusive-OR), BXORSHIFT, BXOR
Special Applications
None
^ (Exclusive-OR)
General Form
dest_reg = src_reg_0 ^ src_reg_1
Syntax
Dreg = Dreg ^ Dreg ; /* (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Exclusive-OR (XOR) instruction performs a 32-bit, bit-wise logical
exclusive OR operation on the two source registers and loads the results
into the dest_reg.
The XOR instruction does not implicitly modify source registers. The
dest_reg and one src_reg can be the same D-register. This would explic-
itly modifies the src_reg.
Flags Affected
The XOR instruction affects flags as follows.
• AZ is set if the final result is zero, cleared if nonzero.
• AN is set if the result is negative, cleared if non-negative.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
r4 = r4 ^ r3 ;
Also See
| (OR), BXORSHIFT, BXOR
Special Applications
None
BXORSHIFT, BXOR
General Form
dest_reg = CC = BXORSHIFT ( A0, src_reg )
dest_reg = CC = BXOR ( A0, src_reg )
dest_reg = CC = BXOR ( A0, A1, CC )
A0 = BXORSHIFT ( A0, A1, CC )
Syntax
LFSR Type I (Without Feedback)
Dreg_lo = CC = BXORSHIFT ( A0, Dreg ) ; /* (b) */
Dreg_lo = CC = BXOR ( A0, Dreg ) ; /* (b) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
Four Bit-Wise Exclusive-OR (BXOR) instructions support two different
types of linear feedback shift register (LFSR) implementations.
s(D)
D[0] D[1]
A0[0] A0[1]
In the figure above, the bits A0 bit 0 and A0 bit 1 are logically AND’ed
with bits D[0] and D[1]. The result from this operation is XOR reduced
according to the following formula.
s ( D ) = ( A0 [ 0 ]&D [ 0 ] ) ⊕ ( A0 [ 1 ]&D [ 1 ] )
XOR Reduction
0 + + + + CC dreg_lo
IN
After Operation
dreg_lo[15:0]
XOR Reduction
0 + + + + CC dreg_lo
IN
After Operation
dreg_lo[15:0]
Flags Affected
The following flags are affected by the Four Bit-Wise Exclusive-OR
instructions.
• CCis set or cleared according to the Functional Description for the
BXOR and the nonfeedback version of the BXORSHIFT instruction.
The feedback version of the BXORSHIFT instruction affects no flags.
• All other flags are unaffected.
CC + + + +
Left Shift by 1
A1[39] A1[38] A1[37] A1[0] Following XOR
Reduction
IN
A0[39] A0[38] A0[37] A0[0]
After Operation
A0[39:0]
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r0.l = cc = bxorshift (a0, r1) ;
r0.l = cc = bxor (a0, r1) ;
r0.l = cc = bxor (a0, a1, cc) ;
a0 = bxorshift (a0, a1, cc) ;
CC + + + CC dreg_lo[0]
IN
After Operation
dreg_lo[15:0]
Also See
None
Special Applications
Linear feedback shift registers (LFSRs) can multiply and divide polynomi-
als and are often used to implement cyclical encoders and decoders.
LFSRs use the set of Bit-Wise XOR instructions to compute bit XOR
reduction from a state masked by a polynomial.
Instruction Summary
• “BITCLR” on page 8-2
• “BITSET” on page 8-4
• “BITTGL” on page 8-6
• “BITTST” on page 8-8
• “DEPOSIT” on page 8-10
• “EXTRACT” on page 8-16
• “BITMUX” on page 8-21
• “ONES (One’s Population Count)” on page 8-26
Instruction Overview
This chapter discusses the instructions that specify bit operations. Users
can take advantage of these instructions to set, clear, toggle, and test bits.
They can also merge bit fields and save the result, extract specific bits from
a register, merge bit streams, and count the number of ones in a register.
BITCLR
General Form
BITCLR ( register, bit_position )
Syntax
BITCLR ( Dreg , uimm5 ) ; /* (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Bit Clear instruction clears the bit designated by bit_position in the
specified D-register. It does not affect other bits in that register.
The bit_position range of values is 0 through 31, where 0 indicates the
LSB, and 31 indicates the MSB of the 32-bit D-register.
Flags Affected
The Bit Clear instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
bitclr (r2, 3) ; /* clear bit 3 (the fourth bit from LSB) in
R2 */
Also See
BITSET, BITTST, BITTGL
Special Applications
None
BITSET
General Form
BITSET ( register, bit_position )
Syntax
BITSET ( Dreg , uimm5 ) ; /* (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Bit Set instruction sets the bit designated by bit_position in the
specified D-register. It does not affect other bits in the D-register.
The bit_position range of values is 0 through 31, where 0 indicates the
LSB, and 31 indicates the MSB of the 32-bit D-register.
Flags Affected
The Bit Set instruction affects flags as follows.
• AZ is cleared.
• AN is set if result is negative; cleared if non-negative.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
bitset (r2, 7) ; /* set bit 7 (the eighth bit from LSB) in
R2 */
Also See
BITCLR, BITTST, BITTGL
Special Applications
None
BITTGL
General Form
BITTGL ( register, bit_position )
Syntax
BITTGL ( Dreg , uimm5 ) ; /* (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Bit Toggle instruction inverts the bit designated by bit_position in
the specified D-register. The instruction does not affect other bits in the
D-register.
The bit_position range of values is 0 through 31, where 0 indicates the
LSB, and 31 indicates the MSB of the 32-bit D-register.
Flags Affected
The Bit Toggle instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
bittgl (r2, 24) ; /* toggle bit 24 (the 25th bit from LSB in
R2 */
Also See
BITSET, BITTST, BITCLR
Special Applications
None
BITTST
General Form
CC = BITTST ( register, bit_position )
CC = ! BITTST ( register, bit_position )
Syntax
CC = BITTST ( Dreg , uimm5 ) ; /* set CC if bit = 1 (a)*/
CC = ! BITTST ( Dreg , uimm5 ) ; /* set CC if bit = 0 (a)*/
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Bit Test instruction sets or clears the CC bit, based on the bit desig-
nated by bit_position in the specified D-register. One version tests
whether the specified bit is set; the other tests whether the bit is clear. The
instruction does not affect other bits in the D-register.
The bit_position range of values is 0 through 31, where 0 indicates the
LSB, and 31 indicates the MSB of the 32-bit D-register.
Flags Affected
The Bit Test instruction affects flags as follows.
• CC is set if the tested bit is 1; cleared otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
cc = bittst (r7, 15) ; /* test bit 15 TRUE in R7 */
Also See
BITCLR, BITSET, BITTGL
Special Applications
None
DEPOSIT
General Form
dest_reg = DEPOSIT ( backgnd_reg, foregnd_reg )
dest_reg = DEPOSIT ( backgnd_reg, foregnd_reg ) (X)
Syntax
Dreg = DEPOSIT ( Dreg, Dreg ) ; /* no extension (b) */
Dreg = DEPOSIT ( Dreg, Dreg ) (X) ; /* sign-extended (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Bit Field Deposit instruction merges the background bit field in
backgnd_reg with the foreground bit field in the upper half of
foregnd_reg and saves the result into dest_reg. The user determines the
length of the foreground bit field and its position in the background field.
The input register bit field definitions appear in Table 8-1.
The operation writes the foreground bit field of length L over the back-
ground bit field with the foreground LSB located at bit p of the
background. See “Example,” below, for more.
Boundary Cases
Consider the following boundary cases.
• Unsigned syntax, L = 0: The architecture copies backgnd_reg con-
tents without modification into dest_reg. By definition, a
foreground of zero length is transparent.
• Sign-extended, L = 0 and p = 0: This case loads 0x0000 0000 into
dest_reg. The sign of a zero length, zero position foreground is
zero; therefore, sign-extended is all zeros.
Options
The (X) syntax sign-extends the deposited bit field. If you specify the
sign-extended syntax, the operation does not affect the dest_reg bits that
are less significant than the deposited bit field.
Flags Affected
This instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
Bit Field Deposit Unsigned
r7 = deposit (r4, r3) ;
• If
• R4=0b1111 1111 1111 1111 1111 1111 1111 1111
where this is the background bit field
• R3=0b0000 0000 0000 0000 0000 0111 0000 0011
where bits 31–16 are the foreground bit field, bits 15–8 are
the position, and bits 7–0 are the length
then the Bit Field Deposit (unsigned) instruction produces:
• R7=0b1111 1111 1111 1111 1111 1100 0111 1111
• If
• R4=0b1111 1111 1111 1111 1111 1111 1111 1111
where this is the background bit field
• R3=0b0000 0000 1111 1010 0000 1101 0000 1001
where bits 31–16 are the foreground bit field, bits 15–8 are
the position, and bits 7–0 are the length
then the Bit Field Deposit (unsigned) instruction produces:
• R7=0b1111 1111 1101 1111 0101 1111 1111 1111
• If
• R4=0b1111 1111 1111 1111 1111 1111 1111 1111
where this is the background bit field
• R3=0b0101 1010 0101 1010 0000 0111 0000 0011
where bits 31–16 are the foreground bit field, bits 15–8 are
the position, and bits 7–0 are the length
then the Bit Field Deposit (unsigned) instruction produces:
• R7=0b0000 0000 0000 0000 0000 0001 0111 1111
• If
• R4=0b1111 1111 1111 1111 1111 1111 1111 1111
where this is the background bit field
• R3=0b0000 1001 1010 1100 0000 1101 0000 1001
where bits 31–16 are the foreground bit field, bits 15–8 are
the position, and bits 7–0 are the length
Also See
EXTRACT
Special Applications
Video image overlay algorithms
EXTRACT
General Form
dest_reg = EXTRACT ( scene_reg, pattern_reg ) (Z)
dest_reg = EXTRACT ( scene_reg, pattern_reg ) (X)
Syntax
Dreg = EXTRACT ( Dreg, Dreg_lo ) (Z) ; /* zero-extended (b)*/
Dreg = EXTRACT ( Dreg, Dreg_lo ) (X) ; /* sign-extended (b)*/
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Bit Field Extraction instruction moves only specific bits from the
scene_reg into the low-order bits of the dest_reg. The user determines
the length of the pattern bit field and its position in the scene field.
The input register bit field definitions appear in Table 8-2.
The operation reads the pattern bit field of length L from the scene bit
field, with the pattern LSB located at bit p of the scene. See “Example”,
below, for more.
Boundary Case
If (p + L) > 32: In the zero-extended and sign-extended versions of the
instruction, the architecture assumes that all bits to the left of the
scene_reg are zero. In such a case, the user is trying to access more bits
than the register actually contains. Consequently, the architecture fills any
undefined bits beyond the MSB of the scene_reg with zeros.
The Bit Field Extraction instruction does not modify the contents of the
two source registers. One of the source registers can also serve as dest_reg.
Options
The user has the choice of using the (X) syntax to perform sign-extend
extraction or the (Z) syntax to perform zero-extend extraction.
Flags Affected
This instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is cleared.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
Bit Field Extraction Unsigned
r7 = extract (r4, r3.l) (z) ; /* zero-extended*/
• If
• R4=0b1010 0101 1010 0101 1100 0011 1010 1010
where this is the scene bit field
• R3=0bxxxx xxxx xxxx xxxx 0000 0111 0000 0100
where bits 15–8 are the position, and bits 7–0 are the length
then the Bit Field Extraction (unsigned) instruction produces:
• R7=0b0000 0000 0000 0000 0000 0000 0000 0111
• If
• R4=0b1010 0101 1010 0101 1100 0011 1010 1010
where this is the scene bit field
• R3=0bxxxx xxxx xxxx xxxx 0000 1101 0000 1001
where bits bits 15–8 are the position, and bits 7–0 are the
length
then the Bit Field Extraction (unsigned) instruction produces:
• R7=0b0000 0000 0000 0000 0000 0001 0010 1110
• If
• R4=0b1010 0101 1010 0101 1100 0011 1010 1010
where this is the scene bit field
• R3=0bxxxx xxxx xxxx xxxx 0000 0111 0000 0100
where bits 15–8 are the position, and bits 7–0 are the length
then the Bit Field Extraction (sign-extended) instruction produces:
• R7=0b0000 0000 0000 0000 0000 0000 0000 0111
• IF
• R4=0b1010 0101 1010 0101 1100 0011 1010 1010
where this is the scene bit field
• R3=0bxxxx xxxx xxxx xxxx 0000 1101 0000 1001
where bits bits 15–8 are the position, and bits 7–0 are the
length
Also See
DEPOSIT
Special Applications
Video image pattern recognition and separation algorithms
BITMUX
General Form
BITMUX ( source_1, source_0, A0 ) (ASR)
Syntax
BITMUX ( Dreg , Dreg , A0 ) (ASR) ; /* shift right, LSB is
shifted out (b) */
BITMUX ( Dreg , Dreg , A0 ) (ASL) ; /* shift left, MSB is
shifted out (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Bit Multiplex instruction merges bit streams.
The instruction has two versions, Shift Right and Shift Left. This instruc-
tion overwrites the contents of source_1 and source_0. See Table 8-3,
Table 8-4, and Table 8-5.
In the Shift Right version, the processor performs the following sequence.
1. Right shift Accumulator A0 by one bit. Right shift the LSB of
source_1 into the MSB of the Accumulator.
In the Shift Left version, the processor performs the following sequence.
1. Left shift Accumulator A0 by one bit. Left shift the MSB of
source_0 into the LSB of the Accumulator.
Accumulator A0:3 zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzyx
Accumulator A0: zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz
Accumulator A0:3 yxzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
bitmux (r2, r3, a0) (asr) ; /* right shift*/
• If
• R2=0b1010 0101 1010 0101 1100 0011 1010 1010
• A0=0b0000 0000 0000 0000 0000 0000 0000 0000 0000 0111
• A0=0b1000 0000 0000 0000 0000 0000 0000 0000 0000 0001
• If
• R3=0b1010 0101 1010 0101 1100 0011 1010 1010
• A0=0b0000 0000 0000 0000 0000 0000 0000 0000 0000 0111
• A0=0b0000 0000 0000 0000 0000 0000 0000 0000 0001 1111
Also See
None
Special Applications
Convolutional encoder algorithms
General Form
dest_reg = ONES src_reg
Syntax
Dreg_lo = ONES Dreg ; /* (b) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The One’s Population Count instruction loads the number of 1’s con-
tained in the src_reg into the lower half of the dest_reg.
The range of possible values loaded into dest_reg is 0 through 32.
The dest_reg and src_reg can be the same D-register. Otherwise, the
One’s Population Count instruction does not modify the contents of
src_reg.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3.l = ones r7 ;
Also See
None
Special Applications
Software parity testing
Instruction Summary
• “Add with Shift” on page 9-2
• “Shift with Add” on page 9-5
• “Arithmetic Shift” on page 9-7
• “Logical Shift” on page 9-14
• “ROT (Rotate)” on page 9-21
Instruction Overview
This chapter discusses the instructions that manipulate bit operations.
Users can take advantage of these instructions to perform logical and
arithmetic shifts, combine addition operations with shifts, and rotate a
registered number through the Control Code (CC) bit.
General Form
dest_pntr = (dest_pntr + src_reg) << 1
dest_pntr = (dest_pntr + src_reg) << 2
dest_reg = (dest_reg + src_reg) << 1
dest_reg = (dest_reg + src_reg) << 2
Syntax
Pointer Operations
Preg = ( Preg + Preg ) << 1 ; /* dest_reg = (dest_reg +
src_reg) x 2 (a) */
Preg = ( Preg + Preg ) << 2 ; /* dest_reg = (dest_reg +
src_reg) x 4 (a) */
Data Operations
Dreg = (Dreg + Dreg) << 1 ; /* dest_reg = (dest_reg + src_reg)
x 2 (a) */
Dreg = (Dreg + Dreg) << 2 ; /* dest_reg = (dest_reg + src_reg)
x 4 (a) */
Syntax Terminology
Preg: P5–0
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Add with Shift instruction combines an addition operation with a
one- or two-place logical shift left. Of course, a left shift accomplishes a x2
multiplication on sign-extended numbers. Saturation is not supported.
The Add with Shift instruction does not intrinsically modify values that
are strictly input. However, dest_reg serves as an input as well as the
result, so dest_reg is intrinsically modified.
Flags Affected
The D-register versions of this instruction affect flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is set if result overflows; cleared if no overflow.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
p3 = (p3+p2)<<1 ; /* p3 = (p3 + p2) * 2 */
p3 = (p3+p2)<<2 ; /* p3 = (p3 + p2) * 4 */
r3 = (r3+r2)<<1 ; /* r3 = (r3 + r2) * 2 */
r3 = (r3+r2)<<2 ; /* r3 = (r3 + r2) * 4 */
Also See
Shift with Add, Logical Shift, Arithmetic Shift, Add, Multiply 32-Bit
Operands
Special Applications
None
General Form
dest_pntr = adder_pntr + ( src_pntr << 1 )
Syntax
Preg = Preg + ( Preg << 1 ) ; /* adder_pntr + (src_pntr x 2)
(a) */
Preg = Preg + ( Preg << 2 ) ; /* adder_pntr + (src_pntr x 4)
(a) */
Syntax Terminology
Preg: P5–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Shift with Add instruction combines a one- or two-place logical shift
left with an addition operation.
The instruction provides a shift-then-add method that supports a rudi-
mentary multiplier sequence useful for array pointer manipulation.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
p3 = p0+(p3<<1) ; /* p3 = (p3 * 2) + p0 */
p3 = p0+(p3<<2) ; /* p3 = (p3 * 4) + p0 */
Also See
Add with Shift, Logical Shift, Arithmetic Shift, Add, Multiply 32-Bit
Operands
Special Applications
None
Arithmetic Shift
General Form
dest_reg >>>= shift_magnitude
dest_reg = src_reg >>> shift_magnitude (opt_sat)
dest_reg = src_reg << shift_magnitude (S)
accumulator = accumulator >>> shift_magnitude
dest_reg = ASHIFT src_reg BY shift_magnitude (opt_sat)
accumulator = ASHIFT accumulator BY shift_magnitude
Syntax
Constant Shift Magnitude
Dreg >>>= uimm5 ; /* arithmetic right shift (a) */
Dreg <<= uimm5 ; /* logical left shift (a) */
Dreg_lo_hi = Dreg_lo_hi >>> uimm4 ; /* arithmetic right shift
(b) */
Dreg_lo_hi = Dreg_lo_hi << uimm4 (S) ; /* arithmetic left
shift (b) */
Dreg = Dreg >>> uimm5 ; /* arithmetic right shift (b) */
Dreg = Dreg << uimm5 (S) ; /* arithmetic left shift (b) */
A0 = A0 >>> uimm5 ; /* arithmetic right shift (b) */
A0 = A0 << uimm5 ; /* logical left shift (b) */
A1 = A1 >>> uimm5 ; /* arithmetic right shift (b) */
A1 = A1 << uimm5 ; /* logical left shift (b) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Arithmetic Shift instruction shifts a registered number a specified dis-
tance and direction while preserving the sign of the original number. The
sign bit value back-fills the left-most bit positions vacated by the arith-
metic right shift.
Specific versions of arithmetic left shift are supported, too. Arithmetic left
shift saturates the result if the value is shifted too far. A left shift that
would otherwise lose nonsign bits off the left-hand side saturates to the
maximum positive or negative value instead.
“>>>”, “<<”, and The value in src_reg is shifted by the number of places specified in
“ASHIFT” shift_magnitude, and the result is stored into dest_reg.
The “ASHIFT” versions can shift 32-bit Dreg and 40-bit Accumulator
registers by up to –32 through +31 places.
Options
Option (S) invokes saturation of the result.
In the default case–without the saturation option–numbers can be
left-shifted so far that all the sign bits overflow and are lost. However,
when the saturation option is enabled, a left shift that would otherwise
shift nonsign bits off the left-hand side saturates to the maximum positive
or negative value instead. Consequently, with saturation enabled, the
result always keeps the same sign as the original number.
See “Saturation” on page 1-11 for a description of saturation behavior.
Flags Affected
The versions of this instruction that send results to a Dreg set flags as
follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is set if result overflows; cleared if no overflow.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
The versions of this instruction that send results to an Accumulator A0 set
flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AV0 is set if result is zero; cleared if nonzero.
• AV0S is set if AV0 is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
The 16-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r0 >>>= 19 ; /* 16-bit instruction length arithmetic right
shift */
r3.l = r0.h >>> 7 ; /* arithmetic right shift, half-word */
r3.h = r0.h >>> 5 ; /* same as above; any combination of upper
and lower half-words is supported */
Also See
Vector Arithmetic Shift, Vector Logical Shift, Logical Shift, Shift with
Add, ROT (Rotate)
Special Applications
Multiply, divide, and normalize signed numbers
Logical Shift
General Form
dest_pntr = src_pntr >> 1
dest_pntr = src_pntr >> 2dest_pntr = src_pntr << 1
dest_pntr = src_pntr << 2dest_reg >>= shift_magnitude
dest_reg <<= shift_magnitude
dest_reg = src_reg >> shift_magnitude
dest_reg = src_reg << shift_magnitude
dest_reg = LSHIFT src_reg BY shift_magnitude
Syntax
Pointer Shift, Fixed Magnitude
Preg = Preg >> 1 ; /* right shift by 1 bit (a) */
Preg = Preg >> 2 ; /* right shift by 2 bit (a) */
Preg = Preg << 1 ; /* left shift by 1 bit (a) */
Preg = Preg << 2 ; /* left shift by 2 bit (a) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Preg: P5–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Logical Shift instruction logically shifts a register by a specified dis-
tance and direction.
Logical shifts discard any bits shifted out of the register and backfill
vacated bits with zeros.
For the LSHIFT version, the sign of the shift magnitude determines the
direction of the shift.
• Positive shift magnitudes produce Left shifts.
• Negative shift magnitudes produce Right shifts.
Flags Affected
The P-register versions of this instruction do not affect any flags.
The versions of this instruction that send results to a Dreg set flags as
follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
The 16-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
p3 = p2 >> 1 ; /* pointer right shift by 1 */
p3 = p3 >> 2 ; /* pointer right shift by 2 */
p4 = p5 << 1 ; /* pointer left shift by 1 */
p0 = p1 << 2 ; /* pointer left shift by 2 */
r3 >>= 17 ; /* data right shift */
r3 <<= 17 ; /* data left shift */
r3.l = r0.l >> 4 ; /* data right shift, half-word register */
r3.l = r0.h >> 4 ; /* same as above; half-word register combi-
nations are arbitrary */
r3.h = r0.l << 12 ; /* data left shift, half-word register */
r3.h = r0.h << 14 ; /* same as above; half-word register com-
binations are arbitrary */
r3 = r6 >> 4 ; /* right shift, 32-bit word */
r3 = r6 << 4 ; /* left shift, 32-bit word */
a0 = a0 >> 7 ; /* Accumulator right shift */
a1 = a1 >> 25 ; /* Accumulator right shift */
a0 = a0 << 7 ; /* Accumulator left shift */
a1 = a1 << 14 ; /* Accumulator left shift */
r3 >>= r0 ; /* data right shift */
r3 <<= r1 ; /* data left shift */
r3.l = lshift r0.l by r2.l ; /* shift direction controlled by
sign of R2.L */
r3.h = lshift r0.l by r2.l ;
a0 = lshift a0 by r7.l ;
a1 = lshift a1 by r7.l ;
/* If r0.h = -64 (or 0xFFC0), then performing . . . */
r3.h = r0.h >> 4 ; /* . . . produces r3.h = 0x0FFC (or 4092),
losing the sign */
Also See
Arithmetic Shift, ROT (Rotate), Shift with Add, Vector Arithmetic Shift,
Vector Logical Shift
Special Applications
None
ROT (Rotate)
General Form
dest_reg = ROT src_reg BY rotate_magnitude
accumulator_new = ROT accumulator_old BY rotate_magnitude
Syntax
Constant Rotate Magnitude
Dreg = ROT Dreg BY imm6 ; /* (b) */
A0 = ROT A0 BY imm6 ; /* (b) */
A1 = ROT A1 BY imm6 ; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Rotate instruction rotates a register through the CC bit a specified dis-
tance and direction. The CC bit is in the rotate chain. Consequently, the
first value rotated into the register is the initial value of the CC bit.
Rotation shifts all the bits either right or left. Each bit that rotates out of
the register (the LSB for rotate right or the MSB for rotate left) is stored in
the CC bit, and the CC bit is stored into the bit vacated by the rotate on the
opposite end of the register.
If 31 0
CC bit: 1
CC bit: 0
If 31 0
CC bit: 0
CC bit: 1
The sign of the rotate magnitude determines the direction of the rotation.
• Positive rotate magnitudes produce Left rotations.
• Negative rotate magnitudes produce Right rotations.
Valid rotate magnitudes are –32 through +31, zero included. The Rotate
instruction masks and ignores bits that are more significant than those
allowed. The distance is determined by the lower 6 bits (sign extended) of
the shift_magnitude.
Unlike shift operations, the Rotate instruction loses no bits of the source
register data. Instead, it rearranges them in a circular fashion. However,
the last bit rotated out of the register remains in the CC bit, and is not
returned to the register. Because rotates are performed all at once and not
one bit at a time, rotating one direction or another regardless of the rotate
magnitude produces no advantage. For instance, a rotate right by two bits
is no more efficient than a rotate left by 30 bits. Both methods produce
identical results in identical execution time.
The D-register versions of this instruction rotate all 32 bits. The Accumu-
lator versions rotate all 40 bits of those registers.
The D-register versions of this instruction do not implicitly modify the
src_reg values. Optionally, dest_reg can be the same D-register as
src_reg. Doing this explicitly modifies the source register.
Flags Affected
The following flags are affected by the Rotate instruction.
• CC contains the latest value shifted into it.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r4 = rot r1 by 8 ; /* rotate left */
r4 = rot r1 by -5 ; /* rotate right */
a0 = rot a0 by 22 ; /* rotate Accumulator left */
a1 = rot a1 by -31 ; /* rotate Accumulator right */
r4 = rot r1 by r2.l ;
a0 = rot a0 by r3.l ;
a1 = rot a1 by r7.l ;
Also See
Arithmetic Shift, Logical Shift
Special Applications
None
Instruction Summary
• “ABS” on page 10-3
• “Add” on page 10-6
• “Add/Subtract – Prescale Down” on page 10-10
• “Add/Subtract – Prescale Up” on page 10-13
• “Add Immediate” on page 10-16
• “DIVS, DIVQ (Divide Primitive)” on page 10-19
• “EXPADJ” on page 10-27
• “MAX” on page 10-31
• “MIN” on page 10-34
• “Modify – Decrement” on page 10-37
• “Modify – Increment” on page 10-40
• “Multiply 16-Bit Operands” on page 10-46
• “Multiply 32-Bit Operands” on page 10-54
• “Multiply and Multiply-Accumulate to Accumulator” on
page 10-56
• “Multiply and Multiply-Accumulate to Half-Register” on
page 10-61
Instruction Overview
This chapter discusses the instructions that specify arithmetic operations.
Users can take advantage of these instructions to add, subtract, divide, and
multiply, as well as to calculate and store absolute values, detect expo-
nents, round, saturate, and return the number of sign bits.
ABS
General Form
dest_reg = ABS src_reg
Syntax
A0 = ABS A0 ; /* (b) */
A0 = ABS A1 ; /* (b) */
A1 = ABS A0 ; /* (b) */
A1 = ABS A1 ; /* (b) */
A1 = ABS A1, A0 = ABS A0 ; /* (b) */
Dreg = ABS Dreg ; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Absolute Value instruction calculates the absolute value of a 32-bit
register and stores it into a 32-bit dest_reg according to the following
rules.
• If the input value is positive or zero, copy it unmodified to the
destination.
• If the input value is negative, subtract it from zero and store the
result in the destination.
The ABS operation can also be performed on both Accumulators by a sin-
gle instruction.
Flags Affected
This instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero. In the case of two
simultaneous operations, AZ represents the logical “OR” of the two.
• AN is cleared.
• Vis set if the maximum negative value is saturated to the maximum
positive value and the dest_reg is a Dreg; cleared if no saturation.
• VS is set if V is set; unaffected otherwise.
• AV0 is set if result overflows and the dest_reg is A0; cleared if no
overflow.
• AV0S is set if AV0 is set; unaffected otherwise.
• AV1 is set if result overflows and the dest_reg is A1; cleared if no
overflow.
• AV1S is set if AV1 is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
a0 = abs a0 ;
a0 = abs a1 ;
a1 = abs a0 ;
a1 = abs a1 ;
a1 = abs a1, a0=abs a0 ;
r3 = abs r1 ;
Also See
Vector ABS
Special Applications
None
Add
General Form
dest_reg = src_reg_1 + src_reg_2
Syntax
Pointer Registers — 32-Bit Operands, 32-Bit Result
Preg = Preg + Preg ; /* (a) */
Syntax Terminology
Preg: P5–0, SP, FP
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Add instruction adds two source values and places the result in a des-
tination register.
There are two ways to specify addition on 32-bit data in D-registers:
• One does not support saturation (16-bit instruction length)
• The other supports optional saturation (32-bit instruction length)
The shorter 16-bit instruction takes up less memory space. The larger
32-bit instruction can sometimes save execution time because it can be
issued in parallel with certain other instructions. See “Parallel Issue”.
The D-register version that accepts 16-bit half-word operands stores the
result in a half-word data register. This version accepts any combination
of upper and lower half-register operands, and places the results in the
upper or lower half of the destination register at the user’s discretion.
All versions that manipulate 16-bit data are 32 bits long.
Options
In the syntax, where sat_flag appears, substitute one of the following
values.
(S) – saturate the result
(NS) – no saturation
See “Saturation” on page 1-11 for a description of saturation behavior.
Flags Affected
D-register versions of this instruction set flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is set if the operation generates a carry; cleared if no carry.
• V is set if result overflows; cleared if no overflow.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
The 16-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r5 = r2 + r1 ; /* 16-bit instruction length add, no
saturation */
r5 = r2 + r1(ns) ; /* same result as above, but 32-bit
instruction length */
r5 = r2 + r1(s) ; /* saturate the result */
p5 = p3 + p0 ;
/* If r0.l = 0x7000 and r7.l = 0x2000, then . . . */
r4.l = r0.l + r7.l (ns) ; /* . . . produces r4.l = 0x9000,
because no saturation is enforced */
/* If r0.l = 0x7000 and r7.h = 0x2000, then . . . */
r4.l = r0.l + r7.h (s) ; /* . . . produces r4.l = 0x7FFF, satu-
rated to the maximum positive value */
r0.l = r2.h + r4.l(ns) ;
r1.l = r3.h + r7.h(ns) ;
r4.h = r0.l + r7.l (ns) ;
r4.h = r0.l + r7.h (ns) ;
r0.h = r2.h + r4.l(s) ; /* saturate the result */
r1.h = r3.h + r7.h(ns) ;
Also See
Modify – Increment, Add with Shift, Shift with Add, Vector Add /
Subtract
Special Applications
None
General Form
dest_reg = src_reg_0 + src_reg_1 (RND20)
dest_reg = src_reg_0 - src_reg_1 (RND20)
Syntax
Dreg_lo_hi = Dreg + Dreg (RND20) ; // (b)
Dreg_lo_hi = Dreg - Dreg (RND20) ; // (b)
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Add/Subtract -- Prescale Down instruction combines two 32-bit val-
ues to produce a 16-bit result as follows:
• Prescale down both input operand values by arithmetically shifting
them four places to the right
• Add or subtract the operands, depending on the instruction version
used
• Round the upper 16 bits of the result
• Extract the upper 16 bits to the dest_reg
The instruction supports only biased rounding. The RND_MOD bit in the
ASTAT register has no bearing on the rounding behavior of this instruction.
Flags Affected
The following flags are affected by this instruction:
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is cleared.
All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r1.l = r6+r7(rnd20) ;
r1.l = r6-r7(rnd20) ;
r1.h = r6+r7(rnd20) ;
r1.h = r6-r7(rnd20) ;
Also See
Add/Subtract – Prescale Up, RND (Round to Half-Word), Add
Special Applications
Typically, use the Add/Subtract – Prescale Down instruction to provide
an IEEE 1180–compliant 2D 8x8 inverse discrete cosine transform.
Add/Subtract – Prescale Up
General Form
dest_reg = src_reg_0 + src_reg_1 (RND12)
dest_reg = src_reg_0 - src_reg_1 (RND12)
Syntax
Dreg_lo_hi = Dreg + Dreg (RND12) ; // (b)
Dreg_lo_hi = Dreg - Dreg (RND12) ; // (b)
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Add/Subtract – Prescale Up instruction combines two 32-bit values
to produce a 16-bit result as follows:
• Prescale up both input operand values by shifting them four places
to the left
• Add or subtract the operands, depending on the instruction version
used
• Round and saturate the upper 16 bits of the result
• Extract the upper 16 bits to the dest_reg
The instruction supports only biased rounding. The RND_MOD bit in the
ASTAT register has no bearing on the rounding behavior of this instruction.
Flags Affected
The following flags are affected by this instruction:
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is set if result saturates; cleared if no saturation.
• VS is set if V is set; unaffected otherwise.
All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r1.l = r6+r7(rnd12) ;
r1.l = r6-r7(rnd12) ;
r1.h = r6+r7(rnd12) ;
r1.h = r6-r7(rnd12) ;
Also See
RND (Round to Half-Word), Add/Subtract – Prescale Down, Add
Special Applications
Typically, use the Add/Subtract – Prescale Up instruction to provide an
IEEE 1180–compliant 2D 8x8 inverse discrete cosine transform.
Add Immediate
General Form
register += constant
Syntax
Dreg += imm7 ; /* Dreg = Dreg + constant (a) */
Preg += imm7 ; /* Preg = Preg + constant (a) */
Ireg += 2 ; /* increment Ireg by 2, half-word address pointer
increment (a) */
Ireg += 4 ; /* word address pointer increment (a) */
Syntax Terminology
Dreg: R7–0
Ireg: I3–0
imm7: 7-bit signed field, with the range of –64 through +63
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Add Immediate instruction adds a constant value to a register without
saturation.
L ToImmediate
subtract immediate values from I-registers, use the Subtract
instruction.
The circular address buffer registers (Index, Length, and Base) are
not initialized automatically by Reset. Traditionally, user software
clears all the circular address buffer registers during boot-up to dis-
able circular buffering, then initializes them later, if needed.
Flags Affected
D-register versions of this instruction set flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is set if the operation generates a carry; cleared if no carry.
• V is set if result overflows; cleared if no overflow.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The Index Register versions of this instruction can be issued in parallel
with specific other instructions. For details, see “Issuing Parallel Instruc-
tions” on page 15-1.
The Data Register and Pointer Register versions of this instruction cannot
be issued in parallel with other instructions.
Example
r0 += 40 ;
p5 += -4 ; /* decrement by adding a negative value */
i0 += 2 ;
i1 += 4 ;
Also See
Subtract Immediate
Special Applications
None
General Form
DIVS ( dividend_register, divisor_register )
DIVQ ( dividend_register, divisor_register )
Syntax
DIVS ( Dreg, Dreg ) ; /* Initialize for DIVQ. Set the AQ flag
based on the signs of the 32-bit dividend and the 16-bit divisor.
Left shift the dividend one bit. Copy AQ into the dividend LSB.
(a) */
DIVQ ( Dreg, Dreg ) ; /* Based on AQ flag, either add or sub-
tract the divisor from the dividend. Then set the AQ flag based
on the MSBs of the 32-bit dividend and the 16-bit divisor. Left
shift the dividend one bit. Copy the logical inverse of AQ into
the dividend LSB. (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Divide Primitive instruction versions are the foundation elements of a
nonrestoring conditional add-subtract division algorithm. See “Example”
on page 10-25 for such a routine.
The dividend (numerator) is a 32-bit value. The divisor (denominator) is
a 16-bit value in the lower half of divisor_register. The high-order
half-word of divisor_register is ignored entirely.
The division can either be signed or unsigned, but the dividend and divi-
sor must both be of the same type. The divisor cannot be negative. A
signed division operation, where the dividend may be negative, begins the
sequence with the DIVS (“divide-sign”) instruction, followed by repeated
execution of the DIVQ (“divide-quotient”) instruction. An unsigned divi-
sion omits the DIVS instruction. In that case, the user must manually clear
the AQ flag of the ASTAT register before issuing the DIVQ instructions.
Up to 16 bits of signed quotient resolution can be calculated by issuing
DIVS once, then repeating the DIVQ instruction 15 times. A 16-bit
unsigned quotient is calculated by omitting DIVS, clearing the AQ flag, then
issuing 16 DIVQ instructions.
Less quotient resolution is produced by executing fewer DIVQ iterations.
The result of each successive addition or subtraction appears in
dividend_register, aligned and ready for the next addition or subtraction
step. The contents of divisor_register are not modified by this
instruction.
The final quotient appears in the low-order half-word of
dividend_register at the end of the successive add/subtract sequence.
DIVS computes the sign bit of the quotient based on the signs of the divi-
dend and divisor. DIVS initializes the AQ flag based on that sign, and
initializes the dividend for the first addition or subtraction. DIVS performs
no addition or subtraction.
DIVQ either adds (dividend + divisor) or subtracts (dividend – divisor)
based on the AQ flag, then reinitializes the AQ flag and dividend for the next
iteration. If AQ is 1, addition is performed; if AQ is 0, subtraction is
performed.
See “Flags Affected” for the conditions that set and clear the AQ flag.
Both instruction versions align the dividend for the next iteration by left
shifting the dividend one bit to the left (without carry). This left shift
accomplishes the same function as aligning the divisor one bit to the right,
such as one would do in manual binary division.
The format of the quotient for any numeric representation can be deter-
mined by the format of the dividend and divisor. Let:
• NL represent the number of bits to the left of the binal point of the
dividend, and
• NR represent the number of bits to the right of the binal point of
the dividend (numerator);
• DL represent the number of bits to the left of the binal point of the
divisor, and
• DR represent the number of bits to the right of the binal point of
the divisor (denominator).
Then the quotient has NL – DL + 1 bits to the left of the binal point and
NR – DR – 1 bits to the right of the binal point. See the following
example.
Dividend (numerator) BBBB B . BBB BBBB BBBB BBBB BBBB BBBB BBBB
NL bits NR bits
NL - DL +1 NR - DR - 1
(5 - 2 + 1) (27 - 14 - 1)
4.12 format
fractional (in 1.15 format) and therefore the upper 16 bits of the dividend
must have a smaller magnitude than the divisor to avoid a quotient over-
flow beyond 16 bits. If an overflow occurs, AV0 is set. User software is able
to detect the overflow, rescale the operand, and repeat the division.
Dividing two integers (32.0 dividend by a 16.0 divisor) results in an
invalid quotient format because the result will not fit in a 16-bit register.
To divide two integers (dividend in 32.0 format and divisor in 16.0 for-
mat) and produce an integer quotient (in 16.0 format), one must shift the
dividend one bit to the left (into 31.1 format) before dividing. This
requirement to shift left limits the usable dividend range to 31 bits. Viola-
tions of this range produce an invalid result of the division operation.
The algorithm overflows if the result cannot be represented in the format
of the quotient as calculated above, or when the divisor is zero or less than
the upper 16 bits of the dividend in magnitude (which is tantamount to
multiplication).
Error Conditions
Two special cases can produce invalid or inaccurate results. Software can
trap and correct both cases.
1. The Divide Primitive instructions do not support signed division
by a negative divisor. Attempts to divide by a negative divisor result
in a quotient that is, in most cases, one LSB less than the correct
value. If division by a negative divisor is required, follow the steps
below.
• Before performing the division, save the sign of the divisor
in a scratch register.
• Calculate the absolute value of the divisor and use that value
as the divisor operand in the Divide Primitive instructions.
• After the divide sequence concludes, multiply the resulting
quotient by the original divisor sign.
• The quotient then has the correct magnitude and sign.
2. The Divide Primitive instructions do not support unsigned divi-
sion by a divisor greater than 0x7FFF. If such divisions are
necessary, prescale both operands by shifting the dividend and divi-
sor one bit to the right prior to division. The resulting quotient
will be correctly aligned.
Flags Affected
This instruction affects flags as follows.
• AQ equals dividend_MSB Exclusive-OR divisor_MSB where dividend
is a 32-bit value and divisor is a 16-bit value.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
/* Evaluate given a signed integer dividend and divisor */
p0 = 15 ; /* Evaluate the quotient to 16 bits. */
r0 = 70 ; /* Dividend, or numerator */
r1 = 5 ; /* Divisor, or denominator */
r0 <<= 1 ; /* Left shift dividend by 1 needed for integer divi-
sion */
divs (r0, r1) ; /* Evaluate quotient MSB. Initialize AQ flag
and dividend for the DIVQ loop. */
loop .div_prim lc0=p0 ; /* Evaluate DIVQ p0=15 times. */
loop_begin .div_prim ;
divq (r0, r1) ;
loop_end .div_prim ;
Also See
LSETUP, LOOP, Multiply 32-Bit Operands
Special Applications
None
EXPADJ
General Form
dest_reg = EXPADJ ( sample_register, exponent_register )
Syntax
Dreg_lo = EXPADJ ( Dreg, Dreg_lo ) ; /* 32-bit sample (b) */
Dreg_lo = EXPADJ ( Dreg_lo_hi, Dreg_lo ) ; /* one 16-bit sam-
ple (b) */
Dreg_lo = EXPADJ ( Dreg, Dreg_lo ) (V) ; /* two 16-bit samples
(b) */
Syntax Terminology
Dreg_lo_hi: R7–0.L, R7–0.H
Dreg_lo: R7–0.L
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Exponent Detection instruction identifies the largest magnitude of
two or three fractional numbers based on their exponents. It compares the
magnitude of one or two sample values to a reference exponent and
returns the smallest of the exponents.
The exponent is the number of sign bits minus one. In other words, the
exponent is the number of redundant sign bits in a signed number.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r5.l = expadj (r4, r2.l) ;
• Assume R4 = 0x0000 0052 and R2.L = 12. Then R5.L becomes 12.
• Assume R4 = 0xFFFF 0052 and R2.L = 12. Then R5.L becomes 12.
• Assume R4 = 0x0000 0052 and R2.L = 27. Then R5.L becomes 24.
• Assume R4 = 0xF000 0052 and R2.L = 27. Then R5.L becomes 3.
r5.l = expadj (r4.l, r2.l) ;
• Assume R4.L = 0x0765, R4.H = 0xFF74 and R2.L = 12. Then R5.L
becomes 4.
• Assume R4.L = 0x0765, R4.H = 0xE722 and R2.L = 12. Then R5.L
becomes 2.
Also See
SIGNBITS
Special Applications
EXPADJ detects the exponent of the largest magnitude number in an array.
The detected value may then be used to normalize the array on a subse-
quent pass with a shift operation. Typically, use this feature to implement
block floating-point capabilities.
MAX
General Form
dest_reg = MAX ( src_reg_0, src_reg_1 )
Syntax
Dreg = MAX ( Dreg , Dreg ) ; /* 32-bit operands (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Maximum instruction returns the maximum, or most positive, value
of the source registers. The operation subtracts src_reg_1 from src_reg_0
and selects the output based on the signs of the input values and the arith-
metic flags.
The Maximum instruction does not implicitly modify input values. The
dest_reg can be the same D-register as one of the source registers. Doing
this explicitly modifies the source register.
Flags Affected
This instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r5 = max (r2, r3) ;
Also See
MIN, Vector MAX, Vector MIN, VIT_MAX (Compare-Select)
Special Applications
None
MIN
General Form
dest_reg = MIN ( src_reg_0, src_reg_1 )
Syntax
Dreg = MIN ( Dreg , Dreg ) ; /* 32-bit operands (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Minimum instruction returns the minimum value of the source regis-
ters to the dest_reg. (The minimum value of the source registers is the
value closest to – ∞.) The operation subtracts src_reg_1 from src_reg_0
and selects the output based on the signs of the input values and the arith-
metic flags.
The Minimum instruction does not implicitly modify input values. The
dest_reg can be the same D-register as one of the source registers. Doing
this explicitly modifies the source register.
Flags Affected
This instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r5 = min (r2, r3) ;
Also See
MAX, Vector MAX, Vector MIN
Special Applications
None
Modify – Decrement
General Form
dest_reg -= src_reg
Syntax
40-Bit Accumulators
A0 -= A1 ; /* dest_reg_new = dest_reg_old - src_reg, saturate
the result at 40 bits (b) */
A0 -= A1 (W32) ; /* dest_reg_new = dest_reg_old - src_reg, dec-
rement and saturate the result at 32 bits, sign extended (b) */
32-Bit Registers
Preg -= Preg ; /* dest_reg_new = dest_reg_old - src_reg (a) */
Ireg -= Mreg ; /* dest_reg_new = dest_reg_old - src_reg (a) */
Syntax Terminology
Preg: P5–0, SP, FP
Ireg: I3–0
Mreg: M3–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Modify – Decrement instruction decrements a register by a
user-defined quantity.
Flags Affected
The Accumulator versions of this instruction affect the flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is set if the operation generates a carry; cleared if no carry.
• AV0 is set if result saturates; cleared if no saturation.
• AV0S is set if AV0 is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
The 16-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
a0 -= a1 ;
a0 -= a1 (w32) ;
p3 -= p0 ;
i1 -= m2 ;
Also See
Modify – Increment, Subtract, Shift with Add
Special Applications
Typically, use the Index Register and Pointer Register versions of the
Modify – Decrement instruction to decrement indirect address pointers
for load or store operations.
Modify – Increment
General Form
dest_reg += src_reg
dest_reg = ( src_reg_0 += src_reg_1 )
Syntax
40-Bit Accumulators
A0 += A1 ; /* dest_reg_new = dest_reg_old + src_reg, saturate
the result at 40 bits (b) */
A0 += A1 (W32) ; /* dest_reg_new = dest_reg_old + src_reg,
signed saturate the result at 32 bits, sign extended (b) */
32-Bit Registers
Preg += Preg (BREV) ; /* dest_reg_new = dest_reg_old +
src_reg, bit reversed carry, only (a) */
Ireg += Mreg (opt_brev) ; /* dest_reg_new = dest_reg_old +
src_reg, optional bit reverse (a) */
Dreg = ( A0 += A1 ) ; /* increment 40-bit A0 by A1 with satura-
tion at 40 bits, then extract the result into a 32-bit register
with saturation at 32 bits (b) */
Syntax Terminology
Dreg: R7–0
Ireg: I3–0
Mreg: M3–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Modify – Increment instruction increments a register by a
user-defined quantity. In some versions, the instruction copies the result
into a third register.
The 16-bit Half-Word Data Register version increments the 40-bit A0 by
A1 with saturation at 40 bits, then extracts the result into a half register.
The extraction step involves first rounding the 40-bit result at bit 16
(according to the RND_MOD bit in the ASTAT register), then saturating at 32
bits and moving bits 31–16 into the half register.
See “Saturation” on page 1-11 for a description of saturation behavior.
Options
(BREV)–bit reverse carry adder. When specified, the carry bit is propagated
from left to right, as shown in Figure 10-1, instead of right to left.
When bit reversal is used on the Index Register version of this instruction,
circular buffering is disabled to support operand addressing for FFT,
DCT and DFT algorithms. The Pointer Register version does not support
circular buffering in any case.
Table 10-1. Bit Addition Flow for the Bit Reverse (BREV) Case
an a2 a1 a0
| cn | c2 | c1 |
+ + + + c0
| | | |
bn b2 b1 b0
Flags Affected
The versions of the Modify – Increment instruction that store the results
in an Accumulator affect flags as follows.
• AZ is set if Accumulator result is zero; cleared if nonzero.
• AN is set if Accumulator result is negative; cleared if non-negative.
• AC0 is set if the operation generates a carry; cleared if no carry.
• V is set if result saturates and the dest_reg is a Dreg; cleared if no
saturation.
• VS is set if V is set; unaffected otherwise.
• AV0 is set if result saturates and the dest_reg is A0; cleared if no
saturation.
• AV0S is set if AV0 is set; unaffected otherwise.
• All other flags are unaffected.
The versions of the Modify – Increment instruction that store the results
in a Data Register affect flags as follows.
• AZ is set if Data Register result is zero; cleared if nonzero.
• AN is set if Data Register result is negative; cleared if non-negative.
• AC0 is set if the operation generates a carry; cleared if no carry.
• V is set if result saturates and the dest_reg is a Dreg; cleared if no
saturation.
• VS is set if V is set; unaffected otherwise.
• AV0 is set if result saturates and the dest_reg is A0; cleared if no
saturation.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
The 16-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
a0 += a1 ;
a0 += a1 (w32) ;
p3 += p0 (brev) ;
i1 += m1 ;
i0 += m0 (brev) ; /* optional carry bit reverse mode */
r5 = (a0 += a1) ;
r2.l = (a0 += a1) ;
r5.h = (a0 += a1) ;
Also See
Modify – Decrement, Add, Shift with Add
Special Applications
Typically, use the Index Register and Pointer Register versions of the
Modify – Increment instruction to increment indirect address pointers for
load or store operations.
General Form
dest_reg = src_reg_0 * src_reg_1 (opt_mode)
Syntax
Multiply-And-Accumulate Unit 0 (MAC0)
Dreg_lo = Dreg_lo_hi * Dreg_lo_hi (opt_mode_1) ; /* 16-bit
result into the destination lower half-word register (b) */
Dreg_even = Dreg_lo_hi * Dreg_lo_hi (opt_mode_2) ; /* 32-bit
result (b) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Dreg_hi: R7–0.H
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Multiply 16-Bit Operands instruction multiplies the two 16-bit oper-
ands and stores the result directly into the destination register with
saturation.
The instruction is like the Multiply-Accumulate instructions, except that
Multiply 16-Bit Operands does not affect the Accumulators.
Operations performed by the Multiply-and-Accumulate Unit 0 (MAC0)
portion of the architecture load their 16-bit results into the lower half of
the destination data register; 32-bit results go into an even numbered
Dreg. Operations performed by MAC1 load their results into the upper
half of the destination data register or an odd numbered Dreg.
In 32-bit result syntax, the MAC performing the operation will be deter-
mined by the destination Dreg. Even-numbered Dregs (R6, R4, R2, R0)
invoke MAC0. Odd-numbered Dregs (R7, R5, R3, R1) invoke MAC1.
Therefore, 32-bit result operations using the (M) option can only be per-
formed on odd-numbered Dreg destinations.
In 16-bit result syntax, the MAC performing the operation will be deter-
mined by the destination Dreg half. Low-half Dregs (R7–0.L) invoke
MAC0. High-half Dregs (R7–0.H) invoke MAC1. Therefore, 16-bit result
operations using the (M) option can only be performed on high-half Dreg
destinations.
The versions of this instruction that produce 16-bit results are affected by
the RND_MOD bit in the ASTAT register when they copy the results into the
16-bit destination register. RND_MOD determines whether biased or unbi-
ased rounding is used. RND_MOD controls rounding for all versions of this
instruction that produce 16-bit results except the (IS), (IU) and (ISS2)
options.
See “Saturation” on page 1-11 for a description of saturation behavior.
See “Rounding and Truncating” on page 1-13 for a description of round-
ing behavior.
The versions of this instruction that produce 32-bit results do not perform
rounding and are not affected by the RND_MOD bit in the ASTAT register.
Options
The Multiply 16-Bit Operands instruction supports the following
options. Saturation is supported for every option.
To truncate the result, the operation eliminates the least significant bits
that do not fit into the destination register.
In fractional mode, the product of the smallest representable fraction
times itself (for example, 0x8000 times 0x8000) is saturated to the maxi-
mum representable positive fraction (0x7FFF).
Default Signed fraction. Multiply 1.15 * 1.15 to Signed fraction. Multiply 1.15 * 1.15 to
produce 1.31 results after left-shift cor- produce 1.31 results after left-shift correc-
rection. Round 1.31 format value at bit tion. Saturate results between minimum -1
16. (RND_MOD bit in the ASTAT and maximum 1-2-31.
register controls the rounding.) Satu- The resulting hexadecimal range is mini-
rate the result to 1.15 precision in desti- mum 0x8000 0000 through maximum
nation register half. Result is between 0x7FFF FFFF.
minimum -1 and maximum 1-2-15 (or,
expressed in hex, between minimum
0x8000 and maximum 0x7FFF).
(FU) Unsigned fraction. Multiply 0.16 * Unsigned fraction. Multiply 0.16 * 0.16 to
0.16 to produce 0.32 results. No shift produce 0.32 results. No shift correction.
correction. Round 0.32 format value at Saturate results between minimum 0 and
bit 16. (RND_MOD bit in the ASTAT maximum 1-2-32.
register controls the rounding.) Satu- Unsigned integer. Multiply 16.0 * 16.0 to
rate the result to 0.16 precision in desti- produce 32.0 results. No shift correction.
nation register half. Result is between Saturate results between minimum 0 and
minimum 0 and maximum 1-2-16 (or, maximum 232-1.
expressed in hex, between minimum In either case, the resulting hexadecimal
0x0000 and maximum 0xFFFF). range is minimum 0x0000 0000 through
maximum 0xFFFF FFFF.
(IS) Signed integer. Multiply 16.0 * 16.0 to Signed integer. Multiply 16.0 * 16.0 to
produce 32.0 results. No shift correc- produce 32.0 results. No shift correction.
tion. Extract the lower 16 bits. Saturate Saturate integer results between minimum
for 16.0 precision in destination register -231 and maximum 231-1.
half. Result is between minimum -215
and maximum 215-1 (or, expressed in
hex, between minimum 0x8000 and
maximum 0x7FFF).
(IU) Unsigned integer. Multiply 16.0 * 16.0 Not applicable. Use (IS).
to produce 32.0 results. No shift correc-
tion. Extract the lower 16 bits. Saturate
for 16.0 precision in destination register
half. Result is between minimum 0 and
maximum 216-1 (or, expressed in hex,
between minimum 0x0000 and maxi-
mum 0xFFFF).
(T) Signed fraction with truncation. Trun- Not applicable. Truncation is meaningless
cate Accumulator 9.31 format value at for 32-bit register destinations.
bit 16. (Perform no rounding.) Satu-
rate the result to 1.15 precision in desti-
nation register half. Result is between
minimum -1 and maximum 1-2-15 (or,
expressed in hex, between minimum
0x8000 and maximum 0x7FFF).
(ISS2) Signed integer with scaling. Multiply Signed integer with scaling. Multiply 16.0
16.0 * 16.0 to produce 32.0 results. No * 16.0 to produce 32.0 results. No shift
shift correction. Extract the lower 16 correction. Shift the results one place to
bits. Shift them one place to the left the left (multiply x 2). Saturate result to
(multiply x 2). Saturate the result for 32.0 format. Copy to destination regis-
16.0 format in destination register half. ter. Results range between minimum -1
Result is between minimum -215 and and maximum 231-1.
maximum 215-1 (or, expressed in hex, The resulting hexadecimal range is mini-
between minimum 0x8000 and maxi- mum 0x8000 0000 through maximum
mum 0x7FFF). 0x7FFF FFFF.
(M) Mixed mode multiply (valid only for MAC1). When issued in a fraction mode
instruction (with Default, FU, T, TFU, or S2RND mode), multiply 1.15 * 0.16 to
produce 1.31 results.
When issued in an integer mode instruction (with IS, ISS2, or IH mode), multiply
16.0 * 16.0 (signed * unsigned) to produce 32.0 results.
No shift correction in either case. Src_reg_0 is the signed operand and Src_reg_1 is
the unsigned operand.
All other operations proceed according to the other mode flag or Default.
Flags Affected
This instruction affects flags as follows.
• V is set if result saturates; cleared if no saturation.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3.l=r3.h*r2.h ; /* MAC0. Both operands are signed
fractions. */
r3.h=r6.h*r4.l (fu) ; /* MAC1. Both operands are unsigned frac-
tions. */
r6=r3.h*r4.h ; /* MAC0. Signed fraction operands, results saved
as 32 bits. */
Also See
Multiply 32-Bit Operands, Multiply and Multiply-Accumulate to Accu-
mulator, Multiply and Multiply-Accumulate to Half-Register, Multiply
and Multiply-Accumulate to Data Register, Vector Multiply, Vector Mul-
tiply and Multiply-Accumulate
Special Applications
None
General Form
dest_reg *= multiplier_register
Syntax
Dreg *= Dreg ; /* 32 x 32 integer multiply (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Multiply 32-Bit Operands instruction multiplies two 32-bit data reg-
isters (dest_reg and multiplier_register) and saves the product in
dest_reg. The instruction mimics multiplication in the C language and
effectively performs Dreg1 = (Dreg1 * Dreg2) modulo 232. Since the
integer multiply is modulo 232, the result always fits in a 32-bit dest_reg,
and overflows are possible but not detected. The overflow flag in the
ASTAT register is never set.
Users are required to limit input numbers to ensure that the resulting
product does not exceed the 32-bit dest_reg capacity. If overflow notifi-
cation is required, users should write their own multiplication macro with
that capability.
Accumulators A0 and A1 are unchanged by this instruction.
The Multiply 32-Bit Operands instruction does not implicitly modify the
number in multiplier_register.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with any other instructions.
Example
r3 *= r0 ;
Also See
DIVS, DIVQ (Divide Primitive), Arithmetic Shift, Shift with Add, Add
with Shift, Vector Multiply and Multiply-Accumulate, Vector Multiply
Special Applications
None
General Form
accumulator = src_reg_0 * src_reg_1 (opt_mode)
accumulator += src_reg_0 * src_reg_1 (opt_mode)
accumulator –= src_reg_0 * src_reg_1 (opt_mode)
Syntax
Multiply-And-Accumulate Unit 0 (MAC0) Operations
A0 =Dreg_lo_hi * Dreg_lo_hi (opt_mode) ; /* multiply and
store (b) */
A0 += Dreg_lo_hi * Dreg_lo_hi (opt_mode) ; /* multiply and
add (b) */
A0 –= Dreg_lo_hi * Dreg_lo_hi (opt_mode) ; /* multiply and
subtract (b) */
Syntax Terminology
Dreg_lo_hi: R7–0.L, R7–0.H
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Multiply and Multiply-Accumulate to Accumulator instruction mul-
tiplies two 16-bit half-word operands. It stores, adds or subtracts the
product into a designated Accumulator with saturation.
The Multiply-and-Accumulate Unit 0 (MAC0) portion of the architecture
performs operations that involve Accumulator A0. MAC1 performs A1
operations.
By default, the instruction treats both operands of both MACs as signed
fractions with left-shift correction as required.
Options
The Multiply and Multiply-Accumulate to Accumulator instruction sup-
ports the following options. Saturation is supported for every option.
When the (M) and (W32) options are used together, both MACs saturate
their Accumulator products at 32 bits. MAC1 multiplies signed fractions
by unsigned fractions and MAC0 multiplies signed fractions.
When used together, the order of the options in the syntax makes no
difference.
In fractional mode, the product of the most negative representable frac-
tion times itself (for example, 0x8000 times 0x8000) is saturated to the
maximum representable positive fraction (0x7FFF) before accumulation.
See “Saturation” on page 1-11 for a description of saturation behavior.
Default Signed fraction. Multiply 1.15 x 1.15 to produce 1.31 format data after shift correc-
tion. Sign extend the result to 9.31 format before passing it to the Accumulator. Sat-
urate the Accumulator after copying or accumulating to maintain 9.31 precision.
Result is between minimum -1 and maximum 1-2-31 (or, expressed in hex, between
minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF).
(FU) Unsigned fraction. Multiply 0.16 x 0.16 to produce 0.32 format data. Perform no
shift correction. Zero extend the result to 8.32 format before passing it to the Accu-
mulator. Saturate the Accumulator after copying or accumulating to maintain 8.32
precision.
Unsigned integer. Multiply 16.0 x 16.0 to produce 32.0 format data. Perform no
shift correction. Zero extend the result to 40.0 format before passing it to the Accu-
mulator. Saturate the Accumulator after copying or accumulating to maintain 40.0
precision.
In either case, the resulting hexadecimal range is minimum 0x00 0000 0000 through
maximum 0xFF FFFF FFFF.
(IS) Signed integer. Multiply 16.0 x 16.0 to produce 32.0 format data. Perform no shift
correction. Sign extend the result to 40.0 format before passing it to the Accumula-
tor. Saturate the Accumulator after copying or accumulating to maintain 40.0 preci-
sion. Result is between minimum -239 and maximum 239-1 (or, expressed in hex,
between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF).
(W32) Signed fraction with 32-bit saturation. Multiply 1.15 x 1.15 to produce 1.31 format
data after shift correction. Sign extend the result to 9.31 format before passing it to
the Accumulator. Saturate the Accumulator after copying or accumulating at bit 31
to maintain 1.31 precision. Result is between minimum -1 and maximum 1-2-31
(or, expressed in hex, between minimum 0xFF 8000 0000 and maximum 0x00 7FFF
FFFF).
(M) Mixed mode multiply (valid only for MAC1). When issued in a fraction mode
instruction (with Default, FU, T, TFU, or S2RND mode), multiply 1.15 * 0.16 to
produce 1.31 results.
When issued in an integer mode instruction (with IS, ISS2, or IH mode), multiply
16.0 * 16.0 (signed * unsigned) to produce 32.0 results.
No shift correction in either case. Src_reg_0 is the signed operand and Src_reg_1 is
the unsigned operand.
Accumulation and extraction proceed according to the other mode flag or Default.
Flags Affected
This instruction affects flags as follows.
• AV0 is set if result in Accumulator A0 (MAC0 operation) saturates;
cleared if A0 result does not saturate.
• AV0S is set if AV0 is set; unaffected otherwise.
• AV1 is set if result in Accumulator A1 (MAC1 operation) saturates;
cleared if A1 result does not saturate.
• AV1S is set if AV1 is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
a0=r3.h*r2.h ; /* MAC0, only. Both operands are signed frac-
tions. Load the product into A0. */
a1+=r6.h*r4.l (fu) ; /* MAC1, only. Both operands are unsigned
fractions. Accumulate into A1 */
Also See
Multiply 16-Bit Operands, Multiply 32-Bit Operands, Multiply and Mul-
tiply-Accumulate to Half-Register, Multiply and Multiply-Accumulate to
Data Register, Vector Multiply, Vector Multiply and
Multiply-Accumulate
Special Applications
DSP filter applications often use the Multiply and Multiply-Accumulate
to Accumulator instruction to calculate the dot product between two sig-
nal vectors.
General Form
dest_reg_half = (accumulator = src_reg_0 * src_reg_1) (opt_mode)
dest_reg_half = (accumulator += src_reg_0 * src_reg_1) (opt_mode)
dest_reg_half = (accumulator –= src_reg_0 * src_reg_1) (opt_mode)
Syntax
Multiply-And-Accumulate Unit 0 (MAC0)
Dreg_lo = (A0 = Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ; /* mul-
tiply and store (b) */
Dreg_lo = (A0 += Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ; /* multi-
ply and add (b) */
Dreg_lo = (A0 –= Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ; /* mul-
tiply and subtract (b) */
Syntax Terminology
Dreg_lo_hi: R7–0.L, R7–0.H
Dreg_lo: R7–0.L
Dreg_hi: R7–0.H
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Multiply and Multiply-Accumulate to Half-Register instruction mul-
tiplies two 16-bit half-word operands. The instruction stores, adds or
subtracts the product into a designated Accumulator. It then copies 16
bits (saturated at 16 bits) of the Accumulator into a data half-register.
The fraction versions of this instruction (the default and “(FU)” options)
transfer the Accumulator result to the destination register according to the
diagrams in Figure 10-1.
The integer versions of this instruction (the “(IS)” and “(IU)” options)
transfer the Accumulator result to the destination register according to the
diagrams in Figure 10-2.
The Multiply-and-Accumulate Unit 0 (MAC0) portion of the architecture
performs operations that involve Accumulator A0 and loads the results
into the lower half of the destination data register. MAC1 performs A1
operations and loads the results into the upper half of the destination data
register.
All versions of this instruction that support rounding are affected by the
RND_MOD bit in the ASTAT register when they copy the results into the desti-
nation register. RND_MOD determines whether biased or unbiased rounding
is used.
Options
The Multiply and Multiply-Accumulate to Half-Register instruction sup-
ports operand and Accumulator copy options.
The options are listed in Table 10-4.
Default Signed fraction format. Multiply 1.15 * 1.15 formats to produce 1.31 results after
shift correction. The special case of 0x8000 * 0x8000 is saturated to 0x7FFF FFFF
to fit the 1.31 result.
Sign extend 1.31 result to 9.31 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 9.31 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
To extract to half-register, round Accumulator 9.31 format value at bit 16.
(RND_MOD bit in the ASTAT register controls the rounding.) Saturate the result
to 1.15 precision and copy it to the destination register half. Result is between
minimum -1 and maximum 1-2-15 (or, expressed in hex, between minimum
0x8000 and maximum 0x7FFF).
(FU) Unsigned fraction format. Multiply 0.16* 0.16 formats to produce 0.32 results.
No shift correction. The special case of 0x8000 * 0x8000 yields 0x4000 0000. No
saturation is necessary since no shift correction occurs.
Zero extend 0.32 result to 8.32 format before copying or accumulating to Accu-
mulator. Then, saturate Accumulator to maintain 8.32 precision; Accumulator
result is between minimum 0x00 0000 0000 and maximum 0xFF FFFF FFFF.
To extract to half-register, round Accumulator 8.32 format value at bit 16.
(RND_MOD bit in the ASTAT register controls the rounding.) Saturate the result
to 0.16 precision and copy it to the destination register half. Result is between
minimum 0 and maximum 1-2-16 (or, expressed in hex, between minimum
0x0000 and maximum 0xFFFF).
(IS) Signed integer format. Multiply 16.0 * 16.0 formats to produce 32.0 results. No
shift correction.
Sign extend 32.0 result to 40.0 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 40.0 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
Extract the lower 16 bits of the Accumulator. Saturate for 16.0 precision and copy
to the destination register half. Result is between minimum -215 and maximum
215-1 (or, expressed in hex, between minimum 0x8000 and maximum 0x7FFF).
(IU) Unsigned integer format. Multiply 16.0 * 16.0 formats to produce 32.0 results.
No shift correction.
Zero extend 32.0 result to 40.0 format before copying or accumulating to Accu-
mulator. Then, saturate Accumulator to maintain 40.0 precision; Accumulator
result is between minimum 0x00 0000 0000 and maximum 0xFF FFFF FFFF.
Extract the lower 16 bits of the Accumulator. Saturate for 16.0 precision and copy
to the destination register half. Result is between minimum 0 and maximum
216-1 (or, expressed in hex, between minimum 0x0000 and maximum 0xFFFF).
(T) Signed fraction with truncation. Multiply 1.15 * 1.15 formats to produce 1.31
results after shift correction. The special case of 0x8000 * 0x8000 is saturated to
0x7FFF FFFF to fit the 1.31 result. (Same as the Default mode.)
Sign extend 1.31 result to 9.31 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 9.31 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
To extract to half-register, truncate Accumulator 9.31 format value at bit 16. (Per-
form no rounding.) Saturate the result to 1.15 precision and copy it to the destina-
tion register half. Result is between minimum -1 and maximum 1-2-15 (or,
expressed in hex, between minimum 0x8000 and maximum 0x7FFF).
(TFU) Unsigned fraction with truncation. Multiply 0.16* 0.16 formats to produce 0.32
results. No shift correction. The special case of 0x8000 * 0x8000 yields 0x4000
0000. No saturation is necessary since no shift correction occurs. (Same as the FU
mode.)
Zero extend 0.32 result to 8.32 format before copying or accumulating to Accu-
mulator. Then, saturate Accumulator to maintain 8.32 precision; Accumulator
result is between minimum 0x00 0000 0000 and maximum 0xFF FFFF FFFF.
To extract to half-register, truncate Accumulator 8.32 format value at bit 16. (Per-
form no rounding.) Saturate the result to 0.16 precision and copy it to the destina-
tion register half. Result is between minimum 0 and maximum 1-2-16 (or,
expressed in hex, between minimum 0x0000 and maximum 0xFFFF).
(S2RND) Signed fraction with scaling and rounding. Multiply 1.15 * 1.15 formats to pro-
duce 1.31 results after shift correction. The special case of 0x8000 * 0x8000 is sat-
urated to 0x7FFF FFFF to fit the 1.31 result. (Same as the Default mode.)
Sign extend 1.31 result to 9.31 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 9.31 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
To extract to half-register, shift the Accumulator contents one place to the left
(multiply x 2). Round Accumulator 9.31 format value at bit 16. (RND_MOD bit
in the ASTAT register controls the rounding.) Saturate the result to 1.15 precision
and copy it to the destination register half. Result is between minimum -1 and
maximum 1-2-15 (or, expressed in hex, between minimum 0x8000 and maximum
0x7FFF).
(ISS2) Signed integer with scaling. Multiply 16.0 * 16.0 formats to produce 32.0 results.
No shift correction. (Same as the IS mode.)
Sign extend 32.0 result to 40.0 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 40.0 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
Extract the lower 16 bits of the Accumulator. Shift them one place to the left
(multiply x 2). Saturate the result for 16.0 format and copy to the destination reg-
ister half. Result is between minimum -215 and maximum 215-1 (or, expressed in
hex, between minimum 0x8000 and maximum 0x7FFF).
(IH) Signed integer, high word extract. Multiply 16.0 * 16.0 formats to produce 32.0
results. No shift correction. (Same as the IS mode.)
Sign extend 32.0 result to 40.0 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 40.0 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
To extract to half-register, round Accumulator 40.0 format value at bit 16.
(RND_MOD bit in the ASTAT register controls the rounding.) Saturate to 32.0
result. Copy the upper 16 bits of that value to the destination register half. Result
is between minimum -215 and maximum 215-1 (or, expressed in hex, between
minimum 0x8000 and maximum 0x7FFF).
(M) Mixed mode multiply (valid only for MAC1). When issued in a fraction mode
instruction (with Default, FU, T, TFU, or S2RND mode), multiply 1.15 * 0.16 to
produce 1.31 results.
When issued in an integer mode instruction (with IS, ISS2, or IH mode), multiply
16.0 * 16.0 (signed * unsigned) to produce 32.0 results.
No shift correction in either case. Src_reg_0 is the signed operand and Src_reg_1
is the unsigned operand.
Accumulation and extraction proceed according to the other mode flag or Default.
To truncate the result, the operation eliminates the least significant bits
that do not fit into the destination register.
When necessary, saturation is performed after the rounding.
The accumulator is unaffected by extraction.
If you want to keep the unaltered contents of the Accumulator, use a sim-
ple Move instruction to copy An.X or An.W to or from a register.
See “Saturation” on page 1-11 for a description of saturation behavior.
Flags Affected
This instruction affects flags as follows.
• V is set if the result extracted to the Dreg saturates; cleared if no
saturation.
• VS is set if V is set; unaffected otherwise.
• AV0 is set if result in Accumulator A0 (MAC0 operation) saturates;
cleared if A0 result does not saturate.
• AV0S is set if AV0 is set; unaffected otherwise.
• AV1 is set if result in Accumulator A1 (MAC1 operation) saturates;
cleared if A1 result does not saturate.
• AV1S is set if AV1 is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3.l=(a0=r3.h*r2.h) ; /* MAC0, only. Both operands are signed
fractions. Load the product into A0, then copy to r3.l. */
r3.h=(a1+=r6.h*r4.l) (fu) ; /* MAC1, only. Both operands are
unsigned fractions. Add the product into A1, then copy to r3.h */
Also See
Multiply 32-Bit Operands, Multiply and Multiply-Accumulate to Accu-
mulator, Multiply and Multiply-Accumulate to Data Register, Vector
Multiply, Vector Multiply and Multiply-Accumulate
Special Applications
DSP filter applications often use the Multiply and Multiply-Accumulate
to Half-Register instruction to calculate the dot product between two sig-
nal vectors.
General Form
dest_reg = (accumulator = src_reg_0 * src_reg_1) (opt_mode)
dest_reg = (accumulator += src_reg_0 * src_reg_1) (opt_mode)
dest_reg = (accumulator –= src_reg_0 * src_reg_1) (opt_mode)
Syntax
Multiply-And-Accumulate Unit 0 (MAC0)
Dreg_even = (A0 = Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ; /* mul-
tiply and store (b) */
Dreg_even = (A0 += Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ; /*
multiply and add (b) */
Dreg_even = (A0 –= Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ; /*
multiply and subtract (b) */
Syntax Terminology
Dreg_lo_hi: R7–0.L, R7–0.H
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
This instruction multiplies two 16-bit half-word operands. The instruc-
tion stores, adds or subtracts the product into a designated Accumulator.
It then copies 32 bits of the Accumulator into a data register. The 32 bits
are saturated at 32 bits.
The Multiply-and-Accumulate Unit 0 (MAC0) portion of the architecture
performs operations that involve Accumulator A0; it loads the results into
an even-numbered data register. MAC1 performs A1 operations and loads
the results into an odd-numbered data register.
Combinations of these instructions can be combined into a single instruc-
tion. See “Vector Multiply and Multiply-Accumulate” on page 14-43.
Options
The Multiply and Multiply-Accumulate to Data Register instruction sup-
ports operand and Accumulator copy options.
These options are as shown in Table 10-5.
The syntax supports only biased rounding. The RND_MOD bit in the ASTAT
register has no bearing on the rounding behavior of this instruction.
See “Rounding and Truncating” on page 1-13 for a description of round-
ing behavior.
Default Signed fraction format. Multiply 1.15 * 1.15 formats to produce 1.31 results after
shift correction. The special case of 0x8000 * 0x8000 is saturated to 0x7FFF FFFF
to fit the 1.31 result.
Sign extend 1.31 result to 9.31 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 9.31 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
To extract, saturate the result to 1.31 precision and copy it to the destination regis-
ter. Result is between minimum -1 and maximum 1-2-31 (or, expressed in hex,
between minimum 0x8000 0000 and maximum 0x7FFF FFFF).
(FU) Unsigned fraction format. Multiply 0.16* 0.16 formats to produce 0.32 results.
No shift correction. The special case of 0x8000 * 0x8000 yields 0x4000 0000. No
saturation is necessary since no shift correction occurs.
Zero extend 0.32 result to 8.32 format before copying or accumulating to Accu-
mulator. Then, saturate Accumulator to maintain 8.32 precision; Accumulator
result is between minimum 0x00 0000 0000 and maximum 0xFF FFFF FFFF.
To extract, saturate the result to 0.32 precision and copy it to the destination regis-
ter. Result is between minimum 0 and maximum 1-2-32 (or, expressed in hex,
between minimum 0x0000 0000 and maximum 0xFFFF FFFF).
(IS) Signed integer format. Multiply 16.0 * 16.0 formats to produce 32.0 results. No
shift correction.
Sign extend 32.0 result to 40.0 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 40.0 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
To extract, saturate for 32.0 precision and copy to the destination register. Result
is between minimum -231 and maximum 231-1 (or, expressed in hex, between
minimum 0x8000 0000 and maximum 0x7FFF FFFF).
(S2RND) Signed fraction with scaling and rounding. Multiply 1.15 * 1.15 formats to pro-
duce 1.31 results after shift correction. The special case of 0x8000 * 0x8000 is sat-
urated to 0x7FFF FFFF to fit the 1.31 result. (Same as the Default mode.)
Sign extend 1.31 result to 9.31 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 9.31 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
To extract, shift the Accumulator contents one place to the left (multiply x 2), sat-
urate the result to 1.31 precision, and copy it to the destination register. Result is
between minimum -1 and maximum 1-2-31 (or, expressed in hex, between mini-
mum 0x8000 0000 and maximum 0x7FFF FFFF).
(ISS2) Signed integer with scaling. Multiply 16.0 * 16.0 formats to produce 32.0 results.
No shift correction. (Same as the IS mode.)
Sign extend 32.0 result to 40.0 format before copying or accumulating to Accumu-
lator. Then, saturate Accumulator to maintain 40.0 precision; Accumulator result
is between minimum 0x80 0000 0000 and maximum 0x7F FFFF FFFF.
To extract, shift the Accumulator contents one place to the left (multiply x 2), sat-
urate the result for 32.0 format, and copy to the destination register. Result is
between minimum -231 and maximum 231-1 (or, expressed in hex, between mini-
mum 0x8000 0000 and maximum 0x7FFF FFFF).
(M) Mixed mode multiply (valid only for MAC1). When issued in a fraction mode
instruction (with Default, FU, T, TFU, or S2RND mode), multiply 1.15 * 0.16 to
produce 1.31 results.
When issued in an integer mode instruction (with IS, ISS2, or IH mode), multiply
16.0 * 16.0 (signed * unsigned) to produce 32.0 results.
No shift correction in either case. Src_reg_0 is the signed operand and Src_reg_1
is the unsigned operand.
Accumulation and extraction proceed according to the other mode flag or Default.
Flags Affected
This instruction affects flags as follows.
• V is set if the result extracted to the Dreg saturates; cleared if no
saturation.
• VS is set if V is set; unaffected otherwise.
• AV0 is set if result in Accumulator A0 (MAC0 operation) saturates;
cleared if A0 result does not saturate.
• AV0S is set if AV0 is set; unaffected otherwise.
• AV1 is set if result in Accumulator A1 (MAC1 operation) saturates;
cleared if A1 result does not saturate.
• AV1S is set if AV1 is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r4=(a0=r3.h*r2.h) ; /* MAC0, only. Both operands are signed
fractions. Load the product into A0, then into r4. */
r3=(a1+=r6.h*r4.l) (fu) ; /* MAC1, only. Both operands are
unsigned fractions. Add the product into A1, then into r3. */
Also See
Move Register, Move Register Half, Multiply 32-Bit Operands, Multiply
and Multiply-Accumulate to Accumulator, Multiply and Multiply-Accu-
mulate to Half-Register, Vector Multiply, Vector Multiply and
Multiply-Accumulate
Special Applications
DSP filter applications often use the Multiply and Multiply-Accumulate
to Data Register instruction or the vector version (“Vector Multiply and
Multiply-Accumulate” on page 14-43) to calculate the dot product
between two signal vectors.
General Form
dest_reg = – src_reg
dest_accumulator = – src_accumulator
Syntax
Dreg = – Dreg ; /* (a) */
Dreg = – Dreg (sat_flag) ; /* (b) */
A0 = – A0 ; /* (b) */
A0 = – A1 ; /* (b) */
A1 = – A0 ; /* (b) */
A1 = – A1 ; /* (b) */
A1 = – A1, A0 = – A0 ; /* negate both Accumulators simulta-
neously in one 32-bit length instruction (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Negate (Two’s Complement) instruction returns the same magnitude
with the opposite arithmetic sign. The Accumulator versions saturate the
result at 40 bits. The instruction calculates by subtracting from zero.
Flags Affected
This instruction affects the flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is set if result overflows or saturates and the dest_reg is a Dreg;
cleared if no overflow or saturation.
• VS is set if V is set; unaffected otherwise.
• AV0 is set if result saturates and the dest_reg is A0; cleared if no
saturation.
• AV0S is set if AV0 is set; unaffected otherwise.
• AV1 is set if result saturates and the dest_reg is A1; cleared if no
saturation.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
The 16-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r5 =-r0 ;
a0 =-a0 ;
a0 =-a1 ;
a1 =-a0 ;
a1 =-a1 ;
a1 =-a1, a0=-a0 ;
r0 =-r1(s) ;
r5 =-r0 (ns) ;
Also See
Vector Negate (Two’s Complement)
Special Applications
None
General Form
dest_reg = src_reg (RND)
Syntax
Dreg_lo_hi =Dreg (RND) ; /* round and saturate the source to
16 bits. (b) */
Syntax Terminology
Dreg: R7– 0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Round to Half-Word instruction rounds a 32-bit, normalized-frac-
tion number into a 16-bit, normalized-fraction number by extracting and
saturating bits 31–16, then discarding bits 15–0. The instruction supports
only biased rounding, which adds a half LSB (in this case, bit 15) before
truncating bits 15–0. The ALU performs the rounding. The RND_MOD bit
in the ASTAT register has no bearing on the rounding behavior of this
instruction.
Fractional data types such as the operands used in this instruction are
always signed.
See “Saturation” on page 1-11 for a description of saturation behavior.
Flags Affected
The following flags are affected by this instruction.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• V is set if result saturates; cleared if no saturation.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
/* If r6 = 0xFFFC FFFF, then rounding to 16-bits with . . . */
r1.l = r6 (rnd) ; // . . . produces r1.l = 0xFFFD
// If r7 = 0x0001 8000, then rounding . . .
r1.h = r7 (rnd) ; // . . . produces r1.h = 0x0002
Also See
Add, Add/Subtract – Prescale Up, Add/Subtract – Prescale Down
Special Applications
None
Saturate
General Form
dest_reg = src_reg (S)
Syntax
A0 = A0 (S) ; /* (b) */
A1 = A1 (S) ; /* (b) */
A1 = A1 (S), A0 = A0 (S) ; /* signed saturate both Accumula-
tors at the 32-bit boundary (b) */
Syntax Terminology
None
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Saturate instruction saturates the 40-bit Accumulators at 32 bits. The
resulting saturated value is sign extended into the Accumulator extension
bits.
See “Saturation” on page 1-11 for a description of saturation behavior.
Flags Affected
This instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero. In the case of two
simultaneous operations, AZ represents the logical “OR” of the two.
• ANis set if result is negative; cleared if non-negative. In the case of
two simultaneous operations, AN represents the logical “OR” of the
two.
• AV0 is set if result saturates and the dest_reg is A0; cleared if no
overflow.
• AV0S is set if AV0 is set; unaffected otherwise.
• AV1 is set if result saturates and the dest_reg is A1; cleared if no
overflow.
• AV1S is set if AV1 is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
a0 = a0 (s) ;
a1 = a1 (s) ;
a1 = a1 (s), a0 = a0 (s) ;
Also See
Subtract (saturate options), Add (saturate options)
Special Applications
None
SIGNBITS
General Form
dest_reg = SIGNBITS sample_register
Syntax
Dreg_lo = SIGNBITS Dreg ; /* 32-bit sample (b) */
Dreg_lo = SIGNBITS Dreg_lo_hi ; /* 16-bit sample (b) */
Dreg_lo = SIGNBITS A0 ; /* 40-bit sample (b) */
Dreg_lo = SIGNBITS A1 ; /* 40-bit sample (b) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Sign Bit instruction returns the number of sign bits in a number, and
can be used in conjunction with a shift to normalize numbers. This
instruction can operate on 16-bit, 32-bit, or 40-bit input numbers.
• For a 16-bit input, Sign Bit returns the number of leading sign bits
minus one, which is in the range 0 through 15. There are no spe-
cial cases. An input of all zeros returns +15 (all sign bits), and an
input of all ones also returns +15.
• For a 32-bit input, Sign Bit returns the number of leading sign bits
minus one, which is in the range 0 through 31. An input of all
zeros or all ones returns +31 (all sign bits).
• For a 40-bit Accumulator input, Sign Bit returns the number of
leading sign bits minus 9, which is in the range –8 through +31. A
negative number is returned when the result in the Accumulator
has expanded into the extension bits; the corresponding normaliza-
tion will shift the result down to a 32-bit quantity (losing
precision). An input of all zeros or all ones returns +31.
The result of the SIGNBITS instruction can be used directly as the argu-
ment to ASHIFT to normalize the number. Resultant numbers will be in
the following formats (S == signbit, M == magnitude bit).
40-bit: SSSS SSSS S.MMM MMMM MMMM MMMM MMMM MMMM MMMM MMMM
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r2.l = signbits r7 ;
r1.l = signbits r5.l ;
r0.l = signbits r4.h ;
r6.l = signbits a0 ;
r5.l = signbits a1 ;
Also See
EXPADJ
Special Applications
You can use the exponent as shift magnitude for array normalization. You
can accomplish normalization by using the ASHIFT instruction directly,
without using special normalizing instructions, as required on other
architectures.
Subtract
General Form
dest_reg = src_reg_1 - src_reg_2
Syntax
32-Bit Operands, 32-Bit Result
Dreg = Dreg - Dreg ; /* no saturation support but shorter
instruction length (a) */
Dreg = Dreg - Dreg (sat_flag) ; /* saturation optionally sup-
ported, but at the cost of longer instruction length (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The Subtract instruction subtracts src_reg_2 from src_reg_1 and places
the result in a destination register.
There are two ways to specify subtraction on 32-bit data. One instruction
that is 16-bit instruction length does not support saturation. The other
instruction, which is 32-bit instruction length, optionally supports satura-
tion. The larger DSP instruction can sometimes save execution time
because it can be issued in parallel with certain other instructions. See
“Parallel Issue”.
The instructions for 16-bit data use half-word data register operands and
store the result in a half-word data register.
All the instructions for 16-bit data are 32-bit instruction length.
In the syntax, where sat_flag appears, substitute one of the following
values.
• (S) saturate the result
• (NS) no saturation
See “Saturation” on page 1-11 for a description of saturation behavior.
The Subtract instruction has no subtraction equivalent of the addition
syntax for P-registers.
Flags Affected
This instruction affects flags as follows.
• AZ is set if result is zero; cleared if nonzero.
• AN is set if result is negative; cleared if non-negative.
• AC0 is set if the operation generates a carry; cleared if no carry.
• V is set if result overflows; cleared if no overflow.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
The 16-bit versions of this instruction cannot be issued in parallel with
other instructions.
Example
r5 = r2 - r1 ; /* 16-bit instruction length subtract, no
saturation */
r5 = r2 - r1(ns) ; /* same result as above, but 32-bit
instruction length */
r5 = r2 - r1(s) ; /* saturate the result */
r4.l = r0.l - r7.l (ns) ;
r4.l = r0.l - r7.h (s) ; /* saturate the result */
r0.l = r2.h - r4.l(ns) ;
r1.l = r3.h - r7.h(ns) ;
r4.h = r0.l - r7.l (ns) ;
r4.h = r0.l - r7.h (ns) ;
r0.h = r2.h - r4.l(s) ; /* saturate the result */
r1.h = r3.h - r7.h(ns) ;
Also See
Modify – Decrement, Vector Add / Subtract
Special Applications
None
Subtract Immediate
General Form
register -= constant
Syntax
Ireg -= 2 ; /* decrement Ireg by 2, half-word address pointer
increment (a) */
Ireg -= 4 ; /* word address pointer decrement (a) */
Syntax Terminology
Ireg: I3–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Subtract Immediate instruction subtracts a constant value from an
Index register without saturation.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For details, see “Issuing Parallel Instructions” on
page 15-1.
Example
i0 -= 4 ;
i2 -= 2 ;
Also See
Add Immediate, Subtract
Special Applications
None
Instruction Summary
• “Idle” on page 11-3
• “Core Synchronize” on page 11-5
• “System Synchronize” on page 11-8
• “EMUEXCPT (Force Emulation)” on page 11-11
• “Disable Interrupts” on page 11-13
• “Enable Interrupts” on page 11-15
• “RAISE (Force Interrupt / Reset)” on page 11-17
• “EXCPT (Force Exception)” on page 11-20
• “Test and Set Byte (Atomic)” on page 11-22
• “No Op” on page 11-25
Instruction Overview
This chapter discusses the instructions that manage external events. Users
can take advantage of these instructions to enable interrupts, force a spe-
cific interrupt or reset to occur, or put the processor in idle state. The
Core Synchronize instruction resolves all pending operations and flushes
the core store buffer before proceeding to the next instruction. The Sys-
tem Synchronize instruction forces all speculative, transient states in the
Idle
General Form
IDLE
Syntax
IDLE ; /* (a) */
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
Typically, the Idle instruction is part of a sequence to place the Blackfin
processor in a quiescent state so that the external system can switch
between core clock frequencies.
The IDLE instruction requests an idle state by setting the idle_req bit in
SEQSTAT register. Setting the idle_req bit precedes placing the Blackfin
processor in a quiescent state. If you intend to place the processor in Idle
mode, the IDLE instruction must immediately precede an SSYNC
instruction.
The first instruction following the SSYNC is the first instruction to execute
when the processor recovers from Idle mode.
The Idle instruction is the only way to set the idle_req bit in SEQSTAT.
The architecture does not support explicit writes to SEQSTAT.
Flags Affected
None
Required Mode
The Idle instruction executes only in Supervisor mode. If execution is
attempted in User mode, the instruction produces an Illegal Use of Pro-
tected Resource exception.
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
idle ;
Also See
System Synchronize
Special Applications
None
Core Synchronize
General Form
CSYNC
Syntax
CSYNC ; /* (a) */
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Core Synchronize (CSYNC) instruction ensures resolution of all pend-
ing core operations and the flushing of the core store buffer before
proceeding to the next instruction. Pending core operations include any
speculative states (for example, branch prediction) or exceptions. The core
store buffer lies between the processor and the L1 cache memory.
CCYNC is typically used after core MMR writes to prevent imprecise
behavior.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The Core Synchronize instruction cannot be issued in parallel with other
instructions.
Example
Consider the following example code sequence.
if cc jump away_from_here ; /* produces speculative branch
prediction */
csync ;
r0 = [p0] ; /* load */
In this example, the CSYNC instruction ensures that the load instruction is
not executed speculatively. CSYNC ensures that the conditional branch is
resolved and any entries in the processor store buffer have been flushed. In
addition, all speculative states or exceptions complete processing before
CSYNC completes.
Also See
System Synchronize
Special Applications
Use CSYNC to enforce a strict execution sequence on loads and stores or to
conclude all transitional core states before reconfiguring the core modes.
For example, issue CSYNC before configuring memory-mapped registers
(MMRs). CSYNC should also be issued after stores to MMRs to make sure
the data reaches the MMR before the next instruction is fetched.
Typically, the Blackfin processor executes all load instructions strictly in
the order that they are issued and all store instructions in the order that
they are issued. However, for performance reasons, the architecture relaxes
ordering between load and store operations. It usually allows load opera-
tions to access memory out of order with respect to store operations.
Further, it usually allows loads to access memory speculatively. The core
may later cancel or restart speculative loads. By using the Core Synchro-
nize or System Synchronize instructions and managing interrupts
appropriately, you can restrict out-of-order and speculative behavior.
System Synchronize
General Form
SSYNC
Syntax
SSYNC ; /* (a) */
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The System Synchronize (SSYNC) instruction forces all speculative, tran-
sient states in the core and system to complete before processing
continues. Until SSYNC completes, no further instructions can be issued to
the pipeline.
The SSYNC instruction performs the same function as Core Synchronize
(CSYNC). In addition, SSYNC flushes any write buffers (between the L1
memory and the system interface) and generates a Synch request signal to
the external system. The operation requires an acknowledgement
Synch_Ack signal by the system before completing the instruction.
If the idle_req bit of the SEQSTAT register is set when SSYNC is executed,
the processor enters Idle state and asserts the external Idle signal after
receiving the external Synch_Ack signal. After the external Idle signal is
asserted, exiting the Idle state requires an external Wakeup signal.
SSYNC should be issued immediately before and after writing to a system
MMR. Otherwise, the MMR change can take effect at an indeterminate
time while other instructions are executing, resulting in imprecise
behavior.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The SSYNC instruction cannot be issued in parallel with other instructions.
Example
Consider the following example code sequence.
if cc jump away_from_here ; /* produces speculative branch
prediction */
ssync ;
r0 = [p0] ; /* load */
In this example, SSYNC ensures that the load instruction will not be exe-
cuted speculatively. The instruction ensures that the conditional branch is
resolved and any entries in the processor store buffer and write buffer have
been flushed. In addition, all exceptions complete processing before SSYNC
completes.
Also See
Core Synchronize, Idle
Special Applications
Typically, SSYNC prepares the architecture for clock cessation or frequency
change. In such cases, the following instruction sequence is typical.
:
instruction...
instruction...
CLI r0 ; /* disable interrupts */
idle ; /* enable Idle state */
ssync ; /* conclude all speculative states, assert external
Sync signal, await Synch_Ack, then assert external Idle signal
and stall in the Idle state until the Wakeup signal. Clock input
can be modified during the stall. */
sti r0 ; /* re-enable interrupts when Wakeup occurs */
instruction...
instruction...
General Form
EMUEXCPT
Syntax
EMUEXCPT ; /* (a) */
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Force Emulation instruction forces an emulation exception, thus
allowing the processor to enter emulation mode.
When emulation is enabled, the processor immediately takes an exception
into emulation mode. When emulation is disabled, EMUEXCPT generates an
illegal instruction exception.
An emulation exception is the highest priority event in the processor.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The Force Emulation instruction cannot be issued in parallel with other
instructions.
Example
emuexcpt ;
Also See
RAISE (Force Interrupt / Reset)
Special Applications
None
Disable Interrupts
General Form
CLI
Syntax
CLI Dreg ; /* previous state of IMASK moved to Dreg (a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Disable Interrupts instruction globally disables general interrupts by
setting IMASK to all zeros. In addition, the instruction copies the previous
contents of IMASK into a user-specified register in order to save the state of
the interrupt system.
The Disable Interrupts instruction does not mask NMI, reset, exceptions
and emulation.
Flags Affected
None
Required Mode
The Disable Interrupts instruction executes only in Supervisor mode. If
execution is attempted in User mode, the instruction produces an Illegal
Use of Protected Resource exception.
Parallel Issue
The Disable Interrupts instruction cannot be issued in parallel with other
instructions.
Example
cli r3 ;
Also See
Enable Interrupts
Special Applications
This instruction is often issued immediately before an IDLE instruction.
Enable Interrupts
General Form
STI
Syntax
STI Dreg ; /* previous state of IMASK restored from Dreg
(a) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Enable Interrupts instruction globally enables interrupts by restoring
the previous state of the interrupt system back into IMASK.
Flags Affected
None
Required Mode
The Enable Interrupts instruction executes only in Supervisor mode. If
execution is attempted in User mode, the instruction produces an Illegal
Use of Protected Resource exception.
Parallel Issue
The Enable Interrupts instruction cannot be issued in parallel with other
instructions.
Example
sti r3 ;
Also See
Disable Interrupts
Special Applications
This instruction is often located after an IDLE instruction so that it will
execute after a wake-up event from the idle state.
General Form
RAISE
Syntax
RAISE uimm4 ; /* (a) */
Syntax Terminology
uimm4: 4-bit unsigned field, with the range of 0 through 15
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Force Interrupt / Reset instruction forces a specified interrupt or reset
to occur. Typically, it is a software method of invoking a hardware event
for debug purposes.
When the RAISE instruction is issued, the processor sets a bit in the ILAT
register corresponding to the interrupt vector specified by the uimm4 con-
stant in the instruction. The interrupt executes when its priority is high
enough to be recognized by the processor. The RAISE instruction causes
these events to occur given the uimm4 arguments shown in Table 11-1.
0 <reserved>
1 RST
2 NMI
3 <reserved>
4 <reserved>
5 IVHW
6 IVTMR
7 ICG7
8 IVG8
9 IVG9
10 IVG10
11 IVG11
12 IVG12
13 IVG13
14 IVG14
15 IVG15
Flags Affected
None
Required Mode
The Force Interrupt / Reset instruction executes only in Supervisor mode.
If execution is attempted in User mode, the Force Interrupt / Reset
instruction produces an Illegal Use of Protected Resource exception.
Parallel Issue
The Force Interrupt / Reset instruction cannot be issued in parallel with
other instructions.
Example
raise 1 ; /* Invoke RST */
raise 6 ; /* Invoke IVTMR timer interrupt */
Also See
EXCPT (Force Exception), EMUEXCPT (Force Emulation)
Special Applications
None
General Form
EXCPT
Syntax
EXCPT uimm4 ; /* (a) */
Syntax Terminology
uimm4: 4-bit unsigned field, with the range of 0 through 15
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Force Exception instruction forces an exception with code uimm4.
When the EXCPT instruction is issued, the sequencer vectors to the excep-
tion handler that the user provides.
Application-level code uses the Force Exception instruction for operating
system calls. The instruction does not set the EVSW bit (bit 3) of the ILAT
register.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The Force Exception instruction cannot be issued in parallel with other
instructions.
Example
excpt 4 ;
Also See
None
Special Applications
None
General Form
TESTSET
Syntax
TESTSET ( Preg ) ; /* (a) */
Syntax Terminology
Preg: P5–0 (SP and FP are not allowed as the register for this instruction)
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Test and Set Byte (Atomic) instruction loads an indirectly addressed
memory byte, tests whether it is zero, then sets the most significant bit of
the memory byte without affecting any other bits. If the byte is originally
zero, the instruction sets the CC bit. If the byte is originally nonzero the
instruction clears the CC bit. The sequence of this memory transaction is
atomic.
TESTSET accesses the entire logical memory space except the core Mem-
ory-Mapped Register (MMR) address region. The system design must
ensure atomicity for all memory regions that TESTSET may access. The
hardware does not perform atomic access to L1 memory space configured
as SRAM. Therefore, semaphores must not reside in on-core memory.
The memory architecture always treats atomic operations as cache-inhib-
ited accesses, even if the CPLB descriptor for the address indicates a
cache-enabled access. If a cache hit is detected, the operation flushes and
invalidates the line before allowing the TESTSET to proceed.
Flags Affected
This instruction affects flags as follows.
• CC is set if addressed value is zero; cleared if nonzero.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The TESTSET instruction cannot be issued in parallel with other
instructions.
Example
testset (p1) ;
Also See
Core Synchronize, System Synchronize
Special Applications
Typically, use TESTSET as a semaphore sampling method between copro-
cessors or coprocesses.
No Op
General Form
NOP
MNOP
Syntax
NOP ; /* (a) */
MNOP ; /* (b) */
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length. Comment
(b) identifies 32-bit instruction length.
Functional Description
The No Op instruction increments the PC and does nothing else.
Typically, the No Op instruction allows previous instructions time to
complete before continuing with subsequent instructions. Other uses are
to produce specific delays in timing loops or to act as hardware event tim-
ers and rate generators when no timers and rate generators are available.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 16-bit versions of this instruction can be issued in parallel with spe-
cific other instructions. For details, see “Issuing Parallel Instructions” on
page 15-1.
Example
nop ;
mnop ;
mnop || /* a 16-bit instr. */ || /* a 16-bit instr. */ ;
Also See
None
Special Applications
MNOP can be used to issue loads or store instructions in parallel without
invoking a 32-bit MAC or ALU operation. Refer to “Issuing Parallel
Instructions” on page 15-1 for more information.
Instruction Summary
• “PREFETCH” on page 12-2
• “FLUSH” on page 12-4
• “FLUSHINV” on page 12-6
• “IFLUSH” on page 12-8
Instruction Overview
This chapter discusses the instructions that control cache. Users can take
advantage of these instructions to prefetch or flush the data cache, invali-
date data cache lines, or flush the instruction cache.
PREFETCH
General Form
PREFETCH
Syntax
PREFETCH [ Preg ] ; /* indexed (a) */
PREFETCH [ Preg ++ ] ; /* indexed, post increment (a) */
Syntax Terminology
Preg: P5–0, SP, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Data Cache Prefetch instruction causes the data cache to prefetch the
cache line that is associated with the effective address in the P-register.
The operation causes the line to be fetched if it is not currently in the data
cache and if the address is cacheable (that is, if bit CPLB_L1_CHBL = 1). If
the line is already in the cache or if the cache is already fetching a line, the
prefetch instruction performs no action, like a NOP.
This instruction does not cause address exception violations. If a protec-
tion violation associated with the address occurs, the instruction acts as a
NOP and does not cause a protection violation exception.
Options
The instruction can post-increment the line pointer by the cache line size.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
prefetch [ p2 ] ;
prefetch [ p0 ++ ] ;
Also See
None
Special Applications
None
FLUSH
General Form
FLUSH
Syntax
FLUSH [ Preg ] ; /* indexed (a) */
FLUSH [ Preg ++ ] ; /* indexed, post increment (a) */
Syntax Terminology
Preg: P5–0, SP, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Data Cache Flush instruction causes the data cache to synchronize
the specified cache line with higher levels of memory. This instruction
selects the cache line corresponding to the effective address contained in
the P-register. If the cached data line is dirty, the instruction writes the
line out and marks the line clean in the data cache. If the specified data
cache line is already clean or the cache does not contain the address in the
P-register, this instruction performs no action, like a NOP.
This instruction does not cause address exception violations. If a protec-
tion violation associated with the address occurs, the instruction acts as a
NOP and does not cause a protection violation exception.
Options
The instruction can post-increment the line pointer by the cache line size.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The instruction cannot be issued in parallel with other instructions.
Example
flush [ p2 ] ;
flush [ p0 ++ ] ;
Also See
None
Special Applications
None
FLUSHINV
General Form
FLUSHINV
Syntax
FLUSHINV [ Preg ] ; /* indexed (a) */
FLUSHINV [ Preg ++ ] ; /* indexed, post increment (a) */
Syntax Terminology
Preg: P5–0, SP, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Data Cache Line Invalidate instruction causes the data cache to inval-
idate a specific line in the cache. The contents of the P-register specify the
line to invalidate. If the line is in the cache and dirty, the cache line is
written out to the next level of memory in the hierarchy. If the line is not
in the cache, the instruction performs no action, like a NOP.
This instruction does not cause address exception violations. If a protec-
tion violation associated with the address occurs, the instruction acts as a
NOP and does not cause a protection violation exception.
Options
The instruction can post-increment the line pointer by the cache line size.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The Data Cache Line Invalidate instruction cannot be issued in parallel
with other instructions.
Example
flushinv [ p2 ] ;
flushinv [ p0 ++ ] ;
Also See
None
Special Applications
None
IFLUSH
General Form
IFLUSH
Syntax
IFLUSH [ Preg ] ; /* indexed (a) */
IFLUSH [ Preg ++ ] ; /* indexed, post increment (a) */
Syntax Terminology
Preg: P5–0, SP, FP
Instruction Length
In the syntax, comment (a) identifies 16-bit instruction length.
Functional Description
The Instruction Cache Flush instruction causes the instruction cache to
invalidate a specific line in the cache. The contents of the P-register spec-
ify the line to invalidate. The instruction cache contains no dirty bit.
Consequently, the contents of the instruction cache are never flushed to
higher levels.
This instruction does not cause address exception violations. If a protec-
tion violation associated with the address occurs, the instruction acts as a
NOP and does not cause a protection violation exception.
Options
The instruction can post-increment the line pointer by the cache line size.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction cannot be issued in parallel with other instructions.
Example
iflush [ p2 ] ;
iflush [ p0 ++ ] ;
Also See
None
Special Applications
None
Instruction Summary
• “ALIGN8, ALIGN16, ALIGN24” on page 13-3
• “DISALGNEXCPT” on page 13-6
• “BYTEOP3P (Dual 16-Bit Add / Clip)” on page 13-8
• “Dual 16-Bit Accumulator Extraction with Addition” on
page 13-13
• “BYTEOP16P (Quad 8-Bit Add)” on page 13-15
• “BYTEOP1P (Quad 8-Bit Average – Byte)” on page 13-19
• “BYTEOP2P (Quad 8-Bit Average – Half-Word)” on page 13-24
• “BYTEPACK (Quad 8-Bit Pack)” on page 13-30
• “BYTEOP16M (Quad 8-Bit Subtract)” on page 13-33
• “SAA (Quad 8-Bit Subtract-Absolute-Accumulate)” on page 13-37
• “BYTEUNPACK (Quad 8-Bit Unpack)” on page 13-42
Instruction Overview
This chapter discusses the instructions that manipulate video pixels. Users
can take advantage of these instructions to align bytes, disable exceptions
that result from misaligned 32-bit memory accesses, and perform dual and
quad 8- and 16-bit add, subtract, and averaging operations.
General Form
dest_reg = ALIGN8 ( src_reg_1, src_reg_0 )
dest_reg = ALIGN16 (src_reg_1, src_reg_0 )
dest_reg = ALIGN24 (src_reg_1, src_reg_0 )
Syntax
Dreg = ALIGN8 ( Dreg, Dreg ) ; /* overlay 1 byte (b) */
Dreg = ALIGN16 ( Dreg, Dreg ) ; /* overlay 2 bytes (b) */
Dreg = ALIGN24 ( Dreg, Dreg ) ; /* overlay 3 bytes (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Byte Align instruction copies a contiguous four-byte unaligned word
from a combination of two data registers. The instruction version deter-
mines the bytes that are copied; in other words, the byte alignment of the
copied word. Alignment options are shown in Table 13-1.
The ALIGN16 version performs the same operation as the Vector Pack
instruction using the dest_reg = PACK ( Dreg_lo, Dreg_hi ) syntax.
Use the Byte Align instruction to align data bytes for subsequent sin-
gle-instruction, multiple-data (SIMD) instructions.
The input values are not implicitly modified by this instruction. The des-
tination register can be the same D-register as one of the source registers.
Doing this explicitly modifies that source register.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
// If r3 = 0xABCD 1234 and r4 = 0xBEEF DEAD, then . . .
r0 = align8 (r3, r4) ; /* produces r0 = 0x34BE EFDE, */
Also See
Vector PACK
Special Applications
None
DISALGNEXCPT
General Form
DISALGNEXCPT
Syntax
DISALGNEXCPT ; /* (b) */
Syntax Terminology
None
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Disable Alignment Exception for Load (DISALGNEXCPT) instruction
prevents exceptions that would otherwise be caused by misaligned 32-bit
memory loads issued in parallel. This instruction only affects misaligned
32-bit load instructions that use I-register indirect addressing.
In order to force address alignment to a 32-bit boundary, the two LSBs of
the address are cleared before being sent to the memory system. The I-reg-
ister is not modified by the DISALIGNEXCPT instruction. Also, any
modifications performed to the I-register by a parallel instruction are not
affected by the DISALIGNEXCPT instruction.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
disalgnexcpt || r1 = [i0++] || r3 = [i1++] ; /* three instruc-
tions in parallel */
disalgnexcpt || [p0 ++ p1] = r5 || r3 = [i1++] ; /* alignment
exception is prevented only for the load */
disalgnexcpt || r0 = [p2++] || r3 = [i1++] ; /* alignment
exception is prevented only for the I-reg load */
Also See
Any Quad 8-Bit instructions, ALIGN8, ALIGN16, ALIGN24
Special Applications
Use the DISALGNEXCPT instruction when priming data registers for Quad
8-Bit single-instruction, multiple-data (SIMD) instructions.
Quad 8-Bit SIMD instructions require as many as sixteen 8-bit operands,
four D-registers worth, to be preloaded with operand data. The operand
data is 8 bits and not necessarily word aligned in memory. Thus, use DIS-
ALGNEXCPT to prevent spurious exceptions for these potentially misaligned
accesses.
During execution, when Quad 8-Bit SIMD instructions perform 8-bit
boundary accesses, they automatically prevent exceptions for misaligned
accesses. No user intervention is required.
General Form
dest_reg = BYTEOP3P ( src_reg_0, src_reg_1 ) (LO)
dest_reg = BYTEOP3P ( src_reg_0, src_reg_1 ) (HI)
dest_reg = BYTEOP3P ( src_reg_0, src_reg_1 ) (LO, R)
dest_reg = BYTEOP3P ( src_reg_0, src_reg_1 ) (HI, R)
Syntax
/* forward byte order operands */
Dreg = BYTEOP3P (Dreg_pair, Dreg_pair) (LO) ; /* sum into low
bytes (b) */
Dreg = BYTEOP3P (Dreg_pair, Dreg_pair) (HI) ; /* sum into high
bytes (b) */
/* reverse byte order operands */
Dreg = BYTEOP3P (Dreg_pair, Dreg_pair) (LO, R) ; /* sum into
low bytes (b) */
Dreg = BYTEOP3P (Dreg_pair, Dreg_pair) (HI, R) ; /* sum into
high bytes (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Dual 16-Bit Add / Clip instruction adds two 8-bit unsigned values to
two 16-bit signed values, then limits (or “clips”) the result to the 8-bit
unsigned range 0 through 255, inclusive. The instruction loads the results
aligned_src_reg_0: y1 y0
aligned_src_reg_1: z3 z2 z1 z0
Table 13-3. The versions that load the result into the lower byte–“(LO)”–
produce:
31................24 23................16 15..................8 7....................0
Table 13-4. And the versions that load the result into the higher byte–
“(HI)”–produce:
31................24 23................16 15..................8 7....................0
In either case, the unused bytes in the destination register are filled with
0x00.
The 8-bit and 16-bit addition is performed as a signed operation. The
16-bit operand is sign-extended to 32 bits before adding.
The only valid input source register pairs are R1:0 and R3:2.
The Dual 16-Bit Add / Clip instruction provides byte alignment directly
in the source register pairs src_reg_0 and src_reg_1 based on index regis-
ters I0 and I1.
• The two LSBs of the I0 register determine the byte alignment for
source register pair src_reg_0 (typically R1:0).
• The two LSBs of the I1 register determine the byte alignment for
source register pair src_reg_1 (typically R3:2).
The relationship between the I-register bits and the byte alignment is
illustrated in Table 13-5.
In the default source order case (e.g., not the ( – , R) syntax), assuming a
source register pair contains the following.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
Options
The ( – , R) syntax reverses the order of the source registers within each
register pair. Typical high performance applications cannot afford the
overhead of reloading both register pair operands to maintain byte order
for every calculation. Instead, they alternate and load only one register
pair operand each time and alternate between the forward and reverse byte
order versions of this instruction. By default, the low order bytes come
from the low register in the register pair. The ( – , R) option causes the
low order bytes to come from the high register.
In the optional reverse source order case (e.g., using the ( – , R) syntax),
the only difference is the source registers swap places within the register
pair in their byte ordering. Assume a source register pair contains the data
shown in Table 13-6.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3 = byteop3p (r1:0, r3:2) (lo) ;
r3 = byteop3p (r1:0, r3:2) (hi) ;
r3 = byteop3p (r1:0, r3:2) (lo, r) ;
r3 = byteop3p (r1:0, r3:2) (hi, r) ;
Also See
BYTEOP16P (Quad 8-Bit Add)
Special Applications
This instruction is primarily intended for video motion compensation
algorithms. The instruction supports the addition of the residual to a
video pixel value, followed by unsigned byte saturation.
General Form
dest_reg_1 = A1.L + A1.H, dest_reg_0 = A0.L + A0.H
Syntax
Dreg = A1.L + A1.H, Dreg = A0.L + A0.H ; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Dual 16-Bit Accumulator Extraction with Addition instruction adds
together the upper half-words (bits 31through 16) and lower half-words
(bits 15 through 0) of each Accumulator and loads each result into a
32-bit destination register.
Each 16-bit half-word in each Accumulator is sign extended before being
added together.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r4=a1.l+a1.h, r7=a0.l+a0.h ;
Also See
SAA (Quad 8-Bit Subtract-Absolute-Accumulate)
Special Applications
Use the Dual 16-Bit Accumulator Extraction with Addition instruction
for motion estimation algorithms in conjunction with the Quad 8-Bit
Subtract-Absolute-Accumulate instruction.
General Form
(dest_reg_1, dest_reg_0) = BYTEOP16P (src_reg_0, src_reg_1)
(dest_reg_1, dest_reg_0) = BYTEOP16P (src_reg_0, src_reg_1) (R)
Syntax
/* forward byte order operands */
( Dreg, Dreg ) = BYTEOP16P ( Dreg_pair, Dreg_pair ) ; /* (b) */
/* reverse byte order operands */
( Dreg, Dreg ) = BYTEOP16P ( Dreg_pair, Dreg_pair ) (R)
; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Quad 8-Bit Add instruction adds two unsigned quad byte number
sets byte-wise, adjusting for byte alignment. It then loads the byte-wise
results as 16-bit, zero-extended, half-words in two destination registers, as
shown inTable 13-7 and Table 13-8.
The only valid input source register pairs are R1:0 and R3:2.
aligned_src_reg_0: y3 y2 y1 y0
aligned_src_reg_1: z3 z2 z1 z0
aligned_src_reg_0: y1 + z1 y0 + z0
aligned_src_reg_1: y3 + z3 y2 + z2
The Quad 8-Bit Add instruction provides byte alignment directly in the
source register pairs src_reg_0 and src_reg_1 based on index registers I0
and I1.
• The two LSBs of the I0 register determine the byte alignment for
source register pair src_reg_0 (typically R1:0).
• The two LSBs of the I1 register determine the byte alignment for
source register pair src_reg_1 (typically R3:2).
The relationship between the I-register bits and the byte alignment is
illustrated below.
In the default source order case (e.g., not the (R) syntax), assume that a
source register pair contains the data shown in Table 13-9.
This instruction prevents exceptions that would otherwise be caused by
misaligned 32-bit memory loads issued in parallel.
Options
The (R) syntax reverses the order of the source registers within each regis-
ter pair. Typical high performance applications cannot afford the
overhead of reloading both register pair operands to maintain byte order
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
for every calculation. Instead, they alternate and load only one register
pair operand each time and alternate between the forward and reverse byte
order versions of this instruction. By default, the low order bytes come
from the low register in the register pair. The (R) option causes the low
order bytes to come from the high register.
In the optional reverse source order case (e.g., using the (R) syntax), the
only difference is the source registers swap places within the register pair
in their byte ordering. Assume a source register pair contains the data
shown in Table 13-10.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
00b: byte3 byte2 byte1 byte0
The mnemonic derives its name from the fact that the operands are bytes,
the result is 16 bits, and the arithmetic operation is “plus” for addition.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
(r1,r2)= byteop16p (r3:2,r1:0) ;
(r1,r2)= byteop16p (r3:2,r1:0) (r) ;
Also See
BYTEOP16M (Quad 8-Bit Subtract)
Special Applications
This instruction provides packed data arithmetic typical of video and
image processing applications.
General Form
dest_reg = BYTEOP1P ( src_reg_0, src_reg_1 )
dest_reg = BYTEOP1P ( src_reg_0, src_reg_1 ) (T)
dest_reg = BYTEOP1P ( src_reg_0, src_reg_1 ) (R)
dest_reg = BYTEOP1P ( src_reg_0, src_reg_1 ) (T, R)
Syntax
/* forward byte order operands */
Dreg = BYTEOP1P (Dreg_pair, Dreg_pair) ; /* (b) */
Dreg = BYTEOP1P (Dreg_pair, Dreg_pair) (T) ; /* truncated (b)
*/
/* reverse byte order operands */
Dreg = BYTEOP1P (Dreg_pair, Dreg_pair) (R) ; /* (b) */
Dreg = BYTEOP1P (Dreg_pair, Dreg_pair) (T, R) ; /* truncated (b)
*/
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Quad 8-Bit Average – Byte instruction computes the arithmetic aver-
age of two unsigned quad byte number sets byte wise, adjusting for byte
alignment. This instruction loads the byte-wise results as concatenated
bytes in one 32-bit destination register, as shown in Table 13-11 and
Table 13-12.
aligned_src_reg_0: y3 y2 y1 y0
aligned_src_reg_1: z3 z2 z1 z0
The relationship between the I-register bits and the byte alignment is
illustrated below.
In the default source order case (e.g., not the (R) syntax), assume a source
register pair contains the data shown in Table 13-13.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
Options
The Quad 8-Bit Average – Byte instruction supports the following
options.
(R) Reverses the order of the source registers within each register pair. Typical
high performance applications cannot afford the overhead of reloading
both register pair operands to maintain byte order for every calculation.
Instead, they alternate and load only one register pair operand each time
and alternate between the forward and reverse byte order versions of this
instruction. By default, the low order bytes come from the low register in
the register pair. The (R) option causes the low order bytes to come from
the high register.
In the optional reverse source order case (e.g., using the (R) syntax), the
only difference is the source registers swap places within the register pair
in their byte ordering. Assume a source register pair contains the data
shown in Table 13-15.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
The mnemonic derives its name from the fact that the operands are bytes,
the result is one word, and the basic arithmetic operation is “plus” for
addition. The single destination register indicates that averaging is
performed.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3 = byteop1p (r1:0, r3:2) ;
r3 = byteop1p (r1:0, r3:2) (r) ;
r3 = byteop1p (r1:0, r3:2) (t) ;
r3 = byteop1p (r1:0, r3:2) (t,r) ;
Also See
BYTEOP16P (Quad 8-Bit Add)
Special Applications
This instruction supports binary interpolation used in fractional motion
search and motion compensation algorithms.
General Form
dest_reg = BYTEOP2P ( src_reg_0, src_reg_1 ) (RNDL)
dest_reg = BYTEOP2P ( src_reg_0, src_reg_1 ) (RNDH)
dest_reg = BYTEOP2P ( src_reg_0, src_reg_1 ) (TL)
dest_reg = BYTEOP2P ( src_reg_0, src_reg_1 ) (TH)
dest_reg = BYTEOP2P ( src_reg_0, src_reg_1 ) (RNDL, R)
dest_reg = BYTEOP2P ( src_reg_0, src_reg_1 ) (RNDH, R)
dest_reg = BYTEOP2P ( src_reg_0, src_reg_1 ) (TL, R)
dest_reg = BYTEOP2P ( src_reg_0, src_reg_1 ) (TH, R)
Syntax
/* forward byte order operands */
Dreg = BYTEOP2P (Dreg_pair, Dreg_pair) (RNDL) ;
/* round into low bytes (b) */
Dreg = BYTEOP2P (Dreg_pair, Dreg_pair) (RNDH) ;
/* round into high bytes (b) */
Dreg = BYTEOP2P (Dreg_pair, Dreg_pair) (TL) ;
/* truncate into low bytes (b) */
Dreg = BYTEOP2P (Dreg_pair, Dreg_pair) (TH) ;
/* truncate into high bytes (b) */
/* reverse byte order operands */
Dreg = BYTEOP2P (Dreg_pair, Dreg_pair) (RNDL, R) ;
/* round into low bytes (b) */
Dreg = BYTEOP2P (Dreg_pair, Dreg_pair) (RNDH, R) ;
/* round into high bytes (b) */
Dreg = BYTEOP2P (Dreg_pair, Dreg_pair) (TL, R) ;
/* truncate into low bytes (b) */
Dreg = BYTEOP2P (Dreg_pair, Dreg_pair) (TH, R) ;
/* truncate into high bytes (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Quad 8-Bit Average – Half-Word instruction finds the arithmetic
average of two unsigned quad byte number sets byte wise, adjusting for
byte alignment. This instruction averages four bytes together. The instruc-
tion loads the results as bytes on half-word boundaries in one 32-bit
destination register. Some syntax options load the upper byte in the
half-word and others load the lower byte, as shown in Table 13-16,
Table 13-17, and Table 13-18.
aligned_src_reg_0: y3 y2 y1 y0
aligned_src_reg_1: z3 z2 z1 z0
Table 13-17. The versions that load the result into the lower byte –
RNDL and TL – produce:
31................24 23................16 15..................8 7....................0
In either case, the unused bytes in the destination register are filled with
0x00.
Table 13-18. And the versions that load the result into the higher byte –
RNDH and TH – produce:
31................24 23................16 15..................8 7....................0
Arithmetic average (or mean) is calculated by summing the four byte oper-
ands, then shifting right two places to divide by four.
When the intermediate sum is not evenly divisible by 4, precision may be
lost.
The user has two options to bias the result–truncation or biased rounding.
See “Rounding and Truncating” on page 1-13 for a description of unbi-
ased rounding and truncating behavior.
The RND_MOD bit in the ASTAT register has no bearing on the rounding
behavior of this instruction.
The only valid input source register pairs are R1:0 and R3:2.
The Quad 8-Bit Average – Half-Word instruction provides byte align-
ment directly in the source register pairs src_reg_0 (typically R1:0) and
src_reg_1 (typically R3:2) based only on the I0 register. The byte align-
ment in both source registers must be identical since only one register
specifies the byte alignment for them both.
The relationship between the I-register bits and the byte alignment is
illustrated in Table 13-19.
In the default source order case (for example, not the (R) syntax), assume a
source register pair contains the data shown in Table 13-19.
This instruction prevents exceptions that would otherwise be caused by
misaligned 32-bit memory loads issued in parallel.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
Options
The Quad 8-Bit Average – Half-Word instruction supports the following
options.
(—L) Loads the results into the lower byte of each destination half-word.
(—H) Loads the results into the higher byte of each destination half-word.
( ,R) Reverses the order of the source registers within each register pair. Typical
high performance applications cannot afford the overhead of reloading both
register pair operands to maintain byte order for every calculation. Instead,
they alternate and load only one register pair operand each time and alternate
between the forward and reverse byte order versions of this instruction. By
default, the low order bytes come from the low register in the register pair.
The (R) option causes the low order bytes to come from the high register.
When used together, the order of the options in the syntax makes no
difference.
In the optional reverse source order case (e.g., using the (R) syntax), the
only difference is the source registers swap places within the register pair
in their byte ordering. Assume a source register pair contains the data
shown in Table 13-21.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
The mnemonic derives its name from the fact that the operands are bytes,
the result is two half-words, and the basic arithmetic operation is “plus”
for addition. The single destination register indicates that averaging is
performed.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3 = byteop2p (r1:0, r3:2) (rndl) ;
r3 = byteop2p (r1:0, r3:2) (rndh) ;
r3 = byteop2p (r1:0, r3:2) (tl) ;
r3 = byteop2p (r1:0, r3:2) (th) ;
r3 = byteop2p (r1:0, r3:2) (rndl, r) ;
r3 = byteop2p (r1:0, r3:2) (rndh, r) ;
r3 = byteop2p (r1:0, r3:2) (tl, r) ;
r3 = byteop2p (r1:0, r3:2) (th, r) ;
Also See
BYTEOP1P (Quad 8-Bit Average – Byte)
Special Applications
This instruction supports binary interpolation used in fractional motion
search and motion compensation algorithms.
General Form
dest_reg = BYTEPACK ( src_reg_0, src_reg_1 )
Syntax
Dreg = BYTEPACK ( Dreg, Dreg ) ; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Quad 8-Bit Pack instruction packs four 8-bit values, half-word
aligned, contained in two source registers into one register, byte aligned as
shown in Table 13-22 and Table 13-23.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r2 = bytepack (r4,r5) ;
• Assuming:
• R4 = 0xFEED FACE
• R5 = 0xBEEF BADD
then this instruction returns:
• R2 = 0xEFDD EDCE
Also See
BYTEUNPACK (Quad 8-Bit Unpack)
Special Applications
None
General Form
(dest_reg_1, dest_reg_0) = BYTEOP16M (src_reg_0, src_reg_1)
(dest_reg_1, dest_reg_0) = BYTEOP16M (src_reg_0, src_reg_1) (R)
Syntax
/* forward byte order operands */
(Dreg, Dreg) = BYTEOP16M (Dreg_pair, Dreg_pair) ; /* (b */)
/* reverse byte order operands */
(Dreg, Dreg) = BYTEOP16M (Dreg-pair, Dreg-pair) (R) ; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Quad 8-Bit Subtract instruction subtracts two unsigned quad byte
number sets byte wise, adjusting for byte alignment. The instruction loads
the byte-wise results as sign-extended half-words in two destination regis-
ters, as shown in Table 13-24 and Table 13-25.
aligned_src_reg_0: y3 y2 y1 y0
aligned_src_reg_1: z3 z2 z1 z0
dest_reg_0: y1 - z1 y0 - z0
dest_reg_1: y3 - z3 y2 - z2
The only valid input source register pairs are R1:0 and R3:2.
The Quad 8-Bit Subtract instruction provides byte alignment directly in
the source register pairs src_reg_0 and src_reg_1 based on index registers
I0 and I1.
• The two LSBs of the I0 register determine the byte alignment for
source register pair src_reg_0 (typically R1:0).
• The two LSBs of the I1 register determine the byte alignment for
source register pair src_reg_1 (typically R3:2).
The relationship between the I-register bits and the byte alignment is
illustrated shown in Table 13-26.
In the default source order case (e.g., not the (R) syntax), assume a source
register pair contains the data shown in Table 13-26.
Options
The (R) syntax reverses the order of the source registers within each regis-
ter pair. Typical high performance applications cannot afford the
overhead of reloading both register pair operands to maintain byte order
for every calculation. Instead, they alternate and load only one register
pair operand each time and alternate between the forward and reverse byte
order versions of this instruction. By default, the low order bytes come
from the low register in the register pair. The (R) option causes the low
order bytes to come from the high register.
In the optional reverse source order case (e.g., using the (R) syntax), the
only difference is the source registers swap places within the register pair
in their byte ordering. Assume that a source register pair contains the data
shown in Table 13-27.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
The mnemonic derives its name from the fact that the operands are bytes,
the result is 16 bits, and the arithmetic operation is “minus” for
subtraction.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
(r1,r2)= byteop16m (r3:2,r1:0) ;
(r1,r2)= byteop16m (r3:2,r1:0) (r) ;
Also See
BYTEOP16P (Quad 8-Bit Add)
Special Applications
This instruction provides packed data arithmetic typical of video and
image processing applications.
General Form
SAA ( src_reg_0, src_reg_1 )
SAA ( src_reg_0, src_reg_1 ) (R)
Syntax
SAA (Dreg_pair, Dreg_pair) ; /* forward byte order operands
(b) */
SAA (Dreg_pair, Dreg_pair) (R) ; /* reverse byte order oper-
ands (b) */
Syntax Terminology
Dreg_pair: R1:0, R3:2 (This instruction only supports register pairs R1:0
and R3:2.)
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Quad 8-Bit Subtract-Absolute-Accumulate instruction subtracts four
pairs of values, takes the absolute value of each difference, and accumu-
lates each result into a 16-bit Accumulator half. The results are placed in
the upper- and lower-half Accumulators A0.H, A0.L, A1.H, and A1.L.
Saturation is performed if an operation overflows a 16-bit Accumulator
half.
Only register pairs R1:0 and R3:2 are valid sources for this instruction.
This instruction supports the following byte-wise Sum of Absolute Differ-
ence (SAD) calculations.
N–1 N–1
SAD = ∑ ∑ a (i,j) – b (i,j)
i=0 j=0
Typical values for N are 8 and 16, corresponding to the video block size of
8x8 and 16x16 pixels, respectively. The 16-bit Accumulator registers limit
the pixel region or block size to 32x32 pixels.
The SAA instruction behavior is shown below.
A1.H +=| a(i, j+3) A1.L +=| a(i, j+2) A0.H +=| a(i, j+1) A0.L +=| a(i, j)
-b(i, j+3) | - b(i, j+2) | - b(i, j+1) | - b(i, j) |
In the default source order case (e.g., not the (R) syntax), assume a source
register pair contain the data shown in Table 13-29.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
Options
The (R) syntax reverses the order of the source registers within each pair.
Typical high performance applications cannot afford the overhead of
reloading both register pair operands to maintain byte order for every cal-
culation. Instead, they alternate and load only one register pair operand
each time and alternate between the forward and reverse byte order ver-
sions of this instruction. By default, the low order bytes come from the
low register in the register pair. The (R) option causes the low order bytes
to come from the high register.
When reversing source order by using the (R) syntax, the source registers
swap places within the register pair in their byte ordering. If a source reg-
ister pair contains the data shown in Table 13-30, then the
SAA instruction computes 12 pixel operations simultaneously–the
three-operation subtract-absolute-accumulate on four pairs of operand
bytes in parallel.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
saa (r1:0, r3:2) || r0 = [i0++] || r2 = [i1++] ; /* parallel fill
instructions */
saa (r1:0, r3:2) (R) || r1 = [i0++] || r3 = [i1++] ; /* reverse,
parallel fill instructions */
saa (r1:0, r3:2) ; /* last SAA in a loop, no more fill
required */
Also See
DISALGNEXCPT, Load Data Register
Special Applications
Use the Quad 8-Bit Subtract-Absolute-Accumulate instruction for
block-based video motion estimation algorithms using block Sum of
Absolute Difference (SAD) calculations to measure distortion.
General Form
( dest_reg_1, dest_reg_0 ) = BYTEUNPACK src_reg_pair
( dest_reg_1, dest_reg_0 ) = BYTEUNPACK src_reg_pair (R)
Syntax
( Dreg , Dreg ) = BYTEUNPACK Dreg_pair ; /* (b) */
( Dreg , Dreg ) = BYTEUNPACK Dreg_pair (R) ; /* reverse source
order (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Quad 8-Bit Unpack instruction copies four contiguous bytes from a
pair of source registers, adjusting for byte alignment. The instruction
loads the selected bytes into two arbitrary data registers on half-word
alignment.
The two LSBs of the I0 register determine the source byte alignment, as
illustrated below.
In the default source order case (e.g., not the (R) syntax), assume the
source register pair contains the data shown in Table 13-31.
This instruction prevents exceptions that would otherwise be caused by
misaligned 32-bit memory loads issued in parallel.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
Options
The (R) syntax reverses the order of the source registers within the pair.
Typical high performance applications cannot afford the overhead of
reloading both register pair operands to maintain byte order for every cal-
culation. Instead, they alternate and load only one register pair operand
each time and alternate between the forward and reverse byte order ver-
sions of this instruction. By default, the low order bytes come from the
low register in the register pair. The (R) option causes the low order bytes
to come from the high register.
In the optional reverse source order case (e.g., using the (R) syntax), the
only difference is the source registers swap places in their byte ordering.
Assume the source register pair contains the data shown in Table 13-32.
Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
The four bytes, now byte aligned, are copied into the destination registers
on half-word alignment, as shown in Table 13-33 and Table 13-34.
Only register pairs R1:0 and R3:2 are valid sources for this instruction.
Misaligned access exceptions are disabled during this instruction.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
(r6,r5) = byteunpack r1:0 ; /* non-reversing sources */
• Assuming:
• register I0’s two LSBs = 00b,
• R1 = 0xFEED FACE
• R0 = 0xBEEF BADD
then this instruction returns:
• R6 = 0x00BE 00EF
• R5 = 0x00BA 00DD
• Assuming:
• register I0’s two LSBs = 01b,
• R1 = 0xFEED FACE
• R0 = 0xBEEF BADD
then this instruction returns:
• R6 = 0x00CE 00BE
• R5 = 0x00EF 00BA
• Assuming:
• register I0’s two LSBs = 10b,
• R1 = 0xFEED FACE
• R0 = 0xBEEF BADD
• Assuming:
• register I0’s two LSBs = 00b,
• R1 = 0xFEED FACE
• R0 = 0xBEEF BADD
then this instruction returns:
• R6 = 0x00FE 00ED
• R5 = 0x00FA 00CE
• Assuming:
• register I0’s two LSBs = 01b,
• R1 = 0xFEED FACE
• R0 = 0xBEEF BADD
then this instruction returns:
• R6 = 0x00DD 00FE
• R5 = 0x00ED 00FA
• Assuming:
• register I0’s two LSBs = 10b,
• R1 = 0xFEED FACE
• R0 = 0xBEEF BADD
then this instruction returns:
• R6 = 0x00BA 00DD
• R5 = 0x00FE 00ED
• Assuming:
• register I0’s two LSBs = 11b,
• R1 = 0xFEED FACE
• R0 = 0xBEEF BADD
then this instruction returns:
• R6 = 0x00EF 00BA
• R5 = 0x00DD 00FE
Also See
BYTEPACK (Quad 8-Bit Pack)
Special Applications
None
Instruction Summary
• “Add on Sign” on page 14-3
• “VIT_MAX (Compare-Select)” on page 14-9
• “Vector ABS” on page 14-16
• “Vector Add / Subtract” on page 14-19
• “Vector Arithmetic Shift” on page 14-25
• “Vector Logical Shift” on page 14-30
• “Vector MAX” on page 14-34
• “Vector MIN” on page 14-37
• “Vector Multiply” on page 14-40
• “Vector Multiply and Multiply-Accumulate” on page 14-43
• “Vector Negate (Two’s Complement)” on page 14-48
• “Vector PACK” on page 14-50
• “Vector SEARCH” on page 14-52
Instruction Overview
This chapter discusses the instructions that control vector operations.
Users can take advantage of these instructions to perform simultaneous
operations on multiple 16-bit values, including add, subtract, multiply,
shift, negate, pack, and search. Compare-Select and Add-On-Sign are also
included in this chapter.
Add on Sign
General Form
dest_hi = dest_lo = SIGN (src0_hi) * src1_hi
+ SIGN (src0_lo) * src1_lo
Syntax
Dreg_hi = Dreg_lo = SIGN ( Dreg_hi ) * Dreg_hi
+ SIGN ( Dreg_lo ) * Dreg_lo ;
/* (b) */
Register Consistency
The destination registers dest_hi and dest_lo must be halves of the same
data register. Similarly, src0_hi and src0_lo must be halves of the same
register and src1_hi and src1_lo must be halves of the same register.
Syntax Terminology
Dreg_hi: R7–0.H
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Add on Sign instruction performs a two step function, as follows.
1. Multiply the arithmetic sign of a 16-bit half-word number in src0
by the corresponding half-word number in src1. The arithmetic
sign of src0 is either (+1) or (–1), depending on the sign bit of
src0. The instruction performs this operation on the upper and
lower half-words of the same data registers.
The results of this step obey the signed multiplication rules sum-
marized in Table 14-1. Y is the number in src0, and Z is the
number in src1. The numbers in src0 and src1 may be positive or
negative.
Table 14-1.
SRC0 SRC1 Sign-Adjusted SRC1
+Y +Z +Z
+Y –Z –Z
–Y +Z –Z
–Y –Z +Z
Note the result always bears the magnitude of Z with only the sign
affected.
2. Then, add the sign-adjusted src1 upper and lower half-word
results together and store the same 16-bit sum in the upper and
lower halves of the destination register, as shown in Table 14-2 and
Table 14-3.
The sum is not saturated if the addition exceeds 16 bits.
src0: a1 a0
src1: b1 b0
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r7.h=r7.l=sign(r2.h)*r3.h+sign(r2.l)*r3.l ;
• If
• R2.H =2
• R3.H = 23
• R2.L = 2001
• R3.L = 1234
then
• R7.H = 1257 (or 1234 + 23)
• R7.L = 1257
• If
• R2.H = –2
• R3.H = 23
• R2.L = 2001
• R3.L = 1234
then
• R7.H = 1211 (or 1234 – 23)
• R7.L = 1211
• If
• R2.H =2
• R3.H = 23
• R2.L = –2001
• R3.L = 1234
then
• R7.H = –1211 (or (–1234) + 23)
• R7.L = –1211
• If
• R2.H = –2
• R3.H = 23
• R2.L = –2001
• R3.L = 1234
then
• R7.H = –1257 (or (–1234) – 23)
• R7.L = –1257
Also See
None
Special Applications
Use the Sum on Sign instruction to compute the branch metric used by
each Viterbi Butterfly.
VIT_MAX (Compare-Select)
General Form
dest_reg = VIT_MAX ( src_reg_0, src_reg_1 ) (ASL)
dest_reg = VIT_MAX ( src_reg_0, src_reg_1 ) (ASR)
dest_reg_lo = VIT_MAX ( src_reg ) (ASL)
dest_reg_lo = VIT_MAX ( src_reg ) (ASR)
Syntax
Dual 16-Bit Operation
Dreg = VIT_MAX ( Dreg , Dreg ) (ASL) ; /* shift history bits
left (b) */
Dreg = VIT_MAX ( Dreg , Dreg ) (ASR) ; /* shift history bits
right (b) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Compare-Select (VIT_MAX) instruction selects the maximum values of
pairs of 16-bit operands, returns the largest values to the destination regis-
ter, and serially records in A0.W the source of the maximum.This operation
performs signed operations. The operands are compared as two’s
complements.
Versions are available for dual and single 16-bit operations. Whereas the
dual versions compare four operands to return two maxima, the single ver-
sions compare only two operands to return one maximum.
The Accumulator extension bits (bits 39–32) must be cleared before exe-
cuting this instruction.
This operation is illustrated in Table 14-4 and Table 14-5.
src_reg_0 y1 y0
src_reg_1 z1 z0
A0 00000000 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXBB
Conversely, the ASR version shifts A0 right two bit positions and appends
two MSBs to indicate the source of each maximum as shown in
Table 14-8 and Table 14-9.
A0 00000000 BBXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Notice that the history bit code depends on the A0 shift direction. The bit
for src_reg_1 is always shifted onto A0 first, followed by the bit for
src_reg_0.
src_reg y1 y0
dest_reg_lo Maximum, y1 or y0
The ASL version shifts A0 left one bit position and appends an LSB to
indicate the source of the maximum.
A0 00000000 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXB
Conversely, the ASR version shifts A0 right one bit position and appends
an MSB to indicate the source of the maximum.
A0 00000000 BXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
0 y0 is the maximum
1 y1 is the maximum
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r5 = vit_max(r3, r2)(asl) ; /* shift left, dual operation */
• Assume:
• R3 = 0xFFFF 0000
• R2 = 0x0000 FFFF
• A0 = 0x00 0000 0000
This example produces:
• R5 = 0x0000 0000
• A0 = 0x00 0000 0002
r7 = vit_max (r1, r0) (asr) ; /* shift right, dual operation */
• Assume:
• R1 = 0xFEED BEEF
• R0 = 0xDEAF 0000
• A0 = 0x00 0000 0000
This example produces:
• R7 = 0xFEED 0000
• A0 = 0x00 8000 0000
r3.l = vit_max (r1)(asl) ; /* shift left, single operation */
• Assume:
• R1 = 0xFFFF 0000
• A0 = 0x00 0000 0000
• Assume:
• R1 = 0x1234 FADE
• A0 = 0x00 FFFF FFFF
This example produces:
• R3.L = 0x1234
• A0 = 0x00 7FFF FFFF
Also See
MAX
Special Applications
The Compare-Select (VIT_MAX) instruction is a key element of the
Add-Compare-Select (ACS) function for Viterbi decoders. Combine it
with a Vector Add instruction to calculate a trellis butterfly used in ACS
functions.
Vector ABS
General Form
dest_reg = ABS source_reg (V)
Syntax
Dreg = ABS Dreg (V) ; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Vector Absolute Value instruction calculates the individual absolute
values of the upper and lower halves of a single 32-bit data register. The
results are placed into a 32-bit dest_reg, using the following rules.
• If the input value is positive or zero, copy it unmodified to the
destination.
• If the input value is negative, subtract it from zero and store the
result in the destination.
For example, if the source register contains the data shown in Table 14-15
the destination register receives the data shown in Table 14-16.
Flags Affected
This instruction affects flags as follows.
• AZ is set if either or both result is zero; cleared if both are nonzero.
• AN is cleared.
• V is set if either or both result saturates; cleared if both are no
saturation.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
/* If r1 = 0xFFFF 7FFF, then . . . */
r3 = abs r1 (v) ;
/* . . . produces 0x0001 7FFF */
Also See
ABS
Special Applications
None
General Form
dest = src_reg_0 +|+ src_reg_1
dest = src_reg_0 –|+ src_reg_1
dest = src_reg_0 +|– src_reg_1
dest = src_reg_0 –|– src_reg_1
dest_0 = src_reg_0 +|+ src_reg_1,
dest_1 = src_reg_0 –|– src_reg_1
dest_0 = src_reg_0 +|– src_reg_1,
dest_1 = src_reg_0 –|+ src_reg_1
dest_0 = src_reg_0 + src_reg_1,
dest_1 = src_reg_0 – src_reg_1
dest_0 = A1 + A0, dest_1 = A1 – A0
dest_0 = A0 + A1, dest_1 = A0 – A1
Syntax
Dual 16-Bit Operations
Dreg = Dreg +|+ Dreg (opt_mode_0) ; /* add | add (b) */
Dreg = Dreg –|+ Dreg (opt_mode_0) ; /* subtract | add (b) */
Dreg = Dreg +|– Dreg (opt_mode_0) ; /* add | subtract (b) */
Dreg = Dreg –|– Dreg (opt_mode_0) ; /* subtract | subtract (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Vector Add / Subtract instruction simultaneously adds and/or sub-
tracts two pairs of registered numbers. It then stores the results of each
operation into a separate 32-bit data register or 16-bit half register,
according to the syntax used. The destination register for each of the quad
or dual versions must be unique.
Options
The Vector Add / Subtract instruction provides three option modes.
• opt_mode_0 supports the Dual and Quad 16-Bit Operations ver-
sions of this instruction.
• opt_mode_1 supports the Dual 32-bit and 40-bit operations.
• opt_mode_2 supports the Quad 16-Bit Operations versions of this
instruction.
Table 14-17 describes the options that the three opt_modes support.
CO Cross option. Swap the order of the results in the destination regis-
ter.
SCO Saturate and cross option. Combination of (S) and (CO) options.
opt_mode_2 ASR Arithmetic shift right. Halve the result (divide by 2) before storing
in the destination register. If specified with the S (saturation) flag in
Quad 16-Bit Operand versions of this instruction, the scaling is per-
formed before saturation.
ASL Arithmetic shift left. Double the result (multiply by 2, truncated)
before storing in the destination register. If specified with the S (sat-
uration) flag in Quad 16-Bit Operand versions of this instruction,
the scaling is performed before saturation.
Flags Affected
This instruction affects the following flags.
• AZ is set if any results are zero; cleared if all are nonzero.
• AN is set if any results are negative; cleared if all non-negative.
• AC0 is set if the right-hand side of a dual operation generates a
carry; cleared if no carry; unaffected if a quad operation.
• AC1 is set if the left-hand side of a dual operation generates a carry;
cleared if no carry; unaffected if a quad operation.
• V is set if any results overflow; cleared if none overflows.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r5=r3 +|+ r4 ; /* dual 16-bit operations, add|add */
r6=r0 -|+ r1(s) ; /* same as above, subtract|add with
saturation */
r0=r2 +|- r1(co) ; /* add|subtract with half-word results
crossed over in the destination register */
r7=r3 -|- r6(sco) ; /* subtract|subtract with saturation and
half-word results crossed over in the destination register */
r5=r3 +|+ r4, r7=r3-|-r4 ; /* quad 16-bit operations, add|add,
subtract|subtract */
r5=r3 +|- r4, r7=r3 -|+ r4 ; /* quad 16-bit operations,
add|subtract, subtract|add */
r5=r3 +|- r4, r7=r3 -|+ r4(asr) ; /* quad 16-bit operations,
add|subtract, subtract|add, with all results divided by 2 (right
shifted 1 place) before storing into destination register */
r5=r3 +|- r4, r7=r3 -|+ r4(asl) ; /* quad 16-bit operations,
add|subtract, subtract|add, with all results multiplied by 2
(left shifted 1 place) before storing into destination register
dual */
r2=r0+r1, r3=r0-r1 ; /* 32-bit operations */
r2=r0+r1, r3=r0-r1(s) ; /* dual 32-bit operations with
saturation */
r4=a1+a0, r6=a1-a0 ; /* dual 40-bit Accumulator operations, A0
subtracted from A1 */
r4=a0+a1, r6=a0-a1(s) ; /* dual 40-bit Accumulator operations
with saturation, A1 subtracted from A0 */
Also See
Add, Subtract
Special Applications
FFT butterfly routines in which each of the registers is considered a single
complex number often use the Vector Add / Subtract instruction.
/* If r1 = 0x0003 0004 and r2 = 0x0001 0002, then . . . */
r0 = r2 +|- r1(co) ;
/* . . . produces r0 = 0xFFFE 0004 */
General Form
dest_reg = src_reg >>> shift_magnitude (V)
dest_reg = ASHIFT src_reg BY shift_magnitude (V)
Syntax
Constant Shift Magnitude
Dreg = Dreg >>> uimm4 (V) ; /* arithmetic shift right, immedi-
ate (b) */
Dreg = Dreg << uimm4 (V,S) ; /* arithmetic shift left, immedi-
ate with saturation (b) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Vector Arithmetic Shift instruction arithmetically shifts a pair of
half-word registered numbers a specified distance and direction. Though
the two half-word registers are shifted at the same time, the two numbers
are kept separate.
Arithmetic right shifts preserve the sign of the preshifted value. The sign
bit value backfills the left-most bit position vacated by the arithmetic right
shift. For positive numbers, this behavior is equivalent to the logical right
shift for unsigned numbers.
Only arithmetic right shifts are supported. Left shifts are performed as
logical left shifts that may not preserve the sign of the original number. In
the default case—without the optional saturation option—numbers can
be left shifted so far that all the sign bits overflow and are lost. However,
when the saturation option is enabled, a left shift that would otherwise
shift nonsign bits off the left side saturates to the maximum positive or
negative value instead. So, with saturation enabled, the result always keeps
the same sign as the original number.
See “Saturation” on page 1-11 for a description of saturation behavior.
“ASHIFT” Syntax
Both half-word registers in src_reg are shifted by the number of places
prescribed in shift_magnitude, and the result stored into dest_reg.
The sign of the shift magnitude determines the direction of the shift for
the ASHIFT versions.
• Positive shift magnitudes without the saturation flag ( – , S) pro-
duce Logical Left shifts.
• Positive shift magnitudes with the saturation flag ( – , S) produce
Arithmetic Left shifts.
• Negative shift magnitudes produce Arithmetic Right shifts.
In essence, the magnitude is the power of 2 multiplied by the src_reg
number. Positive magnitudes cause multiplication ( N x 2n ), whereas neg-
ative magnitudes produce division ( N x 2-n or N / 2n ).
The dest_reg and src_reg are both pairs of 16-bit half registers. Satura-
tion of the result is optional.
Valid shift magnitudes for 16-bit src_reg are –16 through +15, zero
included. If a number larger than these is supplied, the instruction masks
and ignores the more significant bits.
This instruction does not implicitly modify the src_reg values. Option-
ally, dest_reg can be the same D-register as src_reg. Using the same
D-register for the dest_reg and the src_reg explicitly modifies the source
register.
Options
The ASHIFT instruction supports the ( – , S) option, which saturates the
result.
Flags Affected
This instruction affects flags as follows.
• AZ is set if either result is zero; cleared if both are nonzero.
• AN is set if either result is negative; cleared if both are non-negative.
• V is set if either result overflows; cleared if neither overflows.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r4=r5>>>3 (v) ; /* arithmetic right shift immediate R5.H and
R5.L by 3 bits (divide each half-word by 8) If r5 = 0x8004 000F
then the result is r4 = 0xF000 0001 */
r4=r5>>>3 (v, s) ; /* same as above, but saturate the result */
r2=ashift r7 by r5.l (v) ; /* arithmetic shift (right or left,
depending on sign of r5.l) R7.H and R7.L by magnitude of R5.L */
Also See
Vector Logical Shift, Arithmetic Shift, Logical Shift
Special Applications
None
General Form
dest_reg = src_reg >> shift_magnitude (V)
dest_reg = src_reg << shift_magnitude (V)
dest_reg = LSHIFT src_reg BY shift_magnitude (V)
Syntax
Constant Shift Magnitude
Dreg = Dreg >> uimm4 (V) ; /* logical shift right, immediate
(b) */
Dreg = Dreg << uimm4 (V) ; /* logical shift left, immediate
(b) */
Syntax Terminology
Dreg: R7–0
Dreg_lo: R7–0.L
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Vector Logical Shift logically shifts a pair of half-word registered
numbers a specified distance and direction. Though the two half-word
registers are shifted at the same time, the two numbers are kept separate.
Logical shifts discard any bits shifted out of the register and backfill
vacated bits with zeros.
“LSHIFT” Syntax
Both half-word registers in src_reg are shifted by the number of places
prescribed in shift_magnitude, and the result is stored into dest_reg.
For the LSHIFT versions, the sign of the shift magnitude determines the
direction of the shift.
• Positive shift magnitudes produce left shifts.
• Negative shift magnitudes produce right shifts.
The dest_reg and src_reg are both pairs of 16-bit half-registers.
Valid shift magnitudes for 16-bit src_reg are –16 through +15, zero
included. If a number larger than these is supplied, the instruction masks
and ignores the more significant bits.
This instruction does not implicitly modify the src_reg values. Option-
ally, dest_reg can be the same D-register as src_reg. Using the same
D-register for the dest_reg and the src_reg explicitly modifies the source
register at your discretion.
Flags Affected
This instruction affects flags as follows.
• AZ is set if either result is zero; cleared if both are nonzero.
• AN is set if either result is negative; cleared if both are non-negative.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r4=r5>>3 (v) ;
/* logical right shift immediate R5.H and R5.L by 3 bits */
r4=r5<<3 (v) ;
/* logical left shift immediate R5.H and R5.L by 3 bits */
r2=lshift r7 by r5.l (v) ;
/* logically shift (right or left, depending on sign of r5.l)
R7.H and R7.L by magnitude of R5.L */
Also See
Vector Arithmetic Shift, Arithmetic Shift, Logical Shift
Special Applications
None
Vector MAX
General Form
dest_reg = MAX ( src_reg_0, src_reg_1 ) (V)
Syntax
Dreg = MAX ( Dreg , Dreg ) (V) ; /* dual 16-bit operations
(b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Vector Maximum instruction returns the maximum value (meaning
the largest positive value, nearest to 0x7FFF) of the 16-bit half-word
source registers to the dest_reg.
The instruction compares the upper half-words of src_reg_0 and
src_reg_1 and returns that maximum to the upper half-word of dest_reg.
It also compares the lower half-words of src_reg_0 and src_reg_1 and
returns that maximum to the lower half-word of dest_reg. The result is a
concatenation of the two 16-bit maximum values.
The Vector Maximum instruction does not implicitly modify input val-
ues. The dest_reg can be the same D-register as one of the source
registers. Doing this explicitly modifies that source register.
Flags Affected
This instruction affects flags as follows.
• AZ is set if either or both result is zero; cleared if both are nonzero.
• ANis set if either or both result is negative; cleared if both are
non-negative.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r7 = max (r1, r0) (v) ;
Also See
Vector SEARCH, Vector MIN, MAX, MIN
Special Applications
None
Vector MIN
General Form
dest_reg = MIN ( src_reg_0, src_reg_1 ) (V)
Syntax
Dreg = MIN ( Dreg , Dreg ) (V) ; /* dual 16-bit operation
(b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Vector Minimum instruction returns the minimum value (the most
negative value or the value closest to 0x8000) of the 16-bit half-word
source registers to the dest_reg.
This instruction compares the upper half-words of src_reg_0 and
src_reg_1 and returns that minimum to the upper half-word of dest_reg.
It also compares the lower half-words of src_reg_0 and src_reg_1 and
returns that minimum to the lower half-word of dest_reg. The result is a
concatenation of the two 16-bit minimum values.
The input values are not implicitly modified by this instruction. The
dest_reg can be the same D-register as one of the source registers. Doing
this explicitly modifies that source register.
Flags Affected
This instruction affects flags as follows.
• AZ is set if either or both result is zero; cleared if both are nonzero.
• ANis set if either or both result is negative; cleared if both are
non-negative.
• V is cleared.
• All other flags are unaffected.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r7 = min (r1, r0) (v) ;
Also See
Vector SEARCH, Vector MAX, MAX, MIN
Special Applications
None
Vector Multiply
Syntax
Separate the two compatible scalar instructions with a comma to produce
a vector instruction. Add a semicolon to the end of the combined instruc-
tion, as usual. The order of the MAC operations on the command line is
arbitrary.
Instruction Length
This instruction is 32 bits long.
Flags Affected
This instruction affects the following flags.
• V is set if any result saturates; cleared if none saturates.
• VS is set if V is set; unaffected otherwise.
• All other flags are unaffected.
Example
r2.h=r7.l*r6.h, r2.l=r7.h*r6.h ;
/* simultaneous MAC0 and MAC1 execution, 16-bit results. Both
results are signed fractions. */
r4.l=r1.l*r0.l, r4.h=r1.h*r0.h ;
/* same as above. MAC order is arbitrary. */
r0.h=r3.h*r2.l (m), r0.l=r3.l*r2.l ;
Syntax
Separate the two compatible scalar instructions with a comma to produce
a vector instruction. Add a semicolon to the end of the combined instruc-
tion, as usual. The order of the MAC operations on the command line is
arbitrary.
Instruction Length
This instruction is 32 bits long.
Flags Affected
The flags reflect the results of the two scalar operations.This instruction
affects flags as follows.
• V is set if any result extracted to a Dreg saturates; cleared if no Dregs
saturate.
• VS is set if V is set; unaffected otherwise.
• AV0 is set if result in Accumulator A0 (MAC0 operation) saturates;
cleared if A0 result does not saturate.
• AV0S is set if AV0 is set; unaffected otherwise.
Example
Result is 40-bit Accumulator
a1=r2.l*r3.h, a0=r2.h*r3.h ;
/* both multiply signed fractions into separate Accumulators */
a0=r1.l*r0.l, a1+=r1.h*r0.h ;
/* same as above, but sum result into A1. MAC order is arbitrary.
*/
a1+=r3.h*r3.l, a0-=r3.h*r3.h ;
/* sum product into A1, subtract product from A0 */
a1=r3.h*r2.l (m), a0+=r3.l*r2.l ;
/* MAC1 multiplies a signed fraction in r3.h by an unsigned frac-
tion in r2.l. MAC0 multiplies two signed fractions. */
a1=r7.h*r4.h (m), a0+=r7.l*r4.l (fu) ;
/* MAC1 multiplies signed fraction by unsigned fraction. MAC0
multiplies and accumulates two unsigned fractions. */
a1+=r3.h*r2.h, a0=r3.l*r2.l (is) ;
/* both MACs perform signed integer multiplication */
a1=r6.h*r7.h, a0+=r6.l*r7.l (w32) ;
/* both MACs multiply signed fractions, sign extended, and satu-
rate both Accumulators at bit 31 */
General Form
dest_reg = – source_reg (V)
Syntax
Dreg = – Dreg (V) ; /* dual 16-bit operation (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Vector Negate instruction returns the same magnitude with the
opposite arithmetic sign, saturated for each 16-bit half-word in the source.
The instruction calculates by subtracting the source from zero.
See “Saturation” on page 1-11 for a description of saturation behavior.
Flags Affected
This instruction affects flags as follows.
• AZis set if either or both results are zero; cleared if both are
nonzero.
• ANis set if either or both results are negative; cleared if both are
non-negative.
• V is set if either or both results saturate; cleared if neither saturates.
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r5 =–r3 (v) ; /* R5.H becomes the negative of R3.H and R5.L
becomes the negative of R3.L If r3 = 0x0004 7FFF the result is r5
= 0xFFFC 8001 */
Also See
Negate (Two’s Complement)
Special Applications
None
Vector PACK
General Form
Dest_reg = PACK ( src_half_0, src_half_1 )
Syntax
Dreg = PACK ( Dreg_lo_hi , Dreg_lo_hi ) ; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
The Vector Pack instruction packs two 16-bit half-word numbers into the
halves of a 32-bit data register as shown in Table 14-18 and Table 14-19.
src_half_0 half_word_0
src_half_1 half_word_1
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
The 32-bit versions of this instruction can be issued in parallel with spe-
cific other 16-bit instructions. For details, see “Issuing Parallel
Instructions” on page 15-1.
Example
r3=pack(r4.l, r5.l) ; /* pack low / low half-words */
r1=pack(r6.l, r4.h) ; /* pack low / high half-words */
r0=pack(r2.h, r4.l) ; /* pack high / low half-words */
r5=pack(r7.h, r2.h) ; /* pack high / high half-words */
Also See
BYTEPACK (Quad 8-Bit Pack)
Special Applications
/* If r4.l = 0xDEAD and r5.l = 0xBEEF, then . . . */
r3 = pack (r4.l, r5.l) ;
/* . . . produces r3 = 0xDEAD BEEF */
Vector SEARCH
General Form
(dest_pointer_hi, dest_pointer_lo ) = SEARCH src_reg (searchmode)
Syntax
(Dreg, Dreg) = SEARCH Dreg (searchmode) ; /* (b) */
Syntax Terminology
Dreg: R7–0
Instruction Length
In the syntax, comment (b) identifies 32-bit instruction length.
Functional Description
This instruction is used in a loop to locate a maximum or minimum ele-
ment in an array of 16-bit packed data. Two values are tested at a time.
The Vector Search instruction compares two 16-bit, signed half-words to
values stored in the Accumulators. Then, it conditionally updates each
Accumulator and destination pointer based on the comparison.
Pointer register P0 is always the implied array pointer for the elements
being searched.
More specifically, the signed high half-word of src_reg is compared in
magnitude with the 16 low-order bits in A1. If src_reg_hi meets the com-
parison criterion, then A1 is updated with src_reg_hi, and the value in
pointer register P0 is stored in dest_pointer_hi. The same operation is
performed for src_reg_low and A0.
Based on the search mode specified in the syntax, the instruction tests for
maximum or minimum signed values.
Values are sign extended when copied into the Accumulator(s).
See “Example” for one way to implement the search loop. After the vector
search loop concludes, A1 and A0 hold the two surviving elements, and
dest_pointer_hi and dest_pointer_lo contain their respective addresses.
The next step is to select the final value from these two surviving elements.
Modes
The four supported compare modes are specified by the mandatory
searchmode flag.
(GT) Greater than. Find the location of the first maximum number in an array.
(GE) Greater than or equal. Find the location of the last maximum number in an array.
(LT) Less than. Find the location of the first minimum number in an array.
(LE) Less than or equal. Find the location of the last minimum number in an array.
Summary
Assumed Pointer P0
src_reg_hi Compared to least significant 16 bits of A1. If com-
pare condition is met, overwrites lower 16 bits of A1
and copies P0 into dest_pointer_hi.
src_reg_lo Compared to least significant 16 bits of A0. If com-
pare condition is met, overwrites lower 16 bits of A0
and copies P0 into dest_pointer_lo.
Flags Affected
None
Required Mode
User & Supervisor
Parallel Issue
This instruction can be issued in parallel with the combination of one
16-bit length load instruction to the P0 register and one 16-bit NOP. No
other instructions can be issued in parallel with the Vector Search
instruction.
Example
/* Initialize Accumulators with appropriate value for the type of
search. */
r0.l=0x7fff ;
r0.h=0 ;
a0=r0 ; /* max positive 16-bit value */
a1=r0 ; /* max positive 16-bit value */
/* Initialize R2. */
r2=[p0++] ;
LSETUP (loop, loop) LC0=P1>>2 ; /* set up the loop */
loop: (r1,r0) = SEARCH R2 (LE) || R2=[P0++];
/* search for the last minimum in all but the
last element of the array */
(r1,r0) = SEARCH R2 (LE);
/* finally, search the last element */
Also See
Vector MAX, Vector MIN, MAX, MIN
Special Applications
This instruction is used in a loop to locate an element in a vector accord-
ing to the element’s value.
Arithmetic Operations
Add/Subtract – Prescale Up
MAX (Maximum)
MIN (Minimum)
Saturate
SIGNBITS
Logical Operations
Move
ROT (Rotate)
Vector Operations
VIT_MAX (Compare-Select)
Add on Sign
Vector PACK
Vector SEARCH
16-Bit Instructions
The two 16-bit instructions in a multi-issue instruction must each be from
Group1 and Group2 instructions shown in Table 15-3 and Table 15-4.
The following additional restrictions also apply to the 16-bit instructions
of the multi-issue instruction.
• Only one of the 16-bit instructions can be a store instruction.
• If the two 16-bit instructions are memory access instructions, then
both cannot use P-registers as address registers. In this case, at least
one memory access instruction must be an I-register version.
Arithmetic Operations
Load / Store
Store Byte
Load / Store
Examples
Two Parallel Memory Access Instructions
/* Subtract-Absolute-Accumulate issued in parallel with the mem-
ory access instructions that fetch the data for the next SAA
instruction. This sequence is executed in a loop to flip-flop
back and forth between the data in R1 and R3, then the data in R0
and R2. */
saa (r1:0, r3:2) || r0=[i0++] || r2=[i1++] ;
saa (r1:0, r3:2)(r) || r1=[i0++] || r3=[i1++] ;
mnop || r1 = [i0++] || r3 = [i1++] ;
Table A-1 lists the Blackfin processor instruction set an the affect on flags
when these instructions execute on an ADSP-BF535 DSP. The symbol
definitions for the flag bits in the table are as follows:
• – indicates that the flag is NOT AFFECTED by execution of the
instruction
• * indicates that the flag is SET OR CLEARED depending on exe-
cution of the instruction
• ** indicates that the flag is CLEARED by execution of the
instruction
• U indicates that the flag state is UNDEFINED following execution
of the instruction; if the value of this bit is needed for program exe-
cution, the program needs to check the bit prior executing the
instruction with a U in a bit field.
L Because the AC0, AC1, V, AV0, AV, and VS flags do not exist on
the ADSP-BF535, these flags do not appear in Table A-1.
Jump – – – – – –
IF CC JUMP – – – – – –
Call – – – – – –
LSETUP, LOOP – – – – – –
Load Immediate – – – – – –
Load Pointer Register – – – – – –
Store Byte – – – – – –
Move Conditional – – – – – –
--SP (Push) – – – – – –
SP++ (Pop) – – – – – –
LINK, UNLINK – – – – – –
Compare Pointer * – – – – –
Compare Accumulator * * * * U –
Move CC – * * * * *
Negate CC * – – – – –
& (AND) – * * ** ** –
^ (Exclusive-OR) – * * ** ** –
BXORSHIFT, BXOR * – – – – –
BITCLR – * * U U –
BITSET – U U U U –
BITTGL – * * U U –
BITTST * – – – – –
DEPOSIT – * * U U –
EXTRACT – * * U U –
BITMUX – U U – – –
ROT (Rotate) * – – – – –
Add/Subtract – Prescale Up – – – – – –
EXPADJ – U U – – –
MAX – * * U U –
MIN – * * U U –
Saturate – * * U U –
SIGNBITS – U U – – –
Subtract – * * * * –
Idle – – – – – –
Core Synchronize – – – – – –
System Synchronize – – – – – –
Disable Interrupts – – – – – –
Enable Interrupts – – – – – –
No Op – – – – – –
PREFETCH – – – – – –
FLUSH – – – – – –
FLUSHINV – – – – – –
IFLUSH – – – – – –
DISALGNEXCPT – – – – – –
Add on Sign – U U U U –
VIT_MAX (Compare-Select) – U U – – –
Vector MAX – * * U ** –
Vector MIN – * * U ** –
Vector Multiply – – – – U –
Vector PACK – U U – – –
Vector SEARCH – U U – – –
I
Idle instruction, 11-3, 11-14, 11-16 interrupts
IDLE mnemonic, 11-3 disabling
IF CC JUMP mnemonic, 2-5 Disable Interrupts (CLI) in-
IF CC mnemonic, 4-8 struction, 11-13
IFLUSH mnemonic, 12-8 popping RETI from stack, 5-3
ILAT register, 11-20 enabling
imm16 constant, 3-4 Enable Interrupts (STI) instruc-
imm3 constant, 6-2, 6-6 tion, 11-15
imm6 constant, 9-21 forcing
imm7 constant, 3-4, 10-16 Force Interrupt / Reset (RAISE)
immediate constant, 1-5 instruction, 11-17
Index Registers (Ireg) NMI, return from (RTN), 2-10
description, 1-7, 1-15 priority, 11-17
function in circular addressing, return instruction (RTI), 2-10
1-15 uninterruptable instructions
instructions that use linkage instruction, LINK, UN-
Add Immediate, 10-16 LINK, 5-18
Load Data Register, 3-10 Pop Multiple, 5-15
Load High Data Register Half, Push Multiple, 5-6
3-23, 3-27 Return from Interrupt (RTI),
Modify – Decrement, 10-37 2-11
Modify – Increment, 10-40 Return from NMI (RTN), 2-11
Move Register, 4-2 Test and Set Byte (Atomic)
Store Data Register, 3-40 TESTSET, 11-23
Store High Data Register Half, vector, 11-17
3-45
Store Low Data Register Half, J
3-49 jump instructions
Subtract Immediate, 10-93 Conditional Jump, 2-5
Interrupt Mask (IMASK) register, Jump, 2-2
11-15 JUMP mnemonic, 2-2
NOP mnemonic, 11-25 << logical left shift, 9-2, 9-5, 9-7,
NOT (1’s Complement) 9-14, 14-25, 14-30
instruction, 7-4 <<= logical left shift assign, 9-14
notation conventions, 1-4 <= less-than or equal, 6-2, 6-6, 6-9
choice of one register within a = assign (representative sample,
group, 1-5 only), 3-3, 4-2, 5-2, 6-12, 7-10,
constants, 1-5 8-8, 9-2, 10-3, 13-3, 14-3
loop PC-relative constants, 1-6 =– negate (2’s complement)
PC-relative constants, 1-5 assign, 10-76, 14-48
range of sequential registers or –= subtract assign, 10-37, 10-56,
bits, 1-5 10-61, 10-70, 10-93
=! bit invert (one’s complement)
O assign, 6-15, 8-8
ONES mnemonic, 8-26 == compare-equal, 6-2, 6-6, 6-9
Ones Population Count =~ multi-bit invert (one’s
instruction, 8-26 complement) assign, 7-4
operator >> logical right shift, 9-14, 14-30
– – autodecrement, 5-2, 5-5 >>= logical right shift assign, 9-14
– subtract, 10-10, 10-13, 10-89, >>> arithmetic right shift, 9-7,
14-19 14-25
& logical AND, 7-2 >>>= arithmetic right shift assign,
&= logical AND assign, 6-12 9-7
* multiply, 10-46, 10-56, 10-61, ^ logical XOR, 7-8
10-70, 14-3 ^= logical XOR assign, 6-12
+ add, 9-5, 10-6, 10-10, 10-13, | logical OR, 7-6
13-13, 14-19 –|– vector subtract / subtract,
++ autoincrement, 5-8, 5-12, 14-19
12-6, 12-8 –|+ vector subtract / add, 14-19
+= add assign, 10-16, 10-40, |= logical OR assign, 6-12
10-56, 10-61, 10-70 option flags
+|– vector add / subtract, 14-19 16-bit Accumulator extraction
+|+ vector add / add, 14-19 with x2 scaling, 16-bit
< less-than, 6-2, 6-6, 6-9 saturation and rounding
(S2RND)
V
vector couplet, 14-40, 14-43
vector instructions
Vector Absolute Value, 14-16
Vector Add / Subtract, 14-19
Vector Arithmetic Shift, 14-25
Vector Logical Shift, 14-30
Vector Maximum, 14-34
Vector Minimum, 14-37
Vector Multiply, 14-40
Vector Multiply and
Multiply-Accumulate, 14-43
Vector Pack, 14-50
Vector Search, 14-52