Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

02 03 Bits Ints

Download as pdf or txt
Download as pdf or txt
You are on page 1of 87

Carnegie Mellon

Bits, Bytes, and Integers

15-213: Introduction to Computer Systems
2nd and 3rd Lectures, Sep. 3 and Sep. 8, 2015

Randal E. Bryant and David R. O’Hallaron

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 1

Carnegie Mellon

Today: Bits, Bytes, and Integers

 Representing information as bits
 Bit-level manipulations
 Integers
▪ Representation: unsigned and signed
▪ Conversion, casting
▪ Expanding, truncating
▪ Addition, negation, multiplication, shifting
▪ Summary
 Representations in memory, pointers, strings

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 2

Carnegie Mellon

Everything is bits
 Each bit is 0 or 1
 By encoding/interpreting sets of bits in various ways
▪ Computers determine what to do (instructions)
▪ … and represent and manipulate numbers, sets, strings, etc…
 Why bits? Electronic Implementation
▪ Easy to store with bistable elements
▪ Reliably transmitted on noisy and inaccurate wires
0 1 0


Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 3
Carnegie Mellon

For example, can count in binary

 Base 2 Number Representation
▪ Represent 1521310 as 111011011011012
▪ Represent 1.2010 as 1.0011001100110011[0011]…2
▪ Represent 1.5213 X 104 as 1.11011011011012 X 213

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 4

Carnegie Mellon

Encoding Byte Values

 Byte = 8 bits
▪ Binary 000000002 to 111111112 0 0 0000
▪ Decimal: 010 to 25510 1 1 0001
2 2 0010
▪ Hexadecimal 0016 to FF16 3 3 0011
▪ Base 16 number representation 4 4 0100
5 5 0101
▪ Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ 6 6 0110
7 7 0111
▪ Write FA1D37B16 in C as 8 8 1000
– 0xFA1D37B 9 9 1001
A 10 1010
– 0xfa1d37b B 11 1011
C 12 1100
D 13 1101
E 14 1110
F 15 1111

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 5

Carnegie Mellon

Example Data Representations

C Data Type Typical 32-bit Typical 64-bit x86-64

char 1 1 1
short 2 2 2
int 4 4 4
long 4 8 8
float 4 4 4
double 8 8 8
long double − − 10/16

pointer 4 8 8

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 6

Carnegie Mellon

Today: Bits, Bytes, and Integers

 Representing information as bits
 Bit-level manipulations
 Integers
▪ Representation: unsigned and signed
▪ Conversion, casting
▪ Expanding, truncating
▪ Addition, negation, multiplication, shifting
▪ Summary
 Representations in memory, pointers, strings

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 7

Carnegie Mellon

Boolean Algebra
 Developed by George Boole in 19th Century
▪ Algebraic representation of logic
▪ Encode “True” as 1 and “False” as 0
And Or
◼ A&B = 1 when both A=1 and B=1 ◼ A|B = 1 when either A=1 or B=1

Not Exclusive-Or (Xor)

◼ ~A = 1 when A=0 ◼ A^B = 1 when either A=1 or B=1, but not both

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 8

Carnegie Mellon

General Boolean Algebras

 Operate on Bit Vectors
▪ Operations applied bitwise
01101001 01101001 01101001
& 01010101 | 01010101 ^ 01010101 ~ 01010101
01000001 01111101
01111101 00111100
00111100 10101010
 All of the Properties of Boolean Algebra Apply

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 9

Carnegie Mellon

Example: Representing & Manipulating Sets

 Representation
▪ Width w bit vector represents subsets of {0, …, w–1}
▪ aj = 1 if j ∈ A

▪ 01101001 { 0, 3, 5, 6 }
▪ 76543210

▪ 01010101 { 0, 2, 4, 6 }
▪ 76543210

 Operations
▪ & Intersection 01000001 { 0, 6 }
▪ | Union 01111101 { 0, 2, 3, 4, 5, 6 }
▪ ^ Symmetric difference 00111100 { 2, 3, 4, 5 }
▪ ~ Complement 10101010 { 1, 3, 5, 7 }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 10
Carnegie Mellon

Bit-Level Operations in C
 Operations &, |, ~, ^ Available in C
▪ Apply to any “integral” data type
▪ long, int, short, char, unsigned
▪ View arguments as bit vectors
▪ Arguments applied bit-wise
 Examples (Char data type)
▪ ~0x41 0xBE
~010000012 101111102

▪ ~0x00 0xFF
▪ ~000000002 111111112
▪ 0x69 & 0x55 0x41
▪ 011010012 & 010101012 010000012
▪ 0x69 | 0x55 0x7D
▪ 011010012 | 010101012 011111012

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 11

Carnegie Mellon

Contrast: Logic Operations in C

 Contrast to Logical Operators
▪ &&, ||, !
▪ View 0 as “False”
▪ Anything nonzero as “True”
▪ Always return 0 or 1
▪ Early termination

 Examples (char data type)

▪ !0x41 0x00
▪ !0x00 0x01
▪ !!0x41 0x01

▪ 0x69 && 0x55 0x01

▪ 0x69 || 0x55 0x01
▪ p && *p (avoids null pointer access)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 12

Carnegie Mellon

Contrast: Logic Operations in C

 Contrast to Logical Operators
▪ &&, ||, !
▪ View 0 as “False”
▪ Anything nonzero as “True”
▪ Always return 0 or 1

Watch out for && vs. & (and || vs. |)…

▪ Early termination

 Examples (char data type)

one of the more common oopsies in
▪ !0x41 0x00
C programming
▪ !0x00 0x01
▪ !!0x41 0x01

▪ 0x69 && 0x55 0x01

▪ 0x69 || 0x55 0x01
▪ p && *p (avoids null pointer access)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 13

Carnegie Mellon

Shift Operations
 Left Shift: x << y Argument x 01100010
▪ Shift bit-vector x left y positions << 3 00010000
– Throw away extra bits on left
Log. >> 2 00011000
▪ Fill with 0’s on right
Arith. >> 2 00011000
 Right Shift: x >> y
▪ Shift bit-vector x right y positions
Throw away extra bits on right
▪ Argument x 10100010

▪ Logical shift << 3 00010000

▪ Fill with 0’s on left Log. >> 2 00101000
▪ Arithmetic shift
Arith. >> 2 11101000
▪ Replicate most significant bit on left

 Undefined Behavior
▪ Shift amount < 0 or ≥ word size
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 14
Carnegie Mellon

Today: Bits, Bytes, and Integers

 Representing information as bits
 Bit-level manipulations
 Integers
▪ Representation: unsigned and signed
▪ Conversion, casting
▪ Expanding, truncating
▪ Addition, negation, multiplication, shifting
▪ Summary
 Representations in memory, pointers, strings
 Summary

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 15

Carnegie Mellon

Encoding Integers
Unsigned Two’s Complement
w−1 w−2
B2U(X ) =  xi 2 i
B2T (X ) = − xw−1 2 w−1
+  xi 2 i
i=0 i=0

short int x = 15213;

short int y = -15213; Sign
 C short 2 bytes long
Decimal Hex Binary
x 15213 3B 6D 00111011 01101101
y -15213 C4 93 11000100 10010011

 Sign Bit
▪ For 2’s complement, most significant bit indicates sign
▪ 0 for nonnegative
▪ 1 for negative

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 16

Carnegie Mellon

Two-complement Encoding Example (Cont.)

x = 15213: 00111011 01101101
y = -15213: 11000100 10010011

Weight 15213 -15213

1 1 1 1 1
2 0 0 1 2
4 1 4 0 0
8 1 8 0 0
16 0 0 1 16
32 1 32 0 0
64 1 64 0 0
128 0 0 1 128
256 1 256 0 0
512 1 512 0 0
1024 0 0 1 1024
2048 1 2048 0 0
4096 1 4096 0 0
8192 1 8192 0 0
16384 0 0 1 16384
-32768 0 0 1 -32768
Sum Perspective, Third Edition
Bryant and O’Hallaron, Computer Systems: A Programmer’s 15213 -15213 17
Carnegie Mellon

Numeric Ranges
 Unsigned Values
 Two’s Complement Values
▪ UMin = 0
▪ TMin = –2w–1
▪ UMax = 2w –1
▪ TMax = 2w–1 – 1
 Other Values
▪ Minus 1
Values for W = 16
Decimal Hex Binary
UMax 65535 FF FF 11111111 11111111
TMax 32767 7F FF 01111111 11111111
TMin -32768 80 00 10000000 00000000
-1 -1 FF FF 11111111 11111111
0 0 00 00 00000000 00000000

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 18

Carnegie Mellon

Values for Different Word Sizes

8 16 32 64
UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615
TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807
TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808

 Observations  C Programming
▪ |TMin | = TMax + 1 ▪ #include <limits.h>
▪ Asymmetric range ▪ Declares constants, e.g.,
▪ UMax = 2 * TMax + 1 ▪ ULONG_MAX
▪ Values platform specific

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 19

Carnegie Mellon

Unsigned & Signed Numeric Values

X B2U(X) B2T(X)  Equivalence
0000 0 0 ▪ Same encodings for nonnegative
0001 1 1 values
0010 2 2
 Uniqueness
0011 3 3
0100 4 4 ▪ Every bit pattern represents
0101 5 5 unique integer value
0110 6 6 ▪ Each representable integer has
0111 7 7 unique bit encoding
1000 8 –8   Can Invert Mappings
1001 9 –7
1010 10 –6
▪ U2B(x) = B2U-1(x)
1011 11 –5 ▪Bit pattern for unsigned
1100 12 –4 integer
1101 13 –3 ▪ T2B(x) = B2T-1(x)
1110 14 –2 ▪ Bit pattern for two’s comp
1111 15 –1 integer
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 20
Carnegie Mellon

Today: Bits, Bytes, and Integers

 Representing information as bits
 Bit-level manipulations
 Integers
▪ Representation: unsigned and signed
▪ Conversion, casting
▪ Expanding, truncating
▪ Addition, negation, multiplication, shifting
▪ Summary
 Representations in memory, pointers, strings

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 21

Carnegie Mellon

Mapping Between Signed & Unsigned

Two’s Complement Unsigned

x T2B B2U ux
Maintain Same Bit Pattern

Unsigned U2T Two’s Complement

ux U2B B2T x
Maintain Same Bit Pattern

 Mappings between unsigned and two’s complement numbers:

Keep bit representations and reinterpret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 22
Carnegie Mellon

Mapping Signed  Unsigned

Bits Signed Unsigned
0000 0 0
0001 1 1
0010 2 2
0011 3 3
0100 4 4
0101 5 5
0110 6 6
0111 7 U2T 7
1000 -8 8
1001 -7 9
1010 -6 10
1011 -5 11
1100 -4 12
1101 -3 13
1110 -2 14
1111 -1 15
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 23
Carnegie Mellon

Mapping Signed  Unsigned

Bits Signed Unsigned
0000 0 0
0001 1 1
0010 2 2
0011 3 3
0100 4
= 4
0101 5 5
0110 6 6
0111 7 7
1000 -8 8
1001 -7 9
1010 -6 10
1011 -5
+/- 16 11
1100 -4 12
1101 -3 13
1110 -2 14
1111 -1 15
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 24
Carnegie Mellon

Relation between Signed & Unsigned

Two’s Complement Unsigned

x T2B B2U ux
Maintain Same Bit Pattern

w–1 0
ux + + + ••• +++
x - ++ ••• +++

Large negative weight

Large positive weight
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 25
Carnegie Mellon

Conversion Visualized
 2’s Comp. → Unsigned
▪ Ordering Inversion UMax
▪ Negative → Big Positive UMax – 1

TMax + 1 Unsigned
TMax TMax Range

2’s Complement
0 0

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 26
Carnegie Mellon

Signed vs. Unsigned in C

 Constants
▪ By default are considered to be signed integers
▪ Unsigned if have “U” as suffix
0U, 4294967259U
 Casting
▪ Explicit casting between signed & unsigned same as U2T and T2U
int tx, ty;
unsigned ux, uy;
tx = (int) ux;
uy = (unsigned) ty;

▪ Implicit casting also occurs via assignments and procedure calls

tx = ux;
uy = ty;

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 27

Carnegie Mellon

Casting Surprises
 Expression Evaluation
▪ If there is a mix of unsigned and signed in single expression,
signed values implicitly cast to unsigned
▪ Including comparison operations <, >, ==, <=, >=
▪ Examples for W = 32: TMIN = -2,147,483,648 , TMAX = 2,147,483,647
 Constant1 Constant2 Relation Evaluation
0 0 0U
0U == unsigned
-1 -1 00 < signed
-1 -1 0U
0U > unsigned
2147483647 -2147483647-1
-2147483648 > signed
2147483647U -2147483647-1
-2147483648 < unsigned
-1 -1 -2
-2 > signed
(unsigned) -1 -2
-2 > unsigned
2147483647 2147483648U
2147483648U < unsigned
2147483647 (int)2147483648U
(int) 2147483648U > signed
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 28
Carnegie Mellon

Casting Signed ↔ Unsigned: Basic Rules
 Bit pattern is maintained
 But reinterpreted
 Can have unexpected effects: adding or subtracting 2 w

 Expression containing signed and unsigned int

▪ int is cast to unsigned!!

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 29

Carnegie Mellon

Today: Bits, Bytes, and Integers

 Representing information as bits
 Bit-level manipulations
 Integers
▪ Representation: unsigned and signed
▪ Conversion, casting
▪ Expanding, truncating
▪ Addition, negation, multiplication, shifting
▪ Summary
 Representations in memory, pointers, strings

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 30

Carnegie Mellon

Sign Extension
 Task:
▪ Given w-bit signed integer x
▪ Convert it to w+k-bit integer with same value
 Rule:
▪ Make k copies of sign bit:
▪ X  = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0

k copies of MSB w
X •••


X ••• •••
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
w 31
Carnegie Mellon

Sign Extension Example

short int x = 15213;
int ix = (int) x;
short int y = -15213;
int iy = (int) y;

Decimal Hex Binary

x 15213 3B 6D 00111011 01101101
ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101
y -15213 C4 93 11000100 10010011
iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011

 Converting from smaller to larger integer data type

 C automatically performs sign extension

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 32

Carnegie Mellon

Expanding, Truncating: Basic Rules
 Expanding (e.g., short int to int)
▪ Unsigned: zeros added
▪ Signed: sign extension
▪ Both yield expected result

 Truncating (e.g., unsigned to unsigned short)

▪ Unsigned/signed: bits are truncated
▪ Result reinterpreted
▪ Unsigned: mod operation
▪ Signed: similar to mod
▪ For small numbers yields expected behavior

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 33

Carnegie Mellon

Today: Bits, Bytes, and Integers

 Representing information as bits
 Bit-level manipulations
 Integers
▪ Representation: unsigned and signed
▪ Conversion, casting
▪ Expanding, truncating
▪ Addition, negation, multiplication, shifting
 Representations in memory, pointers, strings
 Summary

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 34

Carnegie Mellon

Unsigned Addition
Operands: w bits u •••
+v •••
True Sum: w+1 bits
u+v •••
Discard Carry: w bits UAddw(u , v) •••

 Standard Addition Function

▪ Ignores carry output
 Implements Modular Arithmetic
s = UAddw(u , v) = u + v mod 2w

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 35

Carnegie Mellon

Visualizing (Mathematical) Integer Addition

 Integer Addition Add4(u , v)
▪ 4-bit integers u, v Integer Addition

▪ Compute true sum

Add4(u , v)
▪ Values increase linearly 32
with u and v 28

▪ Forms planar surface 24

12 12
8 10
0 4
2 2
u 8

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 36

Carnegie Mellon

Visualizing Unsigned Addition

 Wraps Around Overflow

▪ If true sum ≥ 2w
▪ At most once UAdd4(u , v)

True Sum 16
2w+1 Overflow 12
2w 6 12

4 10
6 v
0 0 4

Modular Sum 0

u 8

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 37

Carnegie Mellon

Two’s Complement Addition

Operands: w bits u •••
+ v •••
True Sum: w+1 bits
u+v •••
Discard Carry: w bits TAddw(u , v) •••

 TAdd and UAdd have Identical Bit-Level Behavior

▪ Signed vs. unsigned addition in C:
int s, t, u, v;
s = (int) ((unsigned) u + (unsigned) v);
t = u + v
▪ Will give s == t

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 38

Carnegie Mellon

TAdd Overflow
 Functionality True Sum
▪ True sum requires w+1 0 111…1 2w–1
bits TAdd Result
▪ Drop off MSB 0 100…0 2w –1–1 011…1
▪ Treat remaining bits as
2’s comp. integer 0 000…0 0 000…0

1 011…1 –2w –1 100…0

1 000…0 NegOver

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 39

Carnegie Mellon

Visualizing 2’s Complement Addition


 Values
▪ 4-bit two’s comp. TAdd4(u , v)
▪ Range from -8 to +7
 Wraps Around
▪ If sum  2w–1 6

Becomes negative
▪ 4

▪ At most once 0

▪ If sum < –2w–1

-2 4
-4 2
▪ Becomes positive

▪ At most once -8
-4 -6
0 -8
u 4
6 PosOver

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 40

Carnegie Mellon

 Goal: Computing Product of w-bit numbers x, y
▪ Either signed or unsigned
 But, exact results can be bigger than w bits
▪ Unsigned: up to 2w bits
Result range: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1

▪ Two’s complement min (negative): Up to 2w-1 bits
▪ Result range: x * y ≥ (–2w–1)*(2w–1–1) = –22w–2 + 2w–1
▪ Two’s complement max (positive): Up to 2w bits, but only for (TMinw)2
▪ Result range: x * y ≤ (–2w–1) 2 = 22w–2

 So, maintaining exact results…

▪ would need to keep expanding word size with each product computed
▪ is done in software, if needed
▪ e.g., by “arbitrary precision” arithmetic packages

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 41

Carnegie Mellon

Unsigned Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u·v ••• •••
UMultw(u , v) •••
Discard w bits: w bits

 Standard Multiplication Function

▪ Ignores high order w bits
 Implements Modular Arithmetic
UMultw(u , v)= u · v mod 2w

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 42

Carnegie Mellon

Signed Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u·v ••• •••
TMultw(u , v) •••
Discard w bits: w bits

 Standard Multiplication Function

▪ Ignores high order w bits
▪ Some of which are different for signed
vs. unsigned multiplication
▪ Lower bits are the same

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 43

Carnegie Mellon

Power-of-2 Multiply with Shift

 Operation
▪ u << k gives u * 2k
▪ Both signed and unsigned k
u •••
Operands: w bits
* 2k 0 ••• 0 1 0 ••• 0 0

True Product: w+k bits u · 2k ••• 0 ••• 0 0

Discard k bits: w bits UMultw(u , 2k) ••• 0 ••• 0 0

TMultw(u , 2k)
 Examples
▪ u << 3 == u * 8
▪ (u << 5) – (u << 3)== u * 24
▪ Most machines shift and add faster than multiply
▪ Compiler generates this code automatically

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 44

Carnegie Mellon

Unsigned Power-of-2 Divide with Shift

 Quotient of Unsigned by Power of 2
▪ u >> k gives  u / 2k 
▪ Uses logical shift
u ••• ••• Binary Point
/ 2k 0 ••• 0 1 0 ••• 0 0

Division: u / 2k 0 ••• 0 0 ••• . •••

Result:  u / 2k  0 ••• 0 0 •••

Division Computed Hex Binary

x 15213 15213 3B 6D 00111011 01101101
x >> 1 7606.5 7606 1D B6 00011101 10110110
x >> 4 950.8125 950 03 B6 00000011 10110110
x >> 8 59.4257813 59 00 3B 00000000 00111011

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 45

Carnegie Mellon

Today: Bits, Bytes, and Integers

 Representing information as bits
 Bit-level manipulations
 Integers
▪ Representation: unsigned and signed
▪ Conversion, casting
▪ Expanding, truncating
▪ Addition, negation, multiplication, shifting
▪ Summary
 Representations in memory, pointers, strings

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 46

Carnegie Mellon

Arithmetic: Basic Rules

 Addition:
▪ Unsigned/signed: Normal addition followed by truncate,
same operation on bit level
▪ Unsigned: addition mod 2w
▪ Mathematical addition + possible subtraction of 2 w
▪ Signed: modified addition mod 2w (result in proper range)
▪ Mathematical addition + possible addition or subtraction of 2 w

 Multiplication:
▪ Unsigned/signed: Normal multiplication followed by truncate,
same operation on bit level
▪ Unsigned: multiplication mod 2w
▪ Signed: modified multiplication mod 2w (result in proper range)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 47

Carnegie Mellon

Why Should I Use Unsigned?

 Don’t use without understanding implications
▪ Easy to make mistakes
unsigned i;
for (i = cnt-2; i >= 0; i--)
a[i] += a[i+1];

▪ Can be very subtle

#define DELTA sizeof(int)
int i;
for (i = CNT; i-DELTA >= 0; i-= DELTA)
. . .

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 48

Carnegie Mellon

Counting Down with Unsigned

 Proper way to use unsigned as loop index
unsigned i;
for (i = cnt-2; i < cnt; i--)
a[i] += a[i+1];
 See Robert Seacord, Secure Coding in C and C++
▪ C Standard guarantees that unsigned addition will behave like modular
▪ 0 – 1 → UMax

 Even better
size_t i;
for (i = cnt-2; i < cnt; i--)
a[i] += a[i+1];
▪ Data type size_t defined as unsigned value with length = word size
▪ Code will work even if cnt = UMax
▪ What if cnt is signed and < 0?
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 49
Carnegie Mellon

Why Should I Use Unsigned? (cont.)

 Do Use When Performing Modular Arithmetic
▪ Multiprecision arithmetic
 Do Use When Using Bits to Represent Sets
▪ Logical right shift, no sign extension

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 50

Carnegie Mellon

Today: Bits, Bytes, and Integers

 Representing information as bits
 Bit-level manipulations
 Integers
▪ Representation: unsigned and signed
▪ Conversion, casting
▪ Expanding, truncating
▪ Addition, negation, multiplication, shifting
▪ Summary
 Representations in memory, pointers, strings

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 51

Carnegie Mellon

Byte-Oriented Memory Organization


 Programs refer to data by address

▪ Conceptually, envision it as a very large array of bytes
In reality, it’s not, but can think of it that way

▪ An address is like an index into that array
▪ and, a pointer variable stores an address

 Note: system provides private address spaces to each “process”

▪ Think of a process as a program being executed
▪ So, a program can clobber its own data, but not that of others

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 52

Carnegie Mellon

Machine Words
 Any given computer has a “Word Size”
▪ Nominal size of integer-valued data
▪ and of addresses

▪ Until recently, most machines used 32 bits (4 bytes) as word size

▪ Limits addresses to 4GB (232 bytes)

▪ Increasingly, machines have 64-bit word size

▪ Potentially, could have 18 PB (petabytes) of addressable memory
▪ That’s 18.4 X 1015

▪ Machines still support multiple data formats

▪ Fractions or multiples of word size
▪ Always integral number of bytes
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 53
Carnegie Mellon

Word-Oriented Memory Organization

32-bit 64-bit
Bytes Addr.
 Addresses Specify Byte Words Words
Locations 0000
▪ Address of first byte in word =
?? 0002
▪ Addresses of successive words differ Addr
by 4 (32-bit) or 8 (64-bit) =
?? 0004
?? 0006
?? 0010
= 0011
?? 0012
?? 0014
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 54
Carnegie Mellon

Example Data Representations

C Data Type Typical 32-bit Typical 64-bit x86-64

char 1 1 1
short 2 2 2
int 4 4 4
long 4 8 8
float 4 4 4
double 8 8 8
long double − − 10/16

pointer 4 8 8

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 55

Carnegie Mellon

Byte Ordering
 So, how are the bytes within a multi-byte word ordered in
 Conventions
▪ Big Endian: Sun, PPC Mac, Internet
Least significant byte has highest address

▪ Little Endian: x86, ARM processors running Android, iOS, and
▪ Least significant byte has lowest address

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 56

Carnegie Mellon

Byte Ordering Example

 Example
▪ Variable x has 4-byte value of 0x01234567
▪ Address given by &x is 0x100

Big Endian 0x100 0x101 0x102 0x103

01 23
23 45
45 67

Little Endian 0x100 0x101 0x102 0x103

67 45
45 23
23 01

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 57

Carnegie Mellon

Decimal: 15213
Representing Integers Binary: 0011 1011 0110 1101
Hex: 3 B 6 D

int A = 15213; long int C = 15213;

IA32, x86-64 Sun
IA32 x86-64 Sun
6D 00
3B 00 6D 6D 00
3B 3B 00
00 3B
00 00 3B
00 6D
00 00 6D
int B = -15213; 00
IA32, x86-64 Sun
93 FF
FF 93 Two’s complement representation

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 58

Carnegie Mellon

Examining Data Representations

 Code to Print Byte Representation of Data
▪ Casting pointer to unsigned char * allows treatment as a byte array
typedef unsigned char *pointer;

void show_bytes(pointer start, size_t len){

size_t i;
for (i = 0; i < len; i++)
printf(”%p\t0x%.2x\n",start+i, start[i]);

Printf directives:
%p: Print pointer
%x: Print Hexadecimal

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 59

Carnegie Mellon

show_bytes Execution Example

int a = 15213;
printf("int a = 15213;\n");
show_bytes((pointer) &a, sizeof(int));

Result (Linux x86-64):

int a = 15213;
0x7fffb7f71dbc 6d
0x7fffb7f71dbd 3b
0x7fffb7f71dbe 00
0x7fffb7f71dbf 00

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 60

Carnegie Mellon

Representing Pointers
int B = -15213;
int *P = &B;

Sun IA32 x86-64

FF 28 1B
2C FF 82

Different compilers & machines assign different locations to objects

Even get different results each time run program

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 61
Carnegie Mellon

Representing Strings
char S[6] = "18213";
 Strings in C
▪ Represented by array of characters
▪ Each character encoded in ASCII format IA32 Sun
Standard 7-bit encoding of character set
▪ 31 31
▪ Character “0” has code 0x30 38 38
– Digit i has code 0x30+i 32 32
▪ String should be null-terminated 31 31
▪ Final character = 0
33 33
 Compatibility 00 00
▪ Byte ordering not an issue

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 62

Carnegie Mellon

Integer C Puzzles
• x < 0 ((x*2) < 0)
• ux >= 0
• x & 7 == 7 (x<<30) < 0
• ux > -1
• x > y -x < -y
• x * x >= 0
Initialization • x > 0 && y > 0 x + y > 0
• x >= 0 -x <= 0
int x = foo();
• x <= 0 -x >= 0
int y = bar(); • (x|-x)>>31 == -1
unsigned ux = x; • ux >> 3 == ux/8
unsigned uy = y; • x >> 3 == x/8
• x & (x-1) != 0

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 63

Carnegie Mellon

Bonus extras

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 64

Carnegie Mellon

Application of Boolean Algebra

 Applied to Digital Systems by Claude Shannon
▪ 1937 MIT Master’s Thesis
▪ Reason about networks of relay switches
▪ Encode closed switch as 1, open switch as 0

Connection when
A ~B
A&~B | ~A&B
~A B

~A&B = A^B

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 65

Carnegie Mellon

Binary Number Property

1 + 1 + 2 + 4 + 8 + … + 2w-1 = 2w
1+ å 2i = 2w

 w = 0:
▪ 1 = 20
 Assume true for w-1:
▪ 1 + 1 + 2 + 4 + 8 + … + 2w-1 + 2w = 2w + 2w = 2w+1

= 2w

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 66

Carnegie Mellon

Code Security Example

/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];

/* Copy at most maxlen bytes from kernel region to user buffer */

int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;

 Similar to code found in FreeBSD’s implementation of

 There are legions of smart people trying to find
vulnerabilities in programs
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 67
Carnegie Mellon

Typical Usage
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];

/* Copy at most maxlen bytes from kernel region to user buffer */

int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;

#define MSIZE 528

void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, MSIZE);
printf(“%s\n”, mybuf);

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 68

Carnegie Mellon

Malicious Usage /* Declaration of library function memcpy */

void *memcpy(void *dest, void *src, size_t n);

/* Kernel memory region holding user-accessible data */

#define KSIZE 1024
char kbuf[KSIZE];

/* Copy at most maxlen bytes from kernel region to user buffer */

int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;

#define MSIZE 528

void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, -MSIZE);
. . .

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 69

Carnegie Mellon

Mathematical Properties
 Modular Addition Forms an Abelian Group
▪ Closed under addition
0  UAddw(u , v)  2w –1
▪ Commutative
UAddw(u , v) = UAddw(v , u)
▪ Associative
UAddw(t, UAddw(u , v)) = UAddw(UAddw(t, u ), v)
▪ 0 is additive identity
UAddw(u , 0) = u
▪ Every element has additive inverse
▪ Let UCompw (u ) = 2w – u
UAddw(u , UCompw (u )) = 0

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 70

Carnegie Mellon

Mathematical Properties of TAdd

 Isomorphic Group to unsigneds with UAdd
▪ TAddw(u , v) = U2T(UAddw(T2U(u ), T2U(v)))
▪ Since both have identical bit patterns

 Two’s Complement Under TAdd Forms a Group

▪ Closed, Commutative, Associative, 0 is additive identity
▪ Every element has additive inverse
−u u  TMinw
TCompw (u) = 
TMinw u = TMinw

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 71

Carnegie Mellon

Characterizing TAdd
Positive Overflow
 Functionality TAdd(u , v)
▪ True sum requires w+1 bits
▪ Drop off MSB v
▪ Treat remaining bits as 2’s <0
comp. integer
<0 >0
Negative Overflow

u + v + 22ww−1 u + v  TMinw (NegOver)

TAddw (u,v) = u + v TMinw  u + v  TMax w
u + v − 22ww−1 TMax w  u + v (PosOver)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 72

Carnegie Mellon

Negation: Complement & Increment

 Claim: Following Holds for 2’s Complement
~x + 1 == -x
 Complement
▪ Observation: ~x + x == 1111…111 == -1
x 10011101
+ ~x 0 1 1 0 0 0 1 0
-1 11111111
 Complete Proof?

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 73

Carnegie Mellon

Complement & Increment Examples

x = 15213
Decimal Hex Binary
x 15213 3B 6D 00111011 01101101
~x -15214 C4 92 11000100 10010010
~x+1 -15213 C4 93 11000100 10010011
y -15213 C4 93 11000100 10010011

Decimal Hex Binary
0 0 00 00 00000000 00000000
~0 -1 FF FF 11111111 11111111
~0+1 0 00 00 00000000 00000000

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 74

Carnegie Mellon

Code Security Example #2

 SUN XDR library
▪ Widely used library for transferring data between machines

void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size);


malloc(ele_cnt * ele_size)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 75

Carnegie Mellon

XDR Code
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) {
* Allocate buffer for ele_cnt objects, each of ele_size bytes
* and copy from locations designated by ele_src
void *result = malloc(ele_cnt * ele_size);
if (result == NULL)
/* malloc failed */
return NULL;
void *next = result;
int i;
for (i = 0; i < ele_cnt; i++) {
/* Copy object i to destination */
memcpy(next, ele_src[i], ele_size);
/* Move pointer to next memory region */
next += ele_size;
return result;

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 76

Carnegie Mellon

XDR Vulnerability
malloc(ele_cnt * ele_size)

 What if:
▪ ele_cnt = 220 + 1
▪ ele_size = 4096 = 212
▪ Allocation = ??

 How can I make this function secure?

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 77

Carnegie Mellon

Compiled Multiplication Code

C Function
long mul12(long x)
return x*12;

Compiled Arithmetic Operations Explanation

leaq (%rax,%rax,2), %rax t <- x+x*2
salq $2, %rax return t << 2;

 C compiler automatically generates shift/add code when

multiplying by constant

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 78

Carnegie Mellon

Compiled Unsigned Division Code

C Function
unsigned long udiv8
(unsigned long x)
return x/8;

Compiled Arithmetic Operations Explanation

shrq $3, %rax # Logical shift
return x >> 3;

 Uses logical shift for unsigned

 For Java Users
▪ Logical shift written as >>>

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 79

Carnegie Mellon

Signed Power-of-2 Divide with Shift

 Quotient of Signed by Power of 2
▪ x >> k gives  x / 2k 
▪ Uses arithmetic shift
▪ Rounds wrong direction when u < 0
x ••• ••• Binary Point
/ 2k 0 ••• 0 1 0 ••• 0 0

Division: x / 2k 0 ••• ••• . •••

Result: RoundDown(x / 2k) 0 ••• •••
Division Computed Hex Binary
y -15213 -15213 C4 93 11000100 10010011
y >> 1 -7606.5 -7607 E2 49 11100010 01001001
y >> 4 -950.8125 -951 FC 49 11111100 01001001
y >> 8 -59.4257813 -60 FF C4 11111111 11000100

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 80

Carnegie Mellon

Correct Power-of-2 Divide

 Quotient of Negative Number by Power of 2
▪ Want  x / 2k  (Round Toward 0)
▪ Compute as  (x+2k-1)/ 2k 
▪ In C: (x + (1<<k)-1) >> k
▪ Biases dividend toward 0

Case 1: No rounding k
Dividend: u 1 ••• 0 ••• 0 0
+2k –1 0 ••• 0 0 1 ••• 1 1
1 ••• 1 ••• 1 1 Binary Point

Divisor: / 2k 0 ••• 0 1 0 ••• 0 0

 u / 2k  0 •••
1 1 1 1 ••• . 1 ••• 1 1

Biasing has no effect

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 81
Carnegie Mellon

Correct Power-of-2 Divide (Cont.)

Case 2: Rounding
Dividend: x 1 ••• •••
+2k –1 0 ••• 0 0 1 ••• 1 1
1 ••• •••

Incremented by 1 Binary Point

Divisor: / 2k 0 ••• 0 1 0 ••• 0 0
 x / 2k  0 •••
1 1 1 1 ••• . •••

Incremented by 1
Biasing adds 1 to final result
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 82
Carnegie Mellon

Compiled Signed Division Code

C Function
long idiv8(long x)
return x/8;

Compiled Arithmetic Operations Explanation

testq %rax, %rax if x < 0
js L4 x += 7;
L3: # Arithmetic shift
sarq $3, %rax return x >> 3;
addq $7, %rax  Uses arithmetic shift for int
jmp L3
 For Java Users
▪ Arith. shift written as >>

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 83

Carnegie Mellon

Arithmetic: Basic Rules

 Unsigned ints, 2’s complement ints are isomorphic rings:
isomorphism = casting

 Left shift
▪ Unsigned/signed: multiplication by 2k
▪ Always logical shift

 Right shift
▪ Unsigned: logical shift, div (division + round to zero) by 2k
▪ Signed: arithmetic shift
▪ Positive numbers: div (division + round to zero) by 2k
▪ Negative numbers: div (division + round away from zero) by 2 k
Use biasing to fix

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 84

Carnegie Mellon

Properties of Unsigned Arithmetic

 Unsigned Multiplication with Addition Forms
Commutative Ring
▪ Addition is commutative group
▪ Closed under multiplication
0  UMultw(u , v)  2w –1
▪ Multiplication Commutative
UMultw(u , v) = UMultw(v , u)
▪ Multiplication is Associative
UMultw(t, UMultw(u , v)) = UMultw(UMultw(t, u ), v)
▪ 1 is multiplicative identity
UMultw(u , 1) = u
▪ Multiplication distributes over addtion
UMultw(t, UAddw(u , v)) = UAddw(UMultw(t, u ), UMultw(t, v))

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 85

Carnegie Mellon

Properties of Two’s Comp. Arithmetic

 Isomorphic Algebras
▪ Unsigned multiplication and addition
Truncating to w bits

▪ Two’s complement multiplication and addition
▪ Truncating to w bits

 Both Form Rings

▪ Isomorphic to ring of integers mod 2w
 Comparison to (Mathematical) Integer Arithmetic
▪ Both are rings
▪ Integers obey ordering properties, e.g.,
u>0  u+v>v
u > 0, v > 0  u·v>0
▪ These properties are not obeyed by two’s comp. arithmetic
TMax + 1 == TMin
15213 * 30426 == -10030
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
(16-bit words) 86
Carnegie Mellon

Reading Byte-Reversed Listings

 Disassembly
▪ Text representation of binary machine code
▪ Generated by program that reads the machine code
 Example Fragment
Address Instruction Code Assembly Rendition
8048365: 5b pop %ebx
8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx
804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)

 Deciphering Numbers
▪ Value: 0x12ab
▪ Pad to 32 bits: 0x000012ab
▪ Split into bytes: 00 00 12 ab
▪ Reverse: ab 12 00 00

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 87

You might also like