02 03 Bits Ints
02 03 Bits Ints
02 03 Bits Ints
Instructors:
Randal E. Bryant and David R. O’Hallaron
Everything is bits
Each bit is 0 or 1
By encoding/interpreting sets of bits in various ways
▪ Computers determine what to do (instructions)
▪ … and represent and manipulate numbers, sets, strings, etc…
Why bits? Electronic Implementation
▪ Easy to store with bistable elements
▪ Reliably transmitted on noisy and inaccurate wires
0 1 0
1.1V
0.9V
0.2V
0.0V
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 3
Carnegie Mellon
char 1 1 1
short 2 2 2
int 4 4 4
long 4 8 8
float 4 4 4
double 8 8 8
long double − − 10/16
pointer 4 8 8
Boolean Algebra
Developed by George Boole in 19th Century
▪ Algebraic representation of logic
▪ Encode “True” as 1 and “False” as 0
And Or
◼ A&B = 1 when both A=1 and B=1 ◼ A|B = 1 when either A=1 or B=1
▪ 01101001 { 0, 3, 5, 6 }
▪ 76543210
▪ 01010101 { 0, 2, 4, 6 }
▪ 76543210
Operations
▪ & Intersection 01000001 { 0, 6 }
▪ | Union 01111101 { 0, 2, 3, 4, 5, 6 }
▪ ^ Symmetric difference 00111100 { 2, 3, 4, 5 }
▪ ~ Complement 10101010 { 1, 3, 5, 7 }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 10
Carnegie Mellon
Bit-Level Operations in C
Operations &, |, ~, ^ Available in C
▪ Apply to any “integral” data type
▪ long, int, short, char, unsigned
▪ View arguments as bit vectors
▪ Arguments applied bit-wise
Examples (Char data type)
▪ ~0x41 0xBE
~010000012 101111102
▪
▪ ~0x00 0xFF
▪ ~000000002 111111112
▪ 0x69 & 0x55 0x41
▪ 011010012 & 010101012 010000012
▪ 0x69 | 0x55 0x7D
▪ 011010012 | 010101012 011111012
Shift Operations
Left Shift: x << y Argument x 01100010
▪ Shift bit-vector x left y positions << 3 00010000
– Throw away extra bits on left
Log. >> 2 00011000
▪ Fill with 0’s on right
Arith. >> 2 00011000
Right Shift: x >> y
▪ Shift bit-vector x right y positions
Throw away extra bits on right
▪ Argument x 10100010
Undefined Behavior
▪ Shift amount < 0 or ≥ word size
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 14
Carnegie Mellon
Encoding Integers
Unsigned Two’s Complement
w−1 w−2
B2U(X ) = xi 2 i
B2T (X ) = − xw−1 2 w−1
+ xi 2 i
i=0 i=0
Sign Bit
▪ For 2’s complement, most significant bit indicates sign
▪ 0 for nonnegative
▪ 1 for negative
Numeric Ranges
Unsigned Values
Two’s Complement Values
▪ UMin = 0
▪ TMin = –2w–1
000…0
100…0
▪ UMax = 2w –1
▪ TMax = 2w–1 – 1
111…1
011…1
Other Values
▪ Minus 1
111…1
Values for W = 16
Decimal Hex Binary
UMax 65535 FF FF 11111111 11111111
TMax 32767 7F FF 01111111 11111111
TMin -32768 80 00 10000000 00000000
-1 -1 FF FF 11111111 11111111
0 0 00 00 00000000 00000000
Observations C Programming
▪ |TMin | = TMax + 1 ▪ #include <limits.h>
▪ Asymmetric range ▪ Declares constants, e.g.,
▪ UMax = 2 * TMax + 1 ▪ ULONG_MAX
▪ LONG_MAX
▪ LONG_MIN
▪ Values platform specific
w–1 0
ux + + + ••• +++
x - ++ ••• +++
Conversion Visualized
2’s Comp. → Unsigned
▪ Ordering Inversion UMax
▪ Negative → Big Positive UMax – 1
TMax + 1 Unsigned
TMax TMax Range
2’s Complement
0 0
Range
–1
–2
TMin
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 26
Carnegie Mellon
Casting Surprises
Expression Evaluation
▪ If there is a mix of unsigned and signed in single expression,
signed values implicitly cast to unsigned
▪ Including comparison operations <, >, ==, <=, >=
▪ Examples for W = 32: TMIN = -2,147,483,648 , TMAX = 2,147,483,647
Constant1 Constant2 Relation Evaluation
0 0 0U
0U == unsigned
-1 -1 00 < signed
-1 -1 0U
0U > unsigned
2147483647
2147483647 -2147483647-1
-2147483648 > signed
2147483647U
2147483647U -2147483647-1
-2147483648 < unsigned
-1 -1 -2
-2 > signed
(unsigned)-1
(unsigned) -1 -2
-2 > unsigned
2147483647
2147483647 2147483648U
2147483648U < unsigned
2147483647
2147483647 (int)2147483648U
(int) 2147483648U > signed
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 28
Carnegie Mellon
Summary
Casting Signed ↔ Unsigned: Basic Rules
Bit pattern is maintained
But reinterpreted
Can have unexpected effects: adding or subtracting 2 w
Sign Extension
Task:
▪ Given w-bit signed integer x
▪ Convert it to w+k-bit integer with same value
Rule:
▪ Make k copies of sign bit:
▪ X = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0
k copies of MSB w
X •••
•••
X ••• •••
k
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
w 31
Carnegie Mellon
Summary:
Expanding, Truncating: Basic Rules
Expanding (e.g., short int to int)
▪ Unsigned: zeros added
▪ Signed: sign extension
▪ Both yield expected result
Unsigned Addition
Operands: w bits u •••
+v •••
True Sum: w+1 bits
u+v •••
Discard Carry: w bits UAddw(u , v) •••
▪ If true sum ≥ 2w
▪ At most once UAdd4(u , v)
True Sum 16
14
2w+1 Overflow 12
10
8
2w 6 12
14
4 10
8
2
6 v
0 0 4
Modular Sum 0
2
4
6
2
u 8
10
12
14
0
TAdd Overflow
Functionality True Sum
▪ True sum requires w+1 0 111…1 2w–1
PosOver
bits TAdd Result
▪ Drop off MSB 0 100…0 2w –1–1 011…1
▪ Treat remaining bits as
2’s comp. integer 0 000…0 0 000…0
1 000…0 NegOver
–2w
Values
▪ 4-bit two’s comp. TAdd4(u , v)
▪ Range from -8 to +7
Wraps Around
8
▪ If sum 2w–1 6
Becomes negative
▪ 4
2
▪ At most once 0
6
▪ At most once -8
-6
-4 -6
-4
v
-2
0 -8
2
u 4
6 PosOver
Multiplication
Goal: Computing Product of w-bit numbers x, y
▪ Either signed or unsigned
But, exact results can be bigger than w bits
▪ Unsigned: up to 2w bits
Result range: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
▪
▪ Two’s complement min (negative): Up to 2w-1 bits
▪ Result range: x * y ≥ (–2w–1)*(2w–1–1) = –22w–2 + 2w–1
▪ Two’s complement max (positive): Up to 2w bits, but only for (TMinw)2
▪ Result range: x * y ≤ (–2w–1) 2 = 22w–2
Unsigned Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u·v ••• •••
UMultw(u , v) •••
Discard w bits: w bits
Signed Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u·v ••• •••
TMultw(u , v) •••
Discard w bits: w bits
Multiplication:
▪ Unsigned/signed: Normal multiplication followed by truncate,
same operation on bit level
▪ Unsigned: multiplication mod 2w
▪ Signed: modified multiplication mod 2w (result in proper range)
Even better
size_t i;
for (i = cnt-2; i < cnt; i--)
a[i] += a[i+1];
▪ Data type size_t defined as unsigned value with length = word size
▪ Code will work even if cnt = UMax
▪ What if cnt is signed and < 0?
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 49
Carnegie Mellon
•••
Machine Words
Any given computer has a “Word Size”
▪ Nominal size of integer-valued data
▪ and of addresses
char 1 1 1
short 2 2 2
int 4 4 4
long 4 8 8
float 4 4 4
double 8 8 8
long double − − 10/16
pointer 4 8 8
Byte Ordering
So, how are the bytes within a multi-byte word ordered in
memory?
Conventions
▪ Big Endian: Sun, PPC Mac, Internet
Least significant byte has highest address
▪
▪ Little Endian: x86, ARM processors running Android, iOS, and
Windows
▪ Least significant byte has lowest address
Decimal: 15213
Representing Integers Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
Printf directives:
%p: Print pointer
%x: Print Hexadecimal
Representing Pointers
int B = -15213;
int *P = &B;
Representing Strings
char S[6] = "18213";
Strings in C
▪ Represented by array of characters
▪ Each character encoded in ASCII format IA32 Sun
Standard 7-bit encoding of character set
▪ 31 31
▪ Character “0” has code 0x30 38 38
– Digit i has code 0x30+i 32 32
▪ String should be null-terminated 31 31
▪ Final character = 0
33 33
Compatibility 00 00
▪ Byte ordering not an issue
Integer C Puzzles
• x < 0 ((x*2) < 0)
• ux >= 0
• x & 7 == 7 (x<<30) < 0
• ux > -1
• x > y -x < -y
• x * x >= 0
Initialization • x > 0 && y > 0 x + y > 0
• x >= 0 -x <= 0
int x = foo();
• x <= 0 -x >= 0
int y = bar(); • (x|-x)>>31 == -1
unsigned ux = x; • ux >> 3 == ux/8
unsigned uy = y; • x >> 3 == x/8
• x & (x-1) != 0
Bonus extras
A&~B
Connection when
A ~B
A&~B | ~A&B
~A B
~A&B = A^B
w = 0:
▪ 1 = 20
Assume true for w-1:
▪ 1 + 1 + 2 + 4 + 8 + … + 2w-1 + 2w = 2w + 2w = 2w+1
= 2w
Typical Usage
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, MSIZE);
printf(“%s\n”, mybuf);
}
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, -MSIZE);
. . .
}
Mathematical Properties
Modular Addition Forms an Abelian Group
▪ Closed under addition
0 UAddw(u , v) 2w –1
▪ Commutative
UAddw(u , v) = UAddw(v , u)
▪ Associative
UAddw(t, UAddw(u , v)) = UAddw(UAddw(t, u ), v)
▪ 0 is additive identity
UAddw(u , 0) = u
▪ Every element has additive inverse
▪ Let UCompw (u ) = 2w – u
UAddw(u , UCompw (u )) = 0
Characterizing TAdd
Positive Overflow
Functionality TAdd(u , v)
▪ True sum requires w+1 bits
>0
▪ Drop off MSB v
▪ Treat remaining bits as 2’s <0
comp. integer
<0 >0
u
Negative Overflow
x=0
Decimal Hex Binary
0 0 00 00 00000000 00000000
~0 -1 FF FF 11111111 11111111
~0+1 0 00 00 00000000 00000000
ele_src
malloc(ele_cnt * ele_size)
XDR Code
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) {
/*
* Allocate buffer for ele_cnt objects, each of ele_size bytes
* and copy from locations designated by ele_src
*/
void *result = malloc(ele_cnt * ele_size);
if (result == NULL)
/* malloc failed */
return NULL;
void *next = result;
int i;
for (i = 0; i < ele_cnt; i++) {
/* Copy object i to destination */
memcpy(next, ele_src[i], ele_size);
/* Move pointer to next memory region */
next += ele_size;
}
return result;
}
XDR Vulnerability
malloc(ele_cnt * ele_size)
What if:
▪ ele_cnt = 220 + 1
▪ ele_size = 4096 = 212
▪ Allocation = ??
Case 1: No rounding k
Dividend: u 1 ••• 0 ••• 0 0
+2k –1 0 ••• 0 0 1 ••• 1 1
1 ••• 1 ••• 1 1 Binary Point
u / 2k 0 •••
1 1 1 1 ••• . 1 ••• 1 1
Incremented by 1
Biasing adds 1 to final result
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 82
Carnegie Mellon
Left shift
▪ Unsigned/signed: multiplication by 2k
▪ Always logical shift
Right shift
▪ Unsigned: logical shift, div (division + round to zero) by 2k
▪ Signed: arithmetic shift
▪ Positive numbers: div (division + round to zero) by 2k
▪ Negative numbers: div (division + round away from zero) by 2 k
Use biasing to fix
Deciphering Numbers
▪ Value: 0x12ab
▪ Pad to 32 bits: 0x000012ab
▪ Split into bytes: 00 00 12 ab
▪ Reverse: ab 12 00 00