03 Data
03 Data
03 Data
Lecture 3:
A Brief Introduction to Data and Data Processing
· Values, variables
· Vectors
· Matrices
· Loops
· Logical statements
· Control statements
· Functions
Three tutorials
# Vectors
some_numbers <- c(30, 50, 60)
some_numbers[c(2,3)]
some_numbers > 3
some_numbers * 5
Warm-up
What is total_sum?
# start loop
for (i in 1:n) {
if(i %% 2 == 0){
total_sum <- total_sum + numbers[i]
} else {
total_sum <- total_sum + 2*numbers[i]
}
}
Don’t forget…
Data Processing
The binary system
· ‘Off’ = 0
· ‘On’ = 1
The binary counting frame
· Solution:
7 3 1 0
(1 × 2 ) + (1 × 2 ) + (1 × 2 ) + (1 × 2 ) = 139.
The binary counting frame
· Solution:
7 3 1 0
(1 × 2 ) + (1 × 2 ) + (1 × 2 ) + (1 × 2 ) = 139.
· More precisely:
7 6 5 4 3
(1 × 2 ) + (0 × 2 ) + (0 × 2 ) + (0 × 2 ) + (1 × 2 )
2 1 0
+ (0 × 2 ) + (1 × 2 ) + (1 × 2 ) = 139.
· That is, the number 139 in the decimal system corresponds to 10001011 in
the binary system.
Conversion between binary and decimal
Number 128 64 32 16 8 4 2 1
Conversion between binary and decimal
Number 128 64 32 16 8 4 2 1
0= 0 0 0 0 0 0 0 0
1= 0 0 0 0 0 0 0 1
2= 0 0 0 0 0 0 1 0
3= 0 0 0 0 0 0 1 1
139 = 1 0 0 0 1 0 1 1
The binary counting frame
print(x)
## [1] 0.1
print(y)
## [1] 0.1
print(result)
## [1] FALSE
Floating point numbers: a strange phenomenon
## [1] "0.099999999999999977796"
## [1] "0.10000000000000000555"
## [1] TRUE
Decimal numbers in a computer
If computers only understand 0 and 1, how can they express decimal numbers
like 139?
Decimal numbers in a computer
If computers only understand 0 and 1, how can they express decimal numbers
like 139?
· Standards define how symbols, colors, etc are shown on the screen.
· Facilitates interaction with a computer (our keyboards do not only consist of
a 0/1 switch).
What time is it?
The hexadecimal system
· 16 symbols:
- 0-9 (used like in the decimal system)…
- and A-F (for the numbers 10 to 15).
The hexadecimal system
· 16 symbols:
- 0-9 (used like in the decimal system)…
- and A-F (for the numbers 10 to 15).
· 16 symbols >>> base 16: each digit represents an increasing power of 16 (
16 , 16 , etc.).
0 1
The hexadecimal system
· Solution:
1 0
(8 × 16 ) + (11 × 16 ) = 139.
· More precisely:
1 0
(8 × 16 ) + (B × 16 ) = 8B = 139.
WHY?
😆
Character Encoding
Computers and text
A modified version of South Korean Dubeolsik (two-set type) for old hangul letters. (Illustration by Yes0song 2010, Creative Commons Attribution-Share Alike 3.0
Unported)
Computers and text
0011 1111 3F 63 ?
0100 0001 41 65 A
0110 0010 62 98 b
Character encodings: why should we care?
Character encodings: why should we care?
· In practice, Data Science means handling digital data of all formats and
shapes.
- Diverse sources.
- Different standards.
- Different languages (Japanese vs English).
- read/store data.
· At the lowest level, this means understanding/handling encodings.
Computer Code and Text-Files
Putting the pieces together…
In both of these domains we mainly work with one simple type of document:
text files.
Text-files
1. Access a website (over the Internet), use keyboard to enter data into a
website (a Google sheet in that case).
2. R program accesses the data of the Google sheet (again over the Internet),
downloads the data, and loads it into RAM.
🤓
Q&A
References