06 - The Basics of Python in DS
06 - The Basics of Python in DS
Shell
Evaluates what you enter and displays output
Interactive
Type at “>>>” prompt
Editor
Create and save .py files
Can run files and display output in
shell
A simple example: Hello world!
Syntax highlighting:
IDLE colors code differently depending on
6
Fundamental concepts in Python
Variables in Python:
Case sensitive
No need to declare in program
No need to specify the data type
Can be changed to another data type
The values of variables should be assigned as soon as they appear
Basic data types:
String: str
Number: integer, float, fraction and complex
list, tuple, dictionary
7
Fundamental concepts in Python
Variables
>>> someVar = 2
To re-use a value in multiple >>> print someVar # it’s an
int
computations, store it in a 2
>>> someVar = “Why hello there”
variable. >>> print someVar # now str
Why hello there
Python is “dynamically-
typed”, so you can change
the type of value stored.
(differs from Java, C#,
C++, …)
Fundamental concepts in Python
9
Data types in Python
String
We’ve already seen one
type in Python, used for
words and phrases.
In general, this type is
called “string”.
In Python, it’s referred to as
str.
Data types in Python
for numbers.
>>> print 6. # float
6.0
int – integers
>>> print 2.3914 # float
2.3914
float – floating
point (decimal)
numbers
Data types in Python
int
In Python 3.X, int has unlimited range.
Can be used for the computations on very large numbers.
Common operators
Operations Examples
The division with x+y 20 + 3 = 23
the rounded
x–y 20 – 3 = 17
result
x*y 20 * 3 = 60
x/y 20 / 3 = 6.666
x // y 20 // 3 = 6
x%y 20 % 3 = 2
12
Data types in Python
int
Examples
13
Data types in Python
Float
Values: distinguished from integers by decimals. The integer part and
the real part are separated by ‘.’
Operators: +, –, *, /, ** and unary operators
Use decimal for higher precision: from decimal import *
14
Data types in Python - bool
>>> a <= b
True
Python has the values True >>> a
False
== b # does a equal b?
False
Keywords: and, or, not
math
exp(x)
log(x[, base])
log10(x)
pow(x, y)
sqrt(x)
acos(x)
asin(x)
atan(x) atan2(y, x)
cos(x) hypot(x, y)
sin(x) tan(x)
degrees(x) radians(x)
cosh(x) sinh(x) tanh(x)
Constant number: pi, e
Built-in functions in Python
Conditionals: if, elif, else
Conditionals: if, elif, else
Indentation is important in
Python!
Syntax : Example:
var = 100
if condition1: if var < 200:
tasks_1 print (“The value of variable is less than 200")
if condition2: if var == 150:
tasks_2 print (“The value is 150")
elif condition3: elif var == 100:
tasks_3 print (" The value is 100")
else elif var == 50:
tasks print (" The value is 50")
elif condition4: elif var < 50:
tasks_4 print (" The value of variable is less than 50")
else: else:
tasks_5 print (“There is no true condition")
Nested if
Example:
var = int(input('Enter a value: '))
if var < 200:
print (“The value of variable is less than 200")
if var == 150:
print (“The value is 150")
elif var == 100:
print (" The value is 100")
elif var == 50:
print (" The value is 50")
elif var < 50:
print (" The value of variable is less than 50")
else:
print (“There is no true condition")
Exersise
Example: Find all prime numbers that are less than 100
Nested loops - Exersise
Example: Find all prime numbers that are less than 100
i=2
while(i < 100):
j=2
while(j <= (i/j)):
if not(i%j): break
j=j+1
if (j > i/j) : print (i, " is a prime number!")
i=i+1
Lists
A sequence of items
Has the ability to grow (unlike array)
Use indexes to access elements (array notation)
examples
aList = []
another = [1,2,3]
You can print an entire list or an element
print another
print another[0]
index -1 accesses the end of a list
List operation
Numpy
Matplotlib
Pandas
Scikit-learn
Data processing in Python
Examples:
Numpy
Access by index (slicing)
import numpy as np
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
row_r1 = a[1, :] #1-dimensional array of length 4
row_r2 = a[1:2, :]
# 2-dimensional array 2x4
print(row_r1, row_r1.shape)
# Display "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)
# Display "[[5 6 7 8]] (1, 4)"
# Display "[[ 2]
[ 6]
Numpy
import numpy as np
x = np.array([[1, 2 ] , [ 3 , 4 ] ] , dtype=np.float64)
y = np.array([[5, 6 ] , [ 7 , 8 ] ] , dtype=np.float64)
fmt = '[color][marker][line]‘
[colors] :
‘b’ – blue
‘g’ – green
‘r’ –red
‘c’ – cyan
‘m’ – magenta
‘y’ –yellow
‘b’ – black
‘w’ –white
#rrggbb – chỉ ra mã màu theo hệRGB
mathplotlib
Line plot
[marker] – the notation for data:
‘o’ – circle
‘v’ – (‘^’, ‘<‘,‘>’)
‘*’ – star
‘.’ – dot
‘p’ – pentagon
…
[line] – line type:
‘-’ solid line
‘--‘ dash
‘-.’ dotted line
‘:’
mathplotlib
Example – subplot
import numpy as np
import matplotlib.pyplot as p l t
x1 = np.linspace(0.0, 5.0)
x2 = np.linspace(0.0, 2.0)
y1 = np.cos(2 * np.pi * x1) * np.exp(-x1)
y2 = np.cos(2 * np.pi * x2)
plt.subplot(2, 1, 1)
plt.plot(x1, y1, 'o-')
plt.subplot(2, 1, 2)
plt.plot(x2, y2, '.-')
plt.show()
Pandas
General syntax:
pd.DataFrame(data, index, columns, dtype, copy)
In there:
‘data’ will receive values from many different types such
as list, dictionary, ndarray, series,... and even other
DataFrames
‘index’ is the column index label of the dataframe
‘columns’ is the row index label of the dataframe
‘dtype’ is the data type for each column
‘copy’ takes the value True/False to indicate whether
data is copied to a new memory area, default is False
Pandas
Syntax:
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
In there:
‘data’ can accept the following data types: ndarray,
series, map, lists, dict, constants and other dataframes
‘items’ is axis = 0
‘major_axis’ is axis = 1
‘minor_axis’ is axis = 2
‘dtype’ is the data type of each column
‘copy’ takes the value True/False to determine whether
the data shares memory or not
Pandas – Series
import pandas as pd
import numpy as np
Functions on Series
S.axes: returns a list of indexes of S
S.dtype: returns the data type of S's elements
S.empty: returns True if S is empty
S.ndim: returns the dimension of S (1)
S.size: returns the number of elements of S
S.values: returns a list of elements of S
S.head(n): returns the first n elements of S
S.tail(n): returns the last n elements of S
Pandas – Series
Operations on Series
import pandas as pd import numpy as np
CNTT 680.0
S = pd.Series(gia_tri, index=chi_so) Co khi NaN
P= pd.Series([100, 100], ['CNTT', 'PM']) KT NaN
Y= S +P Ke NaN
print(Y) toan NaN
dtype:
PM float64
Pandas - Frame
names_rank = [['MIT',1],["Stanford",2],["DHTL",200]] df
= pd.DataFrame(names_rank)
0 1
print(df) 0 MIT 1
1 Stanford 2
2 DHTL 200
Pandas - Frame
names_rank = [['MIT',1],["Stanford",2],["DHTL",200]] df
= pd.DataFrame(names_rank)
0 1
print(df) 0 MIT 1
1 Stanford 2
2 DHTL 200
Pandas - Panel
Syntax:
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
In there:
‘data’ can accept the following data types: ndarray,
series, map, lists, dict, constants and other dataframes
‘items’ is axis = 0
‘major_axis’ is axis = 1
‘minor_axis’ is axis = 2
‘dtype’ is the data type of each column
‘copy’ takes the value True/False to determine whether
the data shares memory or not
scikit- learn (sklearn)
Basic machine learning problem classes
scikit- learn (sklearn)
Linear regression
Data clustering
Data layering
Linear regression
df = pd.read_csv("nguoi2.csv", index_col = 0)
print(df)
#Training model
X = df.loc[:, ['Cao‘, 'GT']].values
y = df.Nang.values
model = linear_model.LinearRegression()
model.fit(X, y)
Data clustering
from sklearn.cluster import Kmeans
Data layering
from sklearn.naive_bayes import GaussianNB
from sklearn import tree
Classification
Classification
Clustering
Clustering
Exercises