Module-10 - Python Basic-2
Module-10 - Python Basic-2
UNIV/POLTEK
Python Basics
Function, Data Structures, Data Manipulation
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
• Python allows for the use of a special kind of function, a lambda function.
• Lambda functions are small, anonymous functions based on the lambda abstractions that
appear in many functional languages.
• Python can support many different programming paradigms including functional
programming.
Right now, we’ll take a look at some of the handy functional tools provided by Python.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Lambda functions
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
• Map
def square(x):
• • map(function, sequence) applies
function to each item in sequence and return x**2
returns the results as a list.
list(map(square, range(0,11)))
• • Multiple arguments can be provided if
the function supports it.
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
• Reduce
• • reduce(function, sequence) returns a import functools
single value computed as the result of
performing function on the first two def fact(x, y):
items, then on the result with the next return x*y
item, etc.
• • There’s an optional third argument print(functools.reduce(fact,
which is the starting value. range(1,5)))
24
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Creating lists
• To create a list in Python, we can use bracket notation to either create an empty list or an
initialized list.
mylist1 = [] # Creates an empty list
mylist2 = [expression1, expression2, ...]
mylist3 = [expression for variable in sequence]
Example:
mylist2 = [0,1,4,9,16]
mylist3 = [i for i*i in range(5)]
• The first two are referred to as list displays, where the last example is a list comprehension.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Creating lists
• We can also use the built-in list constructor to create a new list.
mylist1 = list()
mylist2 = list(sequence)
mylist3 = list(expression for variable in sequence)
• The sequence argument in the second example can be any kind of sequence object or
iterable. If another list is passed in, this will create a copy of the argument list.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Creating lists
• Note that you cannot create a new different list without initialization.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
• If the index is not known, use the index() method to find the first index of an item. An
exception will be raised if the item cannot be found.
>>> mylist = [34,67,45,29]
>>> mylist.index(67) 1
1
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
• You may also provide a step argument with any of the slicing constructions above.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
• Some examples:
mylist = [34, 56, 29, 73, 19, 62]
mylist[-2] # yields 19
mylist[-4::2] # yields [29, 19]
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Inserting/removing elements
• To add an element to an existing list, use the append() method.
• Use the extend() method to add all of the items from another list.
>>> mylist = [34, 56, 29, 73, 19, 62]
>>> mylist.extend([47,81])
>>> mylist
[34, 56, 29, 73, 19, 62, 47, 81]
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Inserting/removing elements
• Use the insert(pos, item) method to insert an item at the given position. You may also use
negative indexing to indicate the position.
>>> mylist = [34, 56, 29, 73, 19, 62]
>>> mylist.insert(2,47)
>>> mylist
[34, 56, 47, 29, 73, 19, 62]
• Use the remove() method to remove the first occurrence of a given item. An exception will be
raised if there is no matching item in the list.
>>> mylist = [34, 56, 29, 73, 19, 62]
>>> mylist.remove(29)
>>> mylist
[34, 56, 73, 19, 62]
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Lists as stacks
• You can use lists as a quick stack data structure.
• The append() and pop() methods implement a Last In, First Out (LIFO) structure.
• The pop(index) method will remove and return the item at the specified index. If no
index is specified, the last item is popped from the list.
>>> stack = [34, 56, 29, 73, 19, 62]
>>> stack.append(47)
>>> stack
[34, 56, 29, 73, 19, 62, 47]
>>> stack.pop()
47
>>> stack
[34, 56, 29, 73, 19, 62]
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Lists as queues
• Lists can be used as queues natively >>> from collections import deque
since insert() and pop() both support >>> queue = deque([35, 19, 67])
indexing. However, while appending >>> queue.append(42)
and popping from a list are fast,
inserting and popping from the >>> queue.append(23)
beginning of the list are slow >>> queue.popleft()
(especially with large lists. Why is 35
this?). >>> queue.popleft()
• Use the special deque object from 19
the collections module.
>>> queue
deque([67, 42, 23])
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Other operations
• The count(x) method will give you the number of occurrences of item x within the list.
• The sort() and reverse() methods sort and reverse >>> mylist = [5, 2, 3, 4, 1]
the list in place. The sorted(mylist) and
reversed(mylist) built-in functions will return a >>> mylist.sort()
sorted and reversed copy of the list, respectively. >>> mylist
[1, 2, 3, 4, 5]
>>> mylist.reverse()
>>> mylist
[5, 4, 3, 2, 1]
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Custom sorting
we can sort the list as follow (default = ascending)
>>> mylist = ['b', 'A', 'D', 'c']
>>> mylist.sort(key = str.lower)
>>> mylist
['A', 'b', 'c', 'D']
For descending
>>> mylist = ['b', 'A', 'D', 'c']
>>> mylist.sort(key = str.lower, reverse = True)
>>> mylist
[‘D', ‘c', ‘b', ‘A']
str.lower() is a built-in string method.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Creating sets
• Create an empty set with the set constructor.
myset = set()
myset2 = set([]) # both are empty sets
• Create an initialized set with the set constructor or the { } notation. Do not use empty curly
braces to create an empty set – you’ll get an empty dictionary instead.
myset = set(sequence)
myset2 = {expression for variable in sequence}
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Hashable items
• The way a set detects non-unique elements is by indexing the data in memory, creating a
hash for each element. This means that all elements in a set must be hashable.
• All of Python’s immutable built-in objects are hashable, while no mutable containers (such as
lists or dictionaries) are. Objects which are instances of user-defined classes are also hashable
by default.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Mutable operations
>>> myset = {x for x in 'abracadabra'}
• The following operations are not >>> myset
available for frozensets. set(['a', 'b', 'r', 'c', 'd'])
• The add(x) method will add element x >>> myset.add('y')
to the set if it’s not already there. The >>> myset
remove(x) and discard(x) methods will set(['a', 'b', 'r', 'c', 'd', 'y'])
remove x from the set. >>> myset.remove('a')
• The pop() method will remove and >>> myset
return an arbitrary element from the set(['b', 'r', 'c', 'd', 'y'])
set. Raises an error if the set is empty. >>> myset.pop()
• The clear() method removes all 'b'
elements from the set. >>> myset
set(['r', 'c', 'd', 'y'])
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
set ^= other
Update the set, keeping only elements found in either set, but not in both.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Set operations
• The following operations are available for both set and frozenset types.
• Comparison operators >=, <= test whether a set is a superset or subset, respectively, of some
other set. The > and < operators check for proper supersets/subsets.
>>> s1 = set('abracadabra')
>>> s2 = set('bard')
>>> s1 >= s2
True
>>> s1 > s2
True
>>> s1 <= s2
False
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Set operations
• Union: set | other | …
• Return a new set with elements from the set and all others.
• Intersection: set & other & …
• Return a new set with elements common to the set and all others.
• Difference: set – other – …
• Return a new set with elements in the set that are not in the others.
• Symmetric Difference: set ^ other
• Return a new set with elements in either the set or other but not both.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Set operations
>>> s1 = set('abracadabra')
>>> s1
set(['a', 'b', 'r', 'c', 'd'])
>>> s2 = set('alacazam')
>>> s2
set(['a', 'l', 'c', 'z', 'm'])
>>> s1 | s2
set(['a', 'b', 'r', 'c', 'd', 'l', 'z', 'm'])
>>> s1 & s2
set(['a', 'c'])
>>> s1 - s2
set(['b', 'r', 'd'])
>>> s1 ^ s2
set(['b', 'r', 'd', 'l', 'z', 'm'])
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Other operations
• s.copy() returns a shallow copy of the set s.
• s.isdisjoint(other) returns True if set s has no elements in common with set other.
• s.issubset(other) returns True if set s is a subset of set other.
• len, in, and not in are also supported.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Constructing tuples
• An empty tuple can be created with an empty set of parentheses.
• Pass a sequence type object into the tuple() constructor.
• Tuples can be initialized by listing comma-separated values. These do not need to be in
parentheses but they can be.
• One quirk: to initialize a tuple with a single value, use a trailing comma.
>>> t1 = (1, 2, 3, 4)
>>> t2 = "a", "b", "c", "d"
>>> t3 = ()
>>> t4 = ("red", )
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Tuple operations
• Tuples are very similar to lists and support a lot of the same operations.
• Accessing elements: use bracket notation (e.g. t1[2]) and slicing.
• Use len(t1) to obtain the length of a tuple.
• The universal immutable sequence type operations are all supported by tuples.
• +, *
• in, not in
• min(t), max(t), t.index(x), t.count(x)
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Packing/unpacking
• Tuple packing is used to “pack” a collection of items into a tuple. We can unpack a tuple using
Python’s multiple assignment feature.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Constructing a dictionary
• Create an empty dictionary with empty curly braces or the dict() constructor.
• You can initialize a dictionary by specifying each key:value pair within the curly braces.
• Note that keys must be hashable objects.
>>> d1 = {}
>>> d2 = dict() # both empty
>>> d3 = {"Name": "Susan", "Age": 19, "Major": "CS"}
>>> d4 = dict(Name="Susan", Age=19, Major="CS")
>>> d5 = dict(zip(['Name', 'Age', 'Major'], ["Susan", 19, "CS"]))
>>> d6 = dict([('Age', 19), ('Name', "Susan"), ('Major', "CS")])
Note: zip takes two equal-length collections and merges their corresponding elements into tuples.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Updating a dictionary
• Simply assign a key:value pair to modify it or add a new pair. The del keyword can be used to
delete a single key:value pair or the whole dictionary. The clear() method will clear the
contents of the dictionary.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Ordered dictionary
Dictionaries do not remember the order in which keys were inserted. An ordered dictionary
implementation is available in the collections module. The methods of a regular dictionary are
all supported by the OrderedDict class.
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Ordered dictionary
>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
Visualization libraries
• matplotlib
• Seaborn
44
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
provides vectorization of mathematical operations on arrays and matrices which significantly improves
the performance
Link: http://www.numpy.org/
45
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
SciPy:
collection of algorithms for linear algebra, differential equations, numerical integration, optimization,
statistics and more
built on NumPy
Link: https://www.scipy.org/scipylib/
46
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc.
Link: http://pandas.pydata.org/
47
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
SciKit-Learn:
provides machine learning algorithms: classification, regression, clustering, model validation etc.
Link: http://scikit-learn.org/
48
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
matplotlib:
python 2D plotting library which produces publication quality figures in a variety of hardcopy formats
Link: https://matplotlib.org/
49
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Seaborn:
based on matplotlib
Link: https://seaborn.pydata.org/
50
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
51
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Note: The above command has many optional arguments to fine-tune the data import process.
52
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Out[3]:
53
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Hands-on exercises
Can you guess how to view the last few records; Hint:
54
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
55
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Out[4]: dtype('int64')
df.attribute description
dtypes list the types of the columns
57
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Hands-on exercises
58
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
df.method() description
head( [n] ), tail( [n] ) first/last n rows
Hands-on exercises
What are the mean values of the first 50 records in the dataset? Hint: use
head() method to subset the first 50 records and then calculate the mean
60
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Note: there is an attribute rank for pandas data frames, so to select a column with a name
"rank" we should use method 1.
61
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Hands-on exercises
Find how many values in the salary column (use count method);
62
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
In [ ]: #Calculate mean value for each numeric column per each group
df_rank.mean()
63
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Once groupby object is create we can calculate various statistics for each group:
Note: If single brackets are used to specify the column (e.g. salary), then the output is Pandas Series object.
When double brackets are used the output is a Data Frame 64
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
65
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
67
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
When we need to select more than one column and/or make the output to be a
DataFrame, we should use double brackets:
In [ ]: #Select column salary:
df[['rank','salary']]
68
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Notice that the first row has a position 0, and the last value in the range is omitted:
So for 0:10 range the first 10 rows are returned with the positions starting with 0
and ending with 9
69
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Out[ ]:
70
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Out[ ]:
71
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
72
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
In [ ]: # Create a new data frame from the original sorted by the column Salary
df_sorted = df.sort_values( by ='service')
df_sorted.head()
Out[ ]:
73
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Out[ ]:
74
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Missing Values
Missing values are marked as NaN
In [ ]: # Read a dataset with missing values
flights = pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/flights.csv")
Out[ ]:
75
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Missing Values
There are a number of methods to deal with missing values in the data frame:
df.method() description
dropna() Drop missing observations
Missing Values
• When summing the data, missing values will be treated as zero
• If all values are missing, the sum will be equal to NaN
• cumsum() and cumprod() methods ignore missing values but preserve them in
the resulting arrays
• Missing values in GroupBy method are excluded (just like in R)
• Many descriptive statistics methods have skipna option to control if missing
data should be excluded . This value is set to True by default (unlike R)
77
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
min, max
count, sum, prod
mean, median, mode, mad
std, var
78
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Out[ ]:
79
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
kurt kurtosis
80
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
In [ ]: %matplotlib inline
81
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
Graphics
description
distplot histogram
pairplot Pairplot
boxplot boxplot
82
digitalent.kominfo.go.id
LOGO
UNIV/POLTEK
The first one is mostly used for regular analysis using R style formulas, while scikit-learn is
more tailored for Machine Learning.
statsmodels:
• linear regressions
• ANOVA tests
• hypothesis testings
• many more ...
scikit-learn:
• kmeans
• support vector machines
• random forests
• many more ...
digitalent.kominfo
digitalent.kominfo
DTS_kominfo
Digital Talent Scholarship 2019
digitalent.kominfo.go.id 84
digitalent.kominfo.go.id