Apply Python for Machine Learning
Apply Python for Machine Learning
1|Page
Learning outcome 1: Prepare python environment
✔ Definition
What is Python
Python is a general-purpose interpreted, interactive, object-oriented and high-level scripting
programming language .
Python is interpreted, meaning the code is executed line by line, making it faster for development.
Python is a general-purpose language, meaning it can be used to create a variety of different
applications. Python has libraries for data loading, visualization, statistics, natural language
processing, image processing, and more. One of the main advantages of using Python is the ability
to interact directly with the code, using a terminal or other tools like the Jupyter Notebook. Widely
used among the data science community.
Python is a high-level, interpreted programming language known for its simplicity and
readability. It was created by Guido van Rossum and released in 1991. Python's design
philosophy emphasizes code readability, and it allows programmers to express concepts in
fewer lines of code compared to other languages.
2|Page
What is a Machine Learning?
Machine learning is a field of artificial intelligence (AI) that focuses on building algorithms that
can learn from data and improve their performance on specific tasks over time. This learning
process involves extracting knowledge (patterns) from data, and using that knowledge to make
predictions or decisions. Machine learning is about extracting knowledge from data.
In other words, ML algorithms have the ability to automatically analyze and interpret data,
recognize patterns, and make informed predictions or take actions based on the patterns they
discover.
ML algorithms are designed to identify and learn patterns or relationships within data, and they
use this knowledge to make accurate predictions or decisions on new, unseen data.
4. Wide Applicability
3|Page
Another essential feature of this language is that it is widely applicable. Engineers, scientists, and
mathematicians broadly use it.
6. Large Community
The Python community provides rapid support to users. If you face any difficulty during Python
development, community members are always up to help you and solve your queries.
Some experts give Python the moniker "ready-to-run language" because it only requires simple
code to be executed. The language makes writing and testing code much more comfortable.
7. Asynchronous Coding
Asynchronous coding uses a single event loop to complete a job in small intervals. Python
simplifies writing asynchronous code. It doesn't require complex research contentions, deadlocks,
or any other complexity.
9. Portable
Being a portable language, developers do not need to tweak Python code to make it run on
platforms other than the one it is written on. The language follows the “Write Once, Run Anywhere
(WORA)” feature. The only thing to remember is not to incorporate any system-dependent
features.
4|Page
10. Enterprise Application Integration
Python is the best choice for Enterprise Application Integration (EAI). It simplifies web
application development, invoking CORBA or COM components, and direct calling to and from
Java/C++/C.
✔ Characteristics of Python
1. Easy to code
Python is a high-level programming language. Python is very easy to learn the language as
compared to other languages like C, C#, Javascript, Java, etc. It is very easy to code in python
language.
3. Object-Oriented Language
One of the key features of python is Object-Oriented programming. Python supports object-
oriented language and concepts of classes, objects encapsulation, etc.
6. Extensible feature:
5|Page
Python is an Extensible language. We can write some Python code into C or C++ language and
also we can compile that code in C/C++ language.
7. Portable language
Python language is also a portable language. For example, if we have python code for windows
and if we want to run this code on other platforms such as Linux, Unix, and Mac then we do not
need to change it, we can run this code on any platform.
8. Integrated language
Python is also an Integrated language because we can easily integrate python with other languages
like C, C++, etc.
9. Interpreted Language
Python is an Interpreted Language because Python code is executed line by line at a time. Like
other languages C, C++, Java, etc., there is no need to compile python code. This makes it easier
to debug our code. The source code of python is converted into an immediate form called byte
code.
10. Large Standard Library
Python has a large standard library that provides a rich set of modules and functions, so you do not
have to write your own code for every single thing. There are many libraries present in Python,
such as regular expressions, unit testing, web browsers, etc.
6|Page
2. Software development
Python is just the perfect option for software development. Popular applications like Google,
Netflix, and Reddit all use Python. This language offers amazing features like:
Platform independence
Inbuilt libraries and frameworks to provide ease of development.
Enhanced code reusability and readability
High compatibility
3. Automation
You can use Python to automate various repetitive tasks to save time at work and at home. For
example, you can create a script to search the internet for news headlines related to a project you're
working on. Instead of manually visiting each news website to search for the articles, the program
can send you a list with links. To save even more time, you can write a script to sort the links into
categories for easy reference.
4. Data analytics
Data analysts can also use Python libraries to structure large datasets and make mathematical
operations more manageable. Pandas, a Python library, offers a data structure called a data frame
to work with large tables of data effectively.
5. Web Development
It is one of the most astonishing applications of Python. This is because Python comes up with a
wide range of frameworks like Django, Flask, Bottle, and a lot more that provide ease to
developers. Furthermore, Python has inbuilt libraries and tools which make the web development
process completely effortless. The use of Python for web development also offers:
Amazing visualization
Convenience in development
Enhanced security
Fast development process
6. Machine Learning and Artificial Intelligence
7|Page
Machine Learning and Artificial Intelligence are the hottest subjects right now. Python along with
its inbuilt libraries and tools facilitate the development of AI and ML algorithms. Further, it offers
simple, concise, and readable code which makes it easier for developers to write complex
algorithms and provide a versatile flow. Some of the inbuilt libraries and tools that enhance AI and
ML processes are:
Numpy for complex data analysis
Keras for Machine learning
SciPy for technical computing
Seaborn for data visualization
7. Game Development
With the rapidly growing gaming industry, Python has proved to be an exceptional option for game
development. Popular games like Pirates of the Caribbean, Bridge Commander, and Battlefield 2
use Python programming for a wide range of functionalities and add-ons. The presence of popular
2D and 3D gaming libraries like pygame, panda3D, and Cocos2D makes the game development
process completely effortless.
Development tools help us to build fast and reliable Python solutions. It includes Integrated
Development Environment (IDE), Python package manager, and productive extensions. These
tools have made it easy to test the software, debug, and deploy solutions in production.
8|Page
1. Jupyter Notebook
Jupyter Notebook is a web-based IDE for experimenting with code and displaying the results. It
is fairly popular among data scientists and machine learning practitioners. It allows them to run
and test small sets of code and view results instead of running the whole file.
2. Pip
Pip is a tool that uses Python Package Index to install and manage Python software. The Python
ecosystem works on it. Package Managers (pip) is the standard package manager for installing and
managing Python libraries.
Visual Studio Code is free, lightweight, and a powerful code editor. You can build, test, deploy,
and maintain all types of applications without leaving the software window. It comes with syntax
highlighting, code auto-completing, language, Git, and in-line debug support. You can use
extensions to pre-build systems and deploy applications to the cloud.
VSCode is the most popular IDE in the world, and its popularity is mainly due to free extensions
that improve user experience. The extensions allow data scientists to run experiments on the
Jupyter notebook, edit markdown files, integrate SQL server, collaborate on projects, auto
complete code, and in-line code help. Instead of using multiple software, you can use extensions
and run everything from VSCode software like bash terminal.
4. PyCharm
PyCharm is one of the most popular integrated development environments (IDEs) for Python
programming. It is developed by JetBrains and is designed to provide a comprehensive
environment for Python development, offering a wide range of features that help in writing, testing,
and debugging code.
IDE/Code Editors: Popular choices include PyCharm, VS Code, and Jupyter Notebook.
9|Page
Python Data Analysis Tools
Data analysis tools allow users to ingest, clean, and manipulate data for statistical analysis. Every
data professional must understand the core functionality of these tools to perform data analysis,
machine learning, data engineering, and business intelligence tasks.
1. pandas
pandas is a gateway into the world of data science. The first thing you learn as a beginner is to
load a CSV file using read_csv(). pandas is an essential tool for all data professionals.
You can load a dataset, clean it, manipulate it, calculate statistics, create visualizations, and save
the data into various file formats. The pandas API is simple and intuitive. You can load and save
CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 file format.
2. Numpy
NumPy is a fundamental Python package for scientific computations, and most modern tools are
built upon it. As a data scientist, you will use the Numpy array for mathematical calculations and
data wrangling. It provides multidimensional array objects to perform fast operations such as
logical, shape manipulation, sorting, selection, basic statics operation, and random simulation.
Numpy will help you understand the fundamentals of mathematics in data science and how to
convert complex equations into Python code. You can use it to create machine learning models,
customized statical formulas, scientific simulations, and perform advanced data analytics tasks.
3. SQLAlchemy
SQLAlchemy is a Python SQL toolkit for you to access and manage relational databases. It uses
Object Relational Mapper to provide powerful features and flexibility of SQL.
10 | P a g e
This tool is necessary for data scientists and analytics who are used to perform data processing and
analytics in Python. You can either use SQL scripts to perform data analysis or use an object-based
approach where you can use an intuitive Python API to perform similar tasks in effective ways.
4. Dask
Dask is an essential tool for processing big data or files. It uses parallel computing to perform
similar tasks by libraries like NumPy, pandas, and scikit-learn.
Running a simple logical function on a large dataset of 4GB will take at least 10 minutes. Even
with better machines, you cannot improve processing times to a few seconds. Dask uses dynamic
task scheduling and parallel data collection to achieve fast results with the same machine.
Data visualization gives life to data analysis. If you want to explain things to non-technical
executives, you need to tell a data story by displaying a bar chart, line plot, scatter plot, heat maps,
and histograms. The visualization tools help data analytics create interactive, colorful, and clean
visualization with few lines of code.
1. Matplotlib
Matplotlib is a gateway to the world of data visualization. You will learn about it in many data
visualization introductions. With Matplotlib, you can create fully customizable static, animated,
and interactive visualizations. It’s intuitive, and you can use it to plot 3D, multilevel, and detailed
visualization.
11 | P a g e
2. Seaborn
Seaborn is a high-level interface based on Matplotlib for creating attractive statistical graphics.
Similar to Matplotlib, you can produce interactive visualization by typing a single line of code.
It is highly adaptable and works wonders when you are new to data visualization. For customizing,
you can always use matplotlib to create multiple graphs, edit axis, title, or even colors.
3. Plotly
Plotly is the best tool to create data analytics Jupyter-based reports. Instead of creating multiple
static plots, you can make one and add custom controls to explore and explain data insights. It
comes with custom controls and allows you to animate your visualizations and work on data
transformation. Plotly also contains Jupyter widgets, 3D charts, AI charts, financial charts, and
scientific charts.
4. Pandas-profiling
Pandas-profiling is an AutoEDA tool for creating exploratory data analytics reports with a single
line of code. The report includes column types, missing values, unique values, quantile statistics,
descriptive statistics, histogram, correlation, text analysis, and file and image analysis.
It is quite a helpful tool when you have less time to explore. For example, during technical tests,
preparation for team meetings, and participating in the competition.
Machine learning tools are used for data processing, data augmentation, and building, training, and
validation of machine learning models. These tools provide a complete ecosystem to perform any
task from image classification to times series forecasting.
12 | P a g e
1. Scikit-learn
Scikit-learn is an open-source tool for performing predictive analysis. It is built on Numpy, Scipy,
and matplotlib. Scikit-learn has made machine learning accessible to everyone. It is beginner
friendly, and the interface is designed to match the needs of professionals.
2. Keras
Keras is a deep learning framework for processing unstructured data and training it on neural
networks. It is built on top of TensorFlow 2 to provide GPU and TPU acceleration. With Keras,
you can deploy your models on the server, browser, android, and embedded systems.
Keras API offers you a model interface, neural network layers, callbacks API, optimizers, metrics,
data loaders, pre-trained models, model tuning, and API for computer vision and natural language
processing. The interface is simple, fast, and powerful. It is beginner friendly and a gateway to the
world of deep neural networks.
3. PyTorch
PyTorch is an open-source deep learning framework for researchers and machine learning
practitioners. It provides a more direct debugging experience than Keras, while allowing you to
create your custom trainer, loss function, and metrics. The key features of PyTorch are model
serving and production support, distributed training, a robust ecosystem, and cloud support.
PyTorch provides dedicated support for NLP, computer vision, audio, and tabular data. With a few
lines of code, you can load pre-trained models and finetune them on a new but similar dataset. It
is the future of deep learning applications, and modern machine learning research is driven by the
Torch ecosystem.
13 | P a g e
4. OpenCV
OpenCV is a computer vision framework for developing real-time applications. You can use it to
process images, visualize them with labels and segmentation, augment images and videos for
improving machine learning performance, and view real-time results with labels. It is an essential
tool for performing image processing and training deep learning models for computer vision tasks.
Summary:
Processing Power - The minimum and recommended CPU specifications, such as the
number of cores and clock speed. High-performance applications, such as video editing
tools, require powerful processors to handle complex computations efficiently.
Memory - The amount of RAM required for the software to run efficiently, with
specifications for both minimum and recommended memory. More RAM allows the
software to handle larger datasets and multitask more effectively.
14 | P a g e
Storage - The amount of disk space needed for installation and operation, often including
both minimum and recommended storage capacities. Storage requirements also consider
the type of storage, such as SSDs for faster read/write speeds compared to traditional
HDDs.
Display Adapter - Specifications for the graphics card, including GPU model, VRAM,
and supported features like DirectX or OpenGL versions. For gaming and graphic design
software, a robust GPU is essential to render graphics smoothly.
2. Software Requirements
Software requirements define the necessary software environment for the application to
function properly. These include:
Operating System - The compatible operating systems (e.g., Windows, macOS, Linux)
and specific versions required. Different operating systems have different capabilities
and limitations, affecting software performance and compatibility.
15 | P a g e
Drivers - Necessary drivers for hardware components, particularly for graphics cards
and other peripherals. Updated drivers ensure that hardware components function
correctly and efficiently with the software.
Web Browser - Specific web browser versions needed if the application relies on web-
based technologies or components. Web-based applications might require modern
browsers that support the latest web standards.
16 | P a g e
Instructions to install Python 3
1. Operating system: Linux- Ubuntu 16.04 to 17.10, or Windows 7 to 10, with 2GB RAM
(4GB preferable)
2. You have to install Python 3.12 and related packages, please follow the installation
instructions given below as per your operating system.
17 | P a g e
4. Click on the located installer file to download.
5. After download completes, double click on the installer file to start the installation
procedure.
6. Follow the instructions as per the installer
Important Note: After double clicking the installer, check mark the option “Add Python 3.12
to PATH”
18 | P a g e
3. Check python interpreter
Launch the Python interpreter by typing python or python3 in the terminal. If the
interpreter starts, the installation is successful.
1. Data Types
Data types in Python define the kind of data that can be stored and manipulated within a program.
Python supports several built-in data types, including:
Numeric Types:
complex: Represents complex numbers with a real and imaginary part (e.g., 3+4j).
Text Type:
19 | P a g e
Sequence Types:
list: Ordered, mutable collection of items (e.g., [1, 2, 3], ["apple", "banana",
"cherry"]).
tuple: Ordered, immutable collection of items (e.g., (1, 2, 3), ("apple", "banana",
"cherry")).
range: Represents a sequence of numbers, typically used in loops (e.g., range(5) which
produces 0, 1, 2, 3, 4).
Mapping Type:
dict: Unordered, mutable collection of key-value pairs (e.g., {"name": "Alice", "age":
25}).
Set Types:
set: Unordered collection of unique items (e.g., {1, 2, 3}, {"apple", "banana",
"cherry"}).
2. Variables
Variables in Python are used to store data that can be used and manipulated later in the program.
Python is dynamically typed, meaning you don't need to declare a variable's type explicitly.
Example
x = 5 # int
y = 3.14 # float
name = "Alice" # str
20 | P a g e
Naming Rules: Variable names should start with a letter or an underscore, followed by
letters, digits, or underscores. They are case-sensitive (e.g., Name and name are different).
3. Comments
Comments are lines in the code that are ignored by the Python interpreter. They are used to explain
the code, making it easier to understand.
Example
# This is a comment
x = 5 # Assigning 5 to variable x
Multi-line Comment: Use triple quotes (''' or """), although these are technically multi-
line strings not assigned to any variable.
Example
"""
This is a multi-line comment that spans several lines.
"""
4. Operators
Operators in Python are symbols that perform operations on variables and values. The main types
include:
Arithmetic Operators:
+ : Addition
- : Subtraction
* : Multiplication
/ : Division
% : Modulus (remainder)
** : Exponentiation (power)
21 | P a g e
// : Floor division
Example
a = 10
b = 3
print(a + b) # Output: 13
print(a % b) # Output: 1
= = : Equal to
!= : Not equal to
> : Greater than
< : Less than
> = : Greater than or equal to
< = : Less than or equal to
Example
x = 5
y = 10
print(x == y) # Output: False
print(x < y) # Output: True
Example
x = True
y = False
print(x and y) # Output: False
print(x or y) # Output: True
22 | P a g e
= : Basic assignment
+= : Add and assign
-= : Subtract and assign
*= : Multiply and assign
/= : Divide and assign
%= : Modulus
**= : Exponentiation
//= Floor division
Example
x = 5
x += 3 # Equivalent to x = x + 3
print(x) # Output: 8
& : AND
| : OR
^ : XOR
~ : NOT
<< : Left shift
>> : Right shift
Example
a = 10 # Binary: 1010
b = 4 # Binary: 0100
print(a & b) # Output: 0 (Binary: 0000)
print(a | b) # Output: 14 (Binary: 1110)
Examples on Python basic concepts: Data Types, Variables, Comments, and Operators.
Ex1
23 | P a g e
Examples on how to use Data Types
Python has several built-in data types such as integers, floats, strings, lists, tuples, dictionaries
Ex1
# Integer num = 5
print(type(num)) # Output: <class 'int'>
# Float
pi = 3.14
print(type(pi)) # Output: <class 'float'>
# String
name = "Alice"
print(type(name)) #Output: <class 'str'>
# Boolean
is_sunny = True
print(type(is_sunny)) #Output: <class 'bool'>
Ex2
# Integer
age = 25
print(type(age)) #Output: <class 'int'>
24 | P a g e
In Python, lists and tuples are both used to store collections of items, but they differ in several
important ways:
1. Mutability
List: A list is mutable, meaning that its elements can be modified after the list has been
created. You can add, remove, or change items in a list.
Tuple: A tuple is immutable, meaning that once a tuple is created, you cannot change,
add, or remove items from it.
Example:
# List
my_list = [1, 2, 3]
my_list[0] = 10 # You can change elements in a list
print(my_list) # Output: [10, 2, 3]
# Tuple
my_tuple = (1, 2, 3)
my_tuple[0] = 10 # This will raise an error because tuples cannot be
changed
2. Syntax
Example:
# List
my_list = [1, 2, 3]
# Tuple
my_tuple = (1, 2, 3)
25 | P a g e
3. Performance
List: Because lists are mutable, they take slightly more memory and operations like
appending or modifying items are slower compared to tuples.
Tuple: Tuples are generally faster and more memory-efficient than lists because they are
immutable.
4. Use Case
List: Use lists when you need a dynamic collection that can change over time. Lists are
useful for scenarios where you need to append, delete, or update elements.
Tuple: Use tuples when you want to ensure that the data remains constant and should not
be changed. Tuples are often used for fixed data structures like coordinates, dates, etc.
Example:
5. Methods
List: Lists have many built-in methods for manipulation, such as append(), remove(),
sort(), and reverse().
Tuple: Tuples have very few methods since they are immutable. The most common
methods are count() and index().
Example:
# List methods
my_list = [1, 2, 3]
my_list.append(4) # Adds 4 to the list
print(my_list) # Output: [1, 2, 3, 4]
# Tuple methods
my_tuple = (1, 2, 3, 1)
print(my_tuple.count(1)) # Output: 2, counts the occurrences of 1
print(my_tuple.index(2)) # Output: 1, returns the index of the element
2
26 | P a g e
6. Memory Usage
Both lists and tuples support packing (creating collections) and unpacking (assigning values to
variables).
Example:
Summary:
Feature List Tuple
Mutability Mutable (can be changed) Immutable (cannot be changed)
Syntax [] (square brackets) () (parentheses)
Performance Slower, more memory usage Faster, less memory usage
Methods Many methods for manipulation Fewer methods, mostly for access
Use Cases Dynamic collections Static or fixed data
Variables are used to store data values. You can assign a value to a variable and change it later.
EX1
# Assigning variables
age = 25
city = "Kigali"
# Printing variables
print("Age:", age) # Output: Age: 25
print("City:", city) # Output: City: Kigali
# Updating a variable
age = 26
print("Updated Age:", age) # Output: Updated Age: 26
27 | P a g e
EX2
# Assigning values to variables
x = 10 # integer variable
y = 20.5 # float variable
name = "Python" # string variable
# Using variables
sum_xy = x + y
print(sum_xy) # Output: 30.5
Comments are used to explain the code and make it easier to understand. Python ignores comments
during execution.
EX1
"""
This is a multi-line comment
explaining the following code.
"""
weight = 70 # Weight in kilograms
EX2
'''
This is a
multi-line comment
'''
28 | P a g e
Examples on how to use Operators
Operators are used to perform operations on variables and values. Python supports arithmetic,
comparison, logical, and assignment operators.
a) Arithmetic Operators
x = 10
y = 3
b) Comparison Operators
a = 5
b = 8
c) Logical Operators
m = True
n = False
d) Assignment Operators
z = 10
z += 5 # Same as z = z + 5
print(z) # Output: 15
z *= 2 # Same as z = z * 2
print(z) # Output: 30
29 | P a g e
Python Control Structures
Control structures in Python allow developers to manage the flow of the program, deciding what
sections of code are executed and how often. The main control structures include conditional
statements, looping statements, and jump statements.
Conditional Statements
Conditional statements control the flow of the program based on conditions. The primary
conditional statements in Python include if, else, and elif.
Syntax:
if condition:
# Code to execute if the condition is true
elif another_condition:
# Code to execute if the previous condition is false and this condition
is true
else:
# Code to execute if none of the conditions are true
Example1:
x = 10
if x > 5:
print("x is greater than 5")
elif x == 5:
print("x is equal to 5")
else:
print("x is less than 5")
Example2:
age = 20
if age > 18:
if age < 21:
print("You are an adult but not old enough to drink in some
countries.")
else:
print("You are old enough to drink.")
else:
print("You are a minor.")
30 | P a g e
Example3:
x = 10
message = "x is positive" if x > 0 else "x is negative or zero"
print(message)
Looping Statements
Looping statements are used to repeat a block of code multiple times until a certain condition is
met.
a) For Loop
A for loop iterates over a sequence (like a list, tuple, dictionary, set, or string) and executes a block
of code for each element in the sequence.
A list is a collection of items that is ordered and changeable. Here’s how a for loop can iterate
over a list.
Example:
# List of fruits
fruits = ["apple", "banana", "cherry"]
Output:
apple
banana
cherry
A tuple is similar to a list, but it is immutable (you cannot modify its elements). A for loop works
the same way as with a list.
31 | P a g e
Example:
# Tuple of numbers
numbers = (1, 2, 3, 4)
Output:
1
2
3
4
In a dictionary, each item is a key-value pair. You can iterate over keys, values, or both.
Example:
# Dictionary of student grades
grades = {"John": 85, "Jane": 90, "Doe": 78}
Output:
John
Jane
Doe
Example:
# Iterating over the dictionary values
for grade in grades.values():
print(grade)
Output:
85
90
78
32 | P a g e
Iterating over both keys and values:
Example:
# Iterating over keys and values
for student, grade in grades.items():
print(f"{student}: {grade}")
Output:
John: 85
Jane: 90
Doe: 78
A set is an unordered collection of unique items. Even though sets do not preserve order, a for loop
can iterate over the elements.
Example:
# Set of colors
colors = {"red", "green", "blue"}
red
green
blue
A string is a sequence of characters. You can loop through each character in a string.
Example:
# String
word = "Python"
Output:
33 | P a g e
P
y
t
h
o
n
b) While Loop
Syntax:
while condition:
# Code to execute as long as the condition is true
Example:
i = 1
while i < 5:
print(i)
i += 1
Jump statements allow you to control the flow of loops or conditional statements. Python
provides three jump statements: break, continue, and pass.
a) Break Statement
The break statement is used to exit a loop before it has looped through all the items.
Syntax:
for item in sequence:
if condition:
break
Example:
for i in range(10):
if i == 5:
break
print(i)
Output:
0
1
2
3
4
34 | P a g e
b) Continue Statement
The continue statement skips the current iteration of a loop and proceeds to the next iteration.
Syntax:
for item in sequence:
if condition:
continue
Example:
for i in range(5):
if i == 2:
continue
print(i)
Output:
0
1
3
4
C) Pass Statement
The pass statement is used as a placeholder. It doesn’t execute any code but allows you to write
empty code blocks that don’t cause errors.
Syntax:
if condition:
pass # Nothing happens here
Example:
python
Copy code
for i in range(5):
if i == 2:
pass
else:
print(i)
Output:
0
1
3
4
35 | P a g e
Summary:
36 | P a g e