Data Analysis Bootcamp Week 1 Study Note
Data Analysis Bootcamp Week 1 Study Note
with Python. These weekly hands-on sessions will run from Sep
17-Oct 15, 2024, Tuesdays, 11 am EST (or 4 pm WAT, 5 pm SAST, or 6
pm EAT). Sign up here: https://chi2.io/Wp8g9BRo
Introduction to Python
Python is one of the most widely used programming languages globally,
known for its simplicity and versatility. Developed in the early 1990s, Python
can be applied to various domains, from automating simple tasks to building
complex machine learning models. It’s especially favored by researchers, data
scientists, and engineers due to its easy-to-read syntax and an extensive array
of open-source packages that make development more efficient.
Library: A library is like your toolbox full of handy gadgets for building or
maintaining the house. Need a hammer to hang a picture or a screwdriver to
fix a door? You don’t have to make these tools yourself—they’re ready to go
when you need them for specific tasks. You can think of libraries in
programming as similar to Excel functions like SUM(), VLOOKUP(), IF(), which
are all pre-built tools that you can use to perform specific tasks in your
spreadsheet without needing to write complex formulas yourself. Similarly,
libraries in programming are collections of pre-written code (functions,
classes, etc.) that you can use to perform specific tasks in your program
without writing everything from scratch.
Python as an object-oriented
language
Learning Python is like learning a regular language. You have to learn the
nouns (names of person, place or thing), the verbs (action words), and
punctuation (syntax).
The way nouns and verbs are used in English is not very different from how
we use them in programming. Refresh your memory of nouns and verbs with
the images below:
Similarly, everything in Python can be roughly classified as either a “noun”
(the thing that receives the action), or a verb (the thing doing the action).
We call the nouns “objects”, while the verbs are called “functions”.
Types of variables
There are 6 different types of variables we need to be aware of before we can
start writing code.
#4 Mapping (dict)
A dictionary is like a real dictionary where you have words (keys) and their
meanings (values). In Python, you match things, like matching a name to a
phone number. You can also think of it as matching a key to a keyhole. Since
the curly bracket { } looks like a keyhole, it is used to hold the values of a
key-value pair.
Think of a dictionary as a collection of labeled boxes. Each box (or key) has a
label, and inside each box is the value associated with that label. For example:
● "name": "John" means the box labeled name contains the value John.
● "age": 30 means the box labeled age contains the value 30.
● "city": "New York" means the box labeled city contains the value New
York.
In the format key: value, the key (like "name", "age", or "city") always comes
before the colon, telling you what kind of information is inside the box, and
the value (like "John", 30, or "New York") comes after the colon, which is the
actual information inside the box.
And it’s all stored inside curly braces {}, like a neat little collection of labeled
boxes that you can look inside whenever you need that information.
So, for the example person = {"name": "John", "age": 30, "city": "New York"}, it’s
like saying, "I have a person, and here are three labeled boxes: one for their
name, one for their age, and one for their city."
#5 Boolean
It’s like flipping a light switch: it’s either on (True) or off (False)—just two
choices!
#6 Set
A set is like a messy, scattered room where toys and objects are randomly
placed everywhere. Since the items are not in any particular order, you can’t
say, "Give me the fourth item," because there’s no fixed order like in a tidy line.
In Python, this is why we can't use indexing with sets, unlike with lists where
items are neatly arranged, and you can easily ask for the first or fourth item.
Sets only care about having unique things, not where they are!
Think of a set in Python like a Venn diagram. A Venn diagram shows groups of
unique items, and each group is inside its own circle. The things inside the
circles are all unique, just like in a Python set where no item repeats.
Now, the curly brackets { } are like the borders around these groups in Python.
They tell us that everything inside belongs to the same set, just like how the
circle in a Venn diagram contains all the items in a group.
So, whenever you see { }, picture it like a circle around a group of unique
things, just like in a Venn diagram!
Python Installation Guide
python --version
If Python is not installed, you can easily download it from the official Python
website.
python --version
● You should see the version of Python that was installed displayed on
the screen.
Quick Guide to Download and Install Anaconda
Even after installing Python, you may want to install Anaconda because it
simplifies working with data science tools. Anaconda comes with many
pre-installed libraries like NumPy, pandas, and Jupyter, which you’d otherwise
have to install separately. With Anaconda, you get a complete setup for data
science, including popular tools like Jupyter Notebook and Spyder, all in one
simple installation. This makes it a great choice for beginners looking for an
all-in-one solution. This is like buying a toolkit that already has all the tools you
need, instead of buying each tool one by one.
● After installation, open the setup, run the Installer and simply follow the
installation prompts.
#3 Comments
In Python, you can add comments to your code for documentation purposes.
Comments help explain your code to others (or yourself) when you revisit it
later.
A comment starts with a #, and everything after the # on that line will be
ignored by Python.
In programming, data types define the kind of data that a variable can hold.
For example:
age = 30
Here, the integer value 30 is assigned to the variable age, making its data type
int.
int, float
Numeric Holds numbers (integers, decimals)
list, tuple
Sequenc Contains ordered collections of
e items
Mappin dict Stores key-value pairs
g
Python’s numeric data types include integers, floats (decimals), and complex
numbers:
You can check the data type of a variable using the type() function:
a = 10
b = 3.14
A list is an ordered collection of items that can hold different data types. Items
in a list are enclosed in square brackets [ ] and separated by commas:
Lists are mutable, meaning you can modify their elements after they are
created.
Tuples are similar to lists but are immutable—once defined, their elements
cannot be changed. They are defined using parentheses ():
Like lists, tuple elements can also be accessed using their index:
Strings are sequences of characters enclosed within single (') or double (")
quotes:
name = "Alice"
unique_numbers = {1, 2, 3, 4, 4}
Sets do not allow duplicate values, and their elements cannot be accessed via
indexing.
Dictionaries store data in key-value pairs. Keys are unique, and values can be
of any data type. They are defined using curly braces {}:
Python automatically converts one data type to another when it makes sense
to do so. This process is known as implicit conversion. For example, if you add
an integer and a float, Python will automatically convert the integer to a float
to perform the operation.
Example:
x=5 # int
y = 2.5 # float
Explicit conversion or casting is when you manually convert one data type to
another using built-in functions like int(), float(), str(), etc.
Example:
a = 7.8
print(b) # Output: 7
num = 10
● The float value 7.8 is explicitly converted to an int, which results in 7 (the
decimal part is truncated).
● The integer 10 is converted to a string '10'.
For example, if you use input("Please enter your age: "), it will display
"Please enter your age:" and wait for the user to type their age.
In real life, this function is useful whenever you need to get information from
someone using a computer program. For instance, a program might ask for a
user's name, age, or any other details. The input function helps make
interactive programs that can respond to what the user provides.
When accepting input from a user, Python reads the input as a string by
default. To work with numerical data, you need to explicitly convert the input
to the desired data type.
Duplica Access
Collection Ordered Mutable tes Type Use cases
You can also convert between different collection types like lists, tuples, and
sets.
fruits_tuple = tuple(fruits)
💡Proof of proper conversion: We should be able to modify the list “fruit” since
a list is modifiable. However, attempting to modify the tuple “fruits_tuple”
should give us an error as a tuple is not modifiable:
So the operation for the list was a success as we were able to add “mango” to the end.
Now, let’s convert the list to a tuple and attempt doing the exact same thing. It should give us an
error.
num_set = {1, 2, 3}
num_list = list(num_set)
Now, we are able to successfully index the first item because of having successfully converted
the set to a list.
By using implicit and explicit type conversions, you can ensure that your code
handles different types of data efficiently and avoids type errors in Python.
Python List Operations and
Methods Explained as Arrays or
Vectors
A list is a versatile and mutable collection that can store multiple values, including different data
types, just like an array in other programming languages. Python lists allow indexing, slicing,
and various operations like adding or removing elements.
Key Concepts:
Example:
a = 10
b=3
print(a + b) # Output: 13
Example:
x=5
y = 10
print(x == y) # Output: False
Example:
a=5
b = 10
Example:
x=5
x += 3 # Same as: x = x + 3
print(x) # Output: 8
Basic if Statement:
age = 18
if age >= 18:
if/else Statement:
age = 16
else:
if/elif/else:
age = 17
else:
y = 10
else:
Summary:
elif allows you to handle multiple conditions more flexibly, while else
provides a default action when no other conditions apply.
Python Loops
Loops in Python allow you to execute a block of code repeatedly. There are
two main types of loops: “for” and “while”. Here is how to remember each:
print(fruit)
for i in range(5):
print(i) # Output: 0, 1, 2, 3, 4
count = 0
print(count)
count += 1
Both for and while loops allow you to repeat a block of code multiple times.
💡Key structural difference between for and while loops and the implications
for infinite loop
When using a while loop, you must manually start it (initialize) and manually
end it (update with +1). Here is the importance of each:
Once your code gets into an infinite loop, the only way to stop it is by
using brute force because the loop keeps running forever since the
condition is staying True forever, causing the infinite loop. In the above
example, this loop will keep printing 0 forever because the value of
count is never changed, and count < 5 will always remain True. This is
why we change the value of the loop variable inside the loop. In
contrast, a for loop does not need this because it
automatically updates the loop variable:
In R, writing 0:5 generates a sequence that includes both the start and end
values, producing 0, 1, 2, 3, 4, 5. This behavior is different from Python,
where the range stops just before the end value.
What is a Function in
Python?
A function is a reusable block of code that only runs when it’s called by its
name. You can send an input (known as parameters) to the function, and it
can return an output (result) back to you.
Creating a Function:
To create a function in Python, use the def keyword, followed by the function
name and parentheses. Here’s an example:
def my_function():
Calling a Function:
Arguments in Functions:
You can pass information (called arguments) into a function to customize its
behavior. The arguments are placed inside the parentheses when you define
and call the function. For example:
def greet(name):
💡HINT: Note that the second line of the function (the print statement)
must be indented. Indentation is how Python identifies blocks of code that
belong together, such as the body of a function. Without proper indentation,
Python will raise an error. By indenting the print statement, Python
recognizes it as part of the greet function, and generates the needed output.
If the function has multiple lines, all lines within the function's body must
be indented. Each line of code that is part of the function must be indented
consistently. When calling the function however, you should not indent it.
Number of Arguments:
return a + b
Homework
Assignment 1: Looping Over a List
Objective: Write a Python program to loop over a list of country names and
print a sentence indicating the number of letters in each country name.
Instructions:
Replace {country} with the country name and {len(country)} with the
number of letters in the country name.
Example Output:
Instructions:
Parameters:
Error Handling:
● If the from_unit or to_unit is not in this list ['km', 'm', 'cm'], raise a
ValueError with the message: "Invalid units. Supported units are 'km',
'm', and 'cm'."
Return Value: