Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
1 views

Week 6, Python and excel, pandas read and write

The document covers essential Python data types, including single value types (integers, floats, strings, booleans) and collection types (lists, tuples, sets, dictionaries, ranges). It emphasizes the importance of self-directed learning in programming and provides practical tasks related to data manipulation using the Pandas library. Additionally, it encourages students to utilize online resources and AI tools for further learning and understanding.

Uploaded by

charlottelauyee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Week 6, Python and excel, pandas read and write

The document covers essential Python data types, including single value types (integers, floats, strings, booleans) and collection types (lists, tuples, sets, dictionaries, ranges). It emphasizes the importance of self-directed learning in programming and provides practical tasks related to data manipulation using the Pandas library. Additionally, it encourages students to utilize online resources and AI tools for further learning and understanding.

Uploaded by

charlottelauyee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Data Analytics and Visualisation for Public Affairs, PIA 3608

Week 6

Dr. Li Ruozhu
City University of Hong Kong
ruozhuli@cityu.edu.hk
Data Type
• Variables can store data of different types, and different types can do
different things.
• Python has the following data types built-in by default, in these categories:

• 9 of them are the most common, and they're the ones we need to remember and use
in this course.
Data Type
• 9 most commonly used data types:

• For single value:


• Integers
• Float
• String
• Boolean

• For a set of values (collection data types):


• List
• Tuple
• Set
• Dictionary
• Range
Data Type- For single value
• Integers
• no limit to how long an integer value can be. Of course,
it is constrained by the amount of memory your system
has, as are all things, but beyond that an integer can be
as long as you need it to be.
• Floating-Point Numbers
• float values are specified with a decimal point.
• String
• Strings are sequences of character data.
• String literals may be delimited using either single or
double quotes. All the characters between the opening
delimiter and matching closing delimiter are part of the
string.
• Boolean Type
• Objects of Boolean type may have one of two values,
True or False (Be careful, the first letter should be
capitalized.)
Boolean Type
• It is special, try to understand it.
• In programming you often need to know if an expression is True or False.

• You can evaluate any expression in Python, and get one of two answers, True
or False.

• When you compare two values, the expression is evaluated and Python
returns the Boolean answer:

• if else is based on whether the condition is True or False.


Data Type for a set of values
List
• List is used to store multiple items in a single variable.

• List items are ordered, changeable, and allow duplicate values.

• List items are indexed, the first item has index [0], the second
item has index [1] etc.

• Lists are created using square brackets:

• list = [“Andy", “Alice", "cherry"]


Data Type for a set of values
Tuple
• Tuples are used to store multiple items in a single variable
• A tuple is a collection which is ordered and unchangeable.

• Tuples are written with round brackets (paren).

• tuple = ("apple", "banana", "cherry")


• Tuple items are ordered, unchangeable, and allow duplicate values.

• Tuple items are indexed, the first item has index [0], the second item has
index [1] etc.
Data Type for a set of values
Set
• Sets are used to store multiple items in a single variable.
• A set is a collection which is unordered, unchangeable, and
unindexed.
• Set items are unchangeable, but you can remove items and add
new items.
• Sets are written with curly brackets.
• set = {"apple", "banana", "cherry"}
Data Type for a set of values
Dictionary
• Dictionaries are used to store data values in key:value pairs.

• A dictionary is a collection which is ordered, changeable and do not


allow duplicates.
• The values in dictionary items can be of any data type
• Index : value
Data Type for a set of values
Range
• The range() function is a built-in-function used in python, it is used to
generate a sequence of numbers.
• A range is a series of values between two numeric intervals.
• If you want to generate a sequence of numbers given the starting and
the ending values then you can give these values as parameters of the
range() function. For example:

• But you can also consider it as a data type. For example:

• The result is “range”, and it could be seen a data type.


More about collection data types
• Learning programming is a long-term and even lifelong journey of self-directed learning. This
course is designed to lay the foundation for you. Based on this foundation, it's important for you
to develop the ability to learn independently, especially by utilizing online resources and AI to
further your education.

• At this point of time, for instance, you can try to explore more about “collection data types” if you
want to——you can read information available online, such as this link:
• https://medium.com/analytics-vidhya/collection-data-types-in-python-3a3f9c0b554
• Additionally, you can search the web and engage with AI to gain further insights.

• Ultimately, I hope you‘ll gradually build the awareness and ability to self-learn, which will benefit
you throughout your life.
• After all, one of the key goals of this course is to foster your ability to be lifelong learners in the
information age.
Data Type
• Variables can store data of different types, and different types can do
different things.
• Python has the following data types built-in by default, in these categories:

• 9 of them are the most common, and they're the ones we ask you to remember and
use in this course.
More about range
range parameters
• The range() function can be represented in three different ways, or you can
think of them as three range parameters:

• range(stop_value) : default starting point is zero. Stop value not included.

• range(start_value, stop_value) : This generates the sequence based on the start and
stop value.

• range(start_value, stop_value, step_size): It generates the sequence by incrementing


the start value using the step size until it reaches the stop value. Default step size is 1.

• only work when the specified value is an integer or a whole number. It does not support
the float data type and the string data type. However, you can pass in both positive and
negative integer values to it.
How to use parameters of range
• specify both start and the stop value

• negative integer values also can work

• Start value should be smaller than stop value


Unless the step size is negative
How to use parameters of range
• add the third parameter, i.e., the step size

• The step size can be negative (when start value> stop value)
Play with range appear in test

• apply equality comparisons between range functions. Given two


range functions, if they represent the same sequence of values, then
they are considered to be equal (don't need to have the same start,
stop, and step values)

• Check the length of a range: len()


len ()
• Btw, since we mentioned len() here, let’s get to know more about it.

• The len() function returns the number of items in an object. The object,
must be a sequence or a collection.

• When the object is a string, the len() function returns the number of
characters in the string.

• Please also try other data type: list, tuple, dic, range.
Play with range
• Let’s continue to study range final !!!!!
• Accessing range() with an Index Value
• range is indexed
• For example:
• range (3,8)
• Items are: 3, 4, 5, 6, 7
• Their index: [0] [1] [2] [3] [4]

• you can access an item by referring to the index number,


• by using [ ]:
Accessing range() with an Index Value
Use range() function together with List
• we can create a list and we print list elements one by one by
taking advantage of the range().

The index of list.


It has the same logic as range()’s index. The first item of a
list is index [0], and second item is index[1], and so on.
Use range() to create list
• We can put a range directly in to a list to create a list.
For example:
list(range(10))

• Please try this block of code and explain the result:


Task:
• Please print the largest item of this range:

range(3, 666, 70)


Task:
• Please print the largest item of this range:

range(3, 6666, 70)


Task:
• Please print the largest item of this range:

range(3, 6666555555555555, 70)


Task:
• Create a function to auto print the largest item of any range.
Task:
• Create a function to auto print the largest item of any range.
Task:
What is the result of the following block of code? Why?
Task:
What is the result of the following block of code? Why?
(string also has index.
Its logic is the same as range()’s index, each letter has an index.)
Task:
What is the result of the following block of code? Why?
(we can use a string to directly create a list also.
Each item is a letter of the string.)
Pandas: read and write to excel
How to start to use a third party package and its modules?
• First, install package/library
in terminal window key in:
For Windows
pip install package name
pip3 install package name
For Mac

For example, install pandas:


In Windows system:
pip install pandas
In Mac:
pip3 install pandas

Be careful: pip is a software. please keep your pip in the latest version, If this prompt appears, copy the green words
and paste in terminal window, press Enter key to run it:

If done:
How to start to use a third party package and its modules?

• After install, call the package/library


(in programming window)

import package name as a short nickname

• Give the package a short nickname


• Keep it short and easy to use in the following steps.
• This nickname can fully represent the package
• When you need to use the package, just call the nickname
instead of call the full name.
Pandas——data frame
• Print a data frame
• Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
Pandas ——data frame
• Create an empty DataFrame: no data, only columns (column names in
the first line). For example:

• Or, no columns at all.


To read, we need install openpyxl (a package)
in VScode
Pandas——read final!!!
• Read and view excel
fill in the
blanks
pandas ——write
• Write to excel (create a new excel file)

• You will get a new excel file in the same folder, and the content of
the first sheet is df.
pandas ——write
• You can specify the name of the sheet

• You will get a new excel file in the same folder, and the content of
the first sheet is df, the name of this sheet is “try”.
pandas ——write
• You can write an dataframe with no data (only defined columns in
the first line) to the excel

• And without the index column:


pandas ——write
• You can write an totally empty dataframe with no data no columns
to a new excel

• In this way, you can create new excel file with new sheets.
pandas ——Filter
• You can filter the data based on a certain condition. For example:

• You can get all the students come from Y university, and put their
information into a new DataFrame.
Task
• In student score file (download in Canvas), please find all the
students whose score1 is larger than 85, and than put their
information in to a new excel file (named “high score1”)
Task
• In student score file, please find all the students whose score1 is
larger than 85, and than put their information in to a new excel
file (named “high score1”)
Task:
• You have a student score file. Please split this file into multiple
files by university name so that each file contains information from
only one university students:

• please find all the students come from X university, and put their
information into a new excel file named “X university students. xlsx”;
• please find all the students come from Y university, and put their
information into a new excel file named “Y university students. xlsx”;
• please find all the students come from Z university, and put their
information into a new excel file named “Z university students. xlsx”.
Task:
• You have a student score file. Please split this file into multiple
files by major so that each file contains information of students
from only one major.
Lifelong study
• Wanna know more skill of Pandas by yourself?

• Learning to code can be a lifelong study, so in this course you are also
encouraged to build self-learning skills to support your sustainable lifelong
learning ambitions.

• Please have a try to learn a little more by yourself:

• 1 Search online: google/social media


• 2 Take AI assistant/ChatGPT as your tutor
• 3 Directly go to official website tutorial
• https://pandas.pydata.org/
See you in the face to face workshop
• In the following workshop, the lecturer will also take you hand in
hand to practice all the tasks above again, step by step.

• Before attending the workshop, you can try using an AI assistant to


help you figure out what you're confused about. You are
encouraged to record your learning process with the AI assistant
and write a report to the lecturer.

• Remember this star button in anaconda.cloud? Its your AI assistant

52

You might also like