How To Work With Excel Spreadsheets Using Python
How To Work With Excel Spreadsheets Using Python
Python Programming
by
Priyankur Sarkar (https://www.knowledgehut.com/blog/author/priyankur-sarkar)
31st Oct, 2019
Last updated on 07th Apr, 2021
16 mins read
(http://www.facebook.com/sharer/sharer.php?u=https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python)
(http://twitter.com/share?via=Knowledgehut&url=https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python&text=How to Work with Excel
Spreadsheets using Python&hashtags=)
(https://www.linkedin.com/cws/share?url=https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python)
(https://wa.me/?text=https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python)
tnuocsiD %02
Excel is considered as one of the most popular and widely used spreadsheet applications developed by Microsoft. You can organize, analyze and store your
data into tabular sheets with the help of Excel. From analysts and sales managers, to CEOs, professionals from every field use Excel for creating quick statistics
and for data crunching.
Spreadsheets are commonly used in the present world because of their intuitive nature and the ability to handle large datasets. Most importantly, they can
work without any prior technical background.
(https://www.knowledgehut.com/programming/python-programming-certification-training?
utm_source=blog&utm_medium=contentcta&utm_campaign=blogs2020)
Finding different ways to work with Excel using code is essential since working with data and in Python has some serious advantages in comparison with Excel’s
UI. Developers of Python have implemented ways to read, write and manipulate Excel documents.
You can check the quality of your spreadsheet application by going over the checklist below:
This checklist will help you in verifying the qualitative nature of the spreadsheet application you’re going to work on.
Subscribe to our newsletter. Enter Your E-mail SUBSCRIBE
https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python 1/21
7/10/2021 How to Work with Excel Spreadsheets using Python?
Practical Applications
In this article, we would be using openpyxl to work on data. With the help of this module, you can extract data from a database into an Excel spreadsheet or
you can also convert an Excel spreadsheet into a programmatic format. There can be a lot of possible situations where you might feel the need to use a
package like openpyxl. Let us discuss a few of them to get a comprehensive overview of it.
(https://www.knowledgehut.com/data-science/machine-learning-with-python-certification-training)
Now, if you want to import this particular data, you need to iterate over each row of the spreadsheet and then add each of the products into the database of the
online store.
In this situation, openpyxl comes to play. You can use it effectively to iterate over each User record and transform the required information into an Excel
tnuocsiD %02
spreadsheet.
In order to perform this, you have to read the spreadsheet first and then iterate through each row and fetch the total amount spent from the Database. Finally,
you need to write it back to the spreadsheet.
Starting openpyxl
You can install the openpyxl package using pip. Open your terminal and write the following command:
After you have installed the spreadsheet, you can make up your own simple spreadsheet:
workbook = Workbook()
spreadsheet = workbook.active
spreadsheet["A1"] = "Hello"
spreadsheet["B1"] = "World!"
workbook.save(filename="HelloWorld.xlsx")
Let us start with the most important thing that you can do with a spreadsheet,i.e. read it. We will be using a Watch Sample Dataset
(https://github.com/realpython/materials/raw/master/openpyxl-excel-spreadsheets-python/reviews-sample.xlsx) which contains a list of 100 watches with
information like product name, product ID, review and so forth.
>>> workbook.sheetnames
['Sheet 1']
>>> spreadsheet
>>> spreadsheet.title
In the example code above, we open the spreadsheet using load_workbook and then we check all the sheets that are available to work with using
workbook.sheetnames . Then Sheet 1 is automatically selected using workbook.active since it is the first sheet available. This is the most common way of
opening a spreadsheet.
Now, let us see the code to retrieve data from the spreadsheet:
>>> spreadsheet["A1"]
tnuocsiD %02
>>> spreadsheet["A1"].value
'marketplace'
>>> spreadsheet["F10"].value
You can retrieve the actual value and the cell value both. To get the actual value, use .value and to get the cell, you can use .cell() :
Let us first start with iterating through the data. There are a number of iterating methods that depend solely on the user.
You can slice the data with a combination of rows and columns:
>>> spreadsheet["A1:C2"]
You can also iterate through the dataset by ranging between rows and columns:
>>> spreadsheet["A"]
...
>>> spreadsheet["A:B"]
...
...
>>> spreadsheet[5]
tnuocsiD %02
...
>>> spreadsheet[5:6]
...
...
Python offers arguments by which you can set limits to the iteration with the help of Python generators like .iter_rows() and .iter_cols() :
... max_row=2,
... min_col=1,
... max_col=3):
... print(row)
... max_row=2,
... min_col=1,
... max_col=3):
... print(column)
You can also add Boolean values_only in the above example and set it to True to get the values of cell:
... max_row=2,
... min_col=1,
... max_col=3,
... values_only=True):
... print(value)
tnuocsiD %02
('US', 3653882, 'R3O9SGZBVQBV76')
Since we are now done with iterating the data, let us now manipulate data using Python’s primitive data structures.
Consider a situation where you want to extract information of a product from the sample spreadsheet and then store it into the dictionary. The key to the
dictionary would be the product ID.
The two essential elements that can be extracted from the data are as follows:
• ID • ID
• Title • Customers ID
• Parent • Headline
• Category • Body
• Date
import datetime
@dataclass
class Product:
id: str
parent: str
title: str
category: str
@dataclass
class Review:
id: str
customer_id: str
stars: int
headline: str
body: str
date: datetime.datetime
The next step is to create a mapping between columns and the required fields:
... max_row=1,
tnuocsiD %02
... values_only=True):
... print(value)
>>> # Or an alternative
... print(cell.value)
marketplace
Customer_ID
Review_ID
Product_ID
Product_Parent
...
Finally, let us convert the data into new structures which will parse the data in spreadsheet into a list of products and review objects:
PRODUCT_CATEGORY,REVIEW_DATE,REVIEW_ID,REVIEW_CUSTOMER, \
REVIEW_STARS,REVIEW_HEADLINE,REVIEW_BODY
# Using the read_only method since you're not gonna be editing the spreadsheet
workbook = load_workbook(filename="watch_sample.xlsx",read_only=True)
spreadsheet = workbook.active
products = []
reviews = []
# Using the values_only because you just want to return the cell value
parent=row[PRODUCT_PARENT],
title=row[PRODUCT_TITLE],
category=row[PRODUCT_CATEGORY])
products.append(product)
# You need to parse the date from the spreadsheet into a datetime format
spread_date = row[REVIEW_DATE]
parsed_date = datetime.strptime(spread_date,"%Y-%m-%d")
tnuocsiD %02
review = Review(id=row[REVIEW_ID],
Customer_ID=row[REVIEW_CUSTOMER],
stars=row[REVIEW_STARS],
headline=row[REVIEW_HEADLINE],
body=row[REVIEW_BODY],
date=parsed_date)
reviews.append(review)
print(products[0])
print(reviews[0])
After you execute the code, you will get an output that looks like this:
Product(id='A90FALZ1ZC',parent=937111370,...)
Review(id='D3O9OGZVVQBV76',customer_id=3903882,...)
Appending Data
To understanding how to append data, let us hover back to the first sample spreadsheet. We will open the document and append some data to it:
spreadsheet = workbook.active
spreadsheet["C1"]="Manipulating_Data ;)"
workbook.save(filename="hello_world_append.xlsx"
If you open your Excel file, you will notice the additional Manipulating_Data being added to an adjacent cell.
So, let’s begin with some simple Spreadsheets and understand what each line means.
2
3 filename = "first_program.xlsx"
4
tnuocsiD %02
5 workbook = Workbook()
6 spreadsheet = workbook.active
7
8 sheet["A1"] = "first"
9 sheet["B1"] = "program!"
10
11 workbook.save(filename=filename)
Line 5: In order to make a Spreadsheet, at first, we have to create an Empty workbook to perform further operations.
Lines 8 and 9 : We can add data to a specific cell as per our requirement. In this example, we can see that two values “first” and “program” have been added to
specific cells in the sheet.
Line 11: The line shows how to save data after all the operations we have done.
We have already explained the following code by which we can add values to a Spreadsheet:
>>> cell
<Cell 'Sheet'.A1>
>>> cell.value
'hello'
>>> cell.value
'hey'
Line 1: In the first line at first we have declared the cell and updated its value.
Line 5: We have printed the value of the cell as “first” because in the first program we have already assigned sheet["A1"] with “first”
Line 8 : We have updated the value of the cell as "second" by simply assigning it to cell.value .
Lines 9 : In this line, we have just printed the updated value of cell.
Finally, you have to save all the operations you have performed into the spreadsheet once you call workbook.save() .
If the cell didn’t exist while adding a value ,then openpyxl creates a cell:
>>> print_rows()
('first', 'program!')
tnuocsiD %02
>>> spreadsheet["B10"] = "test"
>>> print_rows()
('first', 'program!')
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, 'test')
.insert_rows()
.delete_rows()
.insert_cols()
.delete_cols()
1. idx
2. amount
Idx stands for index position and amount refers to the number of values we can store in the Spreadsheet.
Subscribe to our newsletter. Enter Your E-mail SUBSCRIBE
https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python 9/21
7/10/2021 How to Work with Excel Spreadsheets using Python?
Using our basic knowledge based on the first simple program, let’s see how we can use these methods inside the program:
>>> print_rows()
('first', 'program!')
>>> spreadsheet.insert_cols(idx=1)
>>> print_rows()
>>> spreadsheet.insert_cols(idx=3,amount=5)
>>> print_rows()
>>> spreadsheet.delete_cols(idx=3,amount=5)
>>> v.delete_cols(idx=1)
>>> print_rows()
('first', 'program!')
>>> spreadsheet.insert_rows(idx=1)
>>> print_rows()
(None, None)
tnuocsiD %02
('first', 'program!')
>>> spreadsheet.insert_rows(idx=1,amount=3)
>>> print_rows()
(None, None)
(None, None)
(None, None)
(None, None)
('first', 'program!')
>>> spreadsheet.delete_rows(idx=1,amount=4)
>>> print_rows()
('first', 'program!')
Managing Sheets
We have seen the following recurring piece of code in our previous examples .This is one of the ways of selecting the default sheet from the Spreadsheet:
spreadsheet = workbook.active
However, if you want to open a spreadsheet with multiple sheets, you can write the following command:
>>> # Let's say you have two sheets: "Products" and "Company Sales"
>>> workbook.sheetnames
If we want to change the title of the Sheet, execute the following code:
>>> workbook.sheetnames
>>> workbook.sheetnames
We can CREATE / DELETE Sheets also with the help of two methods - .create_sheet() and .remove() :
>>> workbook.sheetnames
tnuocsiD %02
>>> #To print the updated available sheet names
>>> workbook.sheetnames
>>> # To define the position Where we want to create the Sheet(here “HR” sheet is created at the first position .H
ere index 0 represents the first position)
>>> workbook.sheetnames
>>> # To remove them,we just have to send the sheet names as an argument which we want to delete to the method .r
emove()
>>> workbook.remove(Operations_Sheet)
>>> workbook.sheetnames
>>> workbook.remove(hr_sheet)
>>> workbook.sheetnames
When you’re programmatically creating a spreadsheet and it is going to be sent and used by someone else, it is a good practice to add different filters and
allow people to use it afterward.
In the code below there is a simple example which shows how to add a simple filter to your spreadsheet:
>>> # Check the used spreadsheet space using the attribute "dimensions"
>>> spreadsheet.dimensions
'A1:O100'
>>> spreadsheet.auto_filter.ref="A1:O100"
>>> workbook.save(filename="watch_sample_with_filters.xlsx")
>>> FORMULAE
frozenset({'ABS',
'AMORLINC',
'ACCRINT',
'ACOS',
'ACCRINTM',
'ACOSH',
tnuocsiD %02
...,
'AND',
'YEARFRAC',
'YIELDDISC',
'AMORDEGRC',
'YIELDMAT',
'YIELD',
'ZTEST'})
Let’s check the average star rating of the 99 reviews within the spreadsheet:
Now, if we open your spreadsheet and go to cell P2, you can see the value to be 4.18181818181818.
Similarly, we can use this methodology to include any formulas for our requirements in our spreadsheet. For example, if we want to count the number of
helpful reviews:
Using openpyxl , we get multiple styling options such as including fonts, colors, borders,and so on.
Subscribe to our newsletter. Enter Your E-mail SUBSCRIBE
https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python 12/21
7/10/2021 How to Work with Excel Spreadsheets using Python?
... right=double_border_side,
... bottom=double_border_side,
... left=double_border_side)
>>> workbook.save(filename="sample_styles.xlsx")
If you want to apply multiple styles to one or several cells in our spreadsheets,you can use NamedStyle class:
tnuocsiD %02
>>> # Let's create a style template for the header row
>>> # Now let's apply this to all first row (header) cells
>>> workbook.save(filename="sample_styles.xlsx")
3
4 workbook = Workbook()
5 spreadsheet = workbook.active
6
8 rows = [
9 ["Product","Online","Store"],
10 [1,30,45],
11 [2,40,30],
12 [3,40,25],
13 [4,50,30],
14 [5,30,25],
15 [6,25,35],
16 [7,20,40],
17 ]
18
20 spreadsheet .append(row)
Now let us create a bar chart that will show the total number of sales per product:
22 chart = BarChart()
23 data = Reference(worksheet=sheet,
24 min_row=1,
tnuocsiD %02
25 max_row=8,
26 min_col=2,
27 max_col=3)
28
29 chart.add_data(data,titles_from_data=True)
31
32 workbook.save("chart.xlsx")
You can also create a line chart by simply making some changes to the data:
1 import random
4
5 workbook = Workbook()
6 sheet = workbook.active
7
9 rows= [
13 [1, ],
14 [2, ],
15 [3, ],
16 ]
17
19 sheet.append(row)
20
22 max_row=4,
23 min_col=2,
24 max_col=13):
tnuocsiD %02
26 cell.value = random.randrange(5,100)
There are numerous types of charts and various types of customizations you can apply to your spreadsheet to make it more attractive.
Assume we have a database and we use some Object Relational mapping to map the database into Python classes and then export the objects into
spreadsheets:
@dataclass
class Sale:
id: str
quantity: int
@dataclass
class Product:
id: str
name: str
sales:List[Sale]
Now, to generate some random data, let’s assume that the above classes are stored in db_classes.py file then:
1 import random
2
6
8
9 products_range = []
10
13 sales = []
14
16 for_in range(5):
17 sale_range = Sale(quantity=random.randrange(5,100))
18 sales.append(sale)
19
20 product = Product(id=str(idx),
22 sales=sales)
23 products_range.append(product)
By running this code, we will get 5 products in 5 months of sale with a random quantity of sales for each month.
tnuocsiD %02
Now, we have to convert this into a spreadsheet in which we need to iterate over the data:
25 workbook = Workbook()
26 spreadsheet = workbook.active
27
31
34 data = [product.id,product.name]
36 data.append(sale.quantity)
37 spreadsheet.append(data)
This will create a spreadsheet with some data coming from your database.
We have learned to work with Excel in Python because Excel is one of the most popular tools and finding a way to work with Excel is critical. Pandas is a great
tool to work with Excel in Python. It has unique methods to read all kinds of data in an Excel file and we can export items back to Excel using it.
2
3 data = {
7 }
8 dataframe = pd.DataFrame(data)
Now we have some data, and to convert it from a DataFrame into a worksheet we generally use . dataframe_to_rows() :
12
13 workbook = Workbook()
14 spreadsheet = workbook.active
15
tnuocsiD %02
17 spreadsheet .append(row)
18
19 workbook.save("pandas_spreadsheet.xlsx")
We need to use read_excel method to read data from pandas DataFrame object.
excel_file =’movies.xls’
movies=pd.read_excel(excel_file)
We can also use Excel file class to use multiple sheets from the same excel file:
movies_sheets = []
movies_sheets.append(xlsx.parse(sheet))
movies = pd.concat(movies_sheets))
Indexes and columns allows you to access data from your DataFrame easily:
>>> df.columns
dtype='object')
>>> df["star_rating"][:10]
R3O9SGZBVQBV76 5
RKH8BNC3L5DLF 5
R2HLE8WKZSU3NL 2
R31U3UH5AZ42LL 5
R2SV659OUJ945Y 4
RA51CP8TR5A2L 5
RB2Q7DLDN6TH6 5
R2RHFJV0UYBK3Y 1
R2Z6JOQ94LFHEP 5
RX27XIIWY5JPB 4
>>> df.loc["R2EQL1V1L6E0C9"]
marketplace US
customer_id 15305006
review_id R2EQL1V1L6E0C9
tnuocsiD %02
product_id B004LURNO6
product_parent 892860326
review_date 2015-08-31
Summary
In this article we have covered:
Now you are well aware of the different types of implementations you can perform with spreadsheets using Python. However, if you are willing to gather more
information on this topic, you can always rely on the official documentation (https://openpyxl.readthedocs.io/en/stable/index.html) of openpyxl. To gain more
knowledge about Python tips and tricks, check out our Python tutorial (https://www.knowledgehut.com/tutorials/python-tutorial). To gain mastery
over Python coding,join ourPython certification course (https://www.knowledgehut.com/programming/python-programming-certification-training).
Priyankur Sarkar
Data Science Enthusiast
Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer
with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.
comment*
Name*
Email*
COMMENT
Your email address will not be published. Required fields are marked *
SUGGESTED BLOGS
(https://www.knowledgehut.com/blog/programming/popular-programming-certifications)
BLOGS
(HTTPS://WWW.KNOWLE…
3470
tnuocsiD %02
Top-Paying Programming Certifications for 2021
(https://www.knowledgehut.com/blog/programming/popular-programming-certifications)
by KnowledgeHut (https://www.knowledgehut.com/blog/author/knowledgehut-editor)
09 Jun 2021
6 mins read
Programming is at the core of software development...
READ MORE (HTTPS://WWW.KNOWLEDGEHUT.COM/BLOG/PROGRAMMING/POPULAR-PROGRAMMING-
CERTIFICATIONS)
(https://www.knowledgehut.com/blog/programming/top-java-developers-certifications)
BLOGS
(HTTPS://WWW.KNOWLE…
5680
(https://www.knowledgehut.com/blog/programming/best-python-certifications)
BLOGS
(HTTPS://WWW.KNOWLE…
9303
LOAD MORE
Connect with us
(https://www.linkedin.com/company/knowledgehut) (https://www.facebook.com/KnowledgeHut.Global)
(https://www.instagram.com/knowledgehut.global) (https://www.youtube.com/user/TheKnowledgehut)
(https://twitter.com/KnowledgeHut)
Get Our Weekly Newsletter
tnuocsiD %02
Enter Your E-mail SUBSCRIBE
We Accept
Company
Offerings
Resources
Partner with us
Support
Disclaimer: KnowledgeHut reserves the right to cancel or reschedule events in case of insufficient registrations, or if presenters cannot attend
due to unforeseen circumstances. You are therefore advised to consult a KnowledgeHut agent prior to making any travel arrangements for a
workshop. For more details, please refer Cancellation & Refund Policy (https://www.knowledgehut.com/refund-policy).
Subscribe to our newsletter. Enter Your E-mail
CSM®, CSPO®, CSD®, CSP®, A-CSPO®, A-CSM® are registered trademarks of Scrum All READ MORE
SUBSCRIBE
https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python 20/21
7/10/2021 How to Work with Excel Spreadsheets using Python?
tnuocsiD %02