Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Python Vlookup

Uploaded by

oss.gurgaon21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Python Vlookup

Uploaded by

oss.gurgaon21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Tutorial – VLOOKUP in Pandas

Shiksha Online
Updated on Jan 24, 2023 11:30 IST
VLOOKUP is a common Excel function that stands for ‘Vertical Lookup’. The article
discusses the use of VLOOKUP in Pandas.

We already know that Pandas DataFrames are tabular data structures that store
data similar to an Excel or CSV file – in rows and columns. VLOOKUP is a common
Excel function that is essentially used for vertically arranged data and allows you to
map data from one table to another. In Pandas, VLOOKUP merges two DataFrames
if both have a common attribute (column). You can perform VLOOKUP in Pandas
using map() and merge() methods as discussed in this article:
T he map() method

T he merge() method

For our purpose today, let’s create a sample DataFrame as shown below:

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
Copy code

#Importing Pandas Library


import pandas as pd

#Creating a Sample DataFrame


df = pd.Dat aFrame({
'name': [ 'Bob', 'T om', 'Rob', 'Ben', 'Pam'],
'age': [ 10, 12, 13, 11, 12],
'gender': [ 'M', 'M', 'M', 'M', 'F'],
'birt hmont h': [ 'Jan', 'Aug', 'Oct ', 'Dec', 'Dec']
})

df

Our dummy dataset comprises of 4 columns – ‘name’, ‘age’, ‘gender’, and ‘birthmonth’.
As you can observe, it contains both numerical and categorical variables.

Now, let’s see how we can emulate using the VLOOKUP function in Pandas through
this dataset.

The map() method

The pandas .map() method allows us to map values to a Pandas Series, or a column

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
in a Pandas DataFrame. This can be done using a dictionary, where the key is the
corresponding value in our Pandas column and the value is the value that we want to
map into it.

To understand this better, let’s create a dictionary that contains our mapping
values:

Copy code

birt hmont h_map = {


'Jan': 'January',
'Aug': 'August ',
'Oct ': 'Oct ober',
'Dec': 'December'
}

Now, we will apply the map() method to the column that we want to map into:

df[‘birthmonth’] = df[‘birthmonth’].map(birthmonth_map)

df

Thus, we have performed VLOOKUP using a dictionary.

But what if the data is stored in another DataFrame, as is when working with

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
relational databases like SQL? In such cases, instead of working with Python
dictionaries, we use the merge() method.

How t o Read and Writ e Files Using Pandas


In this tuto rial, we are go ing to see ho w to read data and files using Pandas.

Dif f erence bet ween loc and iloc in Pandas


lo c[ ] and ilo c[ ] in Pandas are used fo r co nvenient data selectio n and filtering in
Pandas. The article co vers the differences between lo c and ilo c in Pandas.

The merge() method

The pandas .merge() method allows us to merge two DataFrames together.

In the DataFrame we created above, we have a column ‘age’ that corresponds to the
year a child was born in. Let’s create another DataFrame that contains the mapping
values (birth year) for the age:

Copy code

#Creating another DataFrame


df 2 = pd.Dat aFrame({
'age': [10, 11, 12, 13, 14, 15],
'birt hyear': [2012, 2011, 2010, 2009, 2008, 2007]
})

df 2

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
Now, let’s see how we can merge the two different DataFrames using the merge()
method:

Copy code

df = pd.merge(lef t =df , right =df 2, how='lef t ')


df

Note that VLOOKUP is essentially a left join between two tables, that is, the output
consists of all the rows in the left table and only the matched rows from the right
table.
T he arguments left and right are positional parameters that choose which DataFrames to
use as your lef t and right tables in the join.

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
T he how parameter sets how the tables have to be joined: lef t, right, inner, or outer.

Dat a Cleaning Using Pandas


Data preparatio n invo lves data co llectio n and data cleaning. When wo rking with
multiple so urces o f data, there are instances where the co llected data co uld be
inco rrect, mislabeled, o r even duplicated. This...re ad m o re

Perf orming VLOOKUP on right join

In the right join, the output DataFrame consists of all the rows in the right
DataFrame and only the matched rows from the left DataFrame. The unmatched
rows will be replaced by NaN values.

Copy code

df = pd.merge(lef t =df , right =df 2, how='right ')


df

Perf orming VLOOKUP on inner join

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
By setting the how parameter to inner, the final DataFrame will contain only the rows
for which the condition is satisfied in both the DataFrames.

Copy code

inner_join = pd.merge(df , df 2, on ='age', how ='inner')


inner_join

Perf orming Dat a Manipulat ion in Pyt hon using Pandas


Even befo re the birth o f the internet, Data was an integral part o f o ur life. Pro per
reco rd-keeping and analysis was the key feature o f a successful o rganizatio n.
No w with the...re ad m o re

Perf orming VLOOKUP on out er join

By setting the how parameter to the outer, the final DataFrame will contain rows
from both the DataFrames. If rows are matched, values will be shown. If rows do not
match, NaN will be displayed.

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
Copy code

out er_join = pd.merge(df , df 2, on ='age', how ='out er')


out er_join

Thus, we have performed VLOOKUP on four types of joins.

Endnotes

The Pandas library makes it incredibly easy to emulate VLOOKUP functions.


Mapping and merging data are essential steps during your data preparation,
especially if you’re working with normalized datasets from databases. Pandas is a
very powerful data processing tool and provides a rich set of functions to process
and manipulate data for analysis.

T op T rending Articles:

Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning
Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become
a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is
a Data Scientist | What is Data Analyst

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.

You might also like