Python Pandas Tutorial_ the Ultimate Guide for Beginner
Python Pandas Tutorial_ the Ultimate Guide for Beginner
Vidhi Chugh
An AI leader adept at building scalable machine-learning systems
TO P I C S
Python
Data Science
pandas is arguably the most important Python package for data analysis. With over 100
million downloads per month, it is the de facto standard package for data manipulation and
exploratory data analysis. Its ability to read from and write to an extensive list of formats
makes it a versatile tool for data science practitioners. Its data manipulation functions make
it a highly accessible and practical tool for aggregating, analyzing, and cleaning data.
In our blog post on how to learn pandas, we discussed the learning path you may take to
master this package. This beginner-friendly tutorial will cover all the basic concepts and
illustrate pandas' different functions. You can also check out our course on pandas
Foundations for further details.
This article is aimed at beginners with basic knowledge of Python and no prior experience
with pandas to help you get started.
What is pandas?
pandas is a data manipulation package in Python for tabular data. That is, data in the form
of rows and columns, also known as DataFrames. Intuitively, you can think of a DataFrame
as an Excel sheet.
pandas’ functionality includes data transformations, like sorting rows and taking subsets, to
calculating summary statistics such as the mean, reshaping DataFrames, and joining
DataFrames together. pandas works well with other popular Python data science packages,
often called the PyData ecosystem, including
https://www.datacamp.com/tutorial/pandas 1/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
pandas is used throughout the data analysis workflow. With pandas, you can:
Tidy datasets by reshaping their structure into a suitable format for analysis.
pandas also contains functionality for time series analysis and analyzing text data.
Made for Python: Python is the world's most popular language for machine learning and
data science.
Less verbose per unit operations: Code written in pandas is less verbose, requiring fewer
lines of code to get the desired output.
Intuitive view of data: pandas offers exceptionally intuitive data representation that
facilitates easier data understanding and analysis.
Extensive feature set: It supports an extensive set of operations from exploratory data
analysis, dealing with missing values, calculating statistics, visualizing univariate and bivariate
data, and much more.
Works with large data: pandas handles large data sets with ease. It offers speed and
efficiency while working with datasets of the order of millions of records and hundreds of
columns, depending on the machine.
Run code
Install pandas
Installing pandas is straightforward; just use the pip install command in your terminal.
https://www.datacamp.com/tutorial/pandas 2/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
After installing pandas, it's good practice to check the installed version to ensure everything
is working correctly:
import pandas as pd
print(pd.__version__) # Prints the pandas version
This confirms that pandas is installed correctly and lets you verify compatibility with other
packages.
import pandas as pd
df = pd.read_csv("diabetes.csv")
This read operation loads the CSV file diabetes.csv to generate a pandas Dataframe object
df . Throughout this tutorial, you'll see how to manipulate such DataFrame objects.
df = pd.read_csv("diabetes.txt", sep="\s")
df = pd.read_excel('diabetes.xlsx')
You can also specify other arguments, such as header for to specify which row becomes
the DataFrame's header. It has a default value of 0 , which denotes the first row as headers
or column names. You can also specify column names as a list in the names argument. The
index_col (default is None ) argument can be used if the file contains a row index.
https://www.datacamp.com/tutorial/pandas 3/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
Note: In a pandas DataFrame or Series, the index is an identifier that points to the location
of a row or column in a pandas DataFrame. In a nutshell, the index labels the row or column
of a DataFrame and lets you access a specific row or column by using its index (you will see
this later on). A DataFrame’s row index can be a range (e.g., 0 to 303), a time series (dates or
timestamps), a unique identifier (e.g., employee_ID in an employees table), or other types
of data. For columns, it's usually a string (denoting the column name).
df = pd.read_json("diabetes.json")
If you want to learn more about importing data with pandas, check out this cheat sheet on
importing various file types with Python.
import sqlite3
df = pd.read_json("https://api.example.com/data.json")
If the API response is paginated, or in a nested JSON format, you may need additional
processing using json_normalize() from pandas.io.json .
https://www.datacamp.com/tutorial/pandas 4/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df.to_csv("diabetes_out.csv", index=False)
df.to_json("diabetes_out.json")
Note: A JSON file stores a tabular object like a DataFrame as a key-value pair. Thus you
would observe repeating column headers in a JSON file.
df.to_excel("diabetes_out.xlsx", index=False)
https://www.datacamp.com/tutorial/pandas 5/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df.head()
df.tail(n = 10)
df.describe()
It gives a quick look at the scale, skew, and range of numeric data.
You can also modify the quartiles using the percentiles argument. Here, for example, we’re
looking at the 30%, 50%, and 70% percentiles of the numeric columns in DataFrame df .
https://www.datacamp.com/tutorial/pandas 6/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
You can also isolate specific data types in your summary output by using the include
argument. Here, for example, we’re only summarizing the columns with the integer data
type.
df.describe(include=[int])
Similarly, you might want to exclude certain data types using exclude argument.
df.describe(exclude=[int])
Often, practitioners find it easy to view such statistics by transposing them with the .T
https://www.datacamp.com/tutorial/pandas 7/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
attribute.
df.describe().T
For more on describing DataFrames, check out the following cheat sheet.
See Details
See More
POWERED BY
https://www.datacamp.com/tutorial/pandas 8/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
POWERED BY
(768,9)
768
9
POWERED BY
df.columns
POWERED BY
list(df.columns)
POWERED BY
https://www.datacamp.com/tutorial/pandas 9/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
The sample DataFrame does not have any missing values. Let's introduce a few to make
things interesting. The .copy() method makes a copy of the original DataFrame. This is done
to ensure that any changes to the copy don’t reflect in the original DataFrame. Using .loc
(to be discussed later), you can set rows two to five of the Pregnancies column to NaN
values, which denote missing values.
df2 = df.copy()
df2.loc[2:5,'Pregnancies'] = None
df2.head(7)
POWERED BY
You can check whether each element in a DataFrame is missing using the .isnull() method.
df2.isnull().head(7)
POWERED BY
Given it's often more useful to know how much missing data you have, you can combine
.isnull() with .sum() to count the number of nulls in each column.
df2.isnull().sum()
POWERED BY
Pregnancies 4
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
POWERED BY
You can also do a double sum to get the total number of nulls in the DataFrame.
df2.isnull().sum().sum()
POWERED BY
https://www.datacamp.com/tutorial/pandas 10/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
POWERED BY
Sorting data
To sort a DataFrame by a specific column:
POWERED BY
POWERED BY
POWERED BY
POWERED BY
df['Outcome']
POWERED BY
https://www.datacamp.com/tutorial/pandas 11/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df[['Pregnancies', 'Outcome']]
POWERED BY
df[df.index==1]
POWERED BY
https://www.datacamp.com/tutorial/pandas 12/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df[df.index.isin(range(2,10))]
POWERED BY
df2.index = range(1,769)
POWERED BY
The below example returns a pandas Series instead of a DataFrame. The 1 represents the
row index (label), whereas the 1 in . iloc[] is the row position (first row).
df2.loc[1]
POWERED BY
Pregnancies 6.000
Glucose 148.000
BloodPressure 72.000
SkinThickness 35.000
Insulin 0.000
BMI 33.600
DiabetesPedigreeFunction 0.627
Age 50.000
Outcome 1.000
Name: 1, dtype: float64
POWERED BY
df2.iloc[1]
POWERED BY
Pregnancies 1.000
Glucose 85.000
BloodPressure 66.000
SkinThickness 29.000
Insulin 0.000
https://www.datacamp.com/tutorial/pandas 13/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
BMI 26.600
DiabetesPedigreeFunction 0.351
Age 31.000
Outcome 0.000
Name: 2, dtype: float64
POWERED BY
You can also fetch multiple rows by providing a range in square brackets.
df2.loc[100:110]
POWERED BY
df2.iloc[100:110]
POWERED BY
You can also subset with .loc[] and .iloc[] by using a list instead of a range.
POWERED BY
https://www.datacamp.com/tutorial/pandas 14/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
POWERED BY
You can also select specific columns along with rows. This is where .iloc[] is different from
.loc[] – it requires column location and not column labels.
POWERED BY
df2.iloc[100:110, :3]
POWERED BY
https://www.datacamp.com/tutorial/pandas 15/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
For faster workflows, you can pass in the starting index of a row as a range.
POWERED BY
df2.iloc[760:, :3]
POWERED BY
df2.loc[df['Age']==81, ['Age']] = 80
POWERED BY
https://www.datacamp.com/tutorial/pandas 16/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df[df.BloodPressure == 122]
POWERED BY
The below example fetched all rows where Outcome is 1. Here df.Outcome selects that
column, df.Outcome == 1 returns a Series of Boolean values determining which Outcomes
are equal to 1, then [] takes a subset of df where that Boolean Series is True .
df[df.Outcome == 1]
POWERED BY
You can use a > operator to draw comparisons. The below code fetches Pregnancies ,
Glucose , and BloodPressure for all records with BloodPressure greater than 100.
POWERED BY
https://www.datacamp.com/tutorial/pandas 17/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df2.isnull().sum()
POWERED BY
Pregnancies 4
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
POWERED BY
df3 = df2.copy()
df3 = df3.dropna()
df3.shape
POWERED BY
https://www.datacamp.com/tutorial/pandas 18/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
POWERED BY
The axis argument lets you specify whether you are dropping rows, or columns, with
missing values. The default axis removes the rows containing NaNs. Use axis = 1 to remove
the columns with one or more NaN values. Also, notice how we are using the argument
inplace=True which lets you skip saving the output of .dropna() into a new DataFrame.
df3 = df2.copy()
df3.dropna(inplace=True, axis=1)
df3.head()
POWERED BY
You can also drop both rows and columns with missing values by setting the how argument
to 'all'
df3 = df2.copy()
df3.dropna(inplace=True, how='all')
POWERED BY
df3 = df2.copy()
# Get the mean of Pregnancies
mean_value = df3['Pregnancies'].mean()
# Fill missing values using .fillna()
df3 = df3.fillna(mean_value)
POWERED BY
POWERED BY
(1536, 9)
https://www.datacamp.com/tutorial/pandas 19/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
POWERED BY
You can remove all duplicate rows (default) from the DataFrame using .drop_duplicates()
method.
df3 = df3.drop_duplicates()
df3.shape
POWERED BY
(768, 9)
POWERED BY
Renaming columns
A common data cleaning task is renaming columns. With the .rename() method, you can
use columns as an argument to rename specific columns. The below code shows the
dictionary for mapping old and new column names.
POWERED BY
You can also directly assign column names as a list to the DataFrame.
POWERED BY
For more on data cleaning, and for easier, more predictable data cleaning workflows, check
out the following checklist, which provides you with a comprehensive set of common data
cleaning tasks.
df.mean()
POWERED BY
df.mode()
POWERED BY
Similarly, the median of each column is computed with the .median() method
df.median()
POWERED BY
df2['Glucose_Insulin_Ratio'] = df2['Glucose']/df2['Insulin']
df2.head()
POWERED BY
https://www.datacamp.com/tutorial/pandas 21/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df['Outcome'].value_counts()
POWERED BY
df['Outcome'].value_counts(normalize=True)
POWERED BY
Turn off automatic sorting of results using sort argument ( True by default). The default
sorting is based on the counts in descending order.
df['Outcome'].value_counts(sort=False)
POWERED BY
You can also apply .value_counts() to a DataFrame object and specific columns within it
instead of just a column. Here, for example, we are applying value_counts() on df with the
subset argument, which takes in a list of columns.
df.value_counts(subset=['Pregnancies', 'Outcome'])
POWERED BY
https://www.datacamp.com/tutorial/pandas 22/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df.groupby('Outcome').mean()
POWERED BY
.groupby() enables grouping by more than one column by passing a list of column names,
as shown below.
df.groupby(['Pregnancies', 'Outcome']).mean()
POWERED BY
https://www.datacamp.com/tutorial/pandas 23/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
Any summary method can be used alongside .groupby() , including .min() , .max() ,
.mean() , .median() , .sum() , .mode() , and more.
Pivot tables
pandas also enables you to calculate summary statistics as pivot tables. This makes it easy
to draw conclusions based on a combination of variables. The below code picks the rows as
unique values of Pregnancies , the column values are the unique values of Outcome , and
the cells contain the average value of BMI in the corresponding group.
For example, for Pregnancies = 5 and Outcome = 0 , the average BMI turns out to be 31.1.
POWERED BY
https://www.datacamp.com/tutorial/pandas 24/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
df[['BMI', 'Glucose']].plot.line()
POWERED BY
You can select the choice of colors by using the color argument.
https://www.datacamp.com/tutorial/pandas 25/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
POWERED BY
All the columns of df can also be plotted on different scales and axes by using the
subplots argument.
df.plot.line(subplots=True)
POWERED BY
df['Outcome'].value_counts().plot.bar()
https://www.datacamp.com/tutorial/pandas 26/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
POWERED BY
Barplots in pandas
df.boxplot(column=['BMI'], by='Outcome')
POWERED BY
Boxplots in pandas
https://www.datacamp.com/tutorial/pandas 27/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
using machine learning, automate data workflows, and more. Check out the resources below
to accelerate your pandas learning journey:
[Cheat Sheets] A plethora of Python and pandas cheat sheets to reference throughout
your learning
[Live trainings] Check out our free live-code-along sessions, many of which leverage
pandas
[More tutorials] Check out our remaining tutorials on pandas and the PyData
ecosystem, including how to implement moving averages in pandas and using the
pandas .apply() method
pandas FAQs
AUTHOR
Vidhi Chugh
I am an AI Strategist and Ethicist working at the intersection of data science, product, and
engineering to build scalable machine learning systems. Listed as one of the "Top 200
Business and Technology Innovators" in the world, I am on a mission to democratize machine
learning and break the jargon for everyone to be a part of this transformation.
TO P I C S
COURSE
Introduction to Python
https://www.datacamp.com/tutorial/pandas 28/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
4 hr 6M
Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore
popular packages.
See More
Related
B LO G
B LO G
C H E AT- S H E E T
See More
LEARN
Learn Python
Learn AI
Learn Power BI
Assessments
Career Tracks
Skill Tracks
Courses
https://www.datacamp.com/tutorial/pandas 29/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
DATA C O U R S E S
Python Courses
R Courses
SQL Courses
Power BI Courses
Tableau Courses
Alteryx Courses
Azure Courses
AWS Courses
Excel Courses
AI Courses
DATA L A B
Get Started
Pricing
Security
Documentation
C E R T I F I C AT I O N
Certifications
Data Scientist
Data Analyst
Data Engineer
SQL Associate
Azure Fundamentals
AI Fundamentals
RESOURCES
https://www.datacamp.com/tutorial/pandas 30/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
Resource Center
Upcoming Events
Blog
Code-Alongs
Tutorials
Docs
Open Source
RDocumentation
Course Editor
Data Portfolio
PLANS
Pricing
For Students
For Business
For Universities
DataCamp Donates
FO R B U S I N E S S
Business Pricing
Teams Plan
Customer Stories
Partner Program
ABOUT
About Us
Learner Stories
Careers
Become an Instructor
Press
Leadership
Contact Us
DataCamp Español
DataCamp Português
https://www.datacamp.com/tutorial/pandas 31/32
2/17/25, 9:07 AM Python pandas Tutorial: The Ultimate Guide for Beginners | DataCamp
DataCamp Deutsch
DataCamp Français
S U P PO R T
Help Center
Become an Affiliate
Privacy Policy Cookie Notice Do Not Sell My Personal Information Accessibility Security Terms of Use
https://www.datacamp.com/tutorial/pandas 32/32