Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

DataVisualizationUsingPython

This document provides a guide on using Python and Matplotlib for data visualization, specifically with a dataset of student test scores. It covers creating various plots including bar charts, histograms, pie charts, and Pareto charts, using a DataFrame created from a dictionary. Each visualization is explained with code examples and descriptions of the data being represented.

Uploaded by

dancer.shore
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

DataVisualizationUsingPython

This document provides a guide on using Python and Matplotlib for data visualization, specifically with a dataset of student test scores. It covers creating various plots including bar charts, histograms, pie charts, and Pareto charts, using a DataFrame created from a dictionary. Each visualization is explained with code examples and descriptions of the data being represented.

Uploaded by

dancer.shore
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

ECON 2123

DR. RAHMAN

Using Python and Matplotlib for Data


Visualization
This document explains how to use Python and Matplotlib to create various plots using a
dictionary as a DataFrame. The examples use the following dataset:

studentsnumbers = {
'PreTestScore': [88, 82, 84, 93, 75, 78, 84, 87, 95, 91, 83, 89, 77, 68, 91, 99, 56, 54, 78, 89, 87, 55,
75, 95, 75, 66, 76, 85, 95, 77],
'PostTestScore': [91, 84, 88, 91, 79, 80, 88, 90, 90, 96, 88, 89, 81, 74, 92, 89, 76, 54, 78, 79, 97, 65,
95, 85, 65, 86, 56, 95, 85, 87],
'Gender': ['Male', 'Female', 'Male', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female',
'Male', 'Female', 'Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female',
'Female', 'Female', 'Male', 'Male', 'Male', 'Male', 'Male', 'Female', 'Male', 'Female'],
'Year': ['Freshman', 'Senior', 'Freshman', 'Sophomore', 'Senior', 'Sophomore', 'Senior', 'Freshman',
'Sophomore',
'Sophomore', 'Sophomore', 'Freshman', 'Freshman', 'Freshman', 'Freshman', 'Junior',
'Sophomore',
'Sophomore', 'Sophomore', 'Freshman', 'Senior', 'Senior', 'Sophomore', 'Freshman', 'Freshman',
'Junior', 'Sophomore', 'Freshman', 'Senior', 'Sophomore']
}

First, let’s create a DataFrame using the pandas library:


# import pandas to your current Python environment
import pandas as pd
# convert the dictionary to a pandas dataframe
df = pd.DataFrame(studentsnumbers)
# print the table
print(df)

Bar Chart
A bar chart is used to represent categorical data with rectangular bars, where the length of each
bar corresponds to the count or frequency or percentage of a category.
While we can use any categorical variable for this visualization, let’s use the gender column for
demonstration. For this, we need the count of each gender, before creating a bar chart:
# import pyplot package from matplotlib and name it as plt
import matplotlib.pyplot as plt

# use the value_counts() command to calculate the total for each gender type in that column
gender_counts = df['Gender'].value_counts()
# let’s create a figure using pyplot and reset the side to 8 by 8 (you can change these numbers to
resize it)
plt.figure(figsize=(8, 8))
# use plt i.e. pyplot to create a bar plot where gender type will be in the X axis and the total
number of that gender type in the Y axis
plt.bar(gender_counts.index, gender_counts.values)
# to change the plot title, we can use the plt.title command and inside write any title
appropriate for this-
plt.title("Gender Distribution")
# use plt.xlabel(“”) to change the label of your X axis and inside the quotation mark, write a
label
plt.xlabel("Gender")
# similarly, for Y axis label, use plt.ylabel(“”)
plt.ylabel("Count")
# to print out your figure, use plt.show()
plt.show()

Histogram
A histogram visualizes the frequency distribution of a dataset by dividing values into intervals
(bins) and displaying their counts as adjacent bars

While we can use any numerical variable for this, let’s visualize the distribution of PostTest
scores using a histogram:

# Creates a histogram of the "PostTestScore" column with 10 bins, blue bars, and 70%
transparency.
plt.hist(df.PostTestScore, bins=10, color='blue', alpha=0.7)
# Sets the title of the histogram to "Histogram of PostTest Scores."
plt.title("Histogram of PostTest Scores")
# Labels the x-axis as "Scores," representing the intervals of PostTest scores.
plt.xlabel("Scores")
# Labels the y-axis as "Frequency," indicating how many scores fall into each bin.
plt.ylabel("Frequency")
# Adds a grid to the plot for easier interpretation of the histogram.
plt.grid()
# Displays the histogram.
plt.show()

Pie Chart
A pie chart visualizes proportions of a whole using slices of a circle, where each slice represents
a category's percentage contribution to the total.

For example, to visualize the distribution of students’ years, let’s create a pie chart:
# first, we need to calculate the total number of students in each year
# use value_counts() like our previous example and save it to a variable as ‘year_counts’
year_counts = df.Year.value_counts()
# let’s create a figure using pyplot and reset the side to 8 by 8 (you can change these numbers to
resize it)
plt.figure(figsize=(8, 8))
# now, use plt.pie(), and inside, mention the name of the variable (year_counts) that you need to
plot and use the values only: .values() only gets the values for each year, write the lable of your
plot after lables = ,
# autopct='%1.1f%%': Displays the percentage value of each slice in the pie chart with one
decimal place.
# startangle=140: Rotates the pie chart to start from a 140-degree angle, improving readability
by adjusting the orientation of slices.
plt.pie(year_counts.values, labels=year_counts.index, autopct='%1.1f%%', startangle=140)
# write a title like before inside plt.title(“”)
plt.title("Year Distribution")
# Now, visualize your plot using plt.show()
plt.show()

Pareto Chart
A Pareto chart combines bars and a line to display individual values in descending order
alongside their cumulative total, highlighting the most significant factors in a dataset.
Let’s create a Pareto chart for the PreTest scores (Y axis) by students’ year (X axis). For that,
we need to do the following:

# Group the PreTestScore by Year and calculate the average score for each Year
average_scores_by_year = df.groupby('Year')['PreTestScore'].mean().sort_values(ascending=False)
# Calculate the cumulative percentage of the average scores
cum_percentage = average_scores_by_year.cumsum() / average_scores_by_year.sum() * 100
# Creates a new figure with a specified size (10x6 inches)
plt.figure(figsize=(10, 6))
# Plots the average scores by Year as a bar chart with an appropriate label
plt.bar(average_scores_by_year.index, average_scores_by_year, label="Average PreTest Scores")
# Adds a line plot showing the cumulative percentage with red color and circle markers
plt.plot(average_scores_by_year.index, cum_percentage, color="r", marker="o", label="Cumulative
Percentage")
# Sets the title of the chart
plt.title("Pareto Chart of Average PreTest Scores by Year")
# Labels the x-axis as "Year"
plt.xlabel("Year")
# Labels the y-axis as "Score / Cumulative Percentage"
plt.ylabel("Score / Cumulative Percentage")
# Displays the legend for the bar and line plots
plt.legend()
# Adds a grid to the chart for better readability
plt.grid()
# Displays the plot
plt.show()

Additional Details
• df.groupby('Year'): Groups the DataFrame rows by the unique values in the "Year"
column (e.g., "Freshman", "Sophomore", etc.). This creates groups where each "Year" is
treated as a category.
• .PreTestScore: Selects the "PreTestScore" column from each group, so operations can
be performed on this specific column.
• .mean(): Calculates the mean (average) of the "PreTestScore" values for each group
(each year).
• .sort_values(ascending=False): Sorts the resulting averages in descending order
(highest to lowest) for easier visualization or analysis.

You might also like