Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Python Case Study

Study please python case study so that you could learn about pandas

Uploaded by

sophiekumar78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Python Case Study

Study please python case study so that you could learn about pandas

Uploaded by

sophiekumar78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

ROHINI COLLEGE OF ENGINEERING AND TECHNOLOGY

(AUTONOMOUS)
Accredited by NAAC with A+ Grade
DEPARTMENT OF COMPUTER APPLICATIONS
CASE STUDY
Course Code & Name: 24CA106 Python Programming

Highest Mapping of
Sl.No Assignment Question CO
Cognitive level PO/PSO

Case Study: Utilize Pandas library in


PO1, PO2, PO3,
1. python to explore and analyze the data K4 CO4
in a dataset PO4, PO8

Submitted By:
Name : MENEESH M
Class : I-MCA
Register Number :
24MCA010 Assignment Date :
14/10/2024 Submission Date :
14-10-2024

EVALUATION SHEET

Marks
Sl .No Criteria Max Marks
Obtained
The research work and content about the
1. 10
knowledge of the topic
The manner of presentation of the study and
2. 6
the writing style.
3. Prompt Submission 4

TOTAL

Signature of the course coordinator


Student Academic Performance Analysis Using Pandas
Objective
The goal of this case study is to analyze a dataset related to student academic performance and extract
meaningful insights using the Python Pandas library. Through this case study, we will demonstrate
how Pandas can be utilized to clean, analyze, and visualize student performance data.
Dataset
For this case study, assume we are working with a dataset that includes information about students'
performance in various subjects along with demographic and socio-economic factors. The dataset
contains the following columns:
1. Student_ID: Unique identifier for each student
2. Gender: Male or Female
3. Age: Age of the student
4. Parental_Education_Level: Parent's education level
5. Study_Hours: Average study hours per day
6. Attendance: Attendance percentage
7. Math_Score: Final exam score in Math
8. Reading_Score: Final exam score in Reading
9. Writing_Score: Final exam score in Writing
1. Student_ID: Unique identifier for each student
 Type: Integer
 Description: A unique number assigned to each student for identification. This column
ensures that every student has a distinct record in the dataset. It plays a crucial role in data
integrity and is not used for analysis.
 Example: 101, 102, 103
2. Gender: Gender of the student
 Type: Categorical (Male, Female)
 Description: Indicates the gender of each student. This data allows us to analyze
performance differences based on gender and explore whether there are any trends or biases
in academic outcomes.
 Example: Male, Female
Relevance: Gender-based analysis can help understand if one gender consistently outperforms the
other, which can lead to insights about societal, cultural, or educational influences.
3. Age: Age of the student
 Type: Integer
 Description: The age of the student, typically in years. This variable can be useful to see
if older or younger students have different performance trends, such as academic maturity
or learning development.
 Example: 15, 16, 17
Relevance: Age might correlate with performance, as older students might have more academic
experience, or younger students may show better adaptation in certain subjects.
4. Parental_Education_Level: Parent's education level
 Type: Categorical (e.g., High School, Bachelor's, Master's, PhD)
 Description: Represents the highest level of education achieved by a student's parent(s). This
column is key for socio-economic analysis, as parental education is often linked to a
student’s access to academic resources and support.
 Example: High School, Bachelor's, Master's
Relevance: Parental education can be a significant factor in student academic success. Students from
highly educated families might have more academic support at home, potentially leading to better
performance.
5. Study_Hours: Average study hours per day
 Type: Float (hours per day)
 Description: Indicates the average amount of time (in hours) that the student dedicates to
studying each day outside of school. This column helps in analyzing whether the amount
of study time directly impacts performance in various subjects.
 Example: 2.5, 3.0, 4.5
Relevance: A higher number of study hours might correlate with better scores, but there could be
diminishing returns or negative effects like burnout if the study time is excessive.
6. Attendance: Attendance percentage
 Type: Float (percentage)
 Description: Percentage of classes attended by the student throughout the academic
year. Regular attendance is often linked with better academic outcomes as students are
more engaged with the course content.
 Example: 85.0, 92.5, 98.0
Relevance: Attendance is a key factor in academic performance. Students with high attendance are
more likely to succeed, while those with low attendance might struggle to keep up with the curriculum.
7. Math_Score: Final exam score in Math
 Type: Float (score between 0 and 100)
 Description: This column represents the student's score in their final Math exam,
typically ranging from 0 to 100. This serves as an indicator of their performance in
mathematical subjects.
 Example: 75.0, 88.5, 95.0
Relevance: Math is often seen as a core subject, and the performance in this area can reflect the
student’s analytical and problem-solving skills. Analyzing trends in math scores can reveal patterns
based on gender, study hours, or other socio-economic factors.
8. Reading_Score: Final exam score in Reading
 Type: Float (score between 0 and 100)
 Description: This column contains the final exam score of students in reading
comprehension or related language skills. It is a measure of their understanding and ability to
interpret written content.
 Example: 80.0, 90.0, 85.0
Relevance: Reading skills are fundamental to overall academic success. This data helps analyze
whether factors like parental education, gender, or study habits affect a student’s reading abilities.
9. Writing_Score: Final exam score in Writing
 Type: Float (score between 0 and 100)
 Description: This column reflects the student’s performance in their writing exam,
measuring their ability to express ideas and arguments clearly and effectively in writing.
 Example: 78.0, 87.5, 91.0
Relevance: Like reading, writing skills are crucial for communication and success in many academic
areas. Analyzing writing scores can help explore trends related to literacy and overall academic
performance.

Steps to Perform the Analysis


1. Importing Required Libraries
python
Copy code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
2. Loading the Dataset
python
Copy code
# Assuming the dataset is a CSV file
df = pd.read_csv("student_performance.csv")
3. Understanding the Data
python
# Preview the first few rows of the dataset
df.head()
# Get an overview of the dataset
df.info()
# Summary statistics
df.describe()
4. Data Cleaning
Before analysis, it's essential to clean the dataset. Let's handle missing data, correct data types, and
remove outliers.
python
# Checking for missing values
df.isnull().sum()
# Drop rows with missing data or fill them with appropriate values
df['Study_Hours'].fillna(df['Study_Hours'].mean(), inplace=True)
# Handle categorical data for analysis
df['Gender'] = df['Gender'].astype('category')
df['Parental_Education_Level'] = df['Parental_Education_Level'].astype('category')
5. Data Exploration
5.1. Gender Distribution of Students
python
df['Gender'].value_counts().plot(kind='bar', title="Gender Distribution")
plt.show()
5.2. Analyzing the Average Scores in Each Subject
python
# Mean scores in Math, Reading, and Writing
df[['Math_Score', 'Reading_Score', 'Writing_Score']].mean().plot(kind='bar', title="Average Scores by
Subject")
plt.show()
5.3. Correlation Between Study Hours and Scores
python
# Checking correlation between study hours and scores
df[['Study_Hours', 'Math_Score', 'Reading_Score', 'Writing_Score']].corr()
# Scatter plot of Study Hours vs Math Score
sns.scatterplot(data=df, x='Study_Hours', y='Math_Score')
plt.title('Study Hours vs Math Score')
plt.show()
5.4. Impact of Parental Education on Student Performance
python
# Grouping by Parental Education Level and analyzing average scores
parental_education_scores = df.groupby('Parental_Education_Level')[['Math_Score', 'Reading_Score',
'Writing_Score']].mean()
parental_education_scores.plot(kind='bar', title="Average Scores by Parental Education Level")
plt.show()
6. Attendance vs Academic Performance
python
Copy code
# Scatter plot of Attendance vs Math Score
sns.scatterplot(data=df, x='Attendance', y='Math_Score')
plt.title('Attendance vs Math Score')
plt.show()
7. Finding Insights
7.1. Gender and Academic Performance
We can check whether gender has any significant influence on academic performance by comparing
the average scores for male and female students.
python
# Grouping by Gender and calculating average scores
gender_scores = df.groupby('Gender')[['Math_Score', 'Reading_Score', 'Writing_Score']].mean()
gender_scores.plot(kind='bar', title="Average Scores by Gender")
plt.show()
7.2. Performance Based on Study Hours
Students with higher study hours may perform better, so it is important to analyze this correlation.
python
# Categorizing students based on study hours
df['Study_Category'] = pd.cut(df['Study_Hours'], bins=[0, 2, 4, 6, 8, 10], labels=["<2 hours", "2-4
hours", "4-6 hours", "6-8 hours", "8-10 hours"])

# Grouping by study categories and checking performance


study_hours_scores = df.groupby('Study_Category')[['Math_Score', 'Reading_Score',
'Writing_Score']].mean()
study_hours_scores.plot(kind='bar', title="Average Scores by Study Hours")
plt.show()
8. Conclusion
From this analysis using Pandas, we can derive several insights:
1. Gender and Performance: There may be slight differences in scores based on gender,
but they are not significant enough to generalize.
2. Study Hours and Scores: Students who study more tend to score higher in all subjects.
3. Parental Education Impact: Students whose parents have higher education levels
generally perform better, indicating a socio-economic influence.
4. Attendance and Performance: Higher attendance is positively correlated with better
performance, emphasizing the importance of regular attendance.
This case study highlights how Pandas can be an effective tool for data analysis in educational
contexts, providing meaningful insights into factors that affect student performance.

You might also like