Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
78 views

CC7182 - Programming For Data Analytics

This document outlines a coursework assignment for a Programming for Data Analytics module. The coursework is worth 100% of the total module grade and is due on May 12th, 2023. It consists of two parts involving data analysis and visualization of (1) a student performance dataset and (2) livestock data from Nepal. Students are expected to write Python code to clean, analyze, and visualize the data, and submit a technical report and code files. Plagiarism is strictly prohibited and penalties will be enforced.

Uploaded by

Kai Enezhu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

CC7182 - Programming For Data Analytics

This document outlines a coursework assignment for a Programming for Data Analytics module. The coursework is worth 100% of the total module grade and is due on May 12th, 2023. It consists of two parts involving data analysis and visualization of (1) a student performance dataset and (2) livestock data from Nepal. Students are expected to write Python code to clean, analyze, and visualize the data, and submit a technical report and code files. Plagiarism is strictly prohibited and penalties will be enforced.

Uploaded by

Kai Enezhu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1st SIT COURSEWORK QUESTION PAPER: Spring Semester 2023

Module Code: CC7182NI

Module Title: Programming for Data Analytics

Module Leader: Sukrit Shakya (Islington College)

Coursework Type: Individual

Coursework Weight: This coursework accounts for 100% of your total module
grades.

Submission Date: Week 12 (12th May, 2023)

When Coursework is Week 8


given out:

Submission Submit the following to Islington College RTE department


 Report in PDF format
 Source code
Instructions:

Warning: London Metropolitan University and Islington College


takes Plagiarism seriously. Offenders will be dealt with
sternly.

© London Metropolitan University


CC7182NI Programming for Data Analytics
Spring Semester 2023

Coursework Assignment

The coursework is an individual assessment weighted 100% of the marks for


the module. It is primarily an exercise in applying programming knowledge and
skills to data analysis tasks, demonstrating your skills for problem-solving and
critical thinking/evaluation.
The assignment consists of 2 parts. The first part involves the analysis of a
student performance dataset from two Portuguese schools. The second part
involves the analysis of livestock data of Nepal.
You are expected to write Python code and a technical report on data
understanding, preparation, exploration, analysis and visualization.

Coursework Submission

The course work is due on Week 12 (12th May, 2023). You need to submit the
following items via Google Classroom:
 Technical report as a single PDF file
 Associated Python files as a ZIP file

Please note that plagiarism is a serious academic offence, for which


penalties are severe. All suspected cases of plagiarism will be reported.
Part 1. Analysis of a Student Performance Dataset

Data Description

The data given here is of student achievements in secondary education of two


Portuguese schools namely “Gabriel Pereira” and “Mousinho da Silveira”. The
data attributes include student grades (in mathematics), demographic, social and
school related features and it was collected by using school reports and
questionnaires.

The detailed data description of the data file is given below:

Filename: student.csv

The data set contains 395 student records. Each record consists of 33 variables,
which includes information about the students.
Variable 33, G3 – final grade (numeric: 0 - 20), is the target variable.

The dataset is in CSV format with following attributes:


1. school - student's school ( "GP" - Gabriel Pereira or "MS" - Mousinho da

Silveira)

2. sex - student's sex

3. age - student's age

4. address - student's home address type ("U" - urban or "R" - rural)

5. famsize - family size ("LE3" - less or equal to 3 or "GT3" - greater than 3)

6. Pstatus - parent's cohabitation status ("T" - living together or "A" - apart)

7. Medu - mother's education


8. Fedu - father's education

9. Mjob - mother's job ("teacher", "health" care related, civil "services" (e.g.

administrative or police), "at_home" or "other")

10. Fjob - father's job ("teacher", "health" care related, civil "services" (e.g.

administrative or police), "at_home" or "other")

11. reason - reason to choose this school (close to "home", school

"reputation", "course" preference or "other")

12. guardian - student's guardian ("mother", "father" or "other")

13. traveltime - home to school travel time

14. studytime - weekly study time

15. failures - number of past class failures

16. schoolsup - extra educational support

17. famsup - family educational support

18. paid - extra paid classes within the course

19. activities - extra-curricular activities

20. nursery - attended nursery school

21. higher - wants to take higher education

22. internet - Internet access at home

23. romantic - with a romantic relationship

24. famrel - quality of family relationships


25. freetime - free time after school

26. goout - going out with friends

27. Dalc - workday alcohol consumption

28. Walc - weekend alcohol consumption

29. health - current health status

30. absences - number of school absences

31. G1 - first period grade (from 0 to 20)

32. G2 - second period grade (from 0 to 20)

33. G3 - final grade (from 0 to 20, target variable)

The tasks are as follows:

1. Data Understanding

 Understand what your data resources are and what the characteristics
of those resources are. Write down your findings including the
characteristics of the different columns in the dataset.
(4 marks)
2. Data Transformation

 Write code to transform variables according to the following guidelines:


a) school, sex, address, famsize, Pstatus, schoolsup, famsup,
paid, activities, nursery, higher, internet and romantic into
binary; 0 or 1 (create new columns without overwriting the existing
ones)
(1+1+1+1+1+1+1+1+1+1+1+1+1 = 13 marks)
b) Medu, Fedu, Mjob, Fjob, reason, guardian, traveltime,
studytime, famrel, freetime, gout, Dalc, Walc and health into
ordinal numbers based on number of cases in the data set (create
new columns without overwriting the existing ones).
(1+1+1+1+1+1+1+1+1+1+1+1+1+1 = 14 marks)
c) create a new column named age_category whose values should
be based on the values in the age column, divide the values into 3
ordinal numbers; 1 – 15 to 17, 2 – 18 to 20, 3 – 21 and over
(1 mark)
d) create a new column named passed (yes or no) whose values
should be based on the values present in the G3 column (>=8 –
yes, <8 – no)
(1 mark)
3. Initial Data Analysis

 Write code to show the summary statistics (sum, mean, median,


standard deviation, max and min) of the variables age, absences, G1,
G2 and G3.
(1+1+1+1+1 = 5 marks)
 Write code to calculate and show the correlation between the variables
absences, failures, G1, G2 and G3. Present the result using a
heatmap and interpret the results.
(2 marks)
4. Data Exploration and Visualization

 Write code to show histogram plots and boxplots to visualize the


distribution of the variables age, absences and G3. Interpret the
results and comment about the distribution of each variable.
(2 + 2 + 2 = 6 marks)
 Write code to show a bar graph of the total number of students who
passed the final term grouped according to the school that they belong
to. Use proper labels in the graph and interpret the results.
(2 marks)
 Write code to show a bar graph of the total number of students who
failed the final term grouped according to their weekly study time. Use
proper labels in the graph and interpret the results.
(2 marks)
5. Further Analysis

 For this task, you need to further explore the given dataset on your
own by using different analysis and visualization techniques and then
present the insights that you have gained. Be creative and come up
with interesting insights and draw conclusions from the data.
(20 marks)
Part 2. Analysis of Livestock Data of Nepal

Data Description

The data given here is about livestock raised across Nepal according to different
districts/regions and the commodities produced by them. The overall data is
spread across multiple files. The list of files are as follows:
1. horseasses-population-in-nepal-by-district.csv
2. milk-animals-and-milk-production-in-nepal-by-district.csv
3. net-meat-production-in-nepal-by-district.csv
4. production-of-cotton-in-nepal-by-district.csv
5. production-of-egg-in-nepal-by-district.csv
6. rabbit-population-in-nepal-by-district.csv
7. wool-production-in-nepal-by-district.csv
8. yak-nak-chauri-population-in-nepal-by-district.csv

You are required to study the data and understand its structure and properties.
Then using python you should clean and merge all the data sources and perform
EDA (Exploratory Data Analysis) on it.

The tasks are as follows:

1. Data Understanding
 Understand what your data resources are and what the characteristics
of those resources are. Write down your findings including the
characteristics of the different columns in the dataset.
(5 marks)

2. Data Merging and Cleaning


 Read all the data sources and merge them into a single data source.
Do necessary data cleaning and pre-processing.
(5 marks)
3. Exploratory Data Analysis
 Explore the data using different analysis and visualization techniques
and then present the insights that you have gained. Be creative and
come up with interesting insights
(20 marks)

The technical report should have screen shots of the code. The results achieved
and the interpretations of the results should also be included in the technical
report. Python code should include adequate comments as well.

For task 5 in part 1 and task 3 in part 2, 20 marks are allocated according to the
following categories

 creativity (5 marks)
 quality of analysis performed/interpretations of the results/presentation
of insights (10 marks)
 programming style/use of tools (5 marks)

END

You might also like