6CS030 Big Data 2019/0 Portfolio - Part 1: Worksheet Three - 5% Hand-Out: Week 9. Demo: Week 10 Workshop
6CS030 Big Data 2019/0 Portfolio - Part 1: Worksheet Three - 5% Hand-Out: Week 9. Demo: Week 10 Workshop
6CS030 Big Data 2019/0 Portfolio - Part 1: Worksheet Three - 5% Hand-Out: Week 9. Demo: Week 10 Workshop
2019/0
Portfolio – Part 1
Worksheet Three – 5%
Hand-out: Week 9. Demo: Week 10 Workshop
1. This worksheet uses three CSV exports generated from the Employment Rate &
Qualifications Profile of Adults spreadsheet seen in Worksheet One.
They have undergone some cleaning to remove non-numeric fields in any fields containing
figures. There is also no header row in the first record.
The files are available on hpd-srv.wlv.ac.uk in the /home/6cs030/Worksheet3
directory.
An updated version of the Population.java found in Hadoop Workbook 2 (Week 9) can
be found in the /home/6cs030/Worksheet3 directory. This has been amended to check
if the figures found are numbers or floats.
You need to analyse just one of the CSV datasets.
First take your student number and divide it by 3. Use the remainder value (modulus) to
pick one of the following worksheets:
Remainder Value CSV Dataset to use Java Class Name
0 Employment_Rate EmpRate
1 Degree-Level_Quals DegreeQuals
2 No_Quals NoQuals
For example, if your student number is 1712345, 1712345/3= 2 so you would use the
kermode.json dataset. See the Remainder spreadsheet if you are not sure how to do this.
2. Examine your dataset and carry out the following tasks:
Task no Task
Note this is an individual assessment. Any group answers will be classed as plagiarism.
For this exercise you can either use the Mongo Shell or Python Notebook to carry out the
commands.
Upload
Demonstration
During the demonstration you will be asked to show what you have done for one of the above
tasks.