6CS030 Big Data Coursework - Part 1 Worksheet One - 5% Hand-Out: Week 2. Demo: Week 3 Workshop
6CS030 Big Data Coursework - Part 1 Worksheet One - 5% Hand-Out: Week 2. Demo: Week 3 Workshop
6CS030 Big Data Coursework - Part 1 Worksheet One - 5% Hand-Out: Week 2. Demo: Week 3 Workshop
Coursework – Part 1
Worksheet One – 5%
Hand-out: Week 2. Demo: Week 3 Workshop
Group: [C3G1]
Remainder value- 1
b) Convert the worksheet so that it is in a suitable format for a relational table and
import into Oracle.
Choose appropriate names for the table and column names.
Document what changes you have made.
Oracle expects the first row to contain the column heading so since this file contains
some unnecessary line, these line are removed.
And the bottom line now looks like this after removing the unnecessary data and column
total:
Now since the data is in appropriate format now pivot table is created using the pivot
Table Wizard.
After this the new worksheet will be presented and then the row and column buttons are
unselected which helps to reduce the data just one value:
The column and the table name is changed before importing it into Oracle:
Row: Local_Authority
Column: Q_Year
Value.
Since this data has been put into the new worksheet known as sheet it is renamed into
Qualification Sheet.
Missing Values
The data in the spreadsheet has some issues in term of missing values. Before editing
the information the copy of the worksheet is created. The worksheet contained some
missing values.
The data containing the some missing value was solved using AVERAGE and MEDIAN
where as the data sheet that contained no value was completely removed.
After the process was completed it was researched if the sheet contains some null
value or not:
To visualise the data the graph was created:
The chart above shows the data fluctuates and there is data missing in the year 2010
and 2011 so the solution is to replace the hash values with the average.
Since the data was all cleaned it is imported to oracle.
OLAP system has helped a lot in organizing the huge amount of information in
the convenient form for the end user. Its suits all the user form the small and the
medium to even large corporate groups.
e) Name one disadvantage to using this approach. For future reference include a
brief explanation of why you think this is a disadvantage
Despite of advantage of using this tool, online analytical processing like every
other technology has disadvantage. The major disadvantage of this system is
limitation.
Potential Risk
Due to lacking of the computation and low interactive analysis ability the OLAP
tool have a huge potential risk. The implementation also hardly relies on the IT.
The poor computation ability of this system results in the failure to submit the
data of huge amount and sometime may bring difficulty in term of decision
making. Not being able to give the reference or solve complex problem may
result to the great loss. The potential risk may lead the OLAP project failure. This
possibility of risk leads to the point that there is difficulty in providing valuable link
to the decision maker even though it depends on the system type and the OLAP
software.
The process of data cleaning helps to avoid the dangerous problem but data
cleaning itself sometime can be more dangerous. There is no any obvious
pattern to the missing data so it is not simple matter to change these data.
Making the statistical analysis on this data is difficult since the numerical data will
be loaded as varchar while importing it on Oracle.