0% found this document useful (0 votes)

100 views

Data Processing For Big Data Report

Proposed a big data solution for an investment banker that would enable them to detect fraudulent transaction in the banking sector as well as improve business profitability by investing in airports through data analytics. The report consists of the technical design and costs for the business solution.

Uploaded by

Varun Mathur

0% found this document useful (0 votes)

100 views

Data Processing For Big Data Report

Uploaded by

Varun Mathur

You are on page 1/ 24

FIT 5202

Technical Design

LAB 05 - Group 02

Name StudentID Authcate Contribution

Faik Canberk 28525450 fayd0001 0.2

Jaideep Singh Dang 30068347 jdan0014 0.2

Xiangtan 29436559 xlin0021 0.2

Varun Mathur 28954114 vmat0002 0.2

Aayush Kapoor 28980875 akap0002 0.2

Word Count (excluding tables and references): 2631

Due Date: October 29, 2018 

LAB 05 - GROUP 02 FINAL PROJECT 1

Index
Heading Page #

Business Requirement Briefs 2

Business Case 2

Design Analyses 3

Costing 6

References 10

Minutes of Meeting 11

Reflective Diaries 16

Business Requirement Briefs

In this proposal we are suggesting a suitable framework that can manage and run the required
operations of the given Investment Banking Enterprise at minimised costs with efficient results.

The proposal provides information about the advised technologies to settle the three main areas of
concern that are:

Use the streaming data to analyse and monitor active credit cards in real time to combat
possible frauds.
To have a suitable set up to store and analyse any relevant publicly available datasets in order
to extract insight from them to assist in investment decisions.
To upgrade the IBE’s computer hardware in order to accommodate required capacity to store
and execute advanced analytics to further aid business decisions.

Business Case

The proposed solutions are constructed in a way that they achieve the defined business goals in
the most efficient form with minimal costs possible. The aim of the project is to understand the
three points of interest carefully and design a strategic solution that comforts all the problems with
minimum required architecture, faster integration, possible scalability and well built security.

The proposed project keeps the data ingestion and processing for fraud detection (i.e. real-time
action requirements) and analysing of publicly available datasets (i.e. non real-time, static action)
separate from each other. This way, bringing in possible scalability and alterations in either of the
two without bothering one another. This provides the IBE with enough room to make flexible
changes to the any of the two seperate systems going in the future.

It is well understood that the customers of the IBE are a very important part of the stakeholders.
With that in mind, the risks associated with the foundations of the project have been considered.
Integrating the afore mentioned split system architecture design (real-time action and static action)

LAB 05 - GROUP 02 FINAL PROJECT 2

solution with the already existing technologies in the IBE might be an area of interest. It has to be
taken care of that there is no loss of data or operations while bringing the integration into action.
The risks of data privacy (be it company or costumer data) have been addressed by proposing the
system to be IBE owned not rented. Also when it comes to making any decision based on
investing, we cannot foresee any possible misfortunes or undesired circumstances concerning the
loss of data or loss of analytics.

Once carried out well, the whole project will bring a beneficial change in the IBE. This will help in
increasing the fraud detection by using suitable algorithms with appropriate hardware. This system
architecture will also allow the static action design to be able to deploy state of the art advanced
data analytics with open source datasets, utilising them to the fullest in order to gain insight to
guide IBE to make the correct investment decision via predictions and inferences.

Design Analyses
- IDENTIFICATION OF DATA SOURCES

As the organisation in concern is a large investment banking enterprise, it is supposedly assumed

that it holds a large database of each service it offers. All credit card transactions and details are
generally stored and it is the best advise to utilise this data to train our model to recognise any
potential threat or fraud. Once our model is trained and tested using the previous cases of fraud
and transactions, the model then can be used to monitor the streaming data to check for frauds in
real time.

For the airport data we have multiple sources to refer to which are publicly available. The project
proposes US Bureau of Statistics’ data to analyse the history and statistics of the airports and the
possible revenue models these airports follow. The data holds information like number of arrivals
and departures, number of scheduled flights, number of carriers, top destination, etc., for each US
airport for every passing month since December 2006[1].

This static data can be utilised to interpret information required to make investment decisions in
the sector.

Additional datasources that may influence business decisions could be a historical track of the US
census (https://www.census.gov/) in which very detailed demographic and social information are
open source. Linking these to US shape files could give big insight in the geospatial aspect.

- PROCESSING FRAMEWORK AND DATA INGESTION

For the purpose of this project we propose to use HADOOP(with SPARK) environment as the
underlying structure and to run the required computations. As the organisation has to deal with
importing both batch and streaming data, we consider this process separately for both. As
discussed the proposal suggests to keep the ingestion and processing of two data types separate
by using two Hadoop clusters(HDFS-A and HDFS-B).

LAB 05 - GROUP 02 FINAL PROJECT 3

(Figure shows the functioning at a macro level)
HDFS-A is designed to focus on computing heavy and streaming data analysis, while HDFS-B is
designed to be more storage centric. The idea is to not depend on rental cloud technologies as
organisation has to deal with not only upscaling in near future but also has to be concerned about
the data and operational security. As discussed earlier, the costumers are important stakeholders,
and their financial and personal information are an important risk. Thus it is important that the data
is hosted in a controlled environment that is completely influenced by the bank itself.

Therefore, HDFS-A is used to process and monitor the streaming credit card data. Generally Real
Time Fraud detection can be categorised in to 2 steps:
The first step includes using the historical data to analysis and build the machine learning
model.
The second step is to use the model in production to make predictions or decisions on live
events.

The given image shows flow for the two steps.[2]

LAB 05 - GROUP 02 FINAL PROJECT 4

In case of credit card frauds, logistic regression is a suitable approach to measure the relationship
between the probability of fraud and the deciding features such as transaction amount, type of
second party, location variability, and time gap between transactions. As the foundation being used
is Spark, it provides suitable methods to operate logistic regression using the mllib module.

Thus the supposed architecture checks the events and frauds by Spark Streaming using the Spark
Machine Learning with deployed model.
The project also suggests use of Kafka as a high-throughput distributed message system to
deliver the information to streaming. It is suitable as it can be used with some external modules in
Spark and is completely functional when it comes to the streaming of large data in a fast paced
environment.
For the batch processing of the static data, HDFS-B is used to store and process the information.

- DATA STORAGE DESIGN CONSIDERATIONS

For this project, HDFS-A and HDFS-B hold different requirements when it comes to data storage.
As HDFS-A has to deal with high-throughput data it requires storage that is physically compatible
its needs. Therefore, when talking about high speed data streaming, SSDs are suggested to settle
and avoid the concerning issues. As the enterprise is large it must face millions of transactions per
day. This requires suitably faster and compatible disk space. FlashBlades are not suggested due
exceptionally high rates as compared to SSD storages. Though, FlashBlades can surely be
integrated in the future if the need be.

On the other hand HDFS-B are not required to have high speed data storage or processing and
can be run on ordinary disks to deal with the analysing of suggested datasets.

Logically, the clusters will work as described and take care of the assigned duties whilst running on
the specified hardware. The data is supposed to be converted into Parquet form so that Spark can
use it more efficiently when turning it in for modelling and processing.

- DATA PROCESSING CONSIDERATIONS & OUTPUT PROVISIONS

Expanding into the are of processing, we first define the processing requirements of which falls
under two categories;

a) Fraud Detection on Continues Data

b) Analytics on Static Data

Fraud detection is a classic supervised classification problem. Given an attribute vector X of

previous transactions (which needs to be mined from internal data sources) labeled as a response
variable of array Y (fraud or normal). One can train an SVM to do this classification. The streamed
data points can then be assessed in a binary fashion via the trained SVM.

Static data analytics for our business case can be thought of as classic BI. However with the idea
of processing with parallelism. Given we have a vast amount of publicly available datapoint of
which we would like to run clustering, regression, classifications we need parallelism integrated
into the processing environment.

Given that HDSF-A is more powerful, streaming the data into the SVM classifier will be done in this
cluster. The data for static operations will reside in HDSF-B. This cluster can indeed handle not

LAB 05 - GROUP 02 FINAL PROJECT 5

computationally heavy tasks, but intense operations (i.e. k-folds validations operations made for
some prediction) can be transferred to the HDSF-A cluster to execute.

Given these requirements, the parallelism requires GraphX for graph related computations. Also
needed is Spark Streaming in order to do the fraud detection task, and also one major module
needed is MLlib for machine learning and advanced analytics. This module also integrates Python
and R into the system which are state of the art in data science methodologies and deep learning
(TensorFlow).

Costing
As the organisation tends to use the data for future decisions and investments, it safely assumed
that the data is to remain as a private entity. Considering the given fact and the possible situations
of upscaling in the near future, it is a better option to set up own computational and storage
environment and not rely on the commercial rental clouds. Also assuming that this data will be
entirely restricted within safe parameters under the influence of the enterprise itself. For the
foundation of the processing framework, Hadoop with Spark are a comfortable approach as they
can accommodate both static and streaming data. Also they can provide with an ability to integrate
more technologies, both limited and public licensed softwares.

Also the Hadoop’s storage costs are significantly less than any other solution and provide a more
failure tolerant storage. Any Hadoop Cluster is constructed using racks of Intel servers. Each of
this server has an internal hard disk storage. Considering that enterprise already satisfies the
requirement for sufficient storage space to accommodate more hardware. This means that about
30U (units of rack space) in one cage, or sum total of 30U in multiple cages. It also assumed that
the enterprise already uses ordinary storage for retaining long term data. This may conclude to
more than 350-400 TB of disk storage. Also the electricity costs are calculated on the basis of the
current figures maintained by the US Departments of Energy. This is understandable by the fact
that the enterprise has its roots in the US and is going to operate there. As of the figures for
October 2018 the cost of energy is seen to be around $0.0135/kWh and the rates are only going to
increase as per the current trends[4]. As the enterprise is said to be large, the services for HDFS-A
are supposed to run 24 hours a day for all 7 days. While HDFS-B is supposed to run for around 18
hours a day at 95% TDP(Thermal Design Power).

To maintain the whole architecture a Data Scientist and a Data Engineer are proposed. The
additional salaries add up to the costs by around US$132,699 and US$103,788[5].
To satisfy the connection requirements, Dual 10 Gigabit Ethernet is preferred as the enterprise is
going to deal with large Input and Output operations where each can do a sub-10min import/
duplication job on a dataset of size 300GB or more[6].

LAB 05 - GROUP 02 FINAL PROJECT 6

The option to opt for the rental cloud services is not considered keeping in mind several factors
such as security and related issues, cost comparison, and networking constraints. This keeps a
client dependent of the host(providing the cloud service) for the maintained networking speed and
cannot argue over low performance. So in case this happens, it is a major risk issue for the
streaming data analyses which can lead to low computational performance and ultimately lead to
more revenue loss for the enterprise. Also the data to be handled is very sensitive and should not
be hosted somewhere outside the premises of the enterprise. Most importantly, on site clusters
cost about same or even less than the existing major players in the rental cloud sector. These
include Microsoft Azure’s HDInsight, Alibaba Cloud E-MapReduce, Amazon Web Services’s EC2,
and Google’s Cloud Compute platform.

CLUSTER COST COMPARISON(1 Year)

Cluster Provider Instance(s) Number of USD/month Annual Cost

Instances (USD)

Azure A7 (Worker Node) 6 $1,091.50

HDInsights[7]
D13 v2 (Head 2 $4,432.56
Node)

Premium 80 cores $748.87

Surcharge

Professional $1,000.00
Support

Total Cost $75,274.4

AWS EC2[8] R4.8xlarge 5 $7,660.8 $91,929.6

Google Cloud N1-highmem-32 5 $6,819.84 $81,838.0

Compute[9]

Alibaba Cloud ecs.n2.7xlarge 10 $4,881.6 $58,579.2

EmapReduce[10]

On the other hand the proposed clusters are expected to cost just around US$70,000 for the
purchase of necessary equipment.

Finally, on looking at these costs, it is also important to register that these figures only take into
account the direct usage and do not note the data transfer to/from the cluster, which would then
likely increase the costs a bit more.

Also the costs for setting up the on-site clusters are to occur only once until any upgradation or
upscaling during which the costs for only the supporting factors(like power, networks, technical
support) will see a rise.

HDFS-B, which is concerned with handling the static data is designed to handle low complexity
calculations, or in case of heavy computational requirements for HDFS-A, it can stream data to
HDFS-A. Thus, Cluster HDFS-B is more storage centric than in terms of computational power but
can work either way if the need be. To justify these situations, HDFS-B is supported with up to
100TB of data: making it sufficient enough to handle all the data.

LAB 05 - GROUP 02 FINAL PROJECT 7

Also as Hadoop replicates a given data 3 times to maintain failure support, the data sets will
require just around 40TB of the available 300TB. Also each data-node will be running on a Xeon
e5-2630 v3 and will have access to 32GB of DDR4 2133MHz RAM. This will be enough to
compute complete 10TB of datasets in 24 hours.
Moreover, HDFS-B will utilise 10GbE to set up a high speed connection between the two clusters
in case the computation tasks are required to take place on HDFS-A. These may include any time-
sensitive or critical operations.

HDFS-A, on the other hand is to be designed to settle high-complexity computations, including

operations on high speed incoming data in the form of streams. For HDFS-A each data node is to
run on the same Xeon e5-2630 v3 as HDFS-2’s data nodes, along with 64GB of RAM and 10TB of
SSD storage. SSDs are chosen over HDDs as they can provide an additional speed. These can
provide much better performance when compared to the ordinary HDDs due to their structural
design.

As the streaming datasets can involve processing of up to 1000 datasets per minute and more,
with each dataset carrying a size of around 100s of KB, the cluster is required to be suitable fast.
Thus, it is safe to assume that around 100GB of RAM is suitable for the cluster performance.
HDFS-A will be able to hold around a 18 hours of streaming data with an access to 5TB storage.
Thus, when looked at, the 8-core Xeon CPUs are suitable to withstand high computation tasks and
are adjustable to upscaling, making them capable of sub-5-second processing.

The said model can be upscaled, upgraded and utilised as comfortable and does not require any
additional cost coverage. These are fully capable cluster designed to run the specific operations
but can be adjusted to support other functionalities

Detailed Composition of Nodes(1 Year)

Node/ Item Composition Number of Units Unit Cost Total

HDFS-A Name CPU: Xeon 2 $975.99 $1,951.98

Node E5-2640 V4
Power usage
@400W Mobo: SuperMicro 1 $329.99 $329.99
Cost per node: MBD-x10DRI-T-O
~$250/yr RAM: Kingston 2 $1,541.18 $3,082.36
Number of units: 2 ValueRam 4*32GB
DDR4-ECC

Rack: SuperMicro 1 $569.99 $569.99

CSE-825TQ-
R700LPB 2U

Drives: Kingston 2 $397.99 $795.98

KC400 1TB SSD

$6,730.3

HDFS-B Name CPU: Xeon 2 $975.99 $1,951.98

Node E5-2640 V4
Power usage
Mobo: SuperMicro 1 $329.99 $329.99
@400W
MBD-x10DRI-T-O
Cost per node:
~$250/yr

LAB 05 - GROUP 02 FINAL PROJECT 8

Detailed Composition of Nodes(1 Year)
Cost per node:
~$250/yr RAM: Kingston 2 $877.00 $1,754.00
Number of units: 2 ValueRam 4*16GB
DDR4-ECC

Rack: SuperMicro 1 $569.99 $569.99

CSE-825TQ-
R700LPB 2U

Drives: WD Gold 4 $414.99 $1,659.96

10TB HDD

$6,265.92

HDFS-A Data CPU: Xeon 1 $765.01 $765.01

Node E5-2630 V4
Power usage
Mobo: SuperMicro 1 $329.99 $329.99
@220W
MBD-x10DRI-T-O
Cost per node:
~$150/yr RAM: Crucial 1 $651.99 $651.99
Number of units: 2 4*16GB DDR4

Rack: SuperMicro 1 $569.99 $569.99

CSE-825TQ-
R700LPB 2U

Drives: Kingston 4 $397.99 $1,591.96

KC400 1TB SSD

$3,908.94

HDFS-B Data CPU: Xeon 1 $419.99 $419.99

Node E5-2620 V4
Power usage
@300W Mobo: SuperMicro 1 $329.99 $329.99
Cost per node: MBD-x10DRI-T-O
~$200/yr RAM: Crucial 1 $309.23 $309.23
Number of units: 3 2*16GB DDR4

Rack: SuperMicro 1 $569.99 $569.99

CSE-825TQ-
R700LPB 2U

Drives: WD Gold 8 $414.99 $3,319.92

10TB HDD

$3,908.94

LAB 05 - GROUP 02 FINAL PROJECT 9

Detailed Composition of Nodes(1 Year)

NAS/Raid Unit: Buffalo 1 $2,999.00 $2,999.00

Power usage TS51210RH1604 16
@300W TB Rack inc 4x4TB
Cost per node: HDD
~$200/yr
Number of units: 2 Extra HDD: WD 4 $174.99 $699.96
Red 4TB NAS
HDD

$3,698.96

Set-up Costs SSD to HDD: 10 $18.99 $189.90

Power usage ICYDOCK
@50W MB882SP 2.5” to
Cost per node: 3.5” Converter
~$25/yr
UPS: APC Smart- 4 $1,060.93 $4,243.72
UPS X SMX2000LV
4U rack

Switch: NETGEAR 2 $1,715.91 $3,431.82

ProSafe 16-Port 1
—Gb Managed
Switch

$7,854.44

GRAND TOTAL $70,764.4

References
[1] The U.S. Department of Transportation’s Bureau of Transportation Statistics (BTS)
retrieved from: https://www.bts.gov/
[2] MapR [Carol McDonald]. 2016, Real Time Credit Card Fraud Detection with Apache Spark and
Event Streaming
Retrieved from: https://mapr.com/blog/real-time-credit-card-fraud-detection-apache-spark-and-
event-streaming/
[4] Electricchoice. 2018, Electricity Rates by State
Retrieved from: https://www.electricchoice.com/electricity-prices-by-state/
[5] Indeed. 2018, Big Data Salaries in the United States
Retrieved from: https://www.indeed.com/salaries/Big-Data-Salaries
[6] Arista. 2012, Hadoop* Clusters Built on 10 Gigabit Ethernet
Retrieved: https://www.arista.com/assets/data/pdf/Whitepapers/Hadoop_WP_final.pdf
[7] Microsoft Azure
Retrieved from: https://azure.microsoft.com/en-au/pricing/calculator/?service=hdinsight
[8] AWS
Retrieved from: https://aws.amazon.com/ec2/pricing/on-demand/
[9] Google Cloud
Retrieved from: https://cloud.google.com/compute/pricing
[10] Alibaba Cloud
Retrieved from: https://www.alibabacloud.com/product/e-mapreduce?spm=a3c0i.
7911826.1160486.dproducth1.2d85737bARb2gy#pricing

LAB 05 - GROUP 02 FINAL PROJECT 10

Minutes of Meeting

Mee#ng No: 1
Mee#ng Date: 04/10/2018
Loca#on: MPA Lounge, Building H, Monash University, Caulﬁeld
A0ending: Aayush Kapoor, Jaideep Singh Dang, Xiangtan Lin, Varun Mathur, Faik Canberk
Apologies: None
Mee#ng start #me: 2:00 PM
Ma0ers arising from previous minutes: YES NO ☑
Conﬁrma#on of minutes from last mee#ng: YES NO ☑

Outcome of Mee#ng:
ISSUE DISCUSSION IN BRIEF OUTCOME ACTION: NAME &
TIMELINE

Meeting Schedule Setup meeting schedule for Meeting days will be Team
the entire project Thursday, Friday as
decided

Team Leader Selecting a team leader Jaideep Singh Dang Project leader will be
will lead the project same throughout the
assessment

Programmer Selecting a team for Proof of Xiangtan and Varun Programmers will be
Concepts will explore coding same throughout the
domain assessment and
coordinate with the
other members for
designing proof of
concepts.

Coordinator and Selecting a team coordinator Aayush will Coordinator and

Author and report author coordinate team Authors will be same
deliverable and Ezra throughout the
and Jaideep will be assessment.
taking care of the
report.

Ac#ons in brief: Team leader and various roles ﬁnalised, meeOng days selected when all the 5 members are  
available. Assessment discussed in details to know the areas that needs more focus and Ome.

Mee#ng Closed at: 04:00 PM

Next mee#ng date, #me, and loca#on: 11/10/2018, 2:00 PM at MPA Lounge, Building H, Monash
University, Caulﬁeld

LAB 05 - GROUP 02 FINAL PROJECT 11

Mee#ng No: 2
Mee#ng Date: 11/10/2018
Loca#on: MPA Lounge, Building H, Monash University, Caulﬁeld
A0ending: Aayush Kapoor, Jaideep Singh Dang, Xiangtan Lin, Varun Mathur, Faik Canberk
Apologies: None
Mee#ng start #me: 2:00 PM
Ma0ers arising from previous minutes: YES NO ☑
Conﬁrma#on of minutes from last mee#ng: YES ☑ NO

Outcome of Mee#ng:
ISSUE DISCUSSION IN BRIEF OUTCOME ACTION: NAME &
TIMELINE

Business Various aspects of Scope management Completed in meeting.

Requirements requirements were analysed techniques applied to
to ensure scope of projects is ensure proper
taken into consideration. requirements are
gathered

Business Case To ensure that business case All the IBE In progress to be
is streamlined with bank’s stakeholders must get finalised by 18th
strategic vision/direction. the best services/ October.
results available.

PoC01 Task and structure that Team further scoped In progress to be

needed to be followed for into K means model. finalised by 18th October
successful completion of it tentatively.
were discussed.

PoC02 Task and structure for Team reviewed the In progress to be

discussed for PoC02. dataset and worked finalised by 18th October
on the code. tentatively.

Ac#ons in brief: Majority of Ome spent on understanding the requirements and proposing opOmum
soluOon for the given situaOon with investment bank. PoC02 task was divided among team mates to ensure
everyone parOcipated to ensure deliverables are completed by next meeOng.

Mee#ng Closed at: 04:00 PM

Next mee#ng date, #me, and loca#on: 18/10/2018, 2:00 PM at MPA Lounge, Building H, Monash
University, Caulﬁeld

LAB 05 - GROUP 02 FINAL PROJECT 12

Mee#ng No: 3
Mee#ng Date: 18/10/2018
Loca#on: MPA Lounge, Building H, Monash University, Caulﬁeld
A0ending: Aayush Kapoor, Jaideep Singh Dang, Xiangtan Lin, Varun Mathur, Faik Canberk
Apologies: None
Mee#ng start #me: 2:00 PM
Ma0ers arising from previous minutes: YES NO ☑
Conﬁrma#on of minutes from last mee#ng: YES ☑ NO

Outcome of Mee#ng:
ISSUE DISCUSSION IN BRIEF OUTCOME ACTION: NAME &
TIMELINE

PoC01 The structure and content of PoC01 almost In-progress, to be

reporting finalised, a dry pre- completed and completed by 22nd
run was performed to check Xiangton to prepare October 2018 tentatively.
code implementation. the finalised version
and report.

PoC02 Worked on content and Varun to collate the In-progress, to be

structure of report to ensure divided work for this completed by 22nd
proper understanding of the task and produce October 2018 tentatively.
functionality of the code by finalised version.
tutor/faculty.

Design Analysis Identification of data sources Team will work on In-progress, to be

to be completed by Ezra, individual tasks and completed by 22nd
Framework and data collate and review the October 2018 tentatively.
ingestion to be completed by final work.
Jaideep and Data storage to
be handled by Aayush.

Business Case Finalised the business Added to the final Completed.

proposal for IBE. report.

Ac#ons in brief: Minor work on PoC01 and PoC02 leY, Xiangtan and Varun will work on the remaining part
and complete it by next meeOng. Business requirements, business case completed.

Mee#ng Closed at: 04:00 PM

Next mee#ng date, #me, and loca#on: 22/10/2018, 2:00 PM at MPA Lounge, Building H, Monash
University, Caulﬁeld

LAB 05 - GROUP 02 FINAL PROJECT 13

Mee#ng No: 4
Mee#ng Date: 22/10/2018
Loca#on: MPA Lounge, Building H, Monash University, Caulﬁeld
A0ending: Aayush Kapoor, Jaideep Singh Dang, Xiangtan Lin, Varun Mathur, Faik Canberk
Apologies: None
Mee#ng start #me: 2:00 PM
Ma0ers arising from previous minutes: YES NO ☑
Conﬁrma#on of minutes from last mee#ng: YES ☑ NO

Outcome of Mee#ng:
ISSUE DISCUSSION IN BRIEF OUTCOME ACTION: NAME &
TIMELINE

PoC01 Completed and draft report Report created, In-Progress, to be

created for the same. formatting need to be completed by 24th
updated. October.

PoC02 Completed but images and Needs minor changes. In-Progress, to be

process still need to be completed by 24th
updated in report. October.

Business Completed and finalised. Added to final report. Completed.

Requirements

Business Case Completed and finalised. Added to final report. Completed.

Design Analysis Completed and finalised. Added to final report. Completed.

Cost Management Jaideep and Ezra validate all Proposed components In-progress, to be
the researched cost options shown in a tabular completed by 24th
available for bank format along with the October.
prices.

Ac#ons in brief:Majority of Ome spend in compleOng cost management, the whole team worked on
content creaOon and researching different viable opOons available in current market scenario and would
suit our client/investment bank the best given scenario. Team decided to meet on 24th October 2018 again
to finalise the content and make final changes in format of report.

Mee#ng Closed at: 06:00 PM

Next mee#ng date, #me, and loca#on: 24/10/2018, 2:00 PM at MPA Lounge, Building H, Monash
University, Caulﬁeld

LAB 05 - GROUP 02 FINAL PROJECT 14

Mee#ng No: 5
Mee#ng Date: 24/10/2018
Loca#on: MPA Lounge, Building H, Monash University, Caulﬁeld
A0ending: Aayush Kapoor, Jaideep Singh Dang, Xiangtan Lin, Varun Mathur, Faik Canberk
Apologies: None
Mee#ng start #me: 2:00 PM
Ma0ers arising from previous minutes: YES NO ☑
Conﬁrma#on of minutes from last mee#ng: YES ☑ NO

Outcome of Mee#ng:
ISSUE DISCUSSION IN BRIEF OUTCOME ACTION: NAME &
TIMELINE

PoC01 Finalised and formatted in Completed Completed

report.

PoC02 Images and process chart Completed Completed

added.

Cost Management Finalised tabular format and Added to the main Completed
optimum cost options shown. report

Reflective Diaries Reflective Diaries handed Added to the main Completed

over by each member to the report.
leader.

Ac#ons in brief: Finalized cosOng opOmum cost opOon available in market and forma^ed the report for
deliery.

Mee#ng Closed at: 06:00 PM

Next mee#ng date, #me, and loca#on: Only electric mode of communicaOon if any further changes
required in the report or content.

LAB 05 - GROUP 02 FINAL PROJECT 15

Reflective Diaries
Diary One
• Full name (as it appears in Moodle)
o Jaideep Singh Dang

• Your role within the group (e.g. coordinator, programmer, designer,

author)
o Leader, Coordinator, Author, Designer

• The activities/responsibilities associated with that role

o Reviewing and Understanding the Project requirements
o Deciding the distribution of project work based on inputs.
o Managing the team meetings and coordinating with each member on their
allocated work.
o Reviewing each member’s work and confirming its suitability with the project
requirements.
o Suggesting design approach and planning build up.

• Your contribution to the group

o Providing with a basic design approach
o Finalising the suggested frameworks and design algorithms.
o Researching all the possible hardware components for the proposed framework
and analysing the costs and expenses.
o Test Proof of Concepts and help with design, storage and ingestion proposals.

• What you feel that you have learnt from this group work?
o The First thing I learnt was to manage a team and bringing all of them together on
the final decisions. Also It helped me learn task allocations and working as a team.
Every member thinks in a different and an out of the box manner and a leader is
responsible to get the most out of these thoughts and turn it into something
dependable.

o Secondly, I learnt a lot about hardwares when I began researching on the possible
component options for the proposed hardware framework. Also the project not
only helped me solidify my understanding on the basic concepts taught in our
lectures and through our lab assessments but also helped me learn something out
of the curriculum, that was how an on-site storage and processing framework is
generally built and maintained.

• How have you learnt? (i.e. learning techniques)

o Researching about the different possible components that can be used to build an
on-site Hadoop framework for a company using big data at its core.
o Thinking in line with every team member as an individual and as a whole team
helped me learn better ways to solve the problem statement.

LAB 05 - GROUP 02 FINAL PROJECT 16

o Learning different possible ML algorithms and how they can be implemented for. a
particular problem in Spark.

• What went well about this project?

o Coming up with a framework with minimised costs and maximised results and
performance.
o Learning and physically applying the concepts made it easier to understand them.

• What went wrong?

o It was made sure that everybody attends the meetings and provides useful inputs.
It was difficult at moments but nothing went wrong as far as managing the team is
concerned and everybody took equal responsibility to take this project forward.

• Issues? Steps taken to resolve these?

o The team faced some issues on understanding the project at a micro level and a lot
of discussion went into clarifying the real purpose and requirements of the project.
But thinking together and brainstorming in similar direction helped us understand
the problem and also with inputs from tutors, we finally could decide on the
solutions. Also making sure that every member can give some time to the project
was a difficult task given their jobs and other assessments. But working together
and coming to a common meeting time that was comfortable for all helped resolve
it.

• Your overall conclusion about the project. How would you do it, if asked to
do it again?
o In my view, every team member has given their 100% to carry this project forward
and complete it. If the project was to be done again, I would not change anything as
the imperfections are made into perfections when we work as team, and this is
what happened. Helping each other and coming to a common conclusion that all
can agree on makes any good project and we made sure that we use every
members input to decide on out solution. I feel it had a perfect blend of business
case and programming to build a wholesome learning experience.

Diary Two

• Full name (as it appears in Moodle)

o Faik Canberk AYDIN

• Your role within the group (e.g. coordinator, programmer, designer,

author)
o Designer, Author

LAB 05 - GROUP 02 FINAL PROJECT 17

• The activities/responsibilities associated with that role o Identifying data
sources
o Understanding and applying past knowledge of ML on the tasks at hand
o Considering and identifying hardware and cluster necessities
o Styling, editing and formatting the report

• Your contribution to the group

o Providing ML/Advanced Analytics Methodologies, Software and design
o Report Formatting / Editing
o Data source research

• What you feel that you have learnt from this group work?
o Getting a chance to apply the concepts learned in class and labs (especially labs), to
the project really made me understand what these concepts practically do. Its ok to
understand and memorise concepts, but this work especially in a group setting
(invoked discussion) cemented what these things are and how they harmonise
with each other to solve potential solve real world problems.

• How have you learnt? (i.e. learning techniques)

o Re discussing with the group, upon initially looking over the problems on my own
sparked constructive decision altering for me. Thinking about concepts
collaboratively gave angles not seen by me prior
o Research on data processing in a parallel fashion (ML models I hadn’t used in
parallel before) gave me better understanding on how complex models run on
Spark.

• What went well about this project?

o The work flow was very smooth and not very problematic
o Group was very nice and constructive also easy to communicate with
o As mentioned earlier, the application of concepts to a potential real-life situation
created a bridge between the concepts and how they physically work.

• What went wrong?

o Not much went wrong. Personally, I had issues with time with other things.

• Issues? Steps taken to resolve these?

o Mentioned prior, I had issues with time. I do research and work part time so I had
trouble balancing things, but communicating this to my team, they were flexible
with meeting times, which was nice for me.

• Your overall conclusion about the project. How would you do it, if asked to
do it again?
o I think it was very useful and a very pivotal part of the unit in terms of connecting
theory and applied. My only feedback would be, it could’ve been spread between
week 6 and 12. I felt like this was much more concentrated and I think a project

LAB 05 - GROUP 02 FINAL PROJECT 18

idea like this should take a much more ‘larger’ chunk of the unit. As I feel like I
learned a lot just by the project at this state, if it were a more comprehensive
project, I think it could’ve thought me a lot.
o I think we did our absolute best. But if we could do it again, I would personally
invest more time on funnelling the research core to the pith of the business cases,
because I spent time on looking over irrelevant details that really wasn’t the core
reason why the assessment was given (i.e. getting stuck on minute details of the
project when I was personally low on time).

Diary Three

• Full name (as it appears in Moodle)

o Varun Mathur

• Your role within the group (e.g. coordinator, programmer, designer,

author)
o Programmer

• The activities/responsibilities associated with that role o Identifying data

sources
o Identifying the coding requirements.
o Creation of the framework to get the required output.
o Identifying, understanding and using the necessary API’s for the relevant
programming task.
o Making code re-usable by using best industry practices.

• Your contribution to the group

o Provision of a good coding framework.
o Discussing the best approach for the technique with the team and then explaining
the different programming techniques used for the task.

• What you feel that you have learnt from this group work?
o I feel it is important to work in a group when the time and the resources are
limited. A good teamwork is the key to succeeding in design/ programming
exercises. Many different ideas could be generated when everyone sat together, and
this really helped and motivated me to contribute well in the project.
o Solving the tutorial exercises over the different weeks has really helped me to
develop my programming skills. Every design has its own pros and cons and
working together as a group can really help to discover them.
o All the group members were able to challenge each other’s opinions and
preconceptions about what can and cannot work for the project.

• How have you learnt? (i.e. learning techniques)

LAB 05 - GROUP 02 FINAL PROJECT 19

o Various requirements of the framework for the proposal were jot down by
researching about the variety and velocity of data in different sectors.
o Regular discussion among the team helped in giving better insights of the problem
statement.
o The tutors helped us a lot in guiding us throughout the project and showing us the
correct path for our research.

• What went well about this project?

o The flow of the work was quite comfortable and smooth.
o Communication with the group was quite streamlined and easy.
o Coming up with different ideas and opinions with the team gave us better
understanding of the requirements of the project.
o Understanding the concepts learnt in the tutorial and applying the practical
knowledge for the project really helped all of us to improve upon our coding
skills/ creative thinking abilities.

• What went wrong?

o Not many issues. Since time was restricted and members have different units and
work, so sometimes bringing all group members together for a meeting seemed
difficult. Rest everything went well.

• Issues? Steps taken to resolve these?

o Since time was the only issue, all group members were understanding, and
everyone kept each other updated about the meetings and other updates

• Your overall conclusion about the project. How would you do it, if asked to
do it again?
o Our team put in a good effort to come up with the deliverables as and when the
project progressed.
o Personally, I feel if the project was given a bit earlier, then more research could have
been done to come up with better solutions and better designs. But other than that,
our team have given their 100% for the success of this project.

Diary Four
• Full name (as it appears in Moodle)
o Aayush Kapoor

• Your role within the group (e.g. coordinator, programmer, designer,

author)
o Coordinator and author

• The activities/responsibilities associated with that role o Identifying data

sources

LAB 05 - GROUP 02 FINAL PROJECT 20

o Maintain central calendars. By doing so, promote effective use of time and keep
everyone informed on what is going on daily and in the future.
o Look at what needs to be done and assign tasks to team members
o Offer routine instructions to team members on job responsibilities
o Understand the market trends in terms of costing, vendors and technology in use
o Understand the business requirements and creating business plan

• Your contribution to the group

o Providing an environment of professionalism ensuring everyone is aware of their
roles and responsibilities
o Happy to indulge in every task of the team and keeping healthy environment
among the group
o Content writing, formatting report and sharing a new point of view to ensure we
cover every situation while offering services to investment bank

• What you feel that you have learnt from this group work?
o The most important thing that I found was the benefit of working in a team. I
learned that If you are aiming for success, then you should use teamwork as the
key when time and resources are restricted. As everyone had their own view,
different ideas could be exchanged, and we were able to work on those ideas.
Working in team made me more involved on assessment as every team member
was very helpful.
o I explored what I learned in labs and lectures while working on the assessment, it
was beyond the classroom peripheral experience to know data engineers would
have to work and research if they are planning to implement big data technologies
for client.

• How have you learnt? (i.e. learning techniques)

o Brainstorming among the team helped in giving better insights about the problem
statement
o Understood the big data technologies like Spark, Hadoop etc. in a whole new way
while working on report. Learned the big vendors in market and the costing of
hardware and software.

• What went well about this project?

o Every teammate was more than happy to help and worked tirelessly to achieve its
completion
o The work was very smooth as everyone participated in the very early stage of
assignment release

• What went wrong?

o Nothing went wrong the team cooperated very well respecting the time of each
team member.

LAB 05 - GROUP 02 FINAL PROJECT 21

• Issues? Steps taken to resolve these?
o Conflict of opinions can either be taken as a debate or can help us understand two
different sides of the coins. The team had a lot of different opinions on how the task
should be performed, but with a healthy debate and a lot of brainstorming we were
able to come up with a common proposal which could not have happened if there
were no conflict over opinions.

• Your overall conclusion about the project. How would you do it, if asked to
do it again?
o I believe we as a team put in a lot of efforts in producing the deliverables and we
took care of all the small factors for getting the best output. I explored the role of
coordinator in more deeper sense i.e., laying out the information before time to
ensure team performed well
o I think we really did great as team. If I could do it again, I would be more than
happy to invest time and energy to the project and I would explore more technical
domain rather than business domain

Diary Five

LAB 05 - GROUP 02 FINAL PROJECT 22

Lessons learnt:
I have further learned and gained insights into the big data technologies e.g Spark,
Hadoop and MLlib in a holistic manner while working on the technical analysis of the
business specification, and have built a good understanding of the current big data
market, players, costing etc by doing research in the market players, hardware and
software costings.
By doing the PoC01, I have further sharpened up my Spark/Scala programming skills,
which enhances my capability to implement similar solution at work. The team formed a
common understanding of what we should achieve at the end of the project. When
implementing the solution, I had a clear understanding that the solution should align with
the business requirements, such as data storage and data retention policy, so that I

LAB 05 - GROUP 02 FINAL PROJECT 23

implemented the prototype to cluster streaming data as well as to save data into HDFS for
further analysis.
The project went well, each team member contributed proactively to the project and
shared progress and knowledge with each other. No real issues encountered during the
course of the project. The team collaborated very well and worked out the solutions.
Conclusion:
In conclusion, I strongly believe that this project is a success. The team has done well in
terms of communication and technical capabilities. The outcome of the project is valuable
and has potential to expand into a commercial solution. The project would benefit from
interviewing Subject Matter Experts if such opportunities present.

LAB 05 - GROUP 02 FINAL PROJECT 24

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (66)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
CTS INTERNSHIP REPORT - Mohak
50% (4)
CTS INTERNSHIP REPORT - Mohak
32 pages
Data Migration Sap MM PDF
No ratings yet
Data Migration Sap MM PDF
32 pages
Weka 9
No ratings yet
Weka 9
7 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Report Data Visualisation PDF
100% (1)
Report Data Visualisation PDF
15 pages
Report Data Visualisation PDF
100% (1)
Report Data Visualisation PDF
15 pages
Bvraju Institute of Technology, Narsapur: Code No: A46H2
No ratings yet
Bvraju Institute of Technology, Narsapur: Code No: A46H2
8 pages
Leveraging Big Data and Analytics Techniques in Managing Large Projects
No ratings yet
Leveraging Big Data and Analytics Techniques in Managing Large Projects
14 pages
IDQ Architecture 1
No ratings yet
IDQ Architecture 1
8 pages
Thesis On Cloud Computing Literature Review
100% (2)
Thesis On Cloud Computing Literature Review
8 pages
Analysys Mason Future of Big Data Jul2014
100% (1)
Analysys Mason Future of Big Data Jul2014
14 pages
Seminar_Report kiran
No ratings yet
Seminar_Report kiran
14 pages
BYTE D1-4 BigDataTechnologiesInfrastructures FINAL - Compressed
No ratings yet
BYTE D1-4 BigDataTechnologiesInfrastructures FINAL - Compressed
34 pages
MIS Final Report
No ratings yet
MIS Final Report
8 pages
Yang Fengming - 2019213704 - DraftReport
No ratings yet
Yang Fengming - 2019213704 - DraftReport
31 pages
The Study On Data Warehouse Design and Usage: Mr. Dishek Mankad, Mr. Preyash Dholakia
No ratings yet
The Study On Data Warehouse Design and Usage: Mr. Dishek Mankad, Mr. Preyash Dholakia
5 pages
FALLSEM2024-25_SWE2011_ETH_VL2024250103282_2024-08-01_Reference-Material-I
No ratings yet
FALLSEM2024-25_SWE2011_ETH_VL2024250103282_2024-08-01_Reference-Material-I
49 pages
Data Visualization With Case Study
No ratings yet
Data Visualization With Case Study
10 pages
Optimization Techniques For Data Lakes in Fintech: Enhancing Query Performance and Storage Efficiency
No ratings yet
Optimization Techniques For Data Lakes in Fintech: Enhancing Query Performance and Storage Efficiency
12 pages
Challenges in Implementation of Cloud Analytics
No ratings yet
Challenges in Implementation of Cloud Analytics
9 pages
ST-1 Solution Big Data KCS061
No ratings yet
ST-1 Solution Big Data KCS061
26 pages
Subscription Management SaaS-based System
No ratings yet
Subscription Management SaaS-based System
11 pages
Harnessing The Value of Big Data Analytics
No ratings yet
Harnessing The Value of Big Data Analytics
13 pages
Lab Manual - Student Copy - Index & Experiments CCS334_BDA
No ratings yet
Lab Manual - Student Copy - Index & Experiments CCS334_BDA
66 pages
Design and Implementation of The Web (Extract, Transform, Load) Process in Data Warehouse Application
No ratings yet
Design and Implementation of The Web (Extract, Transform, Load) Process in Data Warehouse Application
11 pages
Eis SM RTP
No ratings yet
Eis SM RTP
24 pages
Web Based Business Intelligence Tool For A Financial Organization
No ratings yet
Web Based Business Intelligence Tool For A Financial Organization
4 pages
Introduction To Predictive Analytics PDF
No ratings yet
Introduction To Predictive Analytics PDF
10 pages
MDBS Case 8
No ratings yet
MDBS Case 8
4 pages
Service Mining Based On The Knowledge and Customer Database
No ratings yet
Service Mining Based On The Knowledge and Customer Database
40 pages
Stream Processing: Instant Insight Into Data As It Flows
No ratings yet
Stream Processing: Instant Insight Into Data As It Flows
14 pages
Building Industry-Aware Analytics Solutions Using
No ratings yet
Building Industry-Aware Analytics Solutions Using
21 pages
Data Mining and Predictive Analytics Service in The Business Intelligence Microstrtagy Reports On Integreted Masses
No ratings yet
Data Mining and Predictive Analytics Service in The Business Intelligence Microstrtagy Reports On Integreted Masses
8 pages
Application of Data Warehouse and Data Mining in Construction Management
No ratings yet
Application of Data Warehouse and Data Mining in Construction Management
3 pages
Kaushlendra Sikarwar 2023 Jan
No ratings yet
Kaushlendra Sikarwar 2023 Jan
5 pages
Exploration On Big Data Oriented Data Analyzing and Processing Technology
No ratings yet
Exploration On Big Data Oriented Data Analyzing and Processing Technology
7 pages
A Case Study of Innovation of An Information Communication System and Upgrade of The Knowledge Base in Industry by ESB, Artificial Intelligence, and Big Data System Integration
No ratings yet
A Case Study of Innovation of An Information Communication System and Upgrade of The Knowledge Base in Industry by ESB, Artificial Intelligence, and Big Data System Integration
17 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
BDM Assignment-I (Saransh Upadhyaya)
No ratings yet
BDM Assignment-I (Saransh Upadhyaya)
7 pages
BIG Data Analysis Assign - Final
No ratings yet
BIG Data Analysis Assign - Final
21 pages
Busisness Process Cost Optimiza
No ratings yet
Busisness Process Cost Optimiza
10 pages
17-Predictive Analytics and Streaming Analytics-06-04-2023
No ratings yet
17-Predictive Analytics and Streaming Analytics-06-04-2023
9 pages
Iimv Badd
No ratings yet
Iimv Badd
16 pages
Bits Ms Dissertation Topics
100% (2)
Bits Ms Dissertation Topics
8 pages
Chapter 7 - Analysis and Design of Information Systems
No ratings yet
Chapter 7 - Analysis and Design of Information Systems
39 pages
Big Data Analytics Implementation in Banking Industry Case Study Cross Selling Activity in Indonesias Commercial Bank
No ratings yet
Big Data Analytics Implementation in Banking Industry Case Study Cross Selling Activity in Indonesias Commercial Bank
12 pages
breaking-through-data-architecture-gridlock-to-scale-ai
No ratings yet
breaking-through-data-architecture-gridlock-to-scale-ai
7 pages
MI0027-Business Intelligence Tools
No ratings yet
MI0027-Business Intelligence Tools
4 pages
EE8251 - Circuit Theory Laboratory
No ratings yet
EE8251 - Circuit Theory Laboratory
25 pages
Unit 1 1
No ratings yet
Unit 1 1
99 pages
Stockmarket Analysis Using Map Reduce and Py Spark
No ratings yet
Stockmarket Analysis Using Map Reduce and Py Spark
12 pages
What Is Data Architecture - A Framework For Managing Data - CIO
No ratings yet
What Is Data Architecture - A Framework For Managing Data - CIO
6 pages
An Intelligent Approach To Demand Forecasting
No ratings yet
An Intelligent Approach To Demand Forecasting
8 pages
Enhancing and Scalability in Big Data and Cloud Computing: Future Opportunities and Security
No ratings yet
Enhancing and Scalability in Big Data and Cloud Computing: Future Opportunities and Security
7 pages
Operational Business Intelligence Interview Questions
No ratings yet
Operational Business Intelligence Interview Questions
8 pages
ABERDEEN
No ratings yet
ABERDEEN
2 pages
DM Case Study
No ratings yet
DM Case Study
5 pages
Lecture 2 The data science process and tools for each step
No ratings yet
Lecture 2 The data science process and tools for each step
8 pages
SI Presentation
No ratings yet
SI Presentation
14 pages
Sol04 en
No ratings yet
Sol04 en
5 pages
Dashboard Autonomic Messaging Systems
No ratings yet
Dashboard Autonomic Messaging Systems
8 pages
Case Study NOSQL
100% (1)
Case Study NOSQL
8 pages
IE Data Science Presentation
No ratings yet
IE Data Science Presentation
28 pages
Report PSA Assessement
No ratings yet
Report PSA Assessement
21 pages
IE Data Science Presentation
No ratings yet
IE Data Science Presentation
28 pages
Data Science in Industrial Experience: Foresee A Better World
No ratings yet
Data Science in Industrial Experience: Foresee A Better World
19 pages
Report Project Management
100% (4)
Report Project Management
36 pages
Guitar Tutor Using Tone Recognition System: Project Report
No ratings yet
Guitar Tutor Using Tone Recognition System: Project Report
37 pages
Guitar Tutor Using Tone Recognition System: Project Report
No ratings yet
Guitar Tutor Using Tone Recognition System: Project Report
37 pages
Report Analysis Super Cars
100% (1)
Report Analysis Super Cars
15 pages
Case Study Smart Cities
No ratings yet
Case Study Smart Cities
10 pages
Applications of Computing and Communication Technologies: First International Conference, ICACCT 2018, Delhi, India, March 9, 2018, Revised Selected Papers Ganesh Chandra Deka - Read the ebook online or download it to own the full content
100% (3)
Applications of Computing and Communication Technologies: First International Conference, ICACCT 2018, Delhi, India, March 9, 2018, Revised Selected Papers Ganesh Chandra Deka - Read the ebook online or download it to own the full content
69 pages
Architecture and Implementation of A Scalable Sensor Data Dpem18lb7j
No ratings yet
Architecture and Implementation of A Scalable Sensor Data Dpem18lb7j
12 pages
Data Bricks Certified Associated at A Engineer Exam
No ratings yet
Data Bricks Certified Associated at A Engineer Exam
142 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
Abey_Resume_Template
No ratings yet
Abey_Resume_Template
1 page
Spark Connect Explained
No ratings yet
Spark Connect Explained
13 pages
Databricks
No ratings yet
Databricks
43 pages
Data Engineering Cookbook
100% (2)
Data Engineering Cookbook
127 pages
Databricks Spark Knowledge Base
100% (1)
Databricks Spark Knowledge Base
22 pages
Spark ML
No ratings yet
Spark ML
110 pages
Internship Presentation
No ratings yet
Internship Presentation
18 pages
Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch Adi Polak download pdf
100% (2)
Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch Adi Polak download pdf
40 pages
Big Data Analytics Tools and Technologies With Key Features
No ratings yet
Big Data Analytics Tools and Technologies With Key Features
2 pages
Beyond Databases Architectures and Structures Facing the Challenges of Data Proliferation and Growing Variety Stanisław Kozielski 2024 Scribd Download
100% (1)
Beyond Databases Architectures and Structures Facing the Challenges of Data Proliferation and Growing Variety Stanisław Kozielski 2024 Scribd Download
55 pages
ankush_kaira
No ratings yet
ankush_kaira
6 pages
Data Engineering
No ratings yet
Data Engineering
23 pages
Msds Iu Germany
No ratings yet
Msds Iu Germany
37 pages
Integration of Python With Hadoop and Spark
No ratings yet
Integration of Python With Hadoop and Spark
10 pages
Bda 22 - Merged
No ratings yet
Bda 22 - Merged
8 pages