113 Ce 74

CE441: BIG DATA ANALYTICS
Credits and Hours:

Teaching Scheme Theory Practical Tutorial Total Credit
Hours/week 3 4 0 7
5
Marks 100 100 0 200
Pre-requisite courses:
● Linux Operating System
● Database Management System
Outline of the Course:
Sr. Title of the unit Minimum number
No. of hours
1. Big Data and Analytics 02
2. Data Collection, Sampling and Preprocessing 06
3. Predictive Analytics, Descriptive Analytics, Survival 08
Analysis
4. Introduction to Hadoop and Hadoop Architecture 08
5. HDFS, HIVE and HIVEQL, HBASE 08
6. Apache Spark and MongoDB 08
7. Big Data Applications and Visualization 05
Total hours (Theory) : 45
Total hours (Lab) : 30
Total hours : 75
Detailed Syllabus:
1. Big Data and Analytics 02 Hours 4%
Introduction to Big Data, Big Data Characteristics, Types of BigData,
Traditional Versus Big Data Approach, Technologies Availablefor Big
Data, Infrastructure for Big Data, Use of Data Analytics, Big
Data Challenges.
2. Data Collection, Sampling and Preprocessing 06 Hours 13%
Types of Data Sources Sampling, Types of Data Elements ,Visual
Data Exploration and Exploratory Statistical Analysis, Missing
Values, Outlier Detection and Treatment, Standardizing Data,
Categorization, Weights of Evidence Coding, Variable Selection,
Segmentation
3. Predictive Analytics, Descriptive Analytics & Survival Analysis 08 Hours 18%
Predictive Analytics: Target Definition, Linear Regression, Logistic
Regression, Decision Trees, Neural Networks, Support Vector
Machines, Ensemble Methods, Multiclass Classification Techniques,
Evaluating Predictive Models
Descriptive Analytics: Association Rules, Sequence Rules,
Segmentation
Survival Analysis: Survival Analysis Measurements, Kaplan Meier
Analysis, Parametric Survival Analysis, Proportional Hazards
Regression, Extensions of Survival Analysis Models, Evaluating
Survival Analysis Models
4. Introduction to Hadoop and Hadoop Architecture 08 Hours 18%
Big Data – Apache Hadoop & Hadoop EcoSystem, Moving
Data in and out of Hadoop – Understanding inputs and outputs of
MapReduce -, Data Serialization
5. HDFS, HIVE AND HIVEQL, HBASE 08 Hours 18%
HDFS-Overview, Installation and Shell, Java API; Hive Architecture
and Installation, Comparison with Traditional Database, HiveQL
Querying Data, Sorting And Aggregating, Map Reduce Scripts, Joins
& Sub queries, HBase concepts, Advanced Usage, Schema Design,
Advance Indexing, PIG, Zookeeper , how it helps in monitoring a
cluster, HBase uses Zookeeper and how to Build Applications with
Zookeeper
6. Apache Spark, MongoDB and Neo4j 08 Hours 18%
Introduction to Data Analysis with Spark, Downloading Spark and
Getting Started, Programming with RDD, Spark SQL, Spark
Streaming.
Introduction to MongoDB key features, Core Server tools, MongoDB
through the JavaScript’s Shell, Creating and Querying through
Indexes, Document-Oriented, principles of schema design,
Constructing queries on Databases, collections and Documents ,
MongoDB Query Language
7. Graph Analytics and Data Visualization 05 Hours 11%
Apache Spark GraphX: Property Graph, Graph Operator, SubGraph,
Triplet, Neo4j: Modeling data with Neo4j, Cypher Query Language:
General clauses, Read and Write clauses.
Big Data Visualization with D3.js, Kibana and Grafana
Course Outcome (COs):

At the end of the course, the students will be able to
CO1 Understand the key issues in big data management and its associated
applications in intelligent business and scientific computing
CO2 Acquire fundamental enabling techniques and scalable algorithms like Hadoop,
Map Reduce and NO SQL in big data analytics.
CO3 Interpret business models and scientific computing paradigms and apply
software tools for big data analytics. Skill development
CO4 Achieve adequate perspectives of big data analytics in various applications like
recommender systems and social media applications.
CO5 Evaluate and apply appropriate principles, techniques and theories to large-scale
data science problems using various databases with
Employability
analytics and visualizations.
Course Articulation Matrix:

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO1 2 2 1 - - - - - - - - - 1 1
CO2 1 2 3 1 3 - - - - - - - 2 -
CO3 - 1 3 3 3 - - - - - - - 2 -
CO4 1 3 3 3 1 - - - - - - - 1 1
CO5 1 2 1 2 3 - - - - - - - 2 1
Enter correlation levels 1, 2 or 3 as defined below:

1: Slight (Low) 2: Moderate (Medium) 3: Substantial (High)
If there is no correlation, put “-”
Recommended Study Material:
❖ Text book:
1. Bart Baesens , Analytics in a Big Data World: The Essential
Guide to DataScience and its Applications, ,Wiley, 2014
❖ Reference book:
1. Xyz Dirk Deroos et al., Hadoop for Dummies, Dreamtech Press, 2014.
2. Chuck Lam, Hadoop in Action, December, 2010.
3. Leskovec, Rajaraman, Ullman, Mining of Massive
Datasets, CambridgeUniversity Press.
4. I.H. Witten and E. Frank, Data Mining: Practical Machine
learning tools andtechniques.
❖ Web material:
1. https://cognitiveclass.ai/
2. https://codelabs.developers.google.com/
❖ Software & Platform:
1. R & SPSS
2. Hadoop, HBase, Hive, Pig, Spark
3. Casandra, Neo4j, NoSQL

113 Ce 74

Uploaded by

Copyright:

Available Formats

113 Ce 74

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

113 Ce 74

Uploaded by

Copyright:

Available Formats

CE441: BIG DATA ANALYTICS

Credits and Hours:

Course Outcome (COs):

Course Articulation Matrix:

Enter correlation levels 1, 2 or 3 as defined below:

You might also like