113 Ce 74
113 Ce 74
113 Ce 74
Hours/week 3 4 0 7
5
Marks 100 100 0 200
Pre-requisite courses:
● Linux Operating System
● Database Management System
Outline of the Course:
Sr. Title of the unit Minimum number
No. of hours
1. Big Data and Analytics 02
2. Data Collection, Sampling and Preprocessing 06
3. Predictive Analytics, Descriptive Analytics, Survival 08
Analysis
4. Introduction to Hadoop and Hadoop Architecture 08
5. HDFS, HIVE and HIVEQL, HBASE 08
6. Apache Spark and MongoDB 08
7. Big Data Applications and Visualization 05
Total hours (Theory) : 45
Total hours (Lab) : 30
Total hours : 75
Detailed Syllabus:
1. Big Data and Analytics 02 Hours 4%
Introduction to Big Data, Big Data Characteristics, Types of BigData,
Traditional Versus Big Data Approach, Technologies Availablefor Big
Data, Infrastructure for Big Data, Use of Data Analytics, Big
Data Challenges.
2. Data Collection, Sampling and Preprocessing 06 Hours 13%
Types of Data Sources Sampling, Types of Data Elements ,Visual
Data Exploration and Exploratory Statistical Analysis, Missing
Values, Outlier Detection and Treatment, Standardizing Data,
Categorization, Weights of Evidence Coding, Variable Selection,
Segmentation
3. Predictive Analytics, Descriptive Analytics & Survival Analysis 08 Hours 18%
Predictive Analytics: Target Definition, Linear Regression, Logistic
Regression, Decision Trees, Neural Networks, Support Vector
Machines, Ensemble Methods, Multiclass Classification Techniques,
Evaluating Predictive Models
Descriptive Analytics: Association Rules, Sequence Rules,
Segmentation
Survival Analysis: Survival Analysis Measurements, Kaplan Meier
Analysis, Parametric Survival Analysis, Proportional Hazards
Regression, Extensions of Survival Analysis Models, Evaluating
Survival Analysis Models
4. Introduction to Hadoop and Hadoop Architecture 08 Hours 18%
Big Data – Apache Hadoop & Hadoop EcoSystem, Moving
Data in and out of Hadoop – Understanding inputs and outputs of
MapReduce -, Data Serialization
5. HDFS, HIVE AND HIVEQL, HBASE 08 Hours 18%
HDFS-Overview, Installation and Shell, Java API; Hive Architecture
and Installation, Comparison with Traditional Database, HiveQL
Querying Data, Sorting And Aggregating, Map Reduce Scripts, Joins
& Sub queries, HBase concepts, Advanced Usage, Schema Design,
Advance Indexing, PIG, Zookeeper , how it helps in monitoring a
cluster, HBase uses Zookeeper and how to Build Applications with
Zookeeper
6. Apache Spark, MongoDB and Neo4j 08 Hours 18%
Introduction to Data Analysis with Spark, Downloading Spark and
Getting Started, Programming with RDD, Spark SQL, Spark
Streaming.
Introduction to MongoDB key features, Core Server tools, MongoDB
through the JavaScript’s Shell, Creating and Querying through
Indexes, Document-Oriented, principles of schema design,
Constructing queries on Databases, collections and Documents ,
MongoDB Query Language
7. Graph Analytics and Data Visualization 05 Hours 11%
Apache Spark GraphX: Property Graph, Graph Operator, SubGraph,
Triplet, Neo4j: Modeling data with Neo4j, Cypher Query Language:
General clauses, Read and Write clauses.
Big Data Visualization with D3.js, Kibana and Grafana
CO4 Achieve adequate perspectives of big data analytics in various applications like
recommender systems and social media applications.
CO5 Evaluate and apply appropriate principles, techniques and theories to large-scale
data science problems using various databases with
Employability
analytics and visualizations.