Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Data Analytics Sys

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Faculty of Commerce OU

Paper DSE 603(c) :DATA ANALYTICS


Hours Per Week: 7 (3T+4P) Credits: 5
Exam Hours: 1 ½ Marks: 50U+35P+15I

Objective: To learn the different ways of data Analysis, data streams, mining and clustering and
visualization.

UNIT-I: INTRODUCTION TO BIG DATA:


Introduction to Big Data Platform – Challenges of conventional systems – Web data – Evolution of
Analytic scalability, analytic processes and tools, Analysis vs reporting – Modern data analytic tools,
Stastical concepts: Sampling distributions, resampling, statistical inference, prediction error.

UNIT-II: DATA ANALYSIS:


Regression modeling, Multivariate analysis, Bayesian modeling, inference and Bayesian networks,
Support vector and kernel methods, Analysis of time series: linear systems analysis, nonlinear dynamics –
Rule induction – Neural networks: learning and generalization, competitive learning, principal component
analysis and neural networks; Fuzzy logic: extracting fuzzy models from data, fuzzy decision trees,
Stochastic search methods.

UNIT-III: MINING DATA STREAMS:


Introduction to Streams Concepts – Stream data model and architecture – Stream Computing, Sampling
data in a stream – Filtering streams – Counting distinct elements in a stream – Estimating moments –
Counting oneness in a window – Decaying window – Realtime Analytics Platform(RTAP) applications –
case studies – real time sentiment analysis, stock market predictions.

UNIT-IV: FREQUENT ITEMSETS AND CLUSTERING:


Mining Frequent item sets – Market based model – Apriori Algorithm – Handling large data sets in Main
memory – Limited Pass algorithm – Counting frequent itemsets in a stream – Clustering Techniques –
Hierarchical – K- Means – Clustering high dimensional data – CLIQUE and PROCLUS – Frequent
pattern based clustering methods – Clustering in non-euclidean space – Clustering for streams and
Parallelism.

UNIT-V: FRAMEWORKS AND VISUALIZATION:


MapReduce – Hadoop, Hive, MapR – Sharding – NoSQL Databases – S3 – Hadoop Distributed file
systems – Visualizations – Visual data analysis techniques, interaction techniques; Systems and
applications:

SUGGESTED READINGS:

1) Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer, 2007.


2) AnandRajaraman and Jeffrey David Ullman, Mining of Massive Datasets,Cambridge University
Press, 2012.
3) Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with
advanced analystics, John Wiley & sons, 2012.
4) Glenn J. Myatt, Making Sense of Data, John Wiley & Sons, 2007 Pete Warden, Big Data Glossary,
OReilly, 2011.
5) Jiawei Han, MichelineKamber “Data Mining Concepts and Techniques”, Second Edition, Elsevier,
Reprinted 2008.

43

You might also like