Unit 1 Assignment

The document discusses data mining concepts across four units. It includes questions related to star schema design, OLAP operations, data warehousing, association rules, classification algorithms, clustering, and privacy issues in data mining. Answers to the questions would require drawing schemas, applying algorithms, and discussing technical concepts in detail.

Uploaded by

Vishnu Karthik

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

145 views

Unit 1 Assignment

Uploaded by

Vishnu Karthik

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

U15IT601 Data Mining Assignment - 1

Unit -1

1. Illustrate Starnet Query model for an electronics data warehouse with four dimensions
such as location, customer, item, time and with suitable footprints for each dimension.
2. How many cuboids are to be generated for ‘n’ dimension with ‘l’ levels?
3. Suppose that a data warehouse for Big University consists of the following four dimensions:
student, course, semester, and instructor, and two measures count and avg grade. When at the
lowest conceptual level (e.g., for a given student, course, semester, and instructor
combination), the avg grade measure stores the actual course grade of the student. At higher
conceptual levels, avg grade stores the average grade for the given combination.
(a) Draw a snowflake schema diagram for the data warehouse.
(b) Starting with the base cuboid [student; course; semester; instructor], what specific OLAP
operations (e.g., roll-up from semester to year ) should one perform in order to list the
average grade of CS courses for each Big University student.
(c) If each dimension has five levels (including all), such as \student < major < status <
university < all",how many cuboids will this cube contain (including the base and apex
cuboids)?
4. Suppose that a data warehouse consists of the three dimensions time, doctor, and
patient, and the two measures count and charge, where charge is the fee that a doctor
charges a patient for a patient for a visit.
(a) Enumerate three classes of schemas that are popularly used for modeling data
warehouses.
(b) Draw a schema diagram for the above data warehouse using one of the schema
classes listed in (a)
(c) Starting with the base cuboid {day, doctor, patient}, what specific OLAP
operations should be performed in order to list the total fee collected by each
doctor in 2015?
5. Suppose that the following table is derived by AOI

Class Birth_Place Count

Programmer Canada 215
Others 120
DBA Canada 50
Others 80
(a) Transform the table into a cross tab showing the associated t-weight and d-
weight?
(b) Map the class programmer into a bidirectional quantitative description rule?

Unit -2

7 Marks

1. Data quality can be assessed using various issues such as accuracy, completeness and
consistency. For each of the specified issues, discuss how data quality assessment can
depend on the intended use of the data with suitable examples. Suggest other dimensions
of data quality.
2. Analyze the kinds of patterns that can be mined among different types of data.
3. The below contingency table contains information regarding preferred readings of 1500
people and gender is noted. Apply chi square test and find whether gender and preferred
reading correlated?

Unit -3

2 Marks

1. Suppose that the data for analysis includes the attribute age. The age values for the
data tuples are (in increasing order)
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25.
i. What is the mean of data for binning?
ii. What is the min-Max binning method for above data
2. Why is tree pruning useful in decision tree induction? What is a drawback of using a
separate set of tuples to evaluate pruning?
3. Why are decision tree classifiers so popular?
4. Distinguish between lazy learner.
7 Marks

1. Predict the salary of a person with 10 years of experience by applying linear regression using
least square method on given salary data.

2. Apply naïve bayes classification to classify Red Domestic SUV, by using given Car
theft training data set.
3. Compare the advantages and disadvantages of eager classification versus lazy
classification.
4. What could be a problem that association rules leads and show that with an example
how it could be tackle by correlation analysis.
5. Why is naïve Bayesian classification called “naïve”? Briefly outline the major ideas
of naïve Bayesian classification.
6. Use these methods to normalize the following group of data:
200, 300, 400, 600,1000
(a) min-max normalization by setting min = 0 and max = 1
(b) z-score normalization
7. A database has five transactions. Let min sup = 60% and min con f = 80%.

Find all frequent itemsets using Apriori and FP-growth, respectively

8. Construct the decision tree using ID3 algorithm for given data.
9. Transactional data for AllElectronics shown in above table. The data contain frequent
itemset X= {l1,l2,l5} . What are the association rules that can be generated from X?

Unit -4

2 Marks

1. Differentiate between Agglomerative hierarchical clustering method and Divisive

hierarchical clustering method.
2. Clustering has been popularly recognized as an important data mining task with
broad applications. Give one application example that takes clustering as a major
data mining function

7 Marks
1. Given the following measurements for the variable age
18; 22; 25; 42; 28; 43; 33; 35; 56; 28; standardize the variable by the following:
(a) Compute the mean absolute deviation of age.
(b) Compute the z-score for the first four measurements.
2. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8)
(a) Compute the Euclidean distance between the two objects.
(b) Compute the Manhattan distance between the two objects.
(c) Compute the Minkowski distance between the two objects, using p = 3.
3. Suppose that the data mining task is to cluster the following eight points into three
clusters. A1(3,9),A2(2,5),A3(7,5),B1(8,9),B2(6,5),B3(6,5),C1(1,2),C2(5,10). The
distance function is Euclidean distance. Initial center of each cluster is A1, B1 and
C1. Use K-Means algorithm to show only the three cluster centers after the first
iteration of algorithm
4. The data mining task is to cluster the following eight points into three clusters.
A1(2,10) ,A2(2,5) ,A3(8,4), B1(5,8), B2(7,5), B3(6,4), C1(1,2), C2(4,9).
The distance function is Euclidean distance. Initial center of each cluster is A1, B1
and C1. Use K-Means algorithm to show only the three cluster centers after the first
iteration of algorithm
5. Both k-means and k-medoids algorithms can perform effective clustering. Illustrate
the strength and weakness of k-means in comparison with the k-medoids algorithm.
Also, illustrate the strength and weakness of these schemes in comparison with a
hierarchical clustering scheme

Unit -5

2 Marks

1. “Is data mining a threat to privacy and data security?” Comment on the statement.
2. List out the differences between row scalability and column scalability.
3. What are the differences between visual data mining and data visualization?

7 Marks

1. Assume that your local bank has a data mining system. The bank has been studying your debit
card usage patterns. Noticing that you make many transactions at home renovation stores, the
bank decides to contact you, offering information regarding their special loans for home
improvements.
a) Discuss how this may conflict with your right to privacy.
b) Describe another situation where you feel that data mining can infringe on your
privacy.
2. What are the major challenges faced in bringing data mining research to market? Illustrate
one data mining research issue that, in your view, may have a strong impact on the market
and on society. Discuss how to approach such a research issue.
3. Propose a few implementation methods for audio data mining. Can we integrate audio and
visual data mining to bring fun and power to data mining? Is it possible to develop some
video data mining methods? State some scenarios and your solutions to make such integrated
audiovisual mining effective.
4. Assume that your local bank has a data mining system. The bank has been studying your debit
card usage patterns. Noticing that you make many transactions at home renovation stores, the
bank decides to contact you, offering information regarding their special loans for home
improvements.
a) Describe a privacy-preserving data mining method that may allow the bank to
perform customer pattern analysis without infringing on customers' right to privacy.
b) What are some examples where data mining could be used to help society? Can you
think of ways it could be used that may be detrimental to society?

Advanced Data Modelling Paper
No ratings yet
Advanced Data Modelling Paper
4 pages
Sonia Richards - Week End 1 - Course Exercice
100% (1)
Sonia Richards - Week End 1 - Course Exercice
10 pages
Chapter 3 Exercises
50% (2)
Chapter 3 Exercises
3 pages
E-Tivity 2.2 Tharcisse 217010849
No ratings yet
E-Tivity 2.2 Tharcisse 217010849
7 pages
Chapter 3
100% (3)
Chapter 3
4 pages
Method Statement For FA System
No ratings yet
Method Statement For FA System
6 pages
002 11KV Switchgear JHA Work Sheet
100% (2)
002 11KV Switchgear JHA Work Sheet
4 pages
CS 8031 Data Mining and Data Warehousing Tutorial
No ratings yet
CS 8031 Data Mining and Data Warehousing Tutorial
9 pages
Tutor Test and Home Assignment Questions For de
No ratings yet
Tutor Test and Home Assignment Questions For de
4 pages
DWH-DM Assignment
No ratings yet
DWH-DM Assignment
5 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
3 pages
CS402 Data Mining and Warehousing Question Bank
No ratings yet
CS402 Data Mining and Warehousing Question Bank
6 pages
2024 Jan. ITT206-F
No ratings yet
2024 Jan. ITT206-F
4 pages
Data Mining-1
No ratings yet
Data Mining-1
15 pages
Data Mining Worksheet One
No ratings yet
Data Mining Worksheet One
2 pages
File To Submitt Till 20
No ratings yet
File To Submitt Till 20
2 pages
Assignment DS EC11 3
No ratings yet
Assignment DS EC11 3
1 page
DWDM_Mid-1
No ratings yet
DWDM_Mid-1
3 pages
Assign em NT
No ratings yet
Assign em NT
2 pages
DSBDA 4
No ratings yet
DSBDA 4
16 pages
DSE2121 - Data Analytics - End Semester QP-2023
No ratings yet
DSE2121 - Data Analytics - End Semester QP-2023
4 pages
Data Engineering Lab: List of Programs
No ratings yet
Data Engineering Lab: List of Programs
2 pages
Data Engineering Lab: List of Programs
No ratings yet
Data Engineering Lab: List of Programs
2 pages
Assignment DMBI 2
No ratings yet
Assignment DMBI 2
2 pages
Data Warehousing&Data Mining AMTCSE0114
No ratings yet
Data Warehousing&Data Mining AMTCSE0114
3 pages
2018 & 2019 Data Mining Answers
No ratings yet
2018 & 2019 Data Mining Answers
25 pages
Data Mining (Gtu Sem-6)002
No ratings yet
Data Mining (Gtu Sem-6)002
5 pages
Data Literacy Questions All Types
No ratings yet
Data Literacy Questions All Types
2 pages
DA Exam Paper
No ratings yet
DA Exam Paper
6 pages
Model Cs 8 PDF
No ratings yet
Model Cs 8 PDF
17 pages
comp 414 revision
No ratings yet
comp 414 revision
9 pages
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
No ratings yet
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
43 pages
Winsem2012-13 Cp0535 Modqst Model QP
No ratings yet
Winsem2012-13 Cp0535 Modqst Model QP
4 pages
DWM QB Cyse
No ratings yet
DWM QB Cyse
8 pages
F24-CSC493-579-Mid-study guide
No ratings yet
F24-CSC493-579-Mid-study guide
6 pages
ASS Ignments: Program: BSC It Semester-Vi
No ratings yet
ASS Ignments: Program: BSC It Semester-Vi
14 pages
DMBI Questions
No ratings yet
DMBI Questions
8 pages
DMBI Index
No ratings yet
DMBI Index
2 pages
BUSI2045 Midterm Questions 2024 Spring
No ratings yet
BUSI2045 Midterm Questions 2024 Spring
10 pages
Cia1 Paper
No ratings yet
Cia1 Paper
2 pages
Data Mining and Warehousing Lab
No ratings yet
Data Mining and Warehousing Lab
4 pages
MCS 41, MCS 42, MCS 43, MCS 44, MCS 45
0% (1)
MCS 41, MCS 42, MCS 43, MCS 44, MCS 45
13 pages
Discussion Questions
No ratings yet
Discussion Questions
6 pages
ML Assignment-1
No ratings yet
ML Assignment-1
7 pages
Assignment_1_Machine Learning
No ratings yet
Assignment_1_Machine Learning
3 pages
640005
No ratings yet
640005
4 pages
2024 Honework 01 Questions
No ratings yet
2024 Honework 01 Questions
3 pages
Test Question Oct 2020
No ratings yet
Test Question Oct 2020
5 pages
Data Preprocessing Solution-24-37
No ratings yet
Data Preprocessing Solution-24-37
14 pages
Qbank
No ratings yet
Qbank
5 pages
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
No ratings yet
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
2 pages
DWM Solution May 2019
No ratings yet
DWM Solution May 2019
9 pages
DWM Assignment
No ratings yet
DWM Assignment
15 pages
MODEL EXAM II Answer Key - For Merge
No ratings yet
MODEL EXAM II Answer Key - For Merge
20 pages
Big Data Exercieses
No ratings yet
Big Data Exercieses
6 pages
M. Tech. Semester - I: Distributed Computing (MCSCS 101/1MCS1)
No ratings yet
M. Tech. Semester - I: Distributed Computing (MCSCS 101/1MCS1)
20 pages
It 4004 2019
No ratings yet
It 4004 2019
6 pages
Jntuworld: R07 Set No. 2
No ratings yet
Jntuworld: R07 Set No. 2
7 pages
AI Technical
No ratings yet
AI Technical
7 pages
FDS-1
No ratings yet
FDS-1
5 pages
Metaheuristics for Big Data
From Everand
Metaheuristics for Big Data
Clarisse Dhaenens
No ratings yet
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
Blending Aggregate Stockpiles: Senior/Graduate HMA Course
No ratings yet
Blending Aggregate Stockpiles: Senior/Graduate HMA Course
17 pages
Welding Visual Inspection Report F
0% (1)
Welding Visual Inspection Report F
1 page
4.5 - Datex Ohmeda Aestiva 5 Brochure and Specs PDF
No ratings yet
4.5 - Datex Ohmeda Aestiva 5 Brochure and Specs PDF
1 page
Application of Lean Six Sigma Methodology Application in Banking FINAL
No ratings yet
Application of Lean Six Sigma Methodology Application in Banking FINAL
58 pages
Lesson Script in Mathematics
No ratings yet
Lesson Script in Mathematics
49 pages
Free Power Energy Emergency Light
No ratings yet
Free Power Energy Emergency Light
10 pages
Question Bank English Class V Unseen Passage-I
No ratings yet
Question Bank English Class V Unseen Passage-I
9 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Cat Gp15k - Gp35k Dp20k-Dp35k Service Manjel
100% (1)
Cat Gp15k - Gp35k Dp20k-Dp35k Service Manjel
370 pages
LT BILL 76229365020 Apr22
No ratings yet
LT BILL 76229365020 Apr22
2 pages
Cadre Théorique de La Compta
No ratings yet
Cadre Théorique de La Compta
13 pages
Smart Meter Display User Guide
No ratings yet
Smart Meter Display User Guide
11 pages
Audit Planning
No ratings yet
Audit Planning
3 pages
Oil Company or Service Company - Offshore
No ratings yet
Oil Company or Service Company - Offshore
11 pages
Essentials of Oceanography 12th Edition Trujillo Test Bank - Quickly Download And Never Miss Important Content
100% (2)
Essentials of Oceanography 12th Edition Trujillo Test Bank - Quickly Download And Never Miss Important Content
49 pages
Annexure 6
No ratings yet
Annexure 6
1 page
Both - Either - Neither
No ratings yet
Both - Either - Neither
13 pages
Material & Equipment Gate Pass Control No. 0005758641 Status: APPROVED
No ratings yet
Material & Equipment Gate Pass Control No. 0005758641 Status: APPROVED
1 page
Alignment Procedure
No ratings yet
Alignment Procedure
2 pages
Radex Radiator Heaters
No ratings yet
Radex Radiator Heaters
8 pages
IC6501 SCAD MSM by WWW - Learnengineering.in
No ratings yet
IC6501 SCAD MSM by WWW - Learnengineering.in
223 pages
Humility Essay
100% (2)
Humility Essay
9 pages
Stability of Carotenoids and Vitamin A During
No ratings yet
Stability of Carotenoids and Vitamin A During
7 pages
How To Paint Swirl A Guitars
100% (1)
How To Paint Swirl A Guitars
4 pages
Amot 4255
No ratings yet
Amot 4255
6 pages
Customs and Traditions Listening While Watching
100% (2)
Customs and Traditions Listening While Watching
4 pages
18CO5009-10-It2-C04 Bill of Materials
No ratings yet
18CO5009-10-It2-C04 Bill of Materials
7 pages