Part Four PDF
Part Four PDF
Part Four PDF
PART 4
◼ Find all rules X->Y that correlate the presence of one set of items X with
another set of items Y
❑ Example: When a customer buys bread and butter, they buy milk 85% of the time
+
Prof. Ahmed Sultan Al-Hegami
The model: data
◼ Key Features
❑ Completeness: find all rules.
❑ No target item(s) on the right-hand-side
❑ Mining with data on hard disk (not in memory)
AB AC AD BC BD CD
A B C D
◼ After join
❑ C4 = {{1, 2, 3, 4}, {1, 3, 4, 5}}
◼ After pruning:
❑ C4 = {{1, 2, 3, 4}}
because {1, 4, 5} is not in F3 ({1, 3, 4, 5} is removed)
◼ Pruning:
❑ acde is removed because ade is not in L3
◼ C4={abcd}
+
Prof. Ahmed Sultan Al-Hegami
On Apriori Algorithm
Transaction form:
(Attr1, a), (Attr2, b), (Attr3, d)
(Attr1, b), (Attr2, c), (Attr3, e)
◼ Let minsup = 20% and minconf = 60%. The following are two
examples of class association rules:
Student, School → Education [sup= 2/7, conf = 2/2]
game → Sport [sup= 2/7, conf = 2/3]
Programming assignment!
◼ Implement two algorithms for sequential pattern
mining considering
❑ multiple minimum supports
❑ support difference constraint
◼ Algorithms: (1) MS-GSP, and (2) MSprefixSpan
◼ Each group implements only 1 algorithm
❑ Deadline: May 27, 2010 (Demo your program on that day)
❑ Test data sequences will be in one file in the same format
as those in the book.