Data Mining - Mining Sequential Patterns
Data Mining - Mining Sequential Patterns
is a
subA sequence database, S, is a set of tuples, <SID,s>,
where SID is a sequence ID and s is a sequence
Sequential Patterns
Length of Sequence 1 9
It contains a multiple times but sequence 1 will
contribute only one to the support of <a>
Sequence <a(bc)df> - Subsequence of Sequence 1
Support for <(ab)c> is 2 (Present in 1 and 3)
Frequent as it satisfies minimum support of 2
Scalable Methods for Mining Sequential Patterns
Full Set Vs Closed Set
A sequential patterns s is closed if there exists no
s where s is a proper super-sequence of s and s
has same support as s
PrefixSpan
Given a sequence = <e1e2en> (where each ei
corresponds to a frequent event), a sequence
<e1e2em> (m<=n) is called a prefix
e'i = ei for i<=m-1
e'm
m
All frequent items in (em em) are alphabetically
after those in em
disjoint subsets
jth
j
PrefixSpan Example
Step 1: find length-1 sequential patterns
<a>, <b>, <c>, <d>, <e>, <f>
Step 2: divide search space. The complete set of seq.
pat. can be partitioned into 6 subsets:
The ones having prefix <a>;
The ones having prefix <b>;
PrefixSpan
No candidate sequence needs to be generated
Projected databases keep shrinking
Major cost of PrefixSpan: constructing projected
databases
Can be improved by pseudo-projections
Pseudo-projection
Registers the index of the corresponding sequence
and the starting position of the projected suffix in
CloSpan
Periodicity Analysis
Synchronous periodicity event occurs at a relatively
fixed offset in each stable period
3 pm every day
Asynchronous periodicity event fluctuates in
loosely defined period
Precise or approximate depending on data value or
offset within a period
Mining partial periodicity leads to the discovery of
cyclic or periodic association rules (rules that
associate a set of events that occur periodically)
Example: If tea sells well between 3 5 pm
dinner will also sell well between 7 9 pm on
weekends