A Subsequence Interleaving Model for Sequential Pattern Mining

Fowkes, Jaroslav; Sutton, Charles

doi:10.1145/2939672.2939787

Statistics > Machine Learning

arXiv:1602.05012 (stat)

[Submitted on 16 Feb 2016 (v1), last revised 11 Nov 2016 (this version, v2)]

Title:A Subsequence Interleaving Model for Sequential Pattern Mining

Authors:Jaroslav Fowkes, Charles Sutton

View PDF

Abstract:Recent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential patterns and rank them using an associated measure of interestingness. The efficient inference in our model is a direct result of our use of a structural expectation-maximization framework, in which the expectation-step takes the form of a submodular optimization problem subject to a coverage constraint. We show on both synthetic and real world datasets that our model mines a set of sequential patterns with low spuriousness and redundancy, high interpretability and usefulness in real-world applications. Furthermore, we demonstrate that the quality of the patterns from our approach is comparable to, if not better than, existing state of the art sequential pattern mining algorithms.

Comments:	10 pages in KDD 2016: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:1602.05012 [stat.ML]
	(or arXiv:1602.05012v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1602.05012
Related DOI:	https://doi.org/10.1145/2939672.2939787

Submission history

From: Jaroslav Fowkes [view email]
[v1] Tue, 16 Feb 2016 13:30:10 UTC (188 KB)
[v2] Fri, 11 Nov 2016 10:43:36 UTC (108 KB)

Statistics > Machine Learning

Title:A Subsequence Interleaving Model for Sequential Pattern Mining

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:A Subsequence Interleaving Model for Sequential Pattern Mining

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators