Automatic Dataset Type Recognition for Association Rule Mining

Published: 27 December 2024 Publication History


Association Rule Mining is an important subfield of data mining, which consists of extracting interesting associations between items that coexist in transactions on databases. The transactions dataset may be of different types, like (a) a market basket list, where each line represents a transaction, (b) invoice detail, directly derived from ERP company prints, (c) a sparse matrix with as many columns as the different types considered for mining, and (d) nominal attributes, mainly consisting of categorical features. The classification of a given input into the correct dataset type is crucial in automated machine learning tasks. In this paper, we report on the development of an automatic dataset type recognition mechanism. A specialized "Dataset of Datasets" is created from a variety of datasets distributed by well-known repositories. Ultimately, we build a hybrid classification model consisting of a procedural programming component and a pre-trained Supervised Machine Learning model based on the Random Forest algorithm. The classification accuracy achieved is of the order of 98%. The Random Forest algorithm has been chosen after considering a number of popular machine learning algorithms like the Naïve Bayes, Decision Tree, K-Nearest Neighbor (K-NN), SVM, as well as their variants.


        Author Tags

        1. Association Rules
        2. AutoML
        3. Dataset Types
        4. Feature Extraction
        5. Supervised Machine Learning


