Association Rule discovery has been an important problem of investigation in knowledge discovery ... more Association Rule discovery has been an important problem of investigation in knowledge discovery and data mining. An association rule describes associations among the sets of items which occur together in transactions of databases.The Association Rule mining task consists of finding the frequent itemsets and the rules in the form of conditional implications with respect to some prespecified threshold values of support and confidence.The interestingness of Association Rules are determined by these two measures. However, other measures of interestingness like lift and conviction are also used. But, there occurs an explosive growth of discovered association rules and many of such rules are insignificant. In this paper we introduce a new measure of interestingness called Inter Itemset Distance or Spread and implemented this notion based on the approaches of the apriori algorithm with a view to reduce the number of discovered Association Rules in a meaningful manner. An analysis of the working of the new algorithm is done and the results are presented and compared with the results of conventional apriori algorithm.
Association rules discovered from transaction databases can be large in number. Reduction of asso... more Association rules discovered from transaction databases can be large in number. Reduction of association rules is an issue in recent times. Conventionally by varying support and confidence number of rules can be increased and decreased. By combining additional constraint with support number of frequent itemsets can be reduced and it leads to generation of less number of rules. Average inter itemset distance(IID) or Spread, which is the intervening separation of itemsets in the transactions has been used as a measure of interestingness for association rules with a view to reduce the number of association rules. In this paper by using average Inter Itemset Distance a complete algorithm based on the apriori is designed and implemented with a view to reduce the number of frequent itemsets and the association rules and also to find the distribution pattern of the association rules in terms of the number of transactions of non occurrences of the frequent itemsets. Further the apriori algorithm is also implemented and results are compared. The theoretical concepts related to inter itemset distance are also put forward.
Since early 1990s, Data Base Mining has grown tremendously as a field of research. It has broaden... more Since early 1990s, Data Base Mining has grown tremendously as a field of research. It has broadened its horizon from merely mining of market basket data to nearly all perceivable domains where decision making is dependent or has the potential of dependence on the patterns formed within the large databases. Interestingly, data accumulation in the databases is governed by the architectures of the various DBMS and tools developed and implemented under these models. The situation has become even more diverse with the rapid proliferation of the Internet and its distributed repositories along with emergence of various other data models like Object Relational, Active, Deductive, and Temporal etc. As a result, Data Base Mining has also become a hugely diverse sphere of activity which in turn has made the various Data Mining tasks either too specific to a particular model of DBMS or too abstract to find a direct relationship with a particular DBMS or its instance. While data will continue to grow under any DBMS model but the task of Data Mining shall be remaining similar irrespective of the model based on which the repositories are made. That is various Data Mining tasks are general in nature and applicable on any data repositories under any DBMS models. But unfortunately various algorithms for the above Data Mining tasks are not generic enough to be applicable on all kinds of data bases. In this paper the issues related to these problems are closely examined and analysed whether any foundation of a unified theoretical model is possible for the Data Mining tasks so that it is uniformly applicable to any underlying data model.
Association Rule discovery has been an important problem of investigation in knowledge discovery ... more Association Rule discovery has been an important problem of investigation in knowledge discovery and data mining. An association rule describes associations among the sets of items which occur together in transactions of databases.The Association Rule mining task consists of finding the frequent itemsets and the rules in the form of conditional implications with respect to some prespecified threshold values of support and confidence.The interestingness of Association Rules are determined by these two measures. However, other measures of interestingness like lift and conviction are also used. But, there occurs an explosive growth of discovered association rules and many of such rules are insignificant. In this paper we introduce a new measure of interestingness called Inter Itemset Distance or Spread and implemented this notion based on the approaches of the apriori algorithm with a view to reduce the number of discovered Association Rules in a meaningful manner. An analysis of the working of the new algorithm is done and the results are presented and compared with the results of conventional apriori algorithm.
Association rules discovered from transaction databases can be large in number. Reduction of asso... more Association rules discovered from transaction databases can be large in number. Reduction of association rules is an issue in recent times. Conventionally by varying support and confidence number of rules can be increased and decreased. By combining additional constraint with support number of frequent itemsets can be reduced and it leads to generation of less number of rules. Average inter itemset distance(IID) or Spread, which is the intervening separation of itemsets in the transactions has been used as a measure of interestingness for association rules with a view to reduce the number of association rules. In this paper by using average Inter Itemset Distance a complete algorithm based on the apriori is designed and implemented with a view to reduce the number of frequent itemsets and the association rules and also to find the distribution pattern of the association rules in terms of the number of transactions of non occurrences of the frequent itemsets. Further the apriori algorithm is also implemented and results are compared. The theoretical concepts related to inter itemset distance are also put forward.
Since early 1990s, Data Base Mining has grown tremendously as a field of research. It has broaden... more Since early 1990s, Data Base Mining has grown tremendously as a field of research. It has broadened its horizon from merely mining of market basket data to nearly all perceivable domains where decision making is dependent or has the potential of dependence on the patterns formed within the large databases. Interestingly, data accumulation in the databases is governed by the architectures of the various DBMS and tools developed and implemented under these models. The situation has become even more diverse with the rapid proliferation of the Internet and its distributed repositories along with emergence of various other data models like Object Relational, Active, Deductive, and Temporal etc. As a result, Data Base Mining has also become a hugely diverse sphere of activity which in turn has made the various Data Mining tasks either too specific to a particular model of DBMS or too abstract to find a direct relationship with a particular DBMS or its instance. While data will continue to grow under any DBMS model but the task of Data Mining shall be remaining similar irrespective of the model based on which the repositories are made. That is various Data Mining tasks are general in nature and applicable on any data repositories under any DBMS models. But unfortunately various algorithms for the above Data Mining tasks are not generic enough to be applicable on all kinds of data bases. In this paper the issues related to these problems are closely examined and analysed whether any foundation of a unified theoretical model is possible for the Data Mining tasks so that it is uniformly applicable to any underlying data model.
Uploads
Papers by Pankaj Kumar Deva Sarma
its horizon from merely mining of market basket data to nearly all perceivable domains where decision
making is dependent or has the potential of dependence on the patterns formed within the large databases.
Interestingly, data accumulation in the databases is governed by the architectures of the various DBMS
and tools developed and implemented under these models. The situation has become even more diverse
with the rapid proliferation of the Internet and its distributed repositories along with emergence of various
other data models like Object Relational, Active, Deductive, and Temporal etc. As a result, Data Base
Mining has also become a hugely diverse sphere of activity which in turn has made the various Data
Mining tasks either too specific to a particular model of DBMS or too abstract to find a direct relationship
with a particular DBMS or its instance. While data will continue to grow under any DBMS model but the
task of Data Mining shall be remaining similar irrespective of the model based on which the repositories
are made. That is various Data Mining tasks are general in nature and applicable on any data repositories
under any DBMS models. But unfortunately various algorithms for the above Data Mining tasks are
not generic enough to be applicable on all kinds of data bases. In this paper the issues related to these
problems are closely examined and analysed whether any foundation of a unified theoretical model is
possible for the Data Mining tasks so that it is uniformly applicable to any underlying data model.
its horizon from merely mining of market basket data to nearly all perceivable domains where decision
making is dependent or has the potential of dependence on the patterns formed within the large databases.
Interestingly, data accumulation in the databases is governed by the architectures of the various DBMS
and tools developed and implemented under these models. The situation has become even more diverse
with the rapid proliferation of the Internet and its distributed repositories along with emergence of various
other data models like Object Relational, Active, Deductive, and Temporal etc. As a result, Data Base
Mining has also become a hugely diverse sphere of activity which in turn has made the various Data
Mining tasks either too specific to a particular model of DBMS or too abstract to find a direct relationship
with a particular DBMS or its instance. While data will continue to grow under any DBMS model but the
task of Data Mining shall be remaining similar irrespective of the model based on which the repositories
are made. That is various Data Mining tasks are general in nature and applicable on any data repositories
under any DBMS models. But unfortunately various algorithms for the above Data Mining tasks are
not generic enough to be applicable on all kinds of data bases. In this paper the issues related to these
problems are closely examined and analysed whether any foundation of a unified theoretical model is
possible for the Data Mining tasks so that it is uniformly applicable to any underlying data model.