Constrained Logistic Regression for Discriminative Pattern Mining

Anand, Rajul; Reddy, Chandan K.

doi:10.1007/978-3-642-23780-5_16

Rajul Anand²³ &
Chandan K. Reddy²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6911))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3061 Accesses
3 Citations

Abstract

Analyzing differences in multivariate datasets is a challenging problem. This topic was earlier studied by finding changes in the distribution differences either in the form of patterns representing conjunction of attribute value pairs or univariate statistical analysis for each attribute in order to highlight the differences. All such methods focus only on change in attributes in some form and do not implicitly consider the class labels associated with the data. In this paper, we pose the difference in distribution in a supervised scenario where the change in the data distribution is measured in terms of the change in the corresponding classification boundary. We propose a new constrained logistic regression model to measure such a difference between multivariate data distributions based on the predictive models induced on them. Using our constrained models, we measure the difference in the data distributions using the changes in the classification boundary of these models. We demonstrate the advantages of the proposed work over other methods available in the literature using both synthetic and real-world datasets.

Download to read the full chapter text

Chapter PDF

Supervised Pattern Mining and Applications to Classification

A Comparison of Covariate Shift Detection Methods on Medical Datasets

Significant Pattern Mining with Confounding Variables

Keywords

References

Agrawal, R., Imielinski, T., Swami, A.: Database mining: A performance perspective. IEEE Trans. Knowledge Data Engrg. 5(6), 914–925 (1993)
Article Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/
Basu, S., Davidson, I., Wagstaff, K.L.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, Boca Raton (2008)
MATH Google Scholar
Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Article MATH Google Scholar
Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Coleman, T.F., Li, Y.: An interior trust region approach for nonlinear minimizations subject to bounds. Technical Report TR 93-1342 (1993)
Google Scholar
Dai, W., Yang, Q., Xue, G., Yu, Y.: Boosting for transfer learning. In: ICML 2007: Proceedings of the 24th International Conference on Machine Learning, pp. 193–200 (2007)
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–52 (1999)
Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, London (1993)
Book MATH Google Scholar
Fang, G., Pandey, G., Wang, W., Gupta, M., Steinbach, M., Kumar, V.: Mining low-support discriminative patterns from dense and high-dimensional data. IEEE Transactions on Knowledge and Data Engineering (2011)
Google Scholar
Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: methodology and application. Journal of Artificial Intelligence Research 17(1), 501–527 (2002)
MATH Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R., Loh, W.: A framework for measuring differences in data characteristics. J. Comput. Syst. Sci. 64(3), 542–578 (2002)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Heidelberg (2009)
Book MATH Google Scholar
Hilderman, R.J., Peckham, T.: A statistically sound alternative approach to mining contrast sets. In: Proceedings of the 4th Australasian Data Mining Conference (AusDM), pp. 157–172 (2005)
Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet MATH Google Scholar
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with cn2-sd. Journal of Machine Learning Research 5, 153–188 (2004)
MathSciNet Google Scholar
Liu, B., Hsu, W., Han, H.S., Xia, Y.: Mining changes for real-life applications. In: Data Warehousing and Knowledge Discovery, Second International Conference (DaWaK) Proceedings, pp. 337–346 (2000)
Google Scholar
Massey, F.J.: The kolmogorov-smirnov test for goodness of fit. Journal of the American Statistical Association 46(253), 68–78 (1951)
Article MATH Google Scholar
Novak, P.K., Lavrac, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10, 377–403 (2009)
MATH Google Scholar
Ntoutsi, I., Kalousis, A., Theodoridis, Y.: A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees. In: SIAM International Conference on Data Mining (SDM), pp. 810–821 (2008)
Google Scholar
Odibat, O., Reddy, C.K., Giroux, C.N.: Differential biclustering for gene expression analysis. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology (BCB), pp. 275–284 (2010)
Google Scholar
Palit, I., Reddy, C.K., Schwartz, K.L.: Differential predictive modeling for racial disparities in breast cancer. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 239–245 (2009)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010)
Article Google Scholar
Pekerskaya, I., Pei, J., Wang, K.: Mining changing regions from access-constrained snapshots: a cluster-embedded decision tree approach. Journal of Intelligent Information Systems 27(3), 215–242 (2006)
Article Google Scholar
Wang, H., Pei, J.: A random method for quantifying changing distributions in data streams. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 684–691. Springer, Heidelberg (2005)
Chapter Google Scholar
Wang, K., Zhou, S., Fu, A.W.C., Yu, J.X.: Mining changes of classification by correspondence tracing. In: Proceedings of the Third SIAM International Conference on Data Mining (SDM), pp. 95–106 (2003)
Google Scholar
Webb, G.I., Butler, S., Newlands, D.: On detecting differences between groups. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 256–265 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Wayne State University, Detroit, MI, USA
Rajul Anand & Chandan K. Reddy

Authors

Rajul Anand
View author publications
You can also search for this author in PubMed Google Scholar
Chandan K. Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anand, R., Reddy, C.K. (2011). Constrained Logistic Regression for Discriminative Pattern Mining. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-23780-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Constrained Logistic Regression for Discriminative Pattern Mining

Abstract

Chapter PDF

Similar content being viewed by others

Supervised Pattern Mining and Applications to Classification

A Comparison of Covariate Shift Detection Methods on Medical Datasets

Significant Pattern Mining with Confounding Variables

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Constrained Logistic Regression for Discriminative Pattern Mining

Abstract

Chapter PDF

Similar content being viewed by others

Supervised Pattern Mining and Applications to Classification

A Comparison of Covariate Shift Detection Methods on Medical Datasets

Significant Pattern Mining with Confounding Variables

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation