research-article

Structured Sparse Boosting for Graph Classification

Authors:

Jun HuanAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 9, Issue 1

Article No.: 4, Pages 1 - 22

https://doi.org/10.1145/2629328

Published: 25 August 2014 Publication History

Abstract

Boosting is a highly effective algorithm that produces a linear combination of weak classifiers (a.k.a. base learners) to obtain high-quality classification models. In this article, we propose a generalized logit boost algorithm in which base learners have structural relationships in the functional space. Although such relationships are generic, our work is particularly motivated by the emerging topic of pattern-based classification for semistructured data including graphs. Toward an efficient incorporation of the structure information, we have designed a general model in which we use an undirected graph to capture the relationship of subgraph-based base learners. In our method, we employ both L₁ and Laplacian-based L₂ regularization to logit boosting to achieve model sparsity and smoothness in the functional space spanned by the base learners. We have derived efficient optimization algorithms based on coordinate descent for the new boosting formulation and theoretically prove that it exhibits a natural grouping effect for nearby spatial or overlapping base learners and that the resulting estimator is consistent. Additionally, motivated by the connection between logit boosting and logistic regression, we extend our structured sparse regularization framework to logistic regression for vectorial data in which features are structured. Using comprehensive experimental study and comparing our work with the state-of-the-art, we have demonstrated the effectiveness of the proposed learning method.

References

[1]

C. Baldassano, M. C. Iordan, D. M. Beck, and F.-F. Li. 2012. Voxel-level functional connectivity using spatial regularization. NeuroImage 63, 3 (2012), 1099--1106.

[2]

P. L. Bartlett and M. Traskin. 2006. AdaBoost is consistent. In Advances in Neural Information Processing Systems (NIPS’06). 105--112.

[3]

A. Bhaduri, R. Ravishankar, and R. Sowdhamini. 2004. Conserved spatially interacting motifs of protein superfamilies: Application to fold recognition and function annotation of genome data. Proteins 4, 54 (2004), 657--670.

[4]

C. Chang and C. Lin. 2001. LIBSVM: A Library for Support Vector Machines. Available at http://www.csie.ntu.edu.tw/&sim;cjlin/libsvm.

Digital Library

[5]

F. Chung. 1997. Spectral graph theory. CBMS Regional Conferences Series 92 (1997).

[6]

Wenyuan Dai, Qiang Yang, Gui rong Xue, and Yong Yu. 2007. Boosting for transfer learning. In Proceedings of the International Conference on Machine Learning. 193--200.

Digital Library

[7]

M. Deshpande, M. Kuramochi, and G. Karypis. 2005. Frequent sub-structure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering (2005).

Digital Library

[8]

John Duchi and Yoram Singer. 2009. Boosting with structural sparsity. In Proceedings of the 26th Annual International Conference on Machine Learning. 297--304.

Digital Library

[9]

H. Fei and J. Huan. 2008. Structure feature selection for graph classification. In Proceedings of the ACM 17th Conference on Information and Knowledge Management.

Digital Library

[10]

H. Fei and J. Huan. 2009. L2 norm regularized feature kernel regression for graph data. In Proceeding of the 18th ACM Conference on Information and Knowledge Management. 593--600.

Digital Library

[11]

Y. Freund. 1995. Boosting a weak learning algorithm by majority. Information and Computation 121 (1995), 256--285.

Digital Library

[12]

Y. Freund and R. Shapire. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory. 23--37.

Digital Library

[13]

J. Friedman, T. Hastie, and R. Tibshirani. 2000. Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 2 (2000), 337--407.

[14]

J. Friedman, T. Hastie, and R. Tibshirani. 2009. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 (2009).

[15]

J. J. Goeman, S. A. van de Geer, F. de Kort, and H. C. van Houwelingen. 2004. A global test for groups of genes: Testing association with a clinical outcome. Bioinformatics 20, 1 (2004), 93--99.

Digital Library

[16]

Y. Guo, S. Mahony, and D. K. Gifford. 2012. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Computational Biology 8, 8 (Aug. 2012).

[17]

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. 2002 January. Gene selection for cancer classification using support vector machines. Machine Learning 46 (Jan. 2002), 389--422.

Digital Library

[18]

G. Haffari, Y. Wang, S. Wang, G. Mori, and F. Jiao. 2008. Boosting with incomplete information. In Proceedings of the International Conference on Machine Learning. 368--375.

Digital Library

[19]

T. Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning. Springer-Verlag.

[20]

J. Huan, W. Wang, and J. Prins. 2003. Efficient mining of frequent subgraph in the presence of isomorphism. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). 549--552.

Digital Library

[21]

L. Jacob, F. Bach, and J.-P. Vert. 2009. Clustered multi-task learning: A convex formulation. In Neural Information Processing Systems (NIPS).

[22]

N. Jin and W. Wang. 2011. LTS: Discriminative subgraph mining by learning from search history. In Proceedings of International Conference on Data Engineering (ICDE’11). 207--218.

Digital Library

[23]

N. Jin, C. Young, and W. Wang. 2009. Graph classification based on pattern co-occurrence. In Proceeding of the 18th ACM Conference on Information and Knowledge Management. 573--582.

Digital Library

[24]

R. Jorissen and M. Gilson. 2005. Virtual screening of molecular databases using a support vector machine. Journal of Chemical Information Modeling 45(3) (2005), 549--561.

[25]

M. Kanehisa, S. Goto, M. Hattori, K. F. Aoki-Kinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki, and M. Hirakawa. 2006. From genomics to chemical genomics: New developments in KEGG. Nucleic Acids Research 34 (2006), D354--357.

[26]

H. Kashima, K. Tsuda, and A. Inokuchi. 2003. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning (ICML). 321--328.

[27]

K. Knight and W. Fu. 2000. Asymptotics for lasso-type estimators. Journal of the Royal Statisical Society 28, 5 (2000), 1356--1378.

[28]

R. I. Kondor, N. Shervashidze, and K. M. Borgwardt. 2009. The graphlet spectrum. In Proceedings of the International Conference of Machine Learning, Vol. 382. ACM, 67.

Digital Library

[29]

X. Kong, W. Fan, and P. S. Yu. 2011. Dual active feature and sample selection for graph classification. In Proceedings of ACM Knowledge Discovery and Data Mining (KDD’11). 654--662.

Digital Library

[30]

X. Kong and P. S. Yu. 2010. Semi-supervised feature selection for graph classification. In Proceedings of ACM Knowledge Discovery and Data Mining (KDD’10). 793--802.

Digital Library

[31]

Taku Kudo, Eisaku Maeda, and Yuji Matsumoto. 2004. An application of boosting to graph classification. In The Neural Information Processing Systems (NIPS’04).

[32]

C. Leslie, E. Eskin, and W. S. Noble. 2002. The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing. 564--75.

[33]

C. Li and H. Li. 2008. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24, 9 (2008), 1175--1182.

Digital Library

[34]

P. Li. 2008. Adaptive base class boost for multi-class classification. In Proceedings of the International Conference on Machine Learning. 79--88.

Digital Library

[35]

L. Liang, V. Mandal, Y. Lu, and D. Kumar. 2008. Mcm-test: A fuzzy-set-theory-based approach to differential analysis of gene pathways. BMC Bioinformatics 9 (Suppl 6) (2008), S16.

[36]

C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu. 2005. Mining behavior graphs for “backtrace” of noncrashing bugs. In SDM.

[37]

P. M. Long and R. A. Servedio. 2008. Random classification noise defeats all convex potential boosters. In Proceedings of the International Conference on Machine Learning. 608--615.

Digital Library

[38]

T. M. Mitchell and McGraw Hill. 2010. Chapter 1 of Machine Learning: Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. Preprint. 12--16 pages. Available at http://www.cs.cmu.edu/&sim;tom/mlbook/NBayesLogReg.pdf.

[39]

H. D. K. Moonesinghe, H. Valizadegan, S. Fodeh, and P.-N. Tan. 2007. A probabilistic substructure-based approach for graph classification. In Proceedings of the IEEE International Conference on Tools with Artificial Intelligence, Vol. 1. 346--349.

Digital Library

[40]

V. Mootha, C. Lindgren, K. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstraale, E. Laurila et al. 2003. PGC-1: A-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34, 3 (2003), 267C273.

[41]

A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247 (1995), 536--540.

[42]

S. Nowozin, K. Tsuda, T. Uno, T. Kudo, and G. Bakir. 2007. Weighted substructure mining for image analysis. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR’07). 1--8.

[43]

G. Pandey, S. Chawla, S. Poon, B. Arunasalam, and J. G. Davis. 2009. Association rules network: Definition and applications. Statistics Analysis Data Mining 1, 4 (2009), 260--279.

Digital Library

[44]

H. Saigo et al. 2007. gBoost: Graph Learning Toolbox for Matlab. Available at http://www.kyb.tuebingen.mpg.de/bs/people/nowozin/gboost/.

[45]

H. Saigo, N. Krämer, and K. Tsuda. 2008. Partial least squares regression for graph mining. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’08).

Digital Library

[46]

H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tsuda. 2009. gBoost: A mathematical programming approach to graph classification and regression. Journal of Machine Learning 75, 1 (2009), 69--89.

Digital Library

[47]

T. Sandler, P. P. Talukdar, and L. H. Ungar. 2008. Regularized learning with networks of features. In The Neural Information Processing Systems (NIPS).

[48]

R. Schapire. 1990. The strength of weak learnability. Machine Learning 5 (1990), 197--227.

Digital Library

[49]

R. Schapire and Y. Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning 37 (1999), 297--336.

Digital Library

[50]

M. Thoma, H. Cheng, A. Gretton, J. Han, H.-P. Kriegel, A. J. Smola, L. Song, P. S. Yu, X. Yan, and K. M. Borgwardt. 2009. Near-optimal supervised feature selection among frequent subgraphs. In Proceedings of the 2009 SIAM Conference on Data Mining (SDM’09). Philadelphia, PA, 1076--1087.

[51]

R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. 2005. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society 67, 1 (2005), 91--108.

[52]

K. Tsuda. 2007. Entire regularization paths for graph data. In Proceedings of the International Conference on Machine Learning. 919--926.

Digital Library

[53]

Z. Xiang, Y. Xi, U. Hasson, and P. Ramadge. 2009. Boosting with spatial regularization. In Proceedings of the Neural Information Processing Systems (NIPS’09). 2107--2115.

[54]

R. Yan, J. Tesic, and J. R. Smith. 2007. Model-shared subspace boosting for multi-label classification. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 834--843.

Digital Library

[55]

X. Yan, H. Cheng, J. Han, and P. Yu. 2008. Mining significant graph patterns by leap search. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 433--444.

Digital Library

[56]

M. Yuan and Y. Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68 (2006), 49--67.

[57]

P. Zhao and B. Yu. 2006. Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics (2006), 3468--3497.

[58]

Y. Zhao, X. Kong, and P. S. Yu. 2011. Positive and unlabeled learning for graph classification. In Proceedings of the International Conference on Data Mining. 962--971.

Digital Library

[59]

L. Zheng, S. Wang, C. H. Lee, and Y. Liu. 2009. Information theoretic regularization for semi-supervised boosting. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1017--1026.

Digital Library

[60]

H. Zou and T. Hastie. 2005. Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society B 67 (2005), 301--320.

[61]

H. Zou and M. Yuan. 2008. F∞ norm support vector machine. Statistica Sinica 18 (2008), 379--398.

Cited By

Nguyen VNguyen L(2018)An efficient heuristic approach for learning a set of composite graph classification rulesIntelligent Data Analysis10.3233/IDA-16334322:3(581-596)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.3233/IDA-163343
Wang HWu JZhu XChen YZhang C(2018)Time-Variant Graph ClassificationIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2018.2830792(1-14)Online publication date: 2018
https://doi.org/10.1109/TSMC.2018.2830792
Bhuiyan MAl Hasan M(2018)Representing Graphs as Bag of Vertices and Partitions for Graph ClassificationData Science and Engineering10.1007/s41019-018-0065-53:2(150-165)Online publication date: 28-Jun-2018
https://doi.org/10.1007/s41019-018-0065-5
Show More Cited By

Index Terms

Structured Sparse Boosting for Graph Classification
1. Information systems
  1. Information systems applications

Recommendations

Boosting with structure information in the functional space: an application to graph classification
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Boosting is a very successful classification algorithm that produces a linear combination of "weak" classifiers (a.k.a. base learners) to obtain high quality classification models. In this paper we propose a new boosting algorithm where base learners ...
Multi-Class Learning by Smoothed Boosting

AdaBoost.OC has been shown to be an effective method in boosting "weak" binary classifiers for multi-class learning. It employs the Error-Correcting Output Code (ECOC) method to convert a multi-class learning problem into a set of binary classification ...
Regularizers for structured sparsity

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 9, Issue 1

October 2014

209 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/2663598

Editor:
Philip S. Yu
University of Illinois at Chicago, USA

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2014

Accepted: 01 February 2014

Revised: 01 November 2013

Received: 01 March 2013

Published in TKDD Volume 9, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
450
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nguyen VNguyen L(2018)An efficient heuristic approach for learning a set of composite graph classification rulesIntelligent Data Analysis10.3233/IDA-16334322:3(581-596)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.3233/IDA-163343
Wang HWu JZhu XChen YZhang C(2018)Time-Variant Graph ClassificationIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2018.2830792(1-14)Online publication date: 2018
https://doi.org/10.1109/TSMC.2018.2830792
Bhuiyan MAl Hasan M(2018)Representing Graphs as Bag of Vertices and Partitions for Graph ClassificationData Science and Engineering10.1007/s41019-018-0065-53:2(150-165)Online publication date: 28-Jun-2018
https://doi.org/10.1007/s41019-018-0065-5
Wang HZhang PZhu XTsang IChen LZhang CWu X(2017)Incremental Subgraph Feature Selection for Graph ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.261630529:1(128-142)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TKDE.2016.2616305
Aggarwal CHe GZhao P(2016)Edge classification in networks2016 IEEE 32nd International Conference on Data Engineering (ICDE)10.1109/ICDE.2016.7498311(1038-1049)Online publication date: May-2016
https://doi.org/10.1109/ICDE.2016.7498311
Yoshikawa HIwakura T(2016)Fast training of a graph boosting for large-scale text classificationProceedings of the 14th Pacific Rim International Conference on Trends in Artificial Intelligence10.1007/978-3-319-42911-3_53(638-650)Online publication date: 22-Aug-2016
https://dl.acm.org/doi/10.1007/978-3-319-42911-3_53
Li XLv ZHu JZhang BShi LFeng S(2015)XEarth: A 3D GIS platform for managing massive city information2015 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)10.1109/CIVEMSA.2015.7158625(1-6)Online publication date: Jun-2015
https://doi.org/10.1109/CIVEMSA.2015.7158625
Lv ZLi XHu JYin LZhang BFeng S(2015)Virtual geographic environment based coach passenger flow forecasting2015 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)10.1109/CIVEMSA.2015.7158618(1-6)Online publication date: Jun-2015
https://doi.org/10.1109/CIVEMSA.2015.7158618
Li XLv ZHu JZhang BYin LZhong CWang WFeng SBalaji PXu C(2015)Traffic management and forecasting system based on 3D GISProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.62(991-998)Online publication date: 4-May-2015
https://dl.acm.org/doi/10.1109/CCGrid.2015.62

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents