Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Semi-supervised Collective Classification in Multi-attribute Network Data

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Multi-attribute network refers to network data with multiple attribute views and relational view. Although semi-supervised collective classification has been investigated extensively, little attention is received for such kind of network data. In this paper, we aim to study and solve the semi-supervised learning problem for multi-attribute networks. There are two important challenges: (1) how to extract effective information from the rich multi-attribute and relational information; (2) how to make use of unlabeled data in the network. We propose a new generative model with network regularization, called MARL, which addresses the two challenges. In the approach, a generative model based on the probabilistic latent semantic analysis method is developed to leverage attribute information, and a network regularizer is incorporated to smooth label probability with relational information and unlabeled data. Comprehensive experiments on various data sets have been conducted to demonstrate the effectiveness of the proposed MARL, and the results reveal that our approach outperforms existing collective classification methods and multi-view classification methods in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://mlg.ucd.ie/aggregation/index.html.

  2. http://archive.ics.uci.edu/ml/datasets/Multiple+Features.

  3. http://web.engr.illinois.edu/~mingji1/DBLP_four_area.zip.

References

  1. McDowell L, Gupta K, Aha D (2009) Cautious collective classification. J Mach Learn Res 10:2777–2836

    MathSciNet  MATH  Google Scholar 

  2. Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93

    Google Scholar 

  3. Shi R, Wu Q, Ye Y, Ho S (2014) A generative model with network regularization for semi-supervised collective classification. In: Proceedings of the 2014 SIAM international conference on data mining, pp 64–72

  4. Kato T, Kashima H, Sugiyama M (2009) Robust label propagation on multiple networks. IEEE Trans Neural Netw 20(1):35–44

    Article  Google Scholar 

  5. Xu C, Tao D, Xu C (2013) A survey on multi-view learning. In ArXiv preprint 1304(5634)

  6. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, pp 92–100

  7. Zhuang F, Karypis G, Ning X, He Q, Shi Z (2012) Multi-view learning via probabilistic latent semantic analysis. Inf Sci 199:20–30

    Article  Google Scholar 

  8. Shi X, Paiement J, Grangier D, Philip S (2012) Learning from heterogeneous sources via gradient boosting consensus. In: Proceedings of the SIAM international conference on data mining, pp 224–235

  9. Vijayan P, Subramanian S, Ravindran B (2014) Multi-label collective classification in multi-attribute multi-relational network data. In: Advances in social networks analysis and mining, pp 509–514

  10. McDowell L, Aha D (2012) Semi-supervised collective classification via hybrid label regularization. In: Proceedings of the 29th international conference on machine learning, pp 975–982

  11. Shi X, Li Y, Yu P (2011) Collective prediction with latent graphs. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 1127–1136

  12. Bilgic M, Mihalkova L, Getoor L (2010) Active learning for networked data. In: Proceedings of the 27th international conference on machine learning, pp 79–86

  13. McDowell L, Aha D (2013) Labels or attributes? Rethinking the neighbors for collective classification in sparsely-labeled networks. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 847–852

  14. Kumar A, Rai P, Daum H (2011) Co-regularized multi-view spectral clustering. In: Advances in neural information processing systems, pp 1413–1421

  15. Liu J, Jiang Y, Li Z, Zhou Z, Lu H (2014) Partially shared latent factor learning with multiview data. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2014.2335234

  16. Liu J, Wang C, Gao J, Han J (2013) Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the SIAM international conference on data mining, pp 252–260

  17. Cai D, Wang X, He X (2009) Probabilistic dyadic data analysis with local and global consistency. In: Proceedings of the 26th annual international conference on machine learning, pp 105–112

  18. Paige C, Saunders M (1982) Lsqr: an algorithm for sparse linear equations and sparse least squares. ACM Trans Math Softw 8(1):43–71

    Article  MathSciNet  MATH  Google Scholar 

  19. Jacob Y, Denoyer L, Gallinari P (2011) Classification and annotation in social corpora using multiple relations. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 1215–1220

  20. Greene D, Cunningham P (2013) Producing a unified graph representation from multiple social network views. In: Proceedings of the 5th annual ACM web science conference, pp 118–121

  21. Macskassy SA, Provost F (2007) Classification in networked data: a toolkit and a univariate case study. J Mach Learn Res 8:935–983

    Google Scholar 

  22. Neville J, Jensen D (2000) Iterative classification in relational data. In: Proceedings of AAAI-2000 workshop on learning statistical models from relational data, pp 13–20

  23. McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: Proceedings of the AAAI workshop on learning for text categorization, pp 41–48

Download references

Acknowledgments

This research was supported in part by NSFC under Grant Nos. 61572158, 61272538 and 61562027, Shenzhen Science and Technology Program under Grant No. JCYJ20140417172417128, Shenzhen Strategic Emerging Industries Program under Grant No. JCYJ20130329142551746 and Social Science Planning Project of Jiangxi Province under Grant No. 15XW12. Raymond Y.K. Lau’s work was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project: CityU11502115), and the Shenzhen Municipal Science and Technology R&D Funding–Basic Research Program (Project No. JCYJ20140419115614350).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunming Ye.

Appendix

Appendix

Definition 1

\({\mathcal {U}}(\varTheta , {\varTheta }^{\prime })\) is an auxiliary function for \({\mathcal {O}}(\varTheta )\) if the following conditions are satisfied

$$\begin{aligned} {\mathcal {U}}(\varTheta , {\varTheta }^{\prime }) \le {\mathcal {O}}(\varTheta ),\quad {\mathcal {U}}(\varTheta , {\varTheta }) = {\mathcal {O}}(\varTheta ) \end{aligned}$$
(15)

Lemma 1

If \({\mathcal {U}}\) is an auxiliary function of \({\mathcal {O}}\), then \({\mathcal {O}}\) is non-decreasing under the update

$$\begin{aligned} {\varTheta }^{r+1} = \mathop {{{\mathrm{argmax}}}}_{\varTheta }{{\mathcal {U}}(\varTheta , {\varTheta }^{r})} \end{aligned}$$
(16)

Proof

$$\begin{aligned} {\mathcal {O}}\big ({\varTheta }^{r+1}\big ) \ge {\mathcal {U}}(\varTheta ^{r+1}, {\varTheta }^{r}) \ge {\mathcal {U}}(\varTheta ^{r}, {\varTheta }^{r}) = {\mathcal {O}}({\varTheta }^{r}). \end{aligned}$$

\(\square \)

Lemma 2

Function

$$\begin{aligned} {\mathcal {U}}(\varTheta , \varTheta ^r)= & {} \sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big )\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r \nonumber \\&\times \log \frac{P\big (w_j^t|z_q^t,A_t\big )P\big (z_q^t|c_k,A_t\big )P\big (c_k |x_i \big )}{P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r} - \lambda \sum _{i=1}^N \sum _{s=1}^N D\big (P_i(c), P_s(c)\big ) E_{is}\nonumber \\ \end{aligned}$$
(17)

is an auxiliary function for the objective function \({\mathcal {O}}(\varTheta )\) in Eq. (5), where

$$\begin{aligned} \varTheta = \{P(c_k|x_i), P\big (z_q^t|c_k,A_t\big ), P\big (w_j^t|z_q^t,A_t\big )\} \end{aligned}$$

is the parameters,

$$\begin{aligned}&\varTheta ^{r} = \{P(c_k|x_i)^{r}, P\big (z_q^t|c_k,A_t\big )^{r}, P\big (w_j^t|z_q^t,A_t\big )^{r}\},\nonumber \\&\begin{aligned} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r =\frac{P\big (w_j^t|z_q^t,A_t\big )^r P\big (z_q^t|c_k,A_t\big )^r P\big (c_k |x_i \big )^r}{\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (w_j^t|z_q^t,A_t\big )^r P\big (z_q^t|c_k,A_t\big )^r P\big (c_k |x_i\big )^r}. \end{aligned}\qquad \end{aligned}$$
(18)

Proof

According to Jensen’s inequality, we have

$$\begin{aligned} \begin{aligned} {\mathcal {U}}(\varTheta , \varTheta ^r)=&\sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big )\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r \\&\times \log \frac{P\big (w_j^t|z_q^t,A_t\big )P\big (z_q^t|c_k,A_t\big )P\big (c_k |x_i \big )}{P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r} - \lambda \sum _{i=1}^N \sum _{s=1}^N D\big (P_i(c), P_s(c)\big ) E_{is}\\ \le&\sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big ) \log \sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r\\&\frac{P\big (w_j^t|z_q^t,A_t\big )P\big (z_q^t|c_k,A_t\big ) P\big (c_k |x_i \big )}{P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r} - \lambda \sum _{i=1}^N \sum _{s=1}^N D\big (P_i(c), P_s(c)\big ) E_{is} \\ =&\sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big ) \log \sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (w_j^t|z_q^t,A_t\big )P\big (z_q^t|c_k,A_t\big )P\big (c_k |x_i\big ) \\&- \lambda \sum _{i=1}^N \sum _{s=1}^N D\big (P_i(c), P_s(c)\big ) E_{is} \\ =\,&{\mathcal {O}}(\varTheta ), \end{aligned} \end{aligned}$$

It is also easy to verify that

$$\begin{aligned} \begin{aligned} {\mathcal {U}}(\varTheta , \varTheta )=&\sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big )\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big ) \\&\times \log \frac{P\big (w_j^t|z_q^t,A_t\big )P\big (z_q^t|c_k,A_t\big )P\big (c_k |x_i \big )}{P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )} - \lambda \sum _{i=1}^N \sum _{s=1}^N D\big (P_i(c), P_s(c)\big ) E_{is}\\ =&\sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big )\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big ) \\&\times \log \sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (w_j^t|z_q^t,A_t\big )P\big (z_q^t|c_k,A_t\big )P\big (c_k |x_i\big ) \\&- \lambda \sum _{i=1}^N \sum _{s=1}^N D\big (P_i(c), P_s(c)\big ) E_{is} \\ =\,&{\mathcal {O}}(\varTheta ). \end{aligned} \end{aligned}$$

Hence, the result follows. \(\square \)

Lemma 3

Maximizing

$$\begin{aligned} {\varTheta }^{r+1} = \mathop {{{\mathrm{argmax}}}}_{\varTheta }{\mathcal {Q}}(\varTheta ) \end{aligned}$$
(19)

is equivalent to maximizing the update in Eq. (16), where \({\mathcal {Q}}(\varTheta )\) is the expected complete data log-likelihood in Eq. (7).

Proof

$$\begin{aligned} \begin{aligned} {\mathcal {U}}(\varTheta , \varTheta ^r)=&\sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big )\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r \\&\times \log \frac{P\big (w_j^t|z_q^t,A_t\big )P\big (z_q^t|c_k,A_t\big )P\big (c_k |x_i \big )}{P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r} - \lambda \sum _{i=1}^N \sum _{s=1}^N D\big (P_i(c), P_s(c)\big ) E_{is}\\ =&\sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big )\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r \\&\times \log P\big (w_j^t|z_q^t,A_t\big )P\big (z_q^t|c_k,A_t\big )P\big (c_k |x_i\big ) \\&- \sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big )\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r\\&\quad \times \log P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r- \lambda \sum _{i=1}^N \sum _{s=1}^N D\big (P_i(c), P_s(c)\big ) E_{is}\\ =&{\mathcal {Q}}(\varTheta )- \sum _{i=1}^N \sum _{t=1}^T \sum _{j=1}^{M_t} n\big (x_i,w_j^t,A_t\big )\\&\sum _{k=1}^K\sum _{q=1}^{Q^t} P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r \log P\big (z_q^t,c_k|x_i,w_j^t,A_t\big )^r \\ \end{aligned} \end{aligned}$$

The second term is independent of \(\varTheta \), which can be treated as constant. Thus, maximizing the expected complete data log-likelihood function \({\mathcal {Q}}(\varTheta )\) is equivalent to maximizing \({\mathcal {U}}(\varTheta , {\varTheta }^{r})\).

\(\square \)

Next, we give the proof of Theorem 1.

Proof

According to Lemma 2, we know that \({\mathcal {U}}(\varTheta , \varTheta ^r)\) is an auxiliary function for \({\mathcal {O}}(\varTheta )\). From Lemma 1, we thus know that iteratively maximizing \({\mathcal {U}}(\varTheta , \varTheta ^r)\) leads to a non-decreasing of function \({{\mathcal {O}}}(\varTheta )\). Lemma 3 tells us that maximizing \({\mathcal {U}}(\varTheta , \varTheta ^r)\) is equivalent to maximizing \({{\mathcal {Q}}}(\varTheta )\). As Eqs. (8) and (14) are the exact update rules for maximizing \({\mathcal {Q}}(\varTheta )\), we thus know that \({\mathcal {O}}(\varTheta )\) is also non-decreasing under the two update rules. The result follows. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Ye, Y., Li, X. et al. Semi-supervised Collective Classification in Multi-attribute Network Data. Neural Process Lett 45, 153–172 (2017). https://doi.org/10.1007/s11063-016-9517-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-016-9517-y

Keywords