Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3110025.3110128acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Collective classification in social networks

Published: 31 July 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Classification is one of the most studied subjects in machine learning. Most classification methods that were developed this last decade either account for structure (interactions, relationships) or attributes (text, numerical, etc). This leads to ignoring significant patterns in a dataset that could only be captured by analyzing the features of an item and its interactions. Collective classification methods use both structure and attributes, often by aggregating data from neighbors of a node and learning a model on the aggregated data. In social networks, the degree distribution of nodes follows a power law where few nodes have many neighbors. High degree nodes have incoming links from low degree nodes of different classes and many nodes have very few edges. Hence, using only local structure may lead to poor predictions. Also, many social networks allow for different types of interactions (retweet, reply, like, etc.) that affect classification differently. This article proposes a collective classification method that makes use of the structure of a network to determine its neighbors. It then presents experiments aimed at detecting jihadi propagandists and malware distributors on social networks.

    References

    [1]
    D. Cardon, "Le design de la visibilité: un essai de cartographie du web 2.0," Réseaux, vol. 152, pp. 93--137, 2008. [Online]. Available: http://www.cairn.info/revue-reseaux-2008-6-page-93.htm%5Cnhttp://www.internetactu.net/2008/02/01/le-design-de-la-visibilite-un-essai-de-typologie-du-web-20/
    [2]
    Y. Freund and R. R. E. Schapire, "Experiments with a New Boosting Algorithm," International Conference on Machine Learning, pp. 148--156, 1996. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.6252
    [3]
    L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5--32, 2001.
    [4]
    David E. Ruineihart, Geoffrey E. Hinton and R. J. Williams, "Learning internal representations by error propagation," Parallel distributed processing: explorations in the microstructure of cognition, no. 1, pp. 318--362, 1985.
    [5]
    K. Nigam, J. Lafferty, and A. Mccallum, "Using Maximum Entropy for Text Classification," IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61--67, 1999.
    [6]
    R. M. Neal, "Probabilistic Inference Using Markov Chain Monte Carlo Methods," Technical Report, vol. 1, pp. 1--144, 1998. [Online]. Available: papers2://publication/uuid/0C88167E-5379-4E4E-A9E4-007ABA4F716D
    [7]
    P. Sen, G. Namata, M. Bilgic, L. Getoor, and B. Gallagher, "Collective Classification in Network Data," pp. 93--106, 2008.
    [8]
    C. Castillo, D. Donato, and V. Murdock, "Know your Neighbors: Web Spam Detection using the Web Topology," Framework, pp. 423--430, 2007. [Online]. Available: http://www.dcc.uchile.cl/$\sim$ccastill/papers/cdgms_2006_know_your_neighbors.pdf
    [9]
    L. K. McDowell, K. M. Gupta, and D. W. Aha, "Cautious Collective Classification," J. Mach. Learn. Res., vol. 10, pp. 2777--2836, 2009. [Online]. Available: http://dl.acm.org/citation.cfm?id=1577069.1755879
    [10]
    S. a. Macskassy and F. Provost, "Classification in Networked Data: A Toolkit and a Univariate Case Study," Journal of Machine Learning Research, vol. 8, no. December 2004, pp. 935--983, 2007.
    [11]
    L. Tang and H. Liu, "Leveraging social media networks for classification," Data Mining and Knowledge Discovery, vol. 23, no. 3, pp. 447--478, 2011.
    [12]
    B. Gallagher, H. Tong, T. Eliassi-Rad, and C. Faloutsos, "Using ghost edges for classification in sparsely labeled networks," Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, p. 256, 2008. [Online]. Available: http://dl.acm.org/citation.cfm?doid=1401890.1401925
    [13]
    C. Park, "Effective Label Acquisition for Collective Classification Categories and Subject Descriptors," Design.
    [14]
    S. Chakrabarti, B. Dom, and P. Indyk, "Enhanced hypertext categorization using hyperlinks," Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98, no. March, pp. 307--318, 1998. [Online]. Available: http://portal.acm.org/citation.cfm?doid=276304.276332
    [15]
    B. Perozzi, R. Al-Rfou, and S. Skiena, "DeepWalk: Online Learning of Social Representations," 2014. [Online]. Available: http://arxiv.org/abs/1403.6652%0Ahttp://dx.doi.org/10.1145/2623330.2623732
    [16]
    S. Bhagat, G. Cormode, and S. Muthukrishnan, "Node Classification in Social Networks," 2011. [Online]. Available: http://arxiv.org/abs/1101.3291%0Ahttp://dx.doi.org/10.1007/978-1-4419-8462-3_5
    [17]
    D. Jensen, J. Neville, and B. Gallagher, "Why collective inference improves relational classification," Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, p. 593, 2004. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1014052.1014125
    [18]
    D. Zhou, B. Schölkopf, and T. Hofmann, "Semi-Supervised Learning on Directed Graphs," Adv. in Neur. Inf. Proc. Syst. (NIPS), vol. 17, pp. 1633--1640, 2005.
    [19]
    Z. Yang, W. W. Cohen, and R. Salakhutdinov, "Revisiting Semi-Supervised Learning with Graph Embeddings," vol. 48, 2016. [Online]. Available: http://arxiv.org/abs/1603.08861
    [20]
    S. A. Macskassy and F. Provost, "Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access 1," no. February, 2014.
    [21]
    M. Nickel, V. Tresp, and H.-P. Kriegel, "Factorizing YAGO," Proceedings of the 21st international conference on World Wide Web - WWW '12, p. 271, 2012. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2187836.2187874
    [22]
    P. Kazienko and T. Kajdanowicz, "Label-dependent node classification in the network," Neurocomputing, vol. 75, no. 1, pp. 199--209, 2012. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2011.04.047
    [23]
    D. Jensen and J. Neville, "Linkage and autocorrelation cause feature selection bias in relational learning," Proceedings of the Nineteenth International Conference on Machine Learning (ICML2002), pp. 259--266, 2002.
    [24]
    D. Jensen, J. Neville, and M. Hay, "Avoiding Bias when Aggregating Relational Data with Degree Disparity," Proceedings of the Twentieth International Conference on Machine Learning, vol. 20, no. 1, p. 274, 2003.
    [25]
    F. D. Malliaros and M. Vazirgiannis, "Clustering and community detection in directed networks: A survey," Physics Reports, vol. 533, no. 4, pp. 95--142, 2013.
    [26]
    L. Wang, T. Lou, J. Tang, and J. E. Hopcroft, "Detecting Community Kernels in Large Social Networks."
    [27]
    P. Pons and M. Latapy, "Computing communities in large networks using random walks," Lect Notes Comput Sc, vol. 3733, pp. 284--293, 2005.
    [28]
    M. Magnani and L. Rossi, "Multi-Stratum Networks: toward a unified model of on-line identities," arXiv preprint arXiv:1211.0169, pp. 1--18, 2012. [Online]. Available: http://arxiv.org/abs/1211.0169v1
    [29]
    M. Kivelä, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, and M. a. Porter, "Multilayer Networks," arXiv, p. 37, 2014. [Online]. Available: http://arxiv.org/abs/1309.7233
    [30]
    S. Boccaletti, G. Bianconi, R. Criado, C. I. del Genio, J. Gómez-Gardeñes, M. Romance, I. Sendiña-Nadal, Z. Wang, and M. Zanin, "The structure and dynamics of multilayer networks," Physics Reports, vol. 544, no. 1, pp. 1--122, 2014. [Online]. Available: http://dx.doi.org/10.1016/j.physrep.2014.07.001
    [31]
    O. Jaafor, "Multi-layered graph-based model for social engineering vulnerability assessment," in The international conference on Advances in Social Network Analysis and Mining (ASONAM). Paris, France: ACM, 2015, pp. 1480--1488. [Online]. Available: http://link.springer.com/bookseries/8768
    [32]
    P. Kazienko, K. Musial, E. Kukla, and T. Kajdanowicz, "Multidimensional Social Network: Model and Analysis," in International Conference on Computational Collective Intelligence, 2011, pp. 378--387.

    Cited By

    View all
    • (2019)Collective Classification for Social Opinion Spam DetectionProceedings of the 2019 2nd International Conference on Data Science and Information Technology10.1145/3352411.3352440(181-186)Online publication date: 19-Jul-2019
    • (2018)Multimodality Fusion for Node Classification in D2D CommunicationsIEEE Access10.1109/ACCESS.2018.28777156(63748-63756)Online publication date: 2018

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASONAM '17: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017
    July 2017
    698 pages
    ISBN:9781450349932
    DOI:10.1145/3110025
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 July 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Semi-supervised classification
    2. collective classification
    3. multi-layer networks
    4. social networks
    5. surveillance
    6. unbalanced data

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ASONAM '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 116 of 549 submissions, 21%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)2

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Collective Classification for Social Opinion Spam DetectionProceedings of the 2019 2nd International Conference on Data Science and Information Technology10.1145/3352411.3352440(181-186)Online publication date: 19-Jul-2019
    • (2018)Multimodality Fusion for Node Classification in D2D CommunicationsIEEE Access10.1109/ACCESS.2018.28777156(63748-63756)Online publication date: 2018

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media