Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366423.3380263acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

eDarkFind: Unsupervised Multi-view Learning for Sybil Account Detection

Published: 20 April 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Darknet crypto markets are online marketplaces using crypto currencies (e.g., Bitcoin, Monero) and advanced encryption techniques to offer anonymity to vendors and consumers trading for illegal goods or services. The exact volume of substances advertised and sold through these crypto markets is difficult to assess, at least partially, because vendors tend to maintain multiple accounts (or Sybil accounts) within and across different crypto markets. Linking these different accounts will allow us to accurately evaluate the volume of substances advertised across the different crypto markets by each vendor. In this paper, we present a multi-view unsupervised framework (eDarkFind) that helps modeling vendor characteristics and facilitates Sybil account detection. We employ a multi-view learning paradigm to generalize and improve the performance by exploiting the diverse views from multiple rich sources such as BERT, stylometric, and location representation. Our model is further tailored to take advantage of domain-specific knowledge such as the Drug Abuse Ontology to take into consideration the substance information. We performed extensive experiments and demonstrated that the multiple views obtained from diverse sources can be effective in linking Sybil accounts. Our proposed eDarkFind model achieves an accuracy of 98% on three real-world datasets which shows the generality of the approach.

    References

    [1]
    2017. Dark Web Users Suspect ”Dream Market” Has Also Been Backdoored by Feds. https://thehackernews.com/2017/07/dream-market-darkweb.html
    [2]
    2019. Dark web marketplace Wall Street Market busted by international police. https://nakedsecurity.sophos.com/2019/05/07/dark-web-marketplace-wall-street-market-busted-by-international-police/
    [3]
    Sadia Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon McCoy. 2014. Doppelgänger finder: Taking stylometry to the underground. In 2014 IEEE Symposium on Security and Privacy. IEEE, 212–226.
    [4]
    Douglas Bagnall. 2015. Author identification using multi-headed recurrent neural networks. arXiv preprint arXiv:1506.04891(2015).
    [5]
    Monica J Barratt and Judith Aldridge. 2016. Everything you always wanted to know about drug cryptomarkets*(* but were afraid to ask).The International journal on drug policy 35 (2016), 1.
    [6]
    Adrian Benton, Raman Arora, and Mark Dredze. 2016. Learning multiview embeddings of twitter users. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 14–19.
    [7]
    Julian Broséus, Damien Rhumorbarbe, Caroline Mireault, Vincent Ouellette, Frank Crispino, and David Décary-Hétu. 2016. Studying illicit drug trafficking on Darknet markets: structure and organisation from a Canadian perspective. Forensic science international 264 (2016), 7–14.
    [8]
    Delroy Cameron, Gary A Smith, Raminta Daniulaityte, Amit P Sheth, Drashti Dave, Lu Chen, Gaurish Anand, Robert Carlson, Kera Z Watkins, and Russel Falck. 2013. PREDOSE: a semantic web platform for drug abuse epidemiology using social media. Journal of biomedical informatics 46, 6 (2013), 985–997.
    [9]
    J Douglas Carroll. 1968. Generalization of canonical correlation analysis to three or more sets of variables. In Proceedings of the 76th annual convention of the American Psychological Association, Vol. 3. 227–228.
    [10]
    Na Cheng, Rajarathnam Chandramouli, and KP Subbalakshmi. 2011. Author gender identification from text. Digital Investigation 8, 1 (2011), 78–88.
    [11]
    Nicolas Christin. 2013. Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. In Proceedings of the 22nd international conference on World Wide Web. ACM, 213–224.
    [12]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
    [13]
    Martin Dittus, Joss Wright, and Mark Graham. 2018. Platform Criminalism: The’last-mile’geography of the darknet market supply chain. In Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 277–286.
    [14]
    Elisa Ferracane, Su Wang, and Raymond Mooney. 2017. Leveraging discourse information effectively for authorship attribution. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 584–593.
    [15]
    Graeme Hirst and Olǵa Feiguina. 2007. Bigrams of syntactic labels for authorship discrimination of short texts. Literary and Linguistic Computing 22, 4 (2007), 405–417.
    [16]
    Thanh Nghia Ho and Wee Keong Ng. 2016. Application of stylometry to darkweb forum user identification. In International Conference on Information and Communications Security. Springer, 173–183.
    [17]
    Thanh Nghia Ho and Wee Keong Ng. 2016. Application of Stylometry to DarkWeb Forum User Identification. In Information and Communications Security, Kwok-Yan Lam, Chi-Hung Chi, and Sihan Qing (Eds.). Springer International Publishing, Cham, 173–183.
    [18]
    Harold Hotelling. 1992. Relations between two sets of variates. In Breakthroughs in statistics. Springer, 162–190.
    [19]
    Patrick Juola 2008. Authorship attribution. Foundations and Trends® in Information Retrieval 1, 3(2008), 233–334.
    [20]
    KD Kochanek, SL Murphy, JQ Xu, and E Arias. 2017. Mortality in the United States, 2016. NCHS Data Brief, no 293. National Center for Health Statistics(2017).
    [21]
    Kristy Kruithof. 2016. Internet-facilitated drugs trade: An analysis of the size, scope and the role of the Netherlands. RAND.
    [22]
    Francois R Lamy, Raminta Daniulaityte, Ramzi W Nahhas, Monica J Barratt, Alan G Smith, Amit Sheth, Silvia S Martins, Edward W Boyer, and Robert G Carlson. 2017. Increases in synthetic cannabinoids-related harms: Results from a longitudinal web-based content analysis. International Journal of Drug Policy 44 (2017), 121–129.
    [23]
    Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. 1188–1196.
    [24]
    Usha Lokala, Francois R Lamy, Raminta Daniulaityte, Amit Sheth, Ramzi W Nahhas, Jason I Roden, Shweta Yadav, and Robert G Carlson. 2019. Global trends, local harms: availability of fentanyl-type drugs on the dark web and accidental overdoses in Ohio. Computational and Mathematical Organization Theory 25, 1 (2019), 48–59.
    [25]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
    [26]
    Frederick Mosteller and David L Wallace. 1963. Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed Federalist Papers. J. Amer. Statist. Assoc. 58, 302 (1963), 275–309.
    [27]
    Sebastian Ruder, Parsa Ghaffari, and John G Breslin. 2016. Character-level and multi-channel convolutional neural networks for large-scale authorship attribution. arXiv preprint arXiv:1609.06686(2016).
    [28]
    Prasha Shrestha, Sebastian Sierra, Fabio Gonzalez, Manuel Montes, Paolo Rosso, and Thamar Solorio. 2017. Convolutional neural networks for authorship attribution of short texts. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 669–674.
    [29]
    Kevin Smith. 2016. Tochka Market URL: Links - Buy Items from Tochka Darknet Marketplace. https://www.deepweb-sites.com/tochka-market-url-links-darknet-reddit-review/
    [30]
    Kyle Soska and Nicolas Christin. 2015. Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In 24th {USENIX} Security Symposium ({USENIX} Security 15). 33–48.
    [31]
    Efstathios Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60, 3 (2009), 538–556.
    [32]
    Yichuan Tang. 2013. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239(2013).
    [33]
    Michel Van De Velden and Tammo HA Bijmolt. 2006. Generalized canonical correlation analysis of matrices with missing rows: a simulation study. Psychometrika 71, 2 (2006), 323–331.
    [34]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In NIPS.
    [35]
    Xiangwen Wang, Peng Peng, Chun Wang, and Gang Wang. 2018. You are your photographs: Detecting multiple identities of vendors in the darknet marketplaces. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ACM, 431–442.
    [36]
    Zack Whittaker. [n. d.]. Deep Dot Web Indictment. https://www.documentcloud.org/documents/5993699-Deep-Dot-Web-Indictment.html
    [37]
    Aaron van Wirdum. 2019. Major Darknet Marketplace Wall Street Market Shuttered by Law...https://bitcoinmagazine.com/articles/major-darknet-marketplace-wall-street-market-shuttered-law-enforcement
    [38]
    Richong Zhang, Zhiyuan Hu, Hongyu Guo, and Yongyi Mao. 2018. Syntax Encoding with Application in Authorship Attribution. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2742–2753.
    [39]
    Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, Xin Li, Liang Zhao, Chuan Shi, Jiabin Wang, and Qi Xiong. 2019. Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network. In The World Wide Web Conference. ACM, 3448–3454.

    Cited By

    View all
    • (2024)Detecting Substance Use Disorder Using Social Media Data and the Dark Web: Time- and Knowledge-Aware StudyJMIRx Med10.2196/485195(e48519-e48519)Online publication date: 1-May-2024
    • (2023)Forensic investigation of the dark web on the Tor network: pathway toward the surface webInternational Journal of Information Security10.1007/s10207-023-00745-423:1(331-346)Online publication date: 22-Aug-2023
    • (2023)Link Prediction-Based Multi-Identity Recognition of Darknet VendorsInformation and Communications Security10.1007/978-981-99-7356-9_19(317-332)Online publication date: 20-Oct-2023
    • Show More Cited By

    Index Terms

    1. eDarkFind: Unsupervised Multi-view Learning for Sybil Account Detection
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            WWW '20: Proceedings of The Web Conference 2020
            April 2020
            3143 pages
            ISBN:9781450370233
            DOI:10.1145/3366423
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 20 April 2020

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. Correlation Analysis
            2. Darknet Market
            3. Drug Trafficker Identification
            4. Multi-view Learning
            5. Stylometry
            6. Sybil Detection

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Conference

            WWW '20
            Sponsor:
            WWW '20: The Web Conference 2020
            April 20 - 24, 2020
            Taipei, Taiwan

            Acceptance Rates

            Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)36
            • Downloads (Last 6 weeks)4
            Reflects downloads up to

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Detecting Substance Use Disorder Using Social Media Data and the Dark Web: Time- and Knowledge-Aware StudyJMIRx Med10.2196/485195(e48519-e48519)Online publication date: 1-May-2024
            • (2023)Forensic investigation of the dark web on the Tor network: pathway toward the surface webInternational Journal of Information Security10.1007/s10207-023-00745-423:1(331-346)Online publication date: 22-Aug-2023
            • (2023)Link Prediction-Based Multi-Identity Recognition of Darknet VendorsInformation and Communications Security10.1007/978-981-99-7356-9_19(317-332)Online publication date: 20-Oct-2023
            • (2022)CAMul: Calibrated and Accurate Multi-view Time-Series ForecastingProceedings of the ACM Web Conference 202210.1145/3485447.3512037(3174-3185)Online publication date: 25-Apr-2022
            • (2022)Identification of Chinese dark jargons in Telegram underground markets using context-oriented and linguistic featuresInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10303359:5Online publication date: 1-Sep-2022
            • (2021)“When they say weed causes depression, but it’s your fav antidepressant”: Knowledge-aware attention framework for relationship extractionPLOS ONE10.1371/journal.pone.024829916:3(e0248299)Online publication date: 25-Mar-2021
            • (2021)Assessing the Severity of Health States based on Social Media Posts2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9411980(5728-5735)Online publication date: 10-Jan-2021
            • (2021)Hidden Buyer Identification in Darknet Markets via Dirichlet Hawkes Process2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671406(581-589)Online publication date: 15-Dec-2021
            • (2020)Drug Abuse Ontology to Harness Web-Based Data for Substance Use Epidemiology Research: Ontology Development Study (Preprint)JMIR Public Health and Surveillance10.2196/24938Online publication date: 10-Oct-2020

            View Options

            Get Access

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media