Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503823.3503828acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article

AMiner Citation-Data Preprocessing for Recommender Systems on Scientific Publications

Published: 22 February 2022 Publication History

Abstract

Recommender Systems (RS) are used to find user’s interested items among a huge amount of digital information, recently called Big Data, with the purpose of making valuable personalized recommendations. These systems use data from digital, online libraries to train, test and evaluate system’s efficiency. Along this line, data preprocessing is an essential and valuable step to achieve information-preserving data reduction and, in addition, to create input files with the appropriate format needed by a RS. This paper describes our approach for data preprocessing using a scientific publications’ dataset (Computer Science) found in AMiner (https://www.aminer.org/). The proposed approach consists of two phases: creation of a collection of articles based on user preferences and preprocessing this collection. The experimental results demonstrate the value of our approach with at least 79.8% information-preserving data reduction.

References

[1]
[1] Tang et al. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’2008). pp.990-998. URL: https://www.aminer.org/ [accessed 2021-09-05]
[2]
[2]Tang, Jie. 2016. AMiner: Toward Understanding Big Scholar Data. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, (WSDM ’16), pp. 467
[3]
[3]DBLP Computer Science Bibliography: Schloss Dagstuhl – Leibniz Center for Informatics. URL: https://dblp.org/ [accessed 2021-08-30]
[4]
[4]Pub Med Central: US National Library of Medicine and National Inst. of Health. URL: https://www.ncbi.nlm.nih.gov/pmc/ [accessed 2021-08-30]
[5]
[5]M. Alfarhood and J. Cheng. 2020. CATA++: A Collaborative Dual Attentive Autoencoder Method for Recommending Scientific Articles, DOI 10.1109/ACCESS.2020.3029722
[6]
[6]X. Li and J. She. 2017. Collaborative variational autoencoder for recommender systems, in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2017, pp. 305–314
[7]
[7]C. Hsieh et al. 2017. Collaborative metric learning, in Proc. 26th Int. Conf. World Wide Web (WWW), 2017, pp. 193–201
[8]
[8]X. Li and J. She. 2017. Relational Variational Autoencoder for Link Prediction with Multimedia Data, Thematic Workshops ’17, Oct. 23–27, Mountain View, CA, USA.
[9]
[9]W. Zhang et al. 2011. A comparative study of TF*IDF, LSI and multi-words for text classification, Expert Systems with Applications, Volume 38, Issue 3, Pages 2758-2765, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2010.08.066
[10]
[10]D.M. Christopher and S. Hinrich. 2001. Foundations of statistical natural language processing, (pp. 529–574), MIT Press, Cambridge, Massachusetts
[11]
[11]M.F. Porter. 1980. An algorithm for suffix stripping, Program, Vol. 14 No. 3, pp. 130-7
[12]
[12]M.F. Porter. 2001. Snowball: a language for stemming algorithms, URL: www.snowball.tartarus.org/texts/introduction.html [accessed 2021-09-01]
[13]
[13]W. Peter. 2006. The Porter stemming algorithm: then and now, DOI 10.1108/00330330610681295
[14]
[14]A. G. Jivani et al. 2011. A Comparative Study of Stemming Algorithms, Int. J. Comp. Tech. Appl., Vol 2 (6), 1930-1938, ISSN:2229-6093
[15]
[15]M. Rezghi and M. Nasiri. 2013. New Algorithm for RS based on Singular Value Decomposition Method.
[16]
[16]N. F. Al-Bakri, S. H. Hashim. 2018. Reducing Data Sparsity in Recommender Systems, Journal of Al-Nahrain University, Vol.21 (2), June, 2018, pp.138-147.
[17]
[17]X. Feng et al. 2019. The Deep Learning–Based Recommender System “Pubmender” for Choosing a Biomedical Publication Venue: Development and Validation Study, J Med Internet Res 2019;21(5):e12957.
[18]
[18]S. Pandya et al. 2016. A Novel Hybrid based Recommendation System based on Clustering and Association Mining, 2016 10th International Conference on Sensing Technology (ICST).
[19]
[19]A. Duhamel. 2003. A preprocessing method for improving data mining techniques. Application to a large medical diabetes database, Studies in Health Technology and Informatics, 01 Jan 2003. 14663998
[20]
[20]F. Coaquira and E. Acuna. 2007. Applications of rough sets theory in data preprocessing for knowledge discovery, in Proceedings of the World Congress on Engineering and Computer Science WCECS 2007, San Francisco, USA
[21]
[21]L. Shuai-dong et al. 2004. Clustering of web learners based on rough set, Wuhan University Journal of Natural Sciences, vol. 9, pp.542–546
[22]
[22]F. Questier et al. 2002. Application of rough set theory to feature selection for unsupervised clustering, Chemometrics and Intelligent Laboratory Systems, vol. 63, no. 2, pp. 155 – 167
[23]
[23]G. Obadi et al. 2010. A Tolerance Rough Set Based Overlapping Clustering for the DBLP Data, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, DOI 10.1109/WI-IAT.2010.286
[24]
[24]M. Burch et al. 2020. eDBLP: Visualizing Scientific Publications, VINCI 2020, December 8–10, 2020, Eindhoven, The Netherlands, https://doi.org/10.1145/3430036.3430052
[25]
[25]Q. Kang et al. 2018. Edge-Attributed Community Search for Large Graphs, ICBDR 2018, October 27–29, 2018, Weihai, China.
[26]
[26]A. Idri et al. 2018. A systematic map of medical data preprocessing in knowledge discovery, https://doi.org/10.1016/j.cmpb.2018.05.007
[27]
[27]Z. Zhao et al. 2018. Identifying advisor-advisee relationships from co-author networks via a novel deep model, https://doi.org/10.1016/j.ins.2018.07.064
[28]
[28]M. Charnine et al. 2020. Impact Factor of a Term: a Tool for Assessing Article’s Future Citations and Author’s Influence Based on and DBLP Collections
[29]
[29]A. Sharma and S. Durga Bhavani. 2019. A Network Formation Model for Collaboration Networks, G. Fahrnberger et al. (Eds.): ICDCIT 2019, LNCS 11319, pp. 279–294, 2019, https://doi.org/10.1007/978-3-030-05366-6_24

Cited By

View all
  • (2024)An academic recommender system on large citation data based on clustering, graph modeling and deep learningKnowledge and Information Systems10.1007/s10115-024-02094-766:8(4463-4496)Online publication date: 1-Aug-2024
  • (2023)Academic Recommender Systems, Status, Challenges and Opportunities2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA)10.1109/IISA59645.2023.10345971(1-8)Online publication date: 10-Jul-2023
  • (2022)Embedding Representation of Academic Heterogeneous Information Networks Based on Federated Learning2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS57298.2022.10016355(516-520)Online publication date: 26-Nov-2022
  • Show More Cited By

Index Terms

  1. AMiner Citation-Data Preprocessing for Recommender Systems on Scientific Publications
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics
            November 2021
            499 pages
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 22 February 2022

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. AMiner citation network dataset
            2. data preprocessing
            3. scientific publications

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Conference

            PCI 2021

            Acceptance Rates

            Overall Acceptance Rate 190 of 390 submissions, 49%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)27
            • Downloads (Last 6 weeks)4
            Reflects downloads up to 14 Oct 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)An academic recommender system on large citation data based on clustering, graph modeling and deep learningKnowledge and Information Systems10.1007/s10115-024-02094-766:8(4463-4496)Online publication date: 1-Aug-2024
            • (2023)Academic Recommender Systems, Status, Challenges and Opportunities2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA)10.1109/IISA59645.2023.10345971(1-8)Online publication date: 10-Jul-2023
            • (2022)Embedding Representation of Academic Heterogeneous Information Networks Based on Federated Learning2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS57298.2022.10016355(516-520)Online publication date: 26-Nov-2022
            • (2022)Hyper-parameters Tuning of Artificial Neural Networks: An Application in the Field of Recommender SystemsNew Trends in Database and Information Systems10.1007/978-3-031-15743-1_25(266-276)Online publication date: 29-Aug-2022

            View Options

            Get Access

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media