Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Algorithms and Corpora for Persian Plagiarism Detection

Overview of PAN at FIRE 2016

  • Conference paper
  • First Online:
Text Processing (FIRE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10478))

Included in the following conference series:

Abstract

The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. The goal of this shared task is to bring together researchers and practitioners around the exciting topic of plagiarism detection and text-reuse detection. We report on the outcome of the shared task, which divides into two subtasks: text alignment and corpus construction. In the first subtask, nine teams participated, whereas the best result achieved was a PlagDet score of 0.92. For the second subtask of corpus construction, five teams submitted a corpus, which were evaluated using the systems submitted for the first subtask. The results show that significant challenges remain in evaluating newly constructed corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://ictrc.ac.ir/plagdet

References

  1. Bensalem, I., Boukhalfa, I., Rosso, P., Abouenour, L., Darwish, K., Chikhi, S.: Overview of the AraPlagDet PAN@ FIRE2015 Shared Task on Arabic Plagiarism Detection, vol. 1587, pp. 111–122. CEUR-WS.org (2015)

    Google Scholar 

  2. Ehsan, N., Shakery, A.: A pairwise document analysis approach for monolingual plagiarism detection. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  3. Esteki, F., Safi Esfahani, F.: A plagiarism detection approach based on SVM for Persian texts. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  4. Gharavi, E., Bijari, K., Zahirnia, K., Veisi, H.: A deep learning approach to Persian plagiarism detection. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  5. Gillam, L., Vartapetiance, A.: From English to Persian: conversion of text alignment for plagiarism detection. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  6. Gollub, T., Burrows, S., Stein, B.: First experiences with TIRA for reproducible evaluation in information retrieval. In: SIGIR, vol. 12, pp. 52–55, August 2012

    Google Scholar 

  7. Gollub, T., Stein, B., Burrows, S.: Ousting ivory tower research: towards a web framework for providing experiments as a service. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1125–1126. ACM, August 2012

    Google Scholar 

  8. Gollub, T., Stein, B., Burrows, S., Hoppe, D.: TIRA: configuring, executing, and disseminating information retrieval experiments. In: 2012 23rd International Workshop on Database and Expert Systems Applications, pp. 151–155. IEEE, September 2012

    Google Scholar 

  9. Hopfgartner, F., Hanbury, A., Müller, H., Kando, N., Mercer, S., Kalpathy-Cramer, J., Potthast, M., Gollub, T., Krithara, A., Lin, J., Balog, K.: Report on the Evaluation-as-a-Service (EaaS) expert workshop. In: ACM SIGIR Forum, vol. 49, no. 1, pp. 57–65. ACM, June 2015

    Google Scholar 

  10. Khoshnavataher, K., Zarrabi, V., Mohtaj, S., Asghari, H.: Developing monolingual Persian corpus for extrinsic plagiarism detection using artificial obfuscation. Notebook for PAN at CLEF 2015. In: CLEF (Working Notes) (2015)

    Google Scholar 

  11. Mansoorizadeh, M., Rahgooy, T.: Persian plagiarism detection using sentence correlations. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  12. Mashhadirajab, F., Shamsfard, M.: A text alignment algorithm based on prediction of obfuscation types using SVM neural network. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  13. Mashhadirajab, F., Shamsfard, M., Adelkhah, R., Shafiee, F., Saedi, S.: A text alignment corpus for Persian plagiarism detection. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  14. Minaei, B., Niknam, M.: An n-gram based method for nearly copy detection in plagiarism systems. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  15. Momtaz, M., Bijari, K., Salehi, M., Veisi, H.: Graph-based approach to text alignment for plagiarism detection in Persian documents. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  16. Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: 3rd PAN Workshop. Uncovering Plagiarism, Authorship and Social Software Misuse (2009)

    Google Scholar 

  17. Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 997–1005. Association for Computational Linguistics, August 2010

    Google Scholar 

  18. Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th international competition on plagiarism detection. In: CLEF (Online Working Notes/Labs/Workshop) (2012)

    Google Scholar 

  19. Potthast, M., Gollub, T., Rangel, F., Rosso, P., Stamatatos, E., Stein, B.: Improving the reproducibility of PAN’s shared tasks. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 268–299. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11382-1_22

    Google Scholar 

  20. Potthast, M., Hagen, M., Göring, S., Rosso, P., Stein, B.: Towards data submissions for shared tasks: first experiences for the task of text alignment. In: Working Notes Papers of the CLEF, pp. 1613–0073 (2015)

    Google Scholar 

  21. Rezaei Sharifabadi, M., Eftekhari, S.A.: Mahak Samim: a corpus of Persian academic texts for evaluating plagiarism detection systems. In: Working notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  22. Shamsfard, M.: Developing FarsNet: a lexical ontology for Persian. In: Proceedings of the 4th Global WordNet Conference (2008)

    Google Scholar 

  23. Talebpour, A., Shirzadi, M., Aminolroaya, Z.: Plagiarism detection based on a novel trie-based approach. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  24. Mohtaj, S., Asghari, H., Zarrabi, V.: Developing monolingual English Corpus for Plagiarism Detection using Human Annotated Paraphrase Corpus—Notebook for PAN at CLEF 2015 (2015)

    Google Scholar 

  25. Asghari, H., Khoshnavataher, K., Fatemi, O., Faili, H.: Developing Bilingual Plagiarism Detection Corpus Using Sentence Aligned Parallel Corpus—Notebook for PAN at CLEF 2015 (2015)

    Google Scholar 

  26. Kong, L., Lu, Z., Han, Y., Qi, H., Han, Z., Wang, Q., Hao, Z., Zhang, J.: Source Retrieval and Text Alignment Corpus Construction for Plagiarism Detection—Notebook for PAN at CLEF 2015 (2015)

    Google Scholar 

  27. Hanif, I., Nawab, A., Arbab, A., Jamshed, H., Riaz, S., Munir, E.: Cross-Language Urdu-English (CLUE) Text Alignment Corpus—Notebook for PAN at CLEF 2015 (2015)

    Google Scholar 

  28. Alvi, F., Stevenson, M., Clough, P.: The Short Stories Corpus—Notebook for PAN at CLEF 2015 (2015)

    Google Scholar 

  29. Cheema, W., Najib, F., Ahmed, S., Bukhari, S., Sittar, A., Nawab, R.: A Corpus for Analyzing Text Reuse by People of Different Groups—Notebook for PAN at CLEF 2015 (2015)

    Google Scholar 

  30. Zarrabi, V., Rafiei, J., Khoshnava, K., Asghari, H., Mohtaj, S.: Evaluation of Text Reuse Corpora for Text Alignment Task of Plagiarism Detection—Notebook for PAN at CLEF 2015 (2015)

    Google Scholar 

Download references

Acknowledgments

This work has been funded by ICT Research Institute, ACECR, under the partial support of Vice Presidency for Science and Technology of Iran - Grant No. 1164331. The work of Paolo Rosso has been partially funded by the SomEMBED MINECO TIN2015-71147-C2-1-P research project and by the Generalitat Valenciana under the grant ALMAMATER (PrometeoII/2014/030). We would like to thank the participants of the competition for their dedicated work. Our special thanks go to the renowned experts who served on the organizing committee for their contributions and devoted work to make this shared task possible. We would like to thank Javad Rafiei and Khadijeh Khoshnava for their help in construction of evaluation corpus. We are also immensely grateful to Vahid Zarrabi for his comments and valuable help along the way which greatly assisted this challenging shared task.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omid Fatemi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Asghari, H., Mohtaj, S., Fatemi, O., Faili, H., Rosso, P., Potthast, M. (2018). Algorithms and Corpora for Persian Plagiarism Detection. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73606-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73605-1

  • Online ISBN: 978-3-319-73606-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics