JETIR2306482
JETIR2306482
JETIR2306482
org (ISSN-2349-5162)
Abstract : Plagiarism is a problem that is getting worse. It is commonly understood to be literary theft and academic dishonesty in
the literature, and it ought to be avoided at all costs. A survey of plagiarism detection tools is presented in this publication. Common
characteristics of several detecting systems are outlined.
I. INTRODUCTION
Plagiarism, which is typically characterized as literary theft, is one of the rising difficulties facing publishers, researchers, and
educational institutions on a global scale. This entails presenting someone else's ideas, papers, codes, photos, etc. as one's own
creations. Plagiarism refers to this. This demonstrates a dishonest deed in academia and literature, hence it must be avoided.
The student population, particularly undergraduates to postgraduate who modify the original work and offer it as their own, is
where plagiarism in papers occurs most frequently for academic purposes. As a result, the student's capacity to evaluate their own
performance is compromised, and this must be avoided. Because of this, it is necessary to identify plagiarism in documents up front.
Systems that may be utilised in this regard include:
1. Web enabled systems
2. Stand-alone systems
JETIR2306482 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org e395
© 2023 JETIR June 2023, Volume 10, Issue 6 www.jetir.org (ISSN-2349-5162)
The feature-based techniques [1][3] using software metrics, create a feature vector from the input programmed that can be
translated to an n-dimensional point in a cartesian space.
he distance between the points establishes how similar the two programmers are to one another. Lutz Prechill et al. [8] have noted
that feature-based solutions do not take into account important structural information of the programmers and that adding more
metrics for comparison does not increase accuracy.
We have had a peek of the algorithms that can be used to identify plagiarism thanks to Alan Parker et al. [4]. The algorithm that is
the subject of this study is based on string comparisons. It eliminates comments and white spaces, compares the string, keeps track
of the proportion of characters that are the same, and removes blank spaces.
In the research, the authors discussed six different levels of plagiarism and provided examples for each level. These algorithms
were created using Halstead's metrics theories, which have a close relationship with software metrics.
Since this was an older work, the investigation found that textual plagiarism might be automated, saving time and resources.
Students' plagiarism in their assignments has caused a lot of problems for the assessors, thus the author Michael J. Wise et al. [5]
devised a method known as YAP3, which is the third version of YAP and generally operates in two parts. It also deletes the token
that is not a reserved word from the programmed and eliminates the comments and string constants. It also transforms from uppercase
to lowercase, maps synonyms to a common form, rearranges the calling order of the functions, and maps synonyms to a common
form. The report also emphasizes Running-Karp-Rabin Greedy-String-Tiling (RKR-GST), a system for detection developed after
YAP and other detection systems were observed. The technique can also be used to find transposed subsequence. The report also
discusses YAP usage on YAP
on English texts which was a success.
JETIR2306482 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org e396
© 2023 JETIR June 2023, Volume 10, Issue 6 www.jetir.org (ISSN-2349-5162)
A. Collection of assignments: All the assignments or documents will be collected in electronic format. So that plagiarism can be
detected efficiently.
B. Pre-processing: Pre-processing is a major step in the process in which all the assignments are converted into a appropriate format.
All the assignments collected must be in the same format. Numbers, figure values, pictures and all those things which are not
from a-z group should be excluded from the documents.
C. Classification: Text classification should be performed to extract and separate the parts of a sentence into alternative words. With
the help of this key words from a sentence can be found.
JETIR2306482 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org e397
© 2023 JETIR June 2023, Volume 10, Issue 6 www.jetir.org (ISSN-2349-5162)
D. Text analysis: Further, the data will be passed through the text analyzing step. This process can be repeated, sometimes, according
to the need. Moreover different text analyzing techniques can be used according to the nature of text and aims of the institutes.
E. Processing and analyzing the tri-grams: Sequences of three successive words will be considered as tri-grams in every line. They
are created through the cluster of the tri-grams from collection of assignments.
F. Similarity measures: Further in the process, comparison is performed upon the sequence of tri-grams created from the processed
documents, with the help of sequences comparing methods.
G. Clustering the plagiarized data: Clusters are created from the similar tri-grams to calculate the similarity score. Clusters will help
in the calculations and will accelerate the process.
H. Similarity score: Similarity score will be calculated through the clustering of the similar tri-grams. Similarity will be calculated in
the form of percentage. High value of percentage depicts the high similarity score.
According to other study papers and data from other sources, we are generating 98% correct results by applying techniques that
make use of linguistic frameworks.
VI. REFERENCES
[1] Keerthana T V 1, Pushti Dixit 2, Rhuthu Hegde 3, Sonali S K 4, Prameetha Pai5, “A Literature Review on Plagiarism Detection
in Computer Programming Assignments”, IRJET, Volume 9, Issue 3, March. 2022, Pages 1-4.
[2] Joseph L.F. De Kerf, “APL and Halstead's theory of software metrics”, APL '81: Proceedings of the international conference
on APL, October 1981, Pages 89–93.
[3] John L Donaldson, Ann Marie Lancaster and Paula H Sposato, “A plagiarism detection system”, SIGCSE '81: Proceedings of
the twelfth SIGCSE technical symposium on Computer science education, February 1981, Pages 21–25.
[4] Alan Parker and James O. Hamblen, “Computer Algorithms for Plagiarism Detection”, IEEE Transactions On Education, Vol.
32, No. 2. May 1989.
[5] Michael J Wise, “YAP3: improved detection of similarities in computer program and other texts”, SIGCSE '96: Proceedings of
the twenty-seventh SIGCSE technical symposium on Computer science education, March 1996, Pages 130–134.
[6] Saul Schleimer, Daniel S. Wilkerson and Alex Aiken, “Winnowing: Local Algorithms for Document Fingerprinting”, SIGMOD
2003, June 9-12, 2003, San Diego, CA. Copyright 2003 ACM 1-58113-634-X/03/06.
[7] Richard M. Karp and Michael O. Rabin, “Efficient randomized pattern-matching algorithms”, Published in: IBM Journal of
Research and Development (Volume: 31, Issue: 2, March 1987), Page(s): 249 - 260.
[8] Lutz Prechelt and Guido Malpohl, “Finding Plagiarisms among a Set of Programs with JPlag”, March 2003, Journal Of Universal
Computer Science 8(11).
[9] Sven Meyer zu Eissen and Benno Stein, “Intrinsic Plagiarism Detection”, M. Lalmas et al. (Eds.): ECIR 2006, LNCS 3936, pp.
565–569, 2006.
[10] Liang Zhang, Yue-ting Zhuang and Zhen-ming Yuan, “A Program Plagiarism Detection Model Based on Information Distance
and Clustering”, Published in: The 2007 International Conference on Intelligent Pervasive Computing (IPC 2007), Date Added
to IEEE Xplore: 22 January 2008,Print ISBN:978-0-7695-3006-2.
JETIR2306482 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org e398