Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Fyp Progress-1-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

THE UNIVERSITY OF DODOMA

COLLEGE OF INFORMATICS AND VIRTUAL EDUCATION

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

FINAL YEAR PROGRESS ACADEMIC REPORT

YEAR: 2023/2024

TITLE: ONLINE ACADEMIC PLAGIARISM CHECKER


GROUP MEMBERS
STUDENT’S NAME REGISTRATION NUMBER PROGAMME

PERIS HIYOUB T21-03-01602 BSc-CS

SAMSON MZAVA T21-03-10564 BSc-CS

CHARLES MAHENGE T21-03-05695 BSc-CS

YOHANA SANGA T21-03-12266 BSc-CS

RIDHIWANI MBAZO T21-03- 05193 BSc-CS

NAME OF SUPERVISOR SIGNATURE

i
Dr. DEWA ………

Contents
LIST OF FIGURES AND TABLES ............................................................................................. iv
CHAPTER ONE ............................................................................................................................. 1
INTRODUCTION .......................................................................................................................... 1
1.1 Project Overview................................................................................................................... 1
1.3 Objectives .............................................................................................................................. 1
1.3.1 Main objective ................................................................................................................ 2
1.3.2 Specific objective ........................................................................................................... 2
1.4 Project Significance .............................................................................................................. 2
1.5 Project Scope ......................................................................................................................... 3
CHAPTER TWO ............................................................................................................................ 4
LITERATURE REVIEW .............................................................................................................. 4
2.1 Introduction........................................................................................................................... 4
2.2 Definition of Key Terms ....................................................................................................... 4
2.3Theoretical Framework ......................................................................................................... 5
2.4 Related (Similar) Work ........................................................................................................ 5
2.5 Innovation/Research Gap..................................................................................................... 7
CHAPTER THREE........................................................................................................................ 8
METHODOLOGY ......................................................................................................................... 8
3.1 Introduction........................................................................................................................... 8
3.2 Research Approach ............................................................................................................... 8
3.3 Research Method .................................................................................................................. 8
3.4 Study Area/ Location ............................................................................................................ 9
3.5 Data Collection ...................................................................................................................... 9
3.5.1 Data Collection Techniques/Methods .......................................................................... 9
3.5.2 Data Collection Tools ....................................................Error! Bookmark not defined.
3.6 System Requirements ......................................................................................................... 10
3.7 System Architecture .............................................................Error! Bookmark not defined.
3.7.1 Logical Architecture .....................................................Error! Bookmark not defined.
3.7.2 Physical Architecture .................................................................................................. 17

ii
3.8 System Implementation ...................................................................................................... 17
3.8.1 Coding ........................................................................................................................... 19
3.8.2 Testing/Evaluation ....................................................................................................... 19
3.9 System Requirements ......................................................................................................... 21
3.9.1 Hardware Requirements ............................................................................................. 21
3.9.2 Software Tools Requirements ..................................................................................... 21
CHAPTER FOUR ........................................................................................................................ 22
PROJECT ACTIVITIES AND MILESTONES ........................................................................ 22
(WORK DONE) ............................................................................................................................ 22

iii
LIST OF FIGURES AND TABLES
Figure 1:Comparison Table ........................................................................................................... 6
Figure 2:Agile Method ..................................................................................................................
10
Figure 3:Flow Chart for Instructor............................................................................................. 14
Figure 4:Flow Chart For Student ................................................................................................ 15
Figure 5:Context Diagram ........................................................................................................... 16
Figure 6:Use Case ........................................................................................................................ 17
Figure 7: Level 0 Diagram ........................................................................................................... 18
Figure 8:System Architecture ...................................................................................................... 20

iv
v
CHAPTER ONE
INTRODUCTION

1.1 Project Overview


ONLINE ACADEMIC PLAGIARISM CHECKER is the web-based system that can
detect the similarities in copies of text and detect the percentage of plagiarism. Plagiarism is
a serious issue that needs to be controlled and monitored. It refers to the act of blindly copying
someone else’s work and presenting it as your unique work. Plagiarism is done by
paraphrasing sentences, using similar keywords, changing the form of sentences, and so on.
In this sense, plagiarism is like theft of intellectual property. This plagiarism detector used the
data mining method and machine learning algorithms methods. In this software, users can
register by login by creating a valid login id and password.
In this plagiarism checker software, users can register with their basic registration
details and create a valid login id and password. By using login id and password, students
can login into their personal accounts. After that students can upload an assignment/IPT
report file, which will further divide into content and reference link. This web application
will process the content, visit each reference link, and scan the content of that webpage to
match the original and it also compares existing works which are already submitted to check
authenticity of works content. Also, students can view the history of their previous
submitted documents.
1.2 Problem Statement
Serious challenge that faces many academic institutions i.e. UDOM nowadays is
plagiarism whereas the value of having an original work is undermined. Most common
form of plagiarism include paraphrasing, indirect copying, copying from colleagues also
from text generators but we are mainly focused on student’s assignments and IPT projects.
By doing so academic integrity and fair competition will be insured also it will leave
students, educators and professionals invulnerable to plagiarism and its consequences.

1
1.3 Objectives

1.3.1 Main objective.


The main objective of this project is to develop a robust system that provides
plagiarism detection services utilizing advanced data mining techniques, Natural Language
Processing (NLP) and Machine Learning (ML) algorithms. This initiative aims to uphold
academic integrity and enhance the quality of education.

1.3.2 Specific objective.


1. To perform System Requirement Gathering
2. To design system model and information flow
3. To develop and implement a plagiarism detection software

1.4 Project Significance


This project is significant in various ways which are as follows;

• Firstly, the Online Academic Plagiarism Checker plays a pivotal role in


upholding academic integrity by ensuring that students submit original work,
the tool becomes a guardian of honest learning and evaluation processes,
thereby contributing substantially to the overall credibility of academic
institutions. This emphasis on integrity not only fosters a culture of trust but
also reinforces the fundamental values of education.

• Secondly, the project directly impacts the quality of education, Plagiarism


poses a significant threat to the educational experience by discouraging
critical thinking and authentic research. The proposed plagiarism checker is
positioned as a solution to this challenge, actively encouraging students to
engage in independent research. By fostering a culture of intellectual
curiosity and knowledge exploration, the tool becomes an ally in enhancing
the overall quality of education.

2
1.5 Project Scope
The scope of this AI-based plagiarism detector for academic works is to identify and
flag any instances of plagiarism in the submitted works. The detector uses artificial
intelligence to compare the submitted works with a vast database of previously
published works, academic papers, and other sources to identify any similarities. The
plagiarism detector can be used by students and instructors to verify the originality and
authenticity of a text. The detector can also help prevent academic misconduct and
protect academic integrity.

3
CHAPTER TWO
LITERATURE REVIEW
2.1 Introduction
This chapter focuses on plagiarism detection software. We'll look into how it
works, what it can and can't do, and how it's changing to keep things original. We'll
check out the ethics and technical side, exploring algorithms, big databases, and the
small ways plagiarism can happen. By studying current research, we'll see how these
tools affect school and work, and what it means for staying original in the digital world.
The goal is not just to point out problems but also to start a conversation about using
these tools responsibly and ethically.
2.2 Definition of Key Terms
Machine learning - is a type of artificial intelligence that enables computers to learn and
improve from experience without being explicitly programmed. It involves the development
of algorithms that allow systems to automatically learn and make predictions or decisions
based on data.

Artificial Intelligence - refers to the simulation of human intelligence in machines that are
programmed to think and learn. AI systems can perform tasks that typically require human
intelligence, such as problem-solving, speech recognition, and decision-making.

Plagiarism - is the act of using someone else's work, ideas, or intellectual property without
proper attribution or permission, presenting it as one's own. It is considered a form of
academic or intellectual dishonesty.

Plagiarism checker - is a tool or software designed to detect and identify instances of


plagiarism in a given text. It compares the submitted content against a database of sources to
highlight similarities and help prevent academic or professional misconduct.

Database of sources - is a tool or software designed to detect and identify instances of


plagiarism in a given text. It compares the submitted content against a database of sources to
highlight similarities and help prevent academic or professional misconduct.

4
Text match algorithm - is a set of rules or procedures used to identify similarities between
pieces of text. It compares one piece of text to another, highlighting matches or similarities
based on predefined criteria. Text match algorithms are commonly used in plagiarism
checkers to identify and flag potential instances of plagiarism.

2.3 Theoretical Framework


The project “Online Academic Plagiarism Checker” revolves around employing
advanced machine learning techniques to enable detection of plagiarism of student work for
example assignments and field reports. The project aims to develop a system with accurate
results.

2.4 Related (Similar) Work


A. Plagiarism detection in computer programming using feature extraction from Ultrafine Trained
Repositories:

• Authors: Vedran Ljubovic and Enil Pajic

• Year: 2020

• Key idea: Tracks student activities in a cloud-based IDE to create "developer profiles" for
plagiarism detection and identifying students needing extra attention.

• Strengths:

o Ultra-fine-grained code repository captures detailed changes. o


Applicable to other educational purposes beyond plagiarism. o
Compatible with existing IDEs
• Weaknesses:

o Relies on student consent and cloud-based environment.

o Requires analysis of high-resolution logs using machine learning.

5
B. Online Assignment plagiarism checking using Data mining & NLP:

• Authors: Taresh Bokade, Tejas Chede, Dhanashri Kuwar, Prof. Rasika Shintre

• Year: 2021

• Key idea: Combines text mining and NLP to detect plagiarism in submitted assignments,
including paraphrased content and semantic similarities.

• Strengths:
o User-friendly web interface with login and document upload.

o Compares document content with uploaded reference links and webpages. o

Checks for grammar mistakes and semantic plagiarism.

• Weaknesses:

o May not handle advanced plagiarism techniques like code reuse or structural

mimicking.

o o Relies on accurate reference links and website crawling.

Comparison and Insights:

Both studies offer valuable approaches to plagiarism detection. Here's a comparison:

Feature Ljubovic & Pajic (2020) Bokade et al. (2021)

Focus Programming assignments General assignments

6
Approach Developer profiling from IDE Text mining and NLP analysis
activity

Strengths Highly detailed code analysis, User-friendly interface, checks grammar


applicable beyond plagiarism and semantic plagiarism

Weaknesses
Cloud-based, requires student May not handle advanced plagiarism,
consent, high-resolution log relies on reference links and website
analysis crawling

Figure 1:Comparison Table

2.5 Innovation/Research Gap


Innovation and Research Gap in AI-powered Plagiarism Detection. Despite significant
advancements in plagiarism detection tools, several critical gaps and opportunities remain
unaddressed. Here are some key points for our project's research gap section:

Limitations of Existing Tools:

• Accuracy beyond keyword matching: Current tools often struggle with sophisticated
forms of plagiarism like paraphrasing, structural mimicking, and semantic similarity.
Our project can address this by implementing advanced AI techniques like natural
language processing, sentence embedding.

• Relying only to online sources: most plagiarism checkers scan submitted documents
against vast database of academic sources and flag it and not considering the fact that
someone can copy from colleagues. Therefore, the need of having plagiarism checker
which can submitted against each is important.

7
CHAPTER THREE
METHODOLOGY
3.1 Introduction
A methodology is a formalized approach to develop an online academic plagiarism
checker to check the student’s assignment/IPT reports where the data will be collected and
analyzed by the system.

3.2 Research Approach


This project will adopt a Mixed Research Approach and the reason behind this approach is the
following;

-It help us to conduct Qualitative and quantitaive data collectintitative which will lead to qualitative
and quantitative data analysis

- By combining quantitative and qualitative data, you gain a richer and more complete
understanding of plagiarism and the needs of stakeholders. This leads to a more accurate,
user-friendly, and impactful online plagiarism checker system, promoting wider adoption
and successful implementation.

3.3 Research Method


This project is going to use the agile model because Agile promotes collaboration, flexibility,
and continuous improvement. The applicability of this model is Given the dynamic nature of
technology and user expectations, the Agile model is often a good fit for projects that use AI. It
allows for regular feedback loops, enabling adjustments based on user testing and changing
requirements.

8
Figure 2:Agile Method

3.4 Study Area/ Location


The project will be conducted in our university which is UDOM where there is a high rate
of academic plagiarism due to the rise use of technology, the system will assist instructors
to detect forms of academic plagiarism ranging from assignments up to report files of
practical training.

3.3 Data Collection / Requirements Gathering


3.3.1 Data Collection Techniques/Methods
In this project Data will be collected through various data collection techniques and the aim
of conducting data collection is to gather plagiarism checker system requirements from various
stakeholders, below are the techniques and methods that will be used to collect data;
3.3.2 Data Collection Tools
Quantitative and Qualitative Data Collection:

For both qualitative and quantitative data collection we will use a google form questionare
tool because it is easy to implement,less expensive,it save time and its good for remote access

Number of open ended and close-ended questions will be structured then the google form
questionaire guide is sent to the to the stakeholders and the responses will be saved at aa real
time

9
Literature review and Competitive analysis
In this tool studies will be conducted on others' works to gain insights which will help in
development of our project like how different authors have written about the online
plagiarism checker. Number of Current working system will be critically analysed to check
the gap that will help in analysing the system requirement

3.4 System/Requirements/Data Analysis

Functional Requirements:

1. The system should allow administrators to manage user accounts, including registration,
login, password management, and user roles (e.g., student, instructor, administrator)

2. The system should be able to handle a large volume of document submissions


simultaneously

3. The system should maintain a history of submitted documents and plagiarism reports,
allowing users to track changes over time and access previous reports as needed

4. The system should generate detailed plagiarism reports for each submitted document,
highlighting detected similarities and providing information on the original sources

5. The system should analyse submitted documents to identify similarities with existing
sources from databases

6. The system should generate detailed plagiarism reports for each submitted document,
highlighting detected similarities and providing information on the original sources

7. The system should provide feedback to users on the originality of their work

Non-Functional Requirements:

Performance: Fast and efficient text analysis without significant delays.

Scalability: Ability to handle large volumes of text submissions and reports

10
Accessibility: User-friendly interface accessible to users with diverse technical skills and
abilities.

Security: Secure storage and transmission of user data and submitted text.
Privacy: Protection of user privacy and compliance with relevant data protection
regulations.

Customization: Configurability to adapt to specific needs and workflows of different

Maintainability: Regular updates and improvements to maintain functionality and


address new plagiarism techniques.

3.4 System/Requirements/Data Analysis

System/Requirements/Data will be analyzed using The Entity relashionship


diagram,Data flow diagram,Use Case Diagram and Flow Chart Diagram

ERD

11
USECASE

12
FLOW CHART
STUDENT

Figure 2: Flow Chart for Student

13
TEACHER

Figure 3: Flow Chart for Instructor

14
3.5 System/Model Design/Architecture

The system/model will be designed …….. (Modularization details, Data integrity and
constraints, Database design/Procedural Design/Object Oriented Design, User Interface
Design

How it Works:
➢ Students submit assignments/IPT documents through the frontend, which are stored in
MongoDB collections.
➢ The scheduler within the API server triggers document processing after deadlines or within a
processing window.
➢ Documents are retrieved from MongoDB (source materials) and preprocessed documents are
sent to the comparison service.
➢ The comparison service analyzes documents against source materials
➢ Plagiarism report data is stored in the MySQL database.

15
➢ Reports are generated and potentially made available through the frontend.

3.5.1 Logical Design/Architecture


Context diagram

Level 0 diagram

16
3.5.2 Physical Architecture

3.6 System Implementation


The proposed project will be implemented using techniques, tools, methods;

i. Data Collection and Preparation


Gather text documents: Access from user uploads, learning management systems,
online repositories, or web crawling (if applicable). This enables the system to
handle plagiarism across different platforms and scenarios.
Clean and preprocess text: Normalize text, remove punctuation, handle
special characters, and stem or lemmatize words. Cleaning and preprocessing
text Ensuring consistent and standardized data improves model performance
and feature extraction.
Split data: Divide into training and testing sets (e.g., 80/20 split) for
model development and evaluation. Separate training and testing sets

17
ensure unbiased evaluation and avoid overfitting the model to the
training data
ii. Feature Extraction
This is the process of transforming raw data into numerical features that the machine
learning model can understand and use for making predictions. Text representation
techniques to be applied are Bag-of-Words, TF-IDF, and word embeddings (e.g.,
Word2Vec, GloVe) are widely used and effective for capturing word frequency,
importance, and semantic relationships.
iii. Machine Learning Algorithms
Supervised learning: Training models for classification (plagiarism vs. original)
provides a direct and efficient approach.

Algorithms like SVMs, Naive Bayes, and Random Forests: These are established and
reliable choices for text classification tasks.
Neural networks: Their ability to learn complex patterns makes them
suitable for advanced plagiarism detection, especially for paraphrasing and
subtle plagiarism.

Unsupervised learning: Useful for identifying potential plagiarism cases through


anomaly detection, which can complement supervised learning.
Approach algorithm to be used is Clustering algorithms (K-means)

iv. Similarity Measures


Compare text similarity: Algorithms to be used is Jaccard similarity. It is a
versatile method for comparing text similarity at different levels (word, sentence,
document).

18
3.6.1 Coding
The proposed project will be implemented using the following tools, techniques and methods;
i. Front-End:

• JavaScript - Makes things move and interact, works with popular frameworks for
smooth UI.

• HTML5/CSS3 - The building blocks, HTML structures the content, CSS styles it
beautifully.

• Back-End

• Python - Easy to read and powerful, great for machine learning and web development.
ii. Tools:

• NLP libraries - Understand and analyze the language in the documents.

• Machine learning frameworks - Train the model to recognize plagiarism patterns.


Text processing tools - Extract useful information from text data.
iii. Other Essentials:

• Database - Store all the information about documents and plagiarism scores.
• Frameworks/Libraries - Bootstrap/jQuery for UI enhancements, MySQL for a reliable
database.

3.6.2 Testing/Evaluation
The proposed project will be tested using the following testing and evaluation techniques;
i. Unit Testing

• Test individual text processing functions: Ensure they correctly extract


words, stems, or embeddings.

• Test similarity comparison algorithms: Verify they produce accurate


similarity scores for various text pairs.

• Test database interactions: Check successful storage and retrieval of


assignments and results.

19
ii. Integration Testing

• Test document submission and processing: Verify seamless flow from upload
to analysis.

• Test report generation: Ensure accurate plagiarism scores, highlighted


matches, and comprehensive feedback.

• Test user authentication and authorization: Verify only authorized users can
access and submit assignments

iii. System Testing

• Simulate high-volume submissions: Assess performance and identify


bottlenecks.

• Test different file formats and languages: Ensure compatibility with diverse
assignment types.

• Test security measures: Attempt unauthorized access or data


breaches to evaluate system resilience

iv. Accuracy Evaluation

• Compare system results with expert-labeled assignments: Calculate


precision, recall, and F1-score to measure plagiarism detection accuracy.

• Identify common errors and false positives: Analyze patterns to refine


algorithms and improve accuracy

v. User Acceptance Testing

• Gather feedback from educators and students: Evaluate usability, clarity of


reports, and overall satisfaction.

• Identify areas for improvement: Incorporate user insights to enhance user


experience and effectiveness.

20
• Security testing: Conduct thorough penetration testing to uncover
vulnerabilities

3.7 System Requirements


The project will employ hardware components and software resource which will be
needed in accomplishing our job. The following will be the system requirements for the
successful deployment of the project.

3.7.1 Hardware Requirements

• i3 based processor or higher

• Disk Space - 500 GB.

• Memory at least 4 GB

• Monitor

3.7.2 Software Tools Requirements


• Windows 7 or higher

• Text editor i.e. PyCharm and / or Ms Vs code

• SQL Server

• Web Browser

21
CHAPTER FOUR
PROJECT ACTIVITIES AND MILESTONES
(WORK DONE)

Objectives One
To perform system requirement gathering
S/N Activities Output Progress Status
1 Perform literature review Research gap 100%

2 Performing critical competitive Research gap 100%


analysis
3 System requirements 80%
To collect data through
questionnaire guide

4 System requirements 80%


Data analysis and evaluation

Descriptions (or illustrations) of the Outputs (milestones) achieved in objective one


In the objective one all the activity performed in order to gather data from stakeholders ,competitive
analysis and literature review to obtain System requirements .Different methods and techniques
were employed to obtain detailed System functional and non-functional requirements

Objectives Two
To design system model and information flow
S/N Activities Output Progress Status
1 100%
To system architectural design System architecture

2 System design models 100%


To design ERD, Use case, Flow
charts and DFD.

Descriptions (or illustrations) of the Outputs (milestones) achieved in objective two;


In this objective we have designed system architecture and system design in general .It involves
detailed diagrams like ERD,DFD,USE CASE,DEPLOYMENT DIAGRAM,FLOW CHART and
SYSTEM ARCHITECTURE
22
Objectives Three
To develop and implement a plagiarism detection software
S/N Activities Output Progress Status
1 To develop a plagiarism 30%
detection software i.e. front end, Plagiarism detection
software
back end, databases and
algorithms.

2 datasets 0%
To obtain and preprocessing
datasets and train a model for
plagiarism.

3 Trained model 0%
To train and test the model
using datasets obtained
4 Unit Testing and System 0%
Accurate model
integration

Descriptions (or illustrations) of the Outputs (milestones) achieved in objective three;


In this objective we based on actual development of the system using Web technologies, AI techniques
,libraries and frameworks .It also involves testing and integration of the system components

23
24

You might also like