Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

Lung_Cancer_Prediction_and_Classification_Using_Machine_Learning_Algorithms

This research presents an automated system for lung cancer detection using various machine learning algorithms and image processing techniques. The study focuses on accurately classifying different types of lung cancer through a comprehensive dataset of medical images, employing methods such as Random Forest, Decision Trees, and Support Vector Machines. The proposed system achieved a high accuracy of 99.7% in distinguishing between malignant and healthy lung tissues, contributing significantly to early lung cancer diagnosis.

Uploaded by

Sahithi Kalakoti
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lung_Cancer_Prediction_and_Classification_Using_Machine_Learning_Algorithms

This research presents an automated system for lung cancer detection using various machine learning algorithms and image processing techniques. The study focuses on accurately classifying different types of lung cancer through a comprehensive dataset of medical images, employing methods such as Random Forest, Decision Trees, and Support Vector Machines. The proposed system achieved a high accuracy of 99.7% in distinguishing between malignant and healthy lung tissues, contributing significantly to early lung cancer diagnosis.

Uploaded by

Sahithi Kalakoti
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2024 International Conference on Expert Clouds and Applications (ICOECA)

Lung Cancer Prediction and Classification using


Machine learning Algorithms
2024 International Conference on Expert Clouds and Applications (ICOECA) | 979-8-3503-8579-3/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICOECA62351.2024.00176

Dr Satyanarayana Murthy Kalli Likhitha Gurindapalli Srilatha


Nimmagadda Electronics and Communication Electronics and
Electronics and Communication VR siddhartha engineering Communication VR
VR siddhartha engineering college(Jntuk) siddhartha engineering
college(Jntuk) Vijayawada, Vijayawada, India college(Jntuk)
India nsmmit@gmail.com Kallilikhitha@gmail.com Vijayawada, India
gurindapallisrilatha@gmai
l.com
Sajja Monika Sree
Electronics and Communication
VR Siddhartha engineering
college(Jntuk)
Vijayawada, India
monikasajja123@gmail.com

ABSTRACT: This research addresses the challenges in the


early identification of lung cancer by proposing an automated II. LITERATURE SURVEY
system using various Machine Learning (ML) and image
processing techniques. This study has utilized a collection of The research paper "Automatic detection of lung cancer in chest X-
images including different lung cancer types and healthy lung rays using deep learning" by Sefat et al. discussed about the use of
tissue. By analyzing this data, deep understanding has gained deep learning techniques in the automated detection of lung cancer
on its properties. Then, this study has tested various machine using chest X-ray images. The article analyzed different deep
learning models and selected the most accurate one for real- learning architectures, examined relevant datasets, and analyzed the
time implementation. The proposed system's effectiveness in performance metrics used in previous studies.
lung cancer detection is finally analyzed by using performance
evaluation. Finally, the system is deployed with a user-friendly Kumar et al. offered valuable insights into CAD systems created to
interface to aid healthcare professionals in diagnosing lung detect lung cancer. The article covered various aspects, including
cancer. image preprocessing techniques, feature extraction techniques, and
Keywords: Healthcare, Machine Learning, Lung Care, Disease machine learning algorithms used in CAD systems.
Detection
Gupta et al provided a comprehensive survey of machine learning
methods applied to perform automated lung cancer detection. The
I. INTRODUCTION work covers topics such as image preprocessing, feature extraction,
Lung cancer remains a major challenge in the field of oncology and classification algorithms, and performance evaluation metrics.
remains the leading cause of cancer-related deaths worldwide.
Despite significant advances in medical technology and therapy, the A research paper titled "Automatic detection and classification of
outlook for lung cancer patients continues to depend largely on lung cancer using machine learning techniques" by Vignesh et al.
prompt and accurate diagnosis. In our research, we present an provided a comprehensive overview of machine learning methods
automatic lung cancer detection system that harnesses the power of applied to the automated detection and classification of lung cancer.
machine learning algorithms and image processing techniques. Our This study has discussed about various aspects including feature
main goal is to create a robust and efficient solution that can extraction, dimensionality reduction and classification algorithms.
accurately detect different types of lung cancer, including
adenocarcinoma, large cell carcinoma, squamous cell carcinoma and Ahmadi et al. focused on a critical aspect of lung cancer diagnosis:
normal lung tissues. Using a comprehensive set of medical images, automatic detection and classification of lung nodules. The article
we are embarking on a journey to train and evaluate multiple machine explored several topics, including image processing techniques,
learning models, each of which will become a cornerstone of feature extraction methods, and machine learning algorithms
automatic lung cancer diagnosis. Through careful data collection, specifically used for lung nodule detection.
preprocessing and visualization, we create the foundation for our
automatic lung cancer detection system. As our journey progresses, Mohammed et al. specified the application of machine learning
we delve into model training and evaluation and rigorously test state- techniques in diagnosing lung cancer. The article covered topics such
of-the-art machine learning algorithms. Rather than just accuracy, we as feature selection methods, classification algorithms, and
give added weight to performance metrics such as precision, recalls performance evaluation measures for lung cancer detection systems.
and F1 score, which are crucial in assessing the diagnostic properties
of a model for various cancer types.

979-8-3503-8579-3/24/$31.00 ©2024 IEEE 1012


DOI 10.1109/ICOECA62351.2024.00176
Authorized licensed use limited to: MVSR Engineering College. Downloaded on October 19,2024 at 07:38:01 UTC from IEEE Xplore. Restrictions apply.
III. PROPOSED SYSTEM
Figure 1 depicts the proposed research flow using a systematic
flowchart.

Fig .2. Types of Lung Cancer


2. Data Acquisition and Preprocessing
The first step involves collecting medical imaging data from
different types of lung cancer. This dataset contains CT scan images
of lung cancer representing adenocarcinoma, large cell carcinoma,
squamous cell carcinoma and normal lung tissue. Each image is
labeled with a specific category corresponding to its type of cancer.
To prepare images to feed a machine learning model, we resize them
to standardized dimensions and convert them to numeric tables. In
addition, we use data augmentation techniques such as rotation,
scaling and translation to improve the reliability of the model and
Fig. 1. System Diagram reduce overfitting. The major steps of data preprocessing are:
Collecting the data, checking for noisy or missing values, Resolving
the missing value issue, Arranging the data, and Scaling and
distributing the data into particular sets.
IV. DESIGN METHODOLOGY
This research study focuses on building a robust framework for 3. Feature Selection
lung cancer detection using machine learning (ML) algorithms.
This structured approach involves several steps, each of which is key Feature selection is a crucial step in machine learning where the most
to the efficiency and reliability of our proposed method. relevant features are chosen to build robust models while reducing
complexity. This process involves evaluating the importance of each
1. Dataset Description feature based on various metrics such as correlation, information
gain, or model coefficients. Techniques like filtering, wrapper
The dataset is divided into three main folders: training, test and methods, and embedded methods are commonly employed to
validation. Each of these folders has subfolders representing identify and retain the most informative features, enhancing model
different types of lung tissue, including adenocarcinoma, large cell performance and interpretability.
carcinoma, squamous cell carcinoma, and normal tissue. These
subfolders contain medical images depicting the corresponding lung
4. Machine Learning Algorithms
tissue types.

Adenocarcinoma 474
1. Random Forest: Random Forest, an ensemble learning
technique, builds multiple decision trees during training. It
Large cell carcinoma 332 then combines their predictions by taking the class mode (for
classification) or the mean prediction (for regression). It
Squamous cell carcinoma 360 should be noted that Random Forest is robust to
Normal 363 oversampling and works efficiently with high-dimensional
data. The method can be used for classification and
The ordered structure of the dataset allows it to be divided into regression, making it a versatile approach.
separate subsets for training, testing and validation. This approach 2. Decision Tree: A simple but effective supervised learning
provides comprehensive coverage and extensive evaluation of method for solving classification and regression problems is
machine learning models for lung cancer detection. called a decision tree. To maximize data access or minimize
dirtiness, it recursively partitions the data set based on
characteristic values to create a hierarchical tree structure.
Decision trees can handle categorical and numerical data and

1013

Authorized licensed use limited to: MVSR Engineering College. Downloaded on October 19,2024 at 07:38:01 UTC from IEEE Xplore. Restrictions apply.
are interpretable. K among its nearest neighbors in the feature space, it can
3. Logistic Regression: Binary classification tasks use a linear classify a new case. KNN is easy to understand. easy to use and
classification algorithm, known as logistic regression. suitable for datasets with complex decision constraints.
Logistic regression is a classification method that uses a
logistic (sigmoid) function to predict the probability that a
case belongs to a certain class. It works well with linearly ii. Evaluating parameters
separable data and is interpretable and computationally Several additional performance criteria are needed to evaluate the
economical. performance of a classification model. F1 score, precision, recall
and support are some of these metrics, as shown in Figure 4.
4. Support Vector Machine (SVM): SVM is a powerful Taken as a whole, these metrics provide a comprehensive
supervised learning method that can be applied to regression and assessment of the model's performance in lung cancer detection
classification problems. It works well with both linearly and and provide insight into its ability to accurately detect different
nonlinearly separable data because it defines an ideal types of lung tissue while reducing false positives and false
hyperplane that maximizes the margin between classes in the negatives.
feature space. SVM handles high-dimensional data well and is
robust to overfitting.
5. K-Nearest Neighbors (KNN): KNN is an example-based
non-parametric learning technique used to solve regression and
classification problems. By specifying the majority class label

5. Data Acquisition
Data acquisition involves collecting relevant data from various
sources to be used for analysis or training machine learning
models. This process typically begins with defining the data
requirements, identifying potential sources, and then collecting
the data. Techniques for data acquisition include web scraping, Fig. 4: Performance metrics
API calls, database queries, sensor data collection, and manual iii. Confusion matrix
data entry. It's essential to ensure data quality, validity, and The confusion matrix is a practical instrument to evaluate the
compliance with privacy regulations throughout the acquisition performance of varying classification models as shown in Figure
process. Once the data is acquired, it can be preprocessed, 5, which provides a comprehensive analysis of the differences
cleaned, and transformed to prepare it for analysis or model between the model predictions and the actual classes in the
training. dataset. This is particularly useful for finding out what mistakes
the model makes and where it needs to be improved. By
6. Cross- Validation examining the confusion matrix, we can gain insight into the
strengths and weaknesses of a classification model, identify
Cross-validation is a technique used to assess the performance of patterns of misclassification, and modify the model to optimize
a machine learning model by partitioning the available data into its performance in real-world scenarios.
subsets for training and testing. The process involves splitting the
data into k-folds, where the model is trained on k-1 folds and
evaluated on the remaining fold. This process is repeated k times,
with each fold serving as the test set exactly once. Cross-
validation helps to ensure that the model's performance is not
overly dependent on a particular subset of the data, providing a
more reliable estimate of its generalization ability.

V. IMPLEMENTATION AND RESULTS

The preprocessed dataset was used to train the selected models.


The model parameters and hyperparameters were adjusted to
reduce overfitting and increase performance. Evaluation metrics
were used to compare the performance of each model. This
comparison made it possible to determine which algorithm is the
best for lung cancer detection.
Fig. 5: Confusion Matrix

i. Evaluating accuracies iv. Predicting the lung cancer


The model's classification choice can be understood by users
through the visual representation of Figure 6, which depicts the
predicted class of lung cancer. To facilitate medical diagnosis and
decision-making, the predicted lung cancer grade label provides
important information about the type of lung tissue detected by
the model. Users can confidently use the diagnostic capabilities
of the model in clinical practice and better understand its
Fig.3 : Accuracies for different models performance by seeing the predicted lung cancer grade on the
image.
The Random Forest algorithm is quite capable of correctly
classifying different lung tissues from medical photographs with
an accuracy of 99.7 percent. This level of accuracy shows the
stability and reliability of the model in distinguishing between
malignant and healthy tissue types.

1014

Authorized licensed use limited to: MVSR Engineering College. Downloaded on October 19,2024 at 07:38:01 UTC from IEEE Xplore. Restrictions apply.
Photonics.
[4] Siegel, R. L., Miller, K. D. and Jemal, A., “Cancer statistics, 2018,”
CA. Cancer J. Clin. 68(1), 7–30 (2018). Günaydin, Ö.,
Günay, M., & Şengel, Ö. (2019, April). Comparison of lung cancer
detection algorithms. In 2019 Scientific Meeting on Electrical-
Electronics & Biomedical Engineering and Computer Science (EBBT)
(pp. 1-4). IEEE.
[5] Zeebaree, D. Q., Haron, H., Abdulazeez, A. M., & Zebari, D. A. (2019,
April). Trainable Model Based on New Uniform LBP Feature to
Identify the Risk of the Breast Cancer. In 2019 International
Conference on Advanced Science and Engineering (ICOASE) (pp. 106-
111). IEEE.
[6] Zebari, D. A., Zeebaree, D. Q., Abdulazeez, A. M., Haron, H., &
Hamed, H. N. A. (2020). Improved Threshold Based and Trainable
Fully Automated Segmentation for Breast Cancer Boundary and
Pectoral Muscle in Mammogram Images. IEEE Access, 8, 203097-
203116.
[7] Alakwaa, W., Nassef, M., & Badr, A. (2017). Lung cancer detection
and classification with 3D convolutional neural network (3D-CNN).
Lung Cancer, 8(8), 409.
[8] Somvanshi, M., Chavan, P., Tambade, S., & Shinde, S. V. (2016,
August). A review of machine learning techniques using decision tree
and support vector machine. In 2016 International Conference on
Fig.6: Lung Cancer Prediction Computing Communication Control and automation (ICCUBEA) (pp.
1-7). IEEE.
[9] Maione, C., Barbosa Jr, F., & Barbosa, R. M. (2019). Predicting the
VI. CONCLUSION botanical and geographical origin of honey with multivariate data
analysis and machine learning techniques: A review. Computers and
Electronics in Agriculture, 157, 436-446.
This study has demonstrated the ability of Machine Learning [10] Sulaiman, D. M., Abdulazeez, A. M., Haron, H., & Sadiq, S. S. (2019,
(ML) algorithms to accurately classify several different types of April). Unsupervised Learning Approach-Based New Optimization K-
lung tissue using medical images. The research will make an Means Clustering for Finger Vein Image Localization. In 2019
important contribution to the ongoing fight against lung cancer International Conference on Advanced Science and Engineering
(ICOASE) (pp. 82-87). IEEE.
and improve patient outcomes by achieving high precision with
[11] Huang, C. H., Zeng, C., Wang, Y. C., Peng, H. Y., Lin, C. S.,
advanced approaches. The Random Forest (RF) algorithm was
Chang, C. J., & Yang, H. Y. (2018). A study of diagnostic accuracy
the most effective model used for correctly classifying lung using a chemical sensor array and a machine learning technique to
tissue types with 99.7% accuracy. In addition to precision, other detect lung cancer. Sensors, 18(9), 2845.
performance measures such as precision, recall and F1 score [12] Singh, G. A. P., & Gupta, P. K. (2019). Performance analysis of various
also provide insights into the performance of classification machine learning-based approaches for detection and classification of
models. lung cancer in humans. Neural Computing and Applications, 31(10),
6863-6877.
[13] Alam, J., Alam, S., & Hossan, A. (2018, February). Multi-stage lung
REFERENCES cancer detection and prediction using multi-class svm classifie. In 2018
International Conference on Computer, Communication, Chemical,
Engineering (IC4ME2) (pp. 1-4). IEEE.
[1] Yu, K. H., Lee, T. L. M., Yen, M. H., Kou, S. C., Rosen, B.,
[14] Reddy, U., Reddy, B., & Reddy, B. (2019). Recognition of Lung Cancer
Chiang, J. H., & Kohane, I. S. (2020). Reproducible Machine Learning Using Machine Learning Mechanisms with Fuzzy Neural Networks.
Methods for Lung Cancer Detection Using Computed Tomography Traitement du Signal, 36(1), 87-91.
Images: Algorithm Development and Validation. Journal of medical [15] Bhatia, S., Sinha, Y., & Goel, L. (2019). Lung cancer detection: A deep
Internet research, 22(8), e16709. learning approach. In Soft Computing for Problem Solving (pp. 699-
[2] Radhika, P. R., Nair, R. A., & Veena, G. (2019, February). A 705). Springer, Singapore.
Comparative Study of Lung Cancer Detection using Machine Learning
Algorithms. In 2019 IEEE International Conference on Electrical,
Computer and Communication Technologies (ICECCT) (pp. 1-4).
IEEE.
[3] Hussain, L., Rathore, S., Abbasi, A. A., & Saeed, S. (2019, March).
Automated lung cancer detection based on multimodal features
extracting strategy using machine learning techniques. In Medical
Imaging 2019: Physics of Medical Imaging (Vol. 10948, p.
109483Q). International Society for Optics and

1015

Authorized licensed use limited to: MVSR Engineering College. Downloaded on October 19,2024 at 07:38:01 UTC from IEEE Xplore. Restrictions apply.

You might also like