Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
ISSN No:-2456-2165
Abstract:- Lung cancer is the expansion of malignant classifiers developed in this work predicted the various factors
cells in the lungs. Due to the rising frequency of cancer, that influence the survival time, would help doctors make
both the death rate for men and women has increased. more informed decisions about treatment plans and help
Lung cancer is a condition in which lung cells proliferate patients develop more educated decisions about different
uncontrolled. Although lung cancer cannot be averted, treatment options.
the risk can be decreased. Therefore, early identification
of lung cancer is essential for improving patient survival. This study has explained the survival rate analysis of
Lung cancer incidence is directly inversely correlated patients with advanced lung cancer who did not receive any
with the frequency of heavy smokers. Various type of therapeutic modality and to evaluating performance
classification techniques, including Naive Bayes, Random scores daily activities the results of this study have found
forest, Logistic Regression, Knn, Kernal svm and slight improvement in survival rates. Random Forest
Artificial neural network were used to investigate the algorithms were found to result in the good prediction
lung cancer prediction. The primary goal of this study is performance in terms of accuracy of 88% and Artificial
to investigate the effectiveness of classification algorithms neaural network were found in the best prediction giving
and neaural network in the early identification of lung accuracy of 89%.
cancer.
II. LITERATURE REVIEW
Keywords:- Naive Bayes, Random Forest, Logistic
Regression, Knn , Kernal svm, Artificial Neaural Network In paper [11], Pankaj Nanglia, Sumit Kumar, and others
,Machine Learning, Lung Cancer. introduced a novel hybrid technique known as the Kernel
Attribute Selected Classifier, in which they integrate SVM
I. INTRODUCTION with Feed-Forward Back Propagation Neural Network,
assisting in lowering the computational complexity of the
Lung Cancer is the most treacherous disease for human classification. They suggested three block processes for the
beings. Lung cancer is responsible for more deaths than classification, processed the Block 1 is the dataset. The first
combined death count of colon, prostate, ovarian and breast block involves feature extraction using the SURF method, the
cancer . Lung cancer is a serious health concern for humans second block involves optimization using a genetic algorithm,
and alone in the United States of America with a count of and the third block involves classification using FFBPNN.
225,000 people each year . The main factor causing lung
cancer is smoking and the duration of smoking is directly Chao Zhang, Xing Sun, Kang Dang, and others use the
proportional to the person getting affected with cancer. To multicenter data set to conduct a sensitivity analysis in
detect lung cancer manually is a very tedious and risky job paper [12]. The two categories they selected were
even for specialists. To gain deeper insights and identification Diameter and Pathological outcome.
of lung cancer in early stages, different machine leaning In paper [18] K.Mohanambal , Y.Nirosha et al studied
methods are used in classification. By applying techniques structural co-occurrence matrix (SCM) to extract the
such as random forest and other classification algorithms, an feature from the images and based on these features
automated system can be built which can perform with higher categorized them into malignant or benign. The SVM
accuracy rate and helps in accurate classification. classifier is used to classify the lung nodule according to
their malignancy level (1 to 5).
lung cancer is the leading cause of cancer death in both Radhika P. R. and Rakhi. A. S. Nair's paper [16] primarily
men and women in the United States. The main objective of focused on the prediction and categorization of medical
this paper is to analyze the lung cancer data available models imaging data. They made use of the data.world dataset and
to lung cancer survivability prediction model and to develop the UCI Machine Learning Repository. Support vector
accurate survival prediction models using Machine Learning. machines had superior accuracy (99.2%), according to a
Logistic regression,naïve byes, knn ,Random Forest (RF) comparative research using several machine learning
,Kernal svm, Artificial neaural network have been applied for algorithms. Naive Bayes provides 10%, Decision Tree
constructing a lung cancer survivability prediction model. The
IJISRT23JUN281 www.ijisrt.com 68
Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
provides 80% 87.87% and 66.7% are provided via logistic cancer detection ought to be more precise and reliable. The
regression. open source is used to gather lung cancer parameters. Python
The algorithm for lung cancer detection was examined in is the programming language in use.
paper [17] by Vaishnavi. D1, Arya. K. S2, Devi Abirami.
T3, and M. N. Kavitha4. They used the discretely sampled Numerous parameters, such as smoking, anxiety, peer
Dual-tree Complex Wavelet Transform (DTCWT) for pre- pressure, chronic disease, fatigue, allergy, alcohol consuming,
processing. The second order statistical texture analysis etc., are used to predict the lung cancer. The user starts
approach known as GLCM provides a table of the co- activity in this system by using lung cancer dataset. Data
occurrence of various combinations of Gray levels in an gathered from the user during data collection and pre-
image. processing processes is utilized .The initialization data is then
analyzed and splitted into training and testing dataset then the
III. METHODOLOGY model is fitted into the dataset, which evaluates the dataset
and give accuracy to the user.
The total economic development of a developing
country, such as India, where the majority of the population The system is shown as a block diagram :
depends on health, is scared of lung cancer. Therefore, lung
Apply
Collection Selection Processing machine Evaluate Display
of data of data data learning result final result
algorithms
IJISRT23JUN281 www.ijisrt.com 69
Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Hidden Layers: There may be one or more hidden layers to solve the problem will determine the loss function that
between the input and output layers. Multiple synthetic is used.
neurons or units are present in each hidden layer, Backpropagation: The primary algorithm used to train the
processing data and transmitting it to subsequent layers. neural network is backpropagation. In order to minimise
Weights and Bias: Each neuronal link inside the network the loss, it calculates the gradient of the loss function with
has a corresponding weight. These weights are modified respect to the network weights and modifies the weights in
during the training phase to enhance the performance of the opposite direction of the gradient. Usually,
the network. Each neuron also has a bias, which can be optimisation methods like stochastic gradient descent
thought of as an activation threshold. (SGD) or its variations are used for this process.
Activation Function: A neuron's output is determined by Training: The neural network is trained by supplying
its inputs and internal state by the activation function. training examples to the network periodically, modifying
Sigmoid, ReLU (Rectified Linear Unit), and tanh the weights via backpropagation, and optimising the loss
(hyperbolic tangent) are often used activation functions. function. The aim is to reduce the loss and enhance the
They give the network non-linearities, which help it learn forecast accuracy of the network.
intricate patterns. Prediction: After the neural network has been trained,
Loss Function: A loss function evaluates the discrepancy predictions can be made using brand-new, unexplored
between the neural network's output and the predicted data. Forward propagation is used to feed the input data
output. Whether regression or classification is being used through the network, and the output layer delivers the
anticipated outcome.
IJISRT23JUN281 www.ijisrt.com 70
Volume 8, Issue 6, June – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The confusion matrix is given as: depicts the true label Second application of this research could be used for full
vs. the predicted label scaled system for assistance to the radiologists and doctors for
better decision making. In future work, more numbers of
datasets and parameters should be taken into consideration
which can benefit the classifiers.
REFERENCES
IJISRT23JUN281 www.ijisrt.com 71