Disease Drug Prediction Using ML
Disease Drug Prediction Using ML
Disease Drug Prediction Using ML
INTRODUCTION
1.1Problem statement:
After sophisticated computing has come, doctors will need the technology in a
number of fields, such as the form of surgical depiction and x-rays. The treatment
often involves the physician's experience and awareness of different causes, from
medical background to temperature, environment, blood pressure and several other
variables. The vast number of variables are assumed to be whole variables required to
account for the whole workflow, but no model has been tested successfully. To
counter this drawback, patient decision support systems must be used. This device
can allow physicians to make the correct decision. The help system for medical
decisions covers both the procedure of detecting or identifying suspicious diseases or
symptoms and the opinion that this mechanism has been achieved.
1.3 Objective:
Reduce the number of variables and identify diseases using the K-means algorithm
when feasible. This algorithm is ideally suited to organise additional pathogens. K-
Mean is one of the algorithms that solve the clustering issue. Holding the k centroids
one per cluster is the main principle. Different patient controls may function as
clustering features. This algorithm reduces the number of iterations and establishes
cluster boundaries without Overlapping, for each diagnosis to deliver a precise
outcome. This technology utilises service-oriented architecture (SOA), anyone with
1
internet connectivity can login and the LAMSTAR network can measure the weight,
improve algorithm precision, increase overall performance and achieve the best
results.
1.3.1 Proposed System:
Reduce the number of variables and classify the most possible diseases using the K-
means algorithm. This algorithm is best adapted for grouping further pathogens. K-
Mean is one of the unattended learning algorithms that address the issue of clustering.
The key concept is to create the k centroids, one per cluster. Various checks on the
patients are used as a clustering attribute.
By using this algorithm, it decreases the number of iterations, the cluster boundaries
are established without overlapping, such that any diagnosis produces the same
outcome. This technology uses service-oriented architecture (SOA), anybody can
access the web and LAMSTAR Network can measure weight, improve algorithm
precision, speed test overall and deliver better results.
2
CHAPTER 2
TECHNOLOGIES LEARNT
What is Python :-
Programmers need to type very little and have a language indentation requirement,
making it readable all the time.
Python's greatest asset is the vast series of basic libraries that can be used for the
following –
• Learning Machine
• Multilateral Media
Advantages of Python :-
3
1. Extensive Libraries
Python instals a wide repository of code for different uses such as regular
expressions, documentation generation, unit checking, site browsings, threading,
databases, CGI, email, image handling and more. But we don't have to compose the
whole code manually for that.
2. Extensible
3. Embeddable
Python is also embeddable, complimenting extensibility. You should place the Python
code in a different language source code, like C++. This allows us to apply scripting
to our other language code.
4. Improved Productivity
The flexibility of the language and large libraries allow programmers more efficient
than languages such as Java and C++. The reality that you have to compose less and
do better.
5. IOT Opportunities
As Python forms the backbone of modern platforms such as Raspberry Pi, the Internet
Of Things has a promising future. This is how vocabulary can be connected to the
modern world.
You will need to build a class to print 'Hello World' while dealing with Java. But only
a print sentence can do in Python. You can also read, comprehend and code very
easily. This is why when people take Python, they find it difficult to adapt to other
verbs like Java.
4
7. Readable
Since it is not so verbose, reading Python is quite close to reading English. This is
why learning, understanding and code is so simple. There are also no curly braces
required to describe blocks and indentation is compulsory. This also helps to make
the code readable.Centered on the object.
8. Object-Oriented
This vocabulary embraces all the paradigms of proceedings and objects. Although
functions aid us reusability of code, classes and objects enable us to model the real
world. A class requires details and roles to be encapsulated into one.
Like we said before, Python is free to use. However, you can not only use Python for
free, but can even download its source code, adjust it, and even distribute it. It
downloads an extensive library set to support you with your assignments.
10. Portable
If you code your project in a language such as C++, you will have to modify it if you
wish to execute it on another platform. But Python isn't the same. You just need to
code once here, and you can run it anywhere. It is named Compose Anywhere Until
Run (WORA). However, you must be vigilant not to use system-dependent roles.
11. Interpreted
Finally, we'll claim it's a phrase translated. Debugging is simpler than in compiled
languages, since sentences are implemented one by one.
Some questions regarding the benefits of Python before now? In the comment line,
mention.
5
Advantages of Python Over Other Languages
1. Less Coding
Almost any job done in Python needs fewer coding while in other languages the same
work is completed. Python also has great standard library help, so no third-party
library needs to be searched to accomplish the work. This is so many people
recommend that newcomers read Python.
2. Affordable
Python is open, so people, small businesses and large corporations may create apps
using the available free tools. Python is common and commonly used to provide
better help to the group.
The Github 2019 annual survey found Python overtook Java in the most common
division of programming.
Disadvantages of Python
So far, we have shown why the Python project is a perfect option. Yet you can still be
mindful of the implications if you want it. Let's now look at Python's downsides over
another language.
1. Speed Limitations
We saw Python code running line by line. But since Python is interpreted, poor
execution also occurs. However, this is not a concern unless pace is a priority of the
6
project. In other words, the benefits provided by Python are adequate to distract us
from its speed limitations, until high speed is needed.
3. Design Restrictions
Python is dynamically-typed, as you know. This ensures you do not have to declare
the variable form when you write the code. It uses the form of duck. But wait, what is
it? Ok, that also says it could be a duck if it appears like a duck. Although coding is
simple for programmers, runtime errors may be increased.
The access layers of Python's database are rather underdeveloped compared with more
commonly utilised technologies such as JDBC (Java DataBase Connectivity) and ODBC
(Open DataBase Connectivity). Consequently, it is less often seen in large companies.
5. Easy to use.
No, we're not joking. We're not kidding. Simplicity of Python may actually be a challenge.
Take my case. Take my example. I'm more a Python user, I don't do Java. I think that the
syntax is so plain that the Java code's verbosity seems excessive.
This was more about Python Program Language's benefits and drawbacks.
7
Python History: -
What have Python's alphabet and programming language in common? Right, both of them
begin with ABC. If in Python we speak about ABC, it is obvious that ABC is the
programming language. ABC is a general programming language and programming
framework, built at the CWI in the Netherlands, Amsterdam (Centrum Wiskunde &
Informatica). ABC's biggest success was to shape Python's architecture. In the late 1980s,
Python was conceptualised. Guido van Rossum was consulting on the distributed
operating framework, Amoeba, at the CWI. Guido van Rossum said in an interview with
Bill Venners1: "In the early 1980s I worked at Centrum voor Wiskunde et Informatica as
an implementer in a team developing a language named ABC (CWI). I'm not sure how
much people recognise the effect of ABC on Python. I try to note the power of ABC
because I owe everything I knew during the project and the people who worked on it."
Guido van Rossum added later in the same interview: "I recalled all my history with ABC
and some of my dissatisfaction. I wanted to try to design a basic scripting language, which
had some better properties of ABC, but which had no problems. Then I began typing. A
virtual computer, a simple parser and a simple runtime were developed. I created my own
edition of the different ABC pieces I wanted. Instead of curly braces or begin-end lines, I
built the simple syntax, used an indentation for grouping statements, and produced a small
number of powerful data types: hash table (or dictionary), list, strings, and numbers."
Let's begin by looking at what machine learning is and what it is not until we get at the
specifics of different approaches. Machine learning is sometimes classified as an artificial
intelligence subfield, however at first brush it is sometimes misleading to categorise. In
this sense, analysis has undoubtedly contributed to the study of machine learning, but it is
more beneficial to think of machine learning as a way to construct data models in the
implementation of machine learning techniques.
8
'learning' from the data. After these models have been applied to previously observed data,
they may be used to forecast and explain newly observed data elements. I would leave the
reader with a more philosophic digression on how close this form of model-based
'learning' in mathematics is to the 'learning' seen by the human brain. To use these
methods effectively, it's crucial to understand the issue in machine learning and so we will
start with some broad categorizations of the types of approaches we are discussing here.
Machine learning can be divided into two major forms at the most simple level:
supervised learning and unregulated learning.
Uncontrolled learning entails modelling the features of a dataset without any mark context
and is sometimes defined as "letting the dataset talk for themselves." These models
include functions such as clustering and reduction in dimensionality. Clustering
algorithms classify different data groups, while algorithms for dimensionality decrease
look for more concise data representations. In the following segment, we can see
representations of both forms of unattended learning.
Human beings are the smartest and most evolved creatures on earth right now, since they
can think, analyse and solve complicated problems. In the other hand, AI is still in its
initial Step and has not in several ways exceeded human intellect. Then the issue is, what
does the computer need to learn? The most appropriate explanation for this is, "to make
choices of productivity and scale based on results."
9
knowledge for various real-world challenges and problem solving. We can call it data-
driven machine decisions, particularly to automate the operation. Instead of utilising
programming logic, these data-driven choices should be found in problems that cannot be
coded inherently. In reality, we cannot do without human intellect, but another factor is
that we could all tackle issues of effectiveness on a vast scale in the modern world. That is
why machine learning is required.
Although machine learning is quickly advancing, with cryptography and independent cars
making major strides, this division of AI as a whole still has a long way to go. The
explanation behind this is that ML could not resolve multiple obstacles. The problems that
ML actually faces are
Data quality − One of the greatest problems is to provide good data for ML algorithms.
The usage of poor quality data contributes to issues with the preprocessing of data and
retrieval of functions.
Time-consuming job − Another difficulty for ML models is to take the time to collect,
retrieve and recover details.
No specific goal for market issue formulation − No definite target and established goal
is another main challenge for ML, as this technology is not yet so mature.
Overfitting & underfitting − If the model overfit or underfit, the issue would not be
adequately described.
Dimensionality Curse − So many data points functions are another obstacle for the ML
model. This may be a real obstacle.
10
Machine Learning Applications:-
Machine learning is the fastest rising science, and we are in the Golden Year of AI and
ML according to the researchers. It is used to address several difficult issues in the modern
world that cannot be overcome through conventional methods. Any real-world ML
implementations are below
• Study of feelings
• Analysis of emotions
• Synthesis of voice
• Appreciation of voice
• Segmentation of customers
• Identification of artefacts
• Prevention of fraud
• Avoidance of theft
The word "machine learning" was coined by Arthur Samuel in 1959 and described as
"a field of research where computers can be learned without being directly
programmed."
And that was the origin of Machine Learning! Machine learning is one of the most
common (if not the most) career options in modern times. Machine Learning
11
Engineer is the best job in 2019 with increases of 344% and an annual gross base
wage of $146,085.
But there's still a lot of doubts on what machine learning is and how to get started?
This essay addresses the basics of Machine Learning and the road to ultimately
becoming a comprehensive Machine Learning Engineer. Let's start now!!!
This is a hard roadmap that you will take on the journey to become a professional
engineer. Naturally, you can still change the steps to achieve your ultimate target!
If you are a genius, you can start ML immediately but usually you have to know
several prerequisites like linear algebra, Multivariate Calculus, statistics and Python.
And never be scared if you don't recognise these! You don't need a doctoral degree in
these subjects, however you need practical skills.
In machine learning, both linear algebra and multivariate calculus are essential. The
amount you need them depends, however, on your position as a data scientist. If you
concentrate mainly on application heavy-duty machine learning, you need not focus
too much on math as several popular libraries are open. If you choose to work on
research and development at machine learning, mastering of linear algebra and
multivariate calculus is extremely significant, since often ML algorithms have to be
implemented from scratch.
In machine learning, data plays an important part. In reality, 80% of your time as an
ML specialist is spent gathering and cleaning details. And statistics are an area in
which data are collected, analysed and presented. So it's no wonder you have to
practise this!!!
12
Certain essential principles in statistics are statistical importance, likelihood
distributions, hypothesis testing, regression, etc. Bayesian thinking is also a very
significant aspect of ML that addresses different topics such as Conditional
Probability, Preor and Posteria, Maximum Likelihood, etc.
Some may choose to miss the linear algebra, multivariate calculus and statistics, and
think from them as trials and errors come along. But the one feature you can't even
miss is Python! You may use other languages for machine learning, such as R, Scala
etc. Python is the most common ML language at present. There are actually several
Python libraries which are particularly useful for artificial intelligence and machine
learning, including Keras, TensorFlow, Scikit-learn, etc.
So if you'd like to master ML, the easiest thing to do is to learn Python! You may use
different online tools and courses such as Free on GeeksforGeeks for Fork Python.
Now that the prerequisites are met, you can finally study ML (which is the fun
part!!!) It is better to start with the basics and then pass on to the more nuanced stuff.
Any of the fundamental principles in ML are:
• Model – A model is a complex image learned from data by the use of an algorithm
of machine learning. A model is sometimes known as a theory.
• Aim (Mark) – the expected value of our model is a target attribute or label. For the
illustration of fruit mentioned in the function segment, the name of the fruit with each
input set will be like apple, orange, banana, etc.
13
• Preparation – The idea is to generate a series of inputs and predicted outcomes
(labels). After the training, we're going to provide a model (hypothesis), which maps
new data into one of the learned categories.
Machine learning may examine vast amounts of data to uncover trends and patterns
which are not obvious to humans. For example, for an e-commerce platform like
Amazon, it helps to consider its customers' shopping habits and to buy history to
better cater for appropriate items, sales and memoranda. It uses the data to disclose
the related advertising.
14
You don't have to baby every step of the way with ML. Since it ensures that
computers have the potential to learn, it enables them to simulate and refine
algorithms by themselves. One typical example of this is anti-virus software; when
known, they learn to root out new risks. ML is also excellent at spam recognition.
3. Ongoing enhancement.
5. Large applications
You may be an e-tailer or a medical provider and ML function for you. In its apps, it
has the potential to provide consumers with a far more intimate interface and
therefore to reach the best customers.
1. Acquisition of data
Machine learning entails the training of large data sets, which should be
inclusive/unbiased and of high quality. There will also be moments while waiting for
the generation of new data.
ML requires sufficient time to study and refine the algorithms enough to achieve its
objective with great precision and relevance. It still requires big resources to run. This
can imply extra computing power needs for you.
15
3. Perception of the result
The ability to correctly interpret the findings obtained by the algorithms is another
major challenge. The algorithms for your function must also be carefully chosen.
In February 1991 Guido Van Rossum released at alt.sources the first edition of the
Python code (version 0.9.0). This update already contains excellent handling,
functions and key collection, dict, str and other data forms. It was also geared towards
the object and had a device module.
Version 1.0 of Python was published in January 1994. The main new features in this
update were lambda, track, filter and minimise usable programming resources, which
Guido Van Rossum never liked. Python 2.0 was launched six and a half years later in
October 2000. This release provided list understandings, a total garbage collector and
unicode support. Python flourished in 2.x iterations for another 8 years until
Python3.0 (also known as "Python 3000" and "Py3K") was introduced as the new big
update. Python 3 is not compliant with Python 2.x backwards. The main focus of
Python 3 was on the elimination of duplicate structures and modules of programming
that satisfy or are similar to fulfilling the 13th Python Zen Law: "One – and hopefully
only one – obvious way should be used." Some Python 7.3 changes:
16
• Only one integer form is left, i.e. int. long is int.
• The two integers division returns a float instead of an integer. "//" can be used for
the actions of "old."
Objective:-
We have shown that our technique makes it possible to successfully segment intra-
retinal layers with the help of the ANIS function also with photographs of poor
quality including speckles, low contrast and various intensity ranges throughout.
The Python
Python has a complex machine type and automated memory protection. It embraces a
number of programming paradigms, including entity, imperative, procedural, and
process-oriented paradigms.
• Python is translated − The parser processes Python at runtime. Before running it,
you don't have to compile the software. The same is true in PERL and PHP.
• Python is interactive - in reality, you can sit at a Python prompt with your
programmes, and communicate directly with the interpreter.
Python also recognises the importance of pace of progress. This includes readable and
lax code and access to efficient structures, which prevent tired reuse of code.
Maintenance may also be an all but meaningless statistic, but it does mean something
about how much coding you need to search, interpret, and/or comprehend to repair
issue or tweak behaviour. This speed, the ease with which a programmer from other
17
languages may learn simple Python know-how and the vast standard library, are
important for another field in which Python excels. All of its tools were easy to
integrate, saved a lot of time and several people with no Python experience later
patched and modified – with no breaks.
Flow of tensor
Tensor Flow is a free and open source data flow software library that can programme
a number of tasks. It is often used in deep learning systems such as neural networks
as a symbolic math library. It is used at Google for testing and development.
Tensor Flow was created for internal Google usage by the Google Brain team. It was
launched on 9 November 2015 under the open-source Apache 2.0 licence.
Numpy
Numpy is an array-processing package for general purposes. It offers a
multidimensional high-performance array object and software for these arrays.
It is the essential kit for Python's science computation. It contains many functions,
including these:
18
databases easily and rapidly.
Pandas
Pandas is an open-source Python library that uses its versatile data structures for high
performance data processing and analysis. Python was used primarily for mungling
and preparing results. It made very little contribution to the study of results. This
dilemma has been solved by Pandas. Using Pandas, we can complete, plan,
manipulate, model and study five typical steps in the treatment and analysis of data,
irrespective of the origin of data load. Python with Pandas is used in a variety of
fields including banking, economics, statistics, analysis and so forth.
Plotlib
The pyplot module offers a MATLAB-like gui for easy plotting, in particular in
combination with IPython. You have complete control of line types, font properties,
axis properties etc, for the power consumer, with an object-oriented gui and a series
of MATLAB-functions.
Scikit - learning
19
language. Created by Guido van Rossum and first published in 1991, the Python
definition stresses the readability of text, in particular utilising a wide whitespace.
Python has a complex machine type and automated memory protection. It embraces a
number of programming paradigms, including entity, imperative, procedural, and
process-oriented paradigms.
• Python is translated − The parser processes Python at runtime. Before running it,
you don't have to compile the software. The same is true in PERL and PHP.
• Python is interactive - in reality, you can sit at a Python prompt with your
programmes, and communicate directly with the interpreter.
Python also recognises the importance of pace of progress. This includes readable and
lax code and access to efficient structures, which prevent tired reuse of code.
Maintenance may also be an all but meaningless statistic, but it does mean something
about how much coding you need to search, interpret, and/or comprehend to repair
issue or tweak behaviour. This speed, the ease with which a programmer from other
languages may learn simple Python know-how and the vast standard library, are
important for another field in which Python excels. All of its tools were easy to
integrate, saved a lot of time and several people with no Python experience later
patched and modified – with no breaks.
Python does not have a flexible programming language on your machine. Python was
first published in 1991 and is a highly successful high-level programming language
until today. Its theory of style illustrates the readability of code with its prominent
usage of broad space.
20
Windows and Mac how to instal Python:
Over the years there have been many changes in the Python edition. The issue is how
Python should be installed? It may be frustrating to start studying Python for a novice
but this tutorial can solve your problem. Python's current or latest update is version
3.7.4, which means Python 3.
Until beginning the Python installation process. Next, you must recognise the device
specifications. You must download the python version depending on your device
form, i.e. operating system and processor. My machine style is a 64-bit system from
Windows. The following measures are also essential to instal Python update 3.7.4 on
Windows 7 or Python 3. Python Cheatsheet download here. The Windows 10, 8 and 7
measures to instal Python are split into 4 sections to better grasp.
Step 1: Download and instal python from the official website using Google Chrome
or some other web browser. OR Click on the page below: httpsww.python.org
21
Now search the operating system for the new and right update.
Step 3: You can either pick the Python Update button in Yellow Color for windows
3.7.4 or you can scroll down and press on the download version for each version.
The most recent update of Python for Windows 3.7.4 is downloaded here.
Step 4: Scroll down the page until the Files alternative is found.
22
Step 5: Here you can see a separate Python edition of the OS.
You may use one of three ways to download Windows 32-bit python: Windows x86
embeddable zip file, Windows x86 executable installer, or Web based Windows x86
installer.
• You can use any of the three choices for the 64-bit python download: Windows
x86-64, Windows x86-64, Windows x86-64 installer and Windows x86-64 web-
based installer.
Here we instal the web-based Windows x86-64 installer. This is your first
component on which edition of python to download. Now we are continuing with
the second portion of the installation of python, i.e.
Note: You can click on the Release Note option to know the modifications or
improvements produced in the edition.
Python installation
Step 1: Go to Update and open the edition of Python downloaded for installation.
23
Step 2: Until clicking Install Now, please tick Add Python 3.7 to Route.
24
Through these three Steps on python installation, you installed Python successfully
and properly. The time has now come to check the installation.
25
Step 3: : Open the prompt command option.
Step 4: Verify if the python is configured correctly. Python type –V and hit the
Enter button.
Note: If you have already enabled some of the earlier variants of Python. Second, the
previous edition must be uninstalled and the current one installed.
26
See how the Python IDLE performs
Step 3: Click on IDLE and open the software (Python 3.7 64-bit).
Step 4: You must first save the file to function in IDLE. Click File > File Select the
Save tab
Step 5: Call the file and save it as a file form. Select the SAVE tab. I called the files
here as Hey World.
Step 6: Insert the print ("Hey World") and the Press Enter for example.
27
You will see that the given command is started. That concludes our tutorial on installing
Python. You learned how to download python to your respective operating system for
windows.
Note: unlike Java, at the end of statements Python does not need semicolons, otherwise
it will not work.
This stack which contains: aworld.
28
Advantages of Django
Here are few advantages of using Django which can be listed out here −
As you know, Django is a web framework for Python. And Django supports the MVC
pattern, just like the most modern framework. Let's see first what is the Model-View-
Controller (MVC) pattern, and then we look at Django's MVT pattern specificity.
MVC Models
The Model View Template (MVT) differs slightly from the MVC. The main
difference between two patterns is that the controller part (Software code controlling
the interactions between the model and view) is taken by Django himself, leaving us
with the template. The template is an HTML file mixed with the language of Django
Template (DTL).
29
The diagram below shows the interaction between each of the components of
the MVT pattern and the user request.
The developer provides the Model, the view and the template then just maps it to a
URL and Django does the magic to serve it to the user.
Jupyter Notebook
Jupyter Notebook is an open source web application for creating and sharing
documents containing live code, equations, views and text. The people at Project
Jupyter keep Jupyter Notebook.
Jupyter Notebooks is an IPython project spin-off, which was once an IPython
Notebook project. The name, Jupyter, comes from the core languages supported by
the programming: Julia, Python, and R. Jupyter ships with the IPython kernel, which
enables you to write programmes in Python, but currently there are more than 100
other kernels you can use as well.
Anaconda Anaconda
Python distributions include the Python interpreter, along with a list of Python
packages, tools like editors. Anaconda is one of several distributions for Python.
30
Anaconda is a new Python and R data science package distribution. It used to be
known as Continuum Analytics. There are more than 100 new packages in Anaconda.
Anaconda is used for scientific computing, data science, statistical analytics and
machine learning in this working environment. The latest Anaconda 5.0.1 version is
available in October 2017.
Version 5.0.1 released addresses a few minor bugs and adds useful features like up-to-
date support for R language. In the original 5.0.0 release, all these features were not
available.
The package manager is also a Python distribution and an open source package
collection that contains over 1000 R and Python Data Science Packages.
If you're happy with regular python, there's no big reason to switch to Anaconda. But
some people like data scientists who are not full-time designers think anaconda is
very useful because it simplifies many common problems that a beginner encounters.
31
CHAPTER 3SYSTEM DESIGN
3.1 System Architecture
32
3.2Moduledescription
1. Chemical structure
2. Drug targets
Structure of the chemical At the molecular level, the drug structure describes its binding
activity. The most commonly used structural pro ling markers for drugs are chemical
ngerprints[13]. Bit vectors indicating the presence of fingerprints are (1)or lack (0) of certain
chemical characteristics (e.g. group C=N, 6 member ring,We have taken an input chemical
formula (SMILES) from OpenBabel 2.3 libraryID) and generate binary structural molecular
access system (MACCS)Lists with 166 lengths.
A set of drug targets can highlight a ected biological Processes. Processes. We represent the
DrugBank set of drug targets and KEGG as a bit vector in which 1 is a drug objective and 0 It
does not represent a drug target. This leads to a small matrix, as the The median of each
drug is one putative target.
3.3System Specification
2. 1. The service should be able to store the user data; 2. The data should be
available via any Internet-connected device;
3. The service should be able to sync user data between several devices (notebooks,
smart phones, etc.);
33
7. Interoperability with other cloud storage services should be permitted, allowing
data migration from one CSP toanother.
• Operating System:Windows
• Script:
• Database :
3.3.2 HardwareRequirements:
• Processor - Pentium–III
• Speed – 2.4GHz
• Hard Disk - 20 GB
• Monitor – 15 VGAColour
3.4 DetailedDesign
34
three software engineers working for Rational Software developed the UML. It was
later adopted as the standard in 1997 and has since remained the norm with only a
few updates.
GOALS:
The main purposes of the UML design are:
Give users ready-to-use visual modelling language to develop and exchange
meaningful models.
Provide expansion and specialisation mechanisms for expanding core concepts.
i. USE CASEDIAGRAM:
35
login
symptom 1
symptom 2
symptom 3
admin patient
symptom 4
symptom 5
logistic regression
drug prediction
36
ii. SEQUENCEDIAGRAM:
login
login
symptom 1
symptom 2
symptom 3
symptom 4
symptom 5
logistic regression
logistic regression
drug prediction
drug prediciton
iii. CLASSDIAGRAM:
In software engineering, a Unified Modeling Language (UML) class chart is a kind of
static structure chart that describes a system's structure by showing the classes, its
attributes, operations (or methods) and class relation. It explains the class that
contains information.
patient
user name
password
admin
name of the patient()
symptom 1() user name
symptom 2() password
symptom 3()
symptom 4() logistic regression()
symptom 5() support vector machine()
logistic regression() drug prediction()
support vector machine()
drug prediction()
Data flow diagrams represent the data flow in a business information system
graphically. DFD describes how data from input to file storage and report
generation are transferred to the system.
The data flow charts can be logically and physically divided. The logical data flow
chart describes data flow through a system to perform certain business functions.
The physical data flow diagram describes how the logical data flow is implemented.
DFD graphically depicts the functions or processes that collect, manipulate, store
and distribute data between a system, its environment and between system
components. It is a good communication tool between the user and systems designer
because of its visual representation. The DFD structure allows a wide overview and
a hierarchy of detailed diagrams to be extended. For the following reasons, DFD
was often used:
CHAPTER 4IMPLEMENTATION
import argparse
import collections
from datetime import datetime
import hashlib
import os.path
import random
import re
import sys
import tarfile
import numpy as np
from six.moves import urllib
import tensorflow as tf
FLAGS = None
MAX_NUM_IMAGES_PER_CLASS = 2 ** 27 - 1 # ~134M
def create_image_lists(image_dir, testing_percentage, validation_percentage):
if not gfile.Exists(image_dir):
tf.logging.error("Image directory '" + image_dir + "' not found.")
return None
result = collections.OrderedDict()
sub_dirs = [
os.path.join(image_dir,item)
for item in gfile.ListDirectory(image_dir)]
sub_dirs = sorted(item for item in sub_dirs
if gfile.IsDirectory(item))
for sub_dir in sub_dirs:
extensions = ['jpg', 'jpeg', 'JPG', 'JPEG']
file_list = []
dir_name = os.path.basename(sub_dir)
if dir_name == image_dir:
continue
tf.logging.info("Looking for images in '" + dir_name + "'")
for extension in extensions:
file_glob = os.path.join(image_dir, dir_name, '*.' + extension)
file_list.extend(gfile.Glob(file_glob))
if not file_list:
tf.logging.warning('No files found')
continue
if len(file_list) < 20:
tf.logging.warning(
'WARNING: Folder has less than 20 images, which may cause issues.')
elif len(file_list) > MAX_NUM_IMAGES_PER_CLASS:
tf.logging.warning(
'WARNING: Folder {} has more than {} images. Some images will '
'never be selected.'.format(dir_name, MAX_NUM_IMAGES_PER_CLASS))
label_name = re.sub(r'[^a-z0-9]+', ' ', dir_name.lower())
training_images = []
testing_images = []
validation_images = []
for file_name in file_list:
base_name = os.path.basename(file_name)
hash_name = re.sub(r'_nohash_.*$', '', file_name)
hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
percentage_hash = ((int(hash_name_hashed, 16) %
(MAX_NUM_IMAGES_PER_CLASS + 1)) *
(100.0 / MAX_NUM_IMAGES_PER_CLASS))
if percentage_hash < validation_percentage:
validation_images.append(base_name)
elif percentage_hash < (testing_percentage + validation_percentage):
testing_images.append(base_name)
else:
training_images.append(base_name)
result[label_name] = {
'dir': dir_name,
'training': training_images,
'testing': testing_images,
'validation': validation_images,
}
return result
def create_model_graph(model_info):
with tf.Graph().as_default() as graph:
model_path = os.path.join(FLAGS.model_dir, model_info['model_file_name'])
with gfile.FastGFile(model_path, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
bottleneck_tensor, resized_input_tensor = (tf.import_graph_def(
graph_def,
name='',
return_elements=[
model_info['bottleneck_tensor_name'],
model_info['resized_input_tensor_name'],
]))
return graph, bottleneck_tensor, resized_input_tensor
def maybe_download_and_extract(data_url):
dest_directory = FLAGS.model_dir
if not os.path.exists(dest_directory):
os.makedirs(dest_directory)
filename = data_url.split('/')[-1]
filepath = os.path.join(dest_directory, filename)
if not os.path.exists(filepath):
def ensure_dir_exists(dir_name):
if not os.path.exists(dir_name):
os.makedirs(dir_name)
bottleneck_path_2_bottleneck_values = {}
def create_bottleneck_file(bottleneck_path, image_lists, label_name, index,
image_dir, category, sess, jpeg_data_tensor,
decoded_image_tensor, resized_input_tensor,
bottleneck_tensor):
tf.logging.info('Creating bottleneck at ' + bottleneck_path)
image_path = get_image_path(image_lists, label_name, index,
image_dir, category)
if not gfile.Exists(image_path):
tf.logging.fatal('File does not exist %s', image_path)
image_data = gfile.FastGFile(image_path, 'rb').read()
try:
bottleneck_values = run_bottleneck_on_image(
sess, image_data, jpeg_data_tensor, decoded_image_tensor,
resized_input_tensor, bottleneck_tensor)
except Exception as e:
raise RuntimeError('Error during processing file %s (%s)' % (image_path,
str(e)))
bottleneck_string = ','.join(str(x) for x in bottleneck_values)
with open(bottleneck_path, 'w') as bottleneck_file:
bottleneck_file.write(bottleneck_string)
how_many_bottlenecks += 1
if how_many_bottlenecks % 100 == 0:
tf.logging.info(
str(how_many_bottlenecks) + ' bottleneck files created.')
def get_random_distorted_bottlenecks(
sess, image_lists, how_many, category, image_dir, input_jpeg_tensor,
distorted_image, resized_input_tensor, bottleneck_tensor):
class_count = len(image_lists.keys())
bottlenecks = []
ground_truths = []
for unused_i in range(how_many):
label_index = random.randrange(class_count)
label_name = list(image_lists.keys())[label_index]
image_index = random.randrange(MAX_NUM_IMAGES_PER_CLASS + 1)
image_path = get_image_path(image_lists, label_name, image_index, image_dir,
category)
if not gfile.Exists(image_path):
tf.logging.fatal('File does not exist %s', image_path)
jpeg_data = gfile.FastGFile(image_path, 'rb').read()
distorted_image_data = sess.run(distorted_image,
{input_jpeg_tensor: jpeg_data})
bottleneck_values = sess.run(bottleneck_tensor,
{resized_input_tensor: distorted_image_data})
bottleneck_values = np.squeeze(bottleneck_values)
ground_truth = np.zeros(class_count, dtype=np.float32)
ground_truth[label_index] = 1.0
bottlenecks.append(bottleneck_values)
ground_truths.append(ground_truth)
return bottlenecks, ground_truths
def variable_summaries(var):
with tf.name_scope('summaries'):
mean = tf.reduce_mean(var)
tf.summary.scalar('mean', mean)
with tf.name_scope('stddev'):
stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
tf.summary.scalar('stddev', stddev)
tf.summary.scalar('max', tf.reduce_max(var))
tf.summary.scalar('min', tf.reduce_min(var))
tf.summary.histogram('histogram', var)
def add_final_training_ops(class_count, final_tensor_name, bottleneck_tensor,
bottleneck_tensor_size):
with tf.name_scope('input'):
bottleneck_input = tf.placeholder_with_default(
if (i % FLAGS.eval_step_interval) == 0 or is_last_step:
train_accuracy, cross_entropy_value = sess.run(
[evaluation_step, cross_entropy],
feed_dict={bottleneck_input: train_bottlenecks,
ground_truth_input: train_ground_truth})
tf.logging.info('%s: Step %d: Train accuracy = %.1f%%' %
(datetime.now(), i, train_accuracy * 100))
tf.logging.info('%s: Step %d: Cross entropy = %f' %
(datetime.now(), i, cross_entropy_value))
validation_bottlenecks, validation_ground_truth, _ = (
get_random_cached_bottlenecks(
sess, image_lists, FLAGS.validation_batch_size, 'validation',
FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
FLAGS.architecture))
# Run a validation step and capture training summaries for TensorBoard
# with the `merged` op.
validation_summary, validation_accuracy = sess.run(
[merged, evaluation_step],
feed_dict={bottleneck_input: validation_bottlenecks,
ground_truth_input: validation_ground_truth})
validation_writer.add_summary(validation_summary, i)
tf.logging.info('%s: Step %d: Validation accuracy = %.1f%% (N=%d)' %
(datetime.now(), i, validation_accuracy * 100,
len(validation_bottlenecks)))
default=10,
help='How often to evaluate the training results.'
)
parser.add_argument(
'--train_batch_size',
type=int,
default=100,
help='How many images to train on at a time.'
)
parser.add_argument(
'--test_batch_size',
type=int,
default=-1,
help="""\
How many images to test on. This test set is only used once, to evaluate
the final accuracy of the model after training completes.
A value of -1 causes the entire test set to be used, which leads to more
stable results across runs.\
"""
)
parser.add_argument(
'--validation_batch_size',
type=int,
default=100,
help="""\
How many images to use in an evaluation batch. This validation set is
used much more often than the test set, and is an early indicator of how
accurate the model is during training.
A value of -1 causes the entire validation set to be used, which leads to
more stable results across training iterations, but may be slower on large
training sets.\
"""
import argparse
import collections
from datetime import datetime
import hashlib
import os.path
import random
import re
import sys
import tarfile
import numpy as np
from six.moves import urllib
import tensorflow as tf
FLAGS = None
MAX_NUM_IMAGES_PER_CLASS = 2 ** 27 - 1 # ~134M
def get_random_distorted_bottlenecks(
sess, image_lists, how_many, category, image_dir, input_jpeg_tensor,
distorted_image, resized_input_tensor, bottleneck_tensor):
class_count = len(image_lists.keys())
bottlenecks = []
ground_truths = []
for unused_i in range(how_many):
label_index = random.randrange(class_count)
label_name = list(image_lists.keys())[label_index]
image_index = random.randrange(MAX_NUM_IMAGES_PER_CLASS + 1)
image_path = get_image_path(image_lists, label_name, image_index, image_dir,
category)
if not gfile.Exists(image_path):
tf.logging.fatal('File does not exist %s', image_path)
jpeg_data = gfile.FastGFile(image_path, 'rb').read()
distorted_image_data = sess.run(distorted_image,
{input_jpeg_tensor: jpeg_data})
bottleneck_values = sess.run(bottleneck_tensor,
{resized_input_tensor: distorted_image_data})
bottleneck_values = np.squeeze(bottleneck_values)
ground_truth = np.zeros(class_count, dtype=np.float32)
ground_truth[label_index] = 1.0
bottlenecks.append(bottleneck_values)
ground_truths.append(ground_truth)
return bottlenecks, ground_truths
The aim of the test is to detect errors. Testing is an attempt to detect any conceivable
fault or weakness in a work product. It offers a way of monitoring the functionality of
components, subassemblies, assemblies and/or final products It is a software exercise
to ensure that the software system meets its requirements and user expectations and
does not fail in an unacceptable way. There are different test types. Each type of test
addresses a particular test requirement.
TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate the proper functioning of
the internal programmes logic and valid output of the programme inputs. All branches
of decision and internal code flow should be validated. It is the testing of individual
application software units. It is done before integration after completion of an
individual unit. This is a structural test, based on construction knowledge and
invasive. Unit tests perform basic component tests and test a specific business
process, application and/or system setup. Unit tests ensure that each single route of
the business process meets the documented specifications accurately and contains
clearly defined inputs and expected results.
Integration testing
Functional test
System Test
System testing ensures that all integrated software complies with the requirements. It
tests a setup to ensure known and predictable results. The configuration-oriented
system integration test is an example of system testing. System testing relies on
process descriptions and flows, which highlight pre-driven process links and points of
integration.
Black Box Testing tests the software without knowing the module's interior
function, structure or language. As with most other types of tests, black box tests need
to be written in a definitive source document, such as a specification or requirements
document, like a specification or requirements document. It is a test that processes the
software being tested, asablackbox.youcannot—see all in all. The test
providesinputsandoutput without looking at how the software works.
5.1 UnitTesting:
Unit testing is generally performed within the combined code and unit testing phase
of the software life cycle, although it is not uncommon to perform two distinct phases
of coding and unit testing.
Field tests are carried out manually and functional tests are written in detail.
• The input screen, messages and replies should not be laid back.
Characteristics to be tested
• Check that the entries are correctly formatted
Test results: all of the test cases mentioned above were passed successfully. No flaws
found.
RESULTS
itching,
skin_rash
nodal_skin_eruptions
continuous_sneezing
shivering
chills
joint_pain
stomach_pain
acidity
ulcers_on_tongue
muscle_wasting
vomiting
burning_micturition
spotting_ urination
fatigue
weight_gain
anxiety
cold_hands_and_feets
mood_swings
weight_loss
restlessness
lethargy
patches_in_throat
irregular_sugar_level
cough
high_fever
sunken_eyes
breathlessness
sweating
dehydration
indigestion
headache
yellowish_skin
dark_urine
nausea
loss_of_appetite
pain_behind_the_eyes
back_pain
constipation
abdominal_pain
diarrhoea
mild_fever
yellow_urine
yellowing_of_eyes
acute_liver_failure
fluid_overload
swelling_of_stomach
swelled_lymph_nodes
malaise
blurred_and_distorted_vision
phlegm
throat_irritation
redness_of_eyes
sinus_pressure
runny_nose
congestion
chest_pain
weakness_in_limbs
fast_heart_rate
pain_during_bowel_movements
pain_in_anal_region
bloody_stool
irritation_in_anus
neck_pain
dizziness
cramps
bruising
obesity
swollen_legs
swollen_blood_vessels
puffy_face_and_eyes
enlarged_thyroid
brittle_nails
swollen_extremeties
excessive_hunger
extra_marital_contacts
drying_and_tingling_lips
slurred_speech
knee_pain
hip_joint_pain
muscle_weakness
stiff_neck
swelling_joints
movement_stiffness
spinning_movements
loss_of_balance
unsteadiness
weakness_of_one_body_side
loss_of_smell
bladder_discomfort
foul_smell_of urine
continuous_feel_of_urine
passage_of_gases
internal_itching
toxic_look_(typhos)
depression
irritability
muscle_pain
altered_sensorium
red_spots_over_body
We will predict the disease with the dataset and we have implemented to suggest medicine for
diabetes and Hypertension problems.
Execution:
Click on run.bat file in your project directory
Enter the Name of patient and enter the symptoms of the patient to prediction the
disease. And then click on algorithm from which you want to predict.
From the above figure for given symptoms it predicted Hypertensio by using Logistic
regression
Now test for SVM also.
For SVM also for the given symptoms it predicted Hypertension. Now predict the
Drug for the disease.
For Hypertension it suggest 2 drugs for relieve the pain
CHAPTER-9
CONCLUSIONS AND FUTURE SCOPE
Researchers have used publicly accessible data sets to validate their drug
prediction hypotheses. However, the data sets are different and can be changed over
time, which can lead to conclusions for the same hypotheses. Semantic Web
technologies, in particular Linked Data, are used to represent, link and access data on
Bio2RDF-based drugs and diseases. We use SPARQL queries for the classification of
medicines and diseases. In case of a version update of the data, the queries can be
executed again and new updated data can be obtained. We have collected a broader
collection of data containing 816 medicinal products and 1393 diseases. Gold
standard data predictions generated by combining multiple drug data sources were
evaluated. We have tried our method with a separate dataset [23], which shows us the
predictability of our method irrespective of the compiled data. A crucial element of a
typical evaluation system for drug predictions that make unrealistic predictions is that
the paired nature of the inputs is not considered[15]. We have divided data into
different trains and test sets in which both pairs and drugs/diseases are not overlapped
as indicated in [14] for the interaction prediction of drugs. In various cross validation
schemes we tested several classifiers and compared our approach to the existing
methods PREDICT, SLAMS. We found that in disjoint cross-validation settings we
had better predictive performance than PREDICT and SLAMS.
BIBLIOGRAPHY
1. Brown, A.S., Patel, C.J.: A standard database for drug repositioning. Scientic
Data 4, 170029 (2017)
2. Callahan, A., Cruz-Toledo, J., Ansell, P., Dumontier, M.: Bio2rdf release 2:
improved
coverage, interoperability and provenance of life science linked data. In:
Extended Semantic Web Conference.pp. 200{212. Springer (2013)
3. Campillos, M., Kuhn, M., Gavin, A.C., Jensen, L.J., Bork, P.: Drug target identi-
cation using side-e
ect similarity. Science 321(5886), 263{266 (2008)
4. Chiang, A.P., Butte, A.J.: Systematic evaluation of drug{disease relationships to
identify leads for novel drug uses. Clinical Pharmacology & Therapeutics 86(5),
507{510 (2009)
5. Gottlieb, A., Stein, G.Y., Ruppin, E., Sharan, R.: Predict: a method for inferring
novel drug indications with application to personalized medicine. Molecular
systems biology 7(1), 496 (2011)
6. Guney, E.: Reproducible drug repurposing: When similarity does not suce. In:
PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017. pp. 132{143 (2017)
7. Hay, P.J., Claudino, A.M.: Bulimia nervosa: online interventions. BMJ clinical
evidence 2015 (2015)
8. Hu, G., Agarwal, P.: Human disease-drug network based on genomic expression
proles. PloS one 4(8), e6536 (2009)
9. Kuhn, M., Letunic, I., Jensen, L.J., Bork, P.: The sider database of drugs and side
e
ects. Nucleic acids research 44(D1), D1075{D1079 (2015)
10. Lamb, J., Crawford, E.D., Peck, D., Modell, J.W., Blat, I.C.,Wrobel, M.J., Lerner,
J., Brunet, J.P., Subramanian, A., Ross, K.N., et al.: The connectivity map: using
gene-expression signatures to connect small molecules, genes, and disease. science
313(5795), 1929{1935 (2006)
11. Larrosa, O., de la Llave, Y., Barrio, S., Granizo, J.J., Garcia-Borreguero, D.:
Stimulant
and anticataplectic e
ects of reboxetine in patients with narcolepsy: a pilot
study. Sleep 24(3), 282{285 (2001)
12. Lemke, M.R.: E
ect of reboxetine on depression in parkinson's disease patients.
The Journal of clinical psychiatry 63(4), 300{304 (2002)
13. Melville, J.L., Hirst, J.D.: TMACC: Interpretable Correlation Descriptors for
Quantitative StructureActivity Relationships. J. Chem. Inf. Model. 47(2), 626{
634 (Mar 2007), http://dx.doi.org/10.1021/ci6004178
14. Pahikkala, T., Airola, A., Pietila, S., Shakyawar, S., Szwajda, A., Tang, J.,
Aittokallio,
T.: Toward more realistic drug{target interaction predictions. Briengs in
bioinformatics 16(2), 325{337 (2014)
15. Park, Y., Marcotte, E.M.: Flaws in evaluation schemes for pair-input
computational
predictions. Nature methods 9(12), 1134{1136 (2012)
16. Ratner, S., Laor, N., Bronstein, Y.,Weizman, A., Toren, P.: Six-week open-label
reboxetine
treatment in children and adolescents with attention-decit/hyperactivity
disorder. Journal of the American Academy of Child & Adolescent Psychiatry
44(5), 428{433 (2005)
17. Schmidt, C., Leibiger, J., Fendt, M.: The norepinephrine reuptake inhibitor
reboxetine
is more potent in treating murine narcoleptic episodes than the serotonin
reuptake inhibitor escitalopram. Behavioural brain research 308, 205{210 (2016)
18. Silveira, R.O., Zanatto, V., Appolinario, J., Kapczinski, F.: An open trial of
reboxetine
in obese patients with binge eating disorder. Eating and Weight Disorders-
Studies on Anorexia, Bulimia and Obesity 10(4), e93{e96 (2005)
19. Tehrani-Doost, M., Moallemi, S., Shahrivar, Z.: An open-label trial of reboxetine
in children and adolescents with attention-decit/hyperactivity disorder. Journal
of child and adolescent psychopharmacology 18(2), 179{184 (2008)
20. Versiani, M., Cassano, G., Perugi, G., Benedetti, A., Mastalli, L., Nardi, A.,
Savino,
M.: Reboxetine, a selective norepinephrine reuptake inhibitor, is an e
ective and
well-tolerated treatment for panic disorder. The Journal of clinical psychiatry
(2002)
21. Wilkinson, M., Dumontier, M., Aalbersberg, I., Appleton, G., Axton, M., Baak,
A.,
Blomberg, N., Boiten, J., da Silva Santos, L., Bourne, P., Bouwman, J., Brookes,
A., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C., Finkers,
R., Gonzalez-Beltran, A., Gray, A., Groth, P., Goble, C., Grethe, J., Heringa, J.,
't Hoen, P., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S., Martone, M., Mons,
A., Packer, A., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone,
S., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M., Thompson, M.,
Van Der Lei, J., Van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P.,
Wolstencroft, K., Zhao, J., Mons, B.: The fair guiding principles for scientic data
management and stewardship. Scientic Data 3 (2016)
22. Yang, L., Agarwal, P.: Systematic drug repositioning based on clinical side-e
ects.
PloS one 6(12), e28025 (2011)
23. Zhang, P., Agarwal, P., Obradovic, Z.: Computational drug repositioning by
ranking
and integrating multiple data sources. In: Joint European Conference on Machine
Learning and Knowledge Discovery in Databases.pp. 579{594. Springer
(2013)