End Sem Presentation
End Sem Presentation
End Sem Presentation
machine learning methods based on artificial neural networks with representation learning.
Learning can be supervised, semi-supervised or unsupervised
The Integrated Mitochondrial Protein Index (IMPI) is a curated collection of human genes
encoding proteins with evidence for associating with mitochondria and affecting their form and
function. The aim of IMPI is to define a mitochondrial proteome for studying mitochondrial
(dys)function and disease, including during DNA sequencing of patients with mitochondrial
disease.
F1 score :In statistical analysis of binary classification, the F-score or F-measure is a measure
of a test's accuracy. It is calculated from the precision and recall of the test,
where the
precision is the number of true positive results divided by the number of all positive results,
Precision - Precision is the ratio of correctly predicted positive observations to the
total predicted positive observations
Sensitivity is a measure of the proportion of actual positive cases that got predicted as
positive (or true positive). ... This implies that there will be another proportion of actual positive
cases, which would get predicted incorrectly as negative (and, thus, could also be termed as the
false negativ
● Data cleaning.
● Data integration.
● Data reduction.
● Data transformation
Biovec algo:
It characterizes biological sequence in terms of biochemical abd biophysical
interpretation of the underlying patterns.
Biomolecule Visualization with Ellipsoidal Coarse-graining (BioVEC) is a tool for visualizing
molecular dynamics simulation data while allowing coarse-grained residues to be rendered as
ellipsoids.
ProtVec is a representation of proteins through protein sequences. First, we need
a large corpus to train distributed representation of biological sequences. Then,
to break the sequences into sub sequences and we can generate 3 lists of shifted
non-overlapping words.
KNN,SVM :IT CAN solve both classification and regression problem
A kernel is a function used in SVM for helping to solve problems. They provide shortcuts
to avoid complex calculations
In k-fold cross-validation, the original sample is randomly partitioned into k equal size
subsamples. Of the k subsamples, a single subsample is retained as the validation data for
testing the model, and the remaining k-1 subsamples are used as training data
We can still use cross-entropy with a little trick. ... This loss can be computed with the
cross-entropy function since we are now comparing just two probability vectors or even with
categorical cross-entropy since our target is a one-hot vector.
No Sampling technique :The minority class data is less then Majority class data.
.A non-sampling error is a term used in statistics that refers to an error that occurs during
data collection, causing the data to differ from the true values. A non-sampling error refers to
either random or systematic errors
SMOTE is an oversampling technique where the synthetic samples are generated for the
minority class. This algorithm helps to overcome the overfitting problem posed by random
oversampling.
A layer is the highest-level building block in deep learning. A layer is a container that usually
receives weighted input, transforms it with a set of mostly non-linear functions and then
passes these values as output to the next layer
Convolutional layers are the major building blocks used in convolutional neural
networks. A convolution is the simple application of a filter to an input that results in an
activation
Dense Layer is simple layer of neurons in which each neuron receives input from all the
neurons of previous layer, thus called as dense. Dense Layer is used to classify image
based on output from convolutional layers.
A deep learning node is "a computational unit that has one or more weighted input
connections, a transfer function that combines the inputs in some way, and an output
connection. Nodes are then organized into layers to comprise a network."
Hidden Layer :They are “hidden” because the true values of their nodes are unknown in
the training dataset.
An epoch is a term that indicates the number of passes of the entire training dataset the
machine learning algorithm has completed. Datasets are usually grouped into batches
Window size, as I know it, is the length of a (sliding) cutout of a time sequence of
data.E.g., if you have data x(t) that you want to model, you could use a k-size window
x(n), x(n+1), ..., x(n+k). This is a method commonly used in non-recursive approximators.
The Adaptive Moment Estimation (Adam) optimizer [46] was used to minimize
the Categorical Cross Entropy (CCE) loss function. All the models were trained for 50
epochs with a batch size of 128.
ML model:
Ada boost
AdaBoost, short for Adaptive Boosting, is a statistical classification meta-algorithm. It
can be used with many other types of learning algorithms to improve performance
BN-D
classifier is a probabilistic classifier which means that given an input, it predicts the
probability of the input being classified for all the classes. It is also called conditional
probability.
LGBM
LightGBM, short for Light Gradient Boosting Machine, is a free and open source
distributed gradient boosting framework for machine learning It is based on decision tree
algorithms and used for ranking, classification and other machine learning tasks. The
development focus is on performance
KNN
In statistics, the k-nearest neighbors algorithm is a nonparametric classification metho. It
is used for classification and regression.
LSTM stands for long short-term memory networks, used in the field of Deep Learning.
It is a variety of recurrent neural networks (RNNs) that are capable of learning long-term
dependencies, especially in sequence prediction problems.