Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

End Sem Presentation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Deep learning (also known as deep structured learning) is part of a broader family of

machine learning methods based on artificial neural networks with representation learning.
Learning can be supervised, semi-supervised or unsupervised

MitoCarta (www.broadinstitute.org/pubs/MitoCarta) is a database of mitochondrial-localized


protein-coding genes and their expression in 14 mouse tissues. Based on (i) improved gene
transcript models, (ii) updated literature curation, including results from proteomic analyses of
mitochondrial subcompartments, (iii) improved homology mapping, and (iv) updated versions of
all seven original data sets, they have reconstructed this inventory separately for human and
mouse.

Bayesian Integration : Bayesian networks' potential to predict functional connections between


proteins. Assess the inputs of various data sources and evaluate the results broadly as well as
in the context of specific biological processes

The Integrated Mitochondrial Protein Index (IMPI) is a curated collection of human genes
encoding proteins with evidence for associating with mitochondria and affecting their form and
function. The aim of IMPI is to define a mitochondrial proteome for studying mitochondrial
(dys)function and disease, including during DNA sequencing of patients with mitochondrial
disease.

7 underlying data sources have been substantially updated using


improved transcript models,MS/MSsearch algorithms,
database versions and homology detection methods. Furthermore,
the mitochondrial training set was increased by
60%. The MitoCarta 2.0 inventory consists of 1158 human
genes and 1158 mouse genes encoding mitochondrial
proteins.
Cysteine importance
Because its side chain contains a very reactive sulfhydryl group. This places cysteine in a distinct spot
where no other amino acid may replace or substitute it. Because cysteine residues generate disulfide
bridges, which are a permanent part of the protein's fundamental structure.. These bonds improve the
protein's conformational stability by reducing the folded state's entropy and forming stabilising
interactions in the original state.

F1 score :In statistical analysis of binary classification, the F-score or F-measure is a measure
of a test's accuracy. It is calculated from the precision and recall of the test,
where the
precision is the number of true positive results divided by the number of all positive results,
Precision - Precision is the ratio of correctly predicted positive observations to the
total predicted positive observations
Sensitivity is a measure of the proportion of actual positive cases that got predicted as
positive (or true positive). ... This implies that there will be another proportion of actual positive
cases, which would get predicted incorrectly as negative (and, thus, could also be termed as the
false negativ

Major Tasks in Data Preprocessing:

● Data cleaning.
● Data integration.
● Data reduction.
● Data transformation

Biovec algo:
It characterizes biological sequence in terms of biochemical abd biophysical
interpretation of the underlying patterns.
Biomolecule Visualization with Ellipsoidal Coarse-graining (BioVEC) is a tool for visualizing
molecular dynamics simulation data while allowing coarse-grained residues to be rendered as
ellipsoids.
ProtVec is a representation of proteins through protein sequences. First, we need
a large corpus to train distributed representation of biological sequences. Then,
to break the sequences into sub sequences and we can generate 3 lists of shifted
non-overlapping words.
KNN,SVM :IT CAN solve both classification and regression problem
A kernel is a function used in SVM for helping to solve problems. They provide shortcuts
to avoid complex calculations

10-fold Cross validation


Cross-validation is a technique to evaluate predictive models by partitioning the original sample
into a training set to train the model, and a test set to evaluate it.

In k-fold cross-validation, the original sample is randomly partitioned into k equal size
subsamples. Of the k subsamples, a single subsample is retained as the validation data for
testing the model, and the remaining k-1 subsamples are used as training data

We can still use cross-entropy with a little trick. ... This loss can be computed with the
cross-entropy function since we are now comparing just two probability vectors or even with
categorical cross-entropy since our target is a one-hot vector.

No Sampling technique :The minority class data is less then Majority class data.
.A non-sampling error is a term used in statistics that refers to an error that occurs during
data collection, causing the data to differ from the true values. A non-sampling error refers to
either random or systematic errors
SMOTE is an oversampling technique where the synthetic samples are generated for the
minority class. This algorithm helps to overcome the overfitting problem posed by random
oversampling.

A layer is the highest-level building block in deep learning. A layer is a container that usually
receives weighted input, transforms it with a set of mostly non-linear functions and then
passes these values as output to the next layer

Convolutional layers are the major building blocks used in convolutional neural
networks. A convolution is the simple application of a filter to an input that results in an
activation

Dense Layer is simple layer of neurons in which each neuron receives input from all the
neurons of previous layer, thus called as dense. Dense Layer is used to classify image
based on output from convolutional layers.

A deep learning node is "a computational unit that has one or more weighted input
connections, a transfer function that combines the inputs in some way, and an output
connection. Nodes are then organized into layers to comprise a network."

Hidden Layer :They are “hidden” because the true values of their nodes are unknown in
the training dataset.
An epoch is a term that indicates the number of passes of the entire training dataset the
machine learning algorithm has completed. Datasets are usually grouped into batches

Window size, as I know it, is the length of a (sliding) cutout of a time sequence of
data.E.g., if you have data x(t) that you want to model, you could use a k-size window
x(n), x(n+1), ..., x(n+k). This is a method commonly used in non-recursive approximators.

The Adaptive Moment Estimation (Adam) optimizer [46] was used to minimize
the Categorical Cross Entropy (CCE) loss function. All the models were trained for 50
epochs with a batch size of 128.

ML model:
Ada boost
AdaBoost, short for Adaptive Boosting, is a statistical classification meta-algorithm. It
can be used with many other types of learning algorithms to improve performance
BN-D
classifier is a probabilistic classifier which means that given an input, it predicts the
probability of the input being classified for all the classes. It is also called conditional
probability.
LGBM
LightGBM, short for Light Gradient Boosting Machine, is a free and open source
distributed gradient boosting framework for machine learning It is based on decision tree
algorithms and used for ranking, classification and other machine learning tasks. The
development focus is on performance
KNN
In statistics, the k-nearest neighbors algorithm is a nonparametric classification metho. It
is used for classification and regression.

nonparametric machine learning algorithms:


Algorithms that do not make strong assumptions about the form of the mapping
function.

LSTM stands for long short-term memory networks, used in the field of Deep Learning.
It is a variety of recurrent neural networks (RNNs) that are capable of learning long-term
dependencies, especially in sequence prediction problems.

You might also like