Advanced Data Analytics
Advanced Data Analytics
Advanced Data Analytics
Data mining. The data mining process sorts through large data sets to identify
patterns and establish relationships. It's a key part of successful analytics
operations because BI and advanced analytics applications use the data that
mining generates to solve problems. It has applications across a variety of
industries including healthcare, government, scientific research, mathematics and
sports.
Time series analysis. Time series analysis focuses on data changes over time. It
looks at patterns, trends and cycles in the data to predict future points. For
instance, a retailer might use time series analysis to forecast future sales based on
past sales data. The results can help the retailer plan stock levels and manage
resources efficiently.
Big data analytics. Big data analytics is the process of examining large volumes
of structured, semistructured and unstructured data to uncover information such
as hidden patterns, correlations, market trends and customer preferences. It uses
analytics systems to power predictive models, statistical algorithms and what-if
analysis.
Hiring new staff brings in advanced skills quickly, but it costs more -- data
scientists are expensive to hire now. And hiring new staff can pose integration
challenges: New staff with specialized skills bring immediate access to advanced
capabilities, but they'll need time to understand the nuances of the business and
its data.
A mix of skills is essential in an analytics team. Each data team needs people who
can understand the data, interpret the analysis and translate insights into business
strategies. Perhaps one or two strategic hires can integrate effectively into an
existing team and help bring team members up to speed.
Proprietary tools
On the proprietary side, vendors such as Microsoft, IBM and the SAS Institute
all offer advanced analytics tools. Most require a deep technical background and
understanding of mathematical techniques.
Discover the different types of data analytics technologies and tools used across
various industries, including statistical analysis tools, business intelligence tools,
database management tools, and machine learning tools. Learn how these tools
are used for finance and investing, academic research, and in broader business
contexts.
Key Takeaways
Data analytics technologies are any data science tool or software used to
manipulate and analyze a dataset. While some data analytics technologies use
complex algorithms and artificial intelligence to draw conclusions about a
dataset, more traditional tools rely on statistical analysis and mathematical
calculations to return more descriptive analytics.
There are several common types of data analytics technologies depending on the
industry that you are in and the type of data that you are working on. Generally,
data analytics technologies can be categorized as statistical analysis tools,
business intelligence tools, database management tools, and machine learning
tools. While these categories do not include all of the tools available to data
scientists, the following list offers a general introduction to data analytics
technologies and their uses within the industry.
Data analytics technologies that rely on statistical analysis are the most traditional
tools within the data science industry, and they are also the most common tools
used for finance and investing, as well as academic research. From beginner-
friendly spreadsheet software, which is useful for returning descriptive analytics,
to more advanced software, which focuses on prescriptive analytics, statistical
analysis tools have many uses across fields. As the name suggests, statistical
analysis tools rely on mathematical functions and formulas in order to learn more
about a dataset. But, these tools can also require the use of statistics-friendly
programming languages like R. And, while many statistical analysis tools only
require the use of calculations and theories to analyze data, many of these tools
now include features that automate the data analysis process. In addition, many
of these tools can transform data analytics into visualizations, such as charts and
graphs, as well as organize and clean data.
Business intelligence (BI) tools have also gained considerable popularity within
the data science industry because these tools are more versatile than most
traditional data analytics technologies. And, while the name might suggest
otherwise, business intelligence tools are not just used in business and finance,
but also for any team or individual that requires some form of comparative data
analysis—the process of analyzing different parts of a single dataset or comparing
multiple datasets at the same time or in the same space. So, many business
intelligence tools can be used to not only analyze a dataset but also to create
reports and dashboards which compare several sheets and workbooks full of data.
Many data scientists utilize BI tools when they have a larger collection of data,
but also when they need to make decisions by weighing the costs, benefits, and
risks of various scenarios and potential outcomes.
Prior to analyzing data, data scientists must collect and store that data, which is
generally when database management tools are used. Database management tools
primarily include software for managing and manipulating data, as well as storing
that data in a secure database management system. However, database
management tools have features that go beyond data storage and include tools for
exploratory and diagnostic data analysis. In addition, most database management
tools are categorized based on data type, with some tools relying on SQL
databases and others utilizing NoSQL databases. But, regardless of the
designation, database management tools allow data scientists to search and query
data, as well as pinpoint any early data analytics issues within a dataset, such as
missing values or other errors.
Examples of Database Management Tools:Microsoft SQL Server, MySQL,
and MongoDB
UNIT – 5
Machine learning is further divided into categories based on the data on which
we are training our model.
• Supervised Learning – This method is used when we have Training
data along with the labels for the correct answer.
• Unsupervised Learning – In this task our main objective is to find
the patterns or groups in the dataset at hand because we don’t have any
particular labels in this dataset.
•
What is Deep Learning?
Deep learning, on the other hand, is a subset of machine learning that uses neural
networks with multiple layers to analyze complex patterns and relationships in
data. It is inspired by the structure and function of the human brain and has been
successful in a variety of tasks, such as computer vision, natural language
processing, and speech recognition.
Deep learning models are trained using large amounts of data and algorithms
that are able to learn and improve over time, becoming more accurate as they
process more data. This makes them well-suited to complex, real-world
problems and enables them to learn and adapt to new situations.
Now let’s look at the difference between Machine Learning and Deep
Learning:
S.
No. Machine Learning Deep Learning
12. Its model takes less time in A huge amount of time is taken
training due to its small size. because of very big data points.
15. The results of an ML model are The results of deep learning are
easy to explain. difficult to explain.
CLUSTERING:
What is Clustering ?
The task of grouping data points based on their similarity with each other is
called Clustering or Cluster Analysis. This method is defined under the branch
of Unsupervised Learning, which aims at gaining insights from unlabelled data
points, that is, unlike supervised learning we don’t have a target variable.
Clustering aims at forming groups of homogeneous data points from a
heterogeneous dataset. It evaluates the similarity based on a metric like
Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group
the points with highest similarity score together.
For Example, In the graph given below, we can clearly see that there are 3
circular clusters forming on the basis of distance.
Now it is not necessary that the clusters formed must be circular in shape. The
shape of clusters can be arbitrary. There are many algortihms that work well
with detecting arbitrary shaped clusters.
For example, In the below given graph we can see that the clusters formed are
not circular in shape.
Types of Clustering:
Broadly speaking, there are 2 types of clustering that can be performed to group
similar data points:
• Hard Clustering: In this type of clustering, each data point belongs
to a cluster completely or not. For example, Let’s say there are 4 data
point and we have to cluster them into 2 clusters. So each data point
will either belong to cluster 1 or cluster 2.
Data Points Clusters
A C1
B C2
C C2
D C1
• Soft Clustering: In this type of clustering, instead of assigning each
data point into a separate cluster, a probability or likelihood of that
point being that cluster is evaluated. For example, Let’s say there are
4 data point and we have to cluster them into 2 clusters. So we will be
evaluating a probability of a data point belonging to both clusters. This
probability is calculated for all data points.
Data Points Probability of C1 Probability of C2
A 0.91 0.09
B 0.3 0.7
C 0.17 0.83
D 1 0
Uses of Clustering:
Now before we begin with types of clustering algorithms, we will go through
the use cases of Clustering algorithms. Clustering algorithms are majorly used
for:
• Market Segmentation – Businesses use clustering to group their
customers and use targeted advertisements to attract more audience.
• Market Basket Analysis – Shop owners analyze their sales and figure
out which items are majorly bought together by the customers. For
example, In USA, according to a study diapers and beers were usually
bought together by fathers.
• Social Network Analysis – Social media sites use your data to
understand your browsing behaviour and provide you with targeted
friend recommendations or content recommendations.
• Medical Imaging – Doctors use Clustering to find out diseased areas
in diagnostic images like X-rays.
• Anomaly Detection – To find outliers in a stream of real-time dataset
or forecasting fraudulent transactions we can use clustering to identify
them.
• Simplify working with large datasets – Each cluster is given a cluster
ID after clustering is complete. Now, you may reduce a feature set’s
whole feature set into its cluster ID. Clustering is effective when it can
represent a complicated case with a straightforward cluster ID. Using
the same principle, clustering data can make complex datasets simpler.
There are many more use cases for clustering but there are some of the major
and common use cases of clustering. Moving forward we will be discussing
Clustering Algorithms that will help you perform the above tasks.
The association rule learning is one of the very important concepts of machine
learning, and it is employed in Market Basket analysis, Web usage mining,
continuous production, etc. Here market basket analysis is a technique used by
the various big retailer to discover the associations between items. We can
understand it by taking an example of a supermarket, as in a supermarket, all
products that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs,
or milk, so these products are stored within a shelf or mostly nearby. Consider
the below diagram:
Association rule learning can be divided into three types of algorithms:
ADVERTISEMENT
1. Apriori
2. Eclat
3. F-P Growth Algorithm
Association rule learning works on the concept of If and Else Statement, such as
if A then B.
Support
Confidence
Confidence indicates how often the rule has been found to be true. Or how often
the items X and Y occur together in the dataset when the occurrence of X is
already given. It is the ratio of the transaction that contains X and Y to the number
of records that contain X.
Lift
It is the ratio of the observed support measure and expected support if X and Y
are independent of each other. It has three possible values:
Apriori Algorithm
It is mainly used for market basket analysis and helps to understand the products
that can be bought together. It can also be used in the healthcare field to find drug
reactions for patients.
Eclat Algorithm
The F-P growth algorithm stands for Frequent Pattern, and it is the improved
version of the Apriori Algorithm. It represents the database in the form of a tree
structure that is known as a frequent pattern or tree. The purpose of this frequent
tree is to extract the most frequent patterns.
It has various applications in machine learning and data mining. Below are some
popular applications of association rule learning:
This is the simplest form of linear regression, and it involves only one
independent variable and one dependent variable. The equation for simple linear
regression is:
[Tex]y=\beta_{0}+\beta_{1}X [/Tex]
where:
• β0 is the intercept
• β1 is the slope
This involves more than one independent variable and one dependent variable.
The equation for multiple linear regression is:
[Tex]y=\beta_{0}+\beta_{1}X+\beta_{2}X+.........\beta_{n}X [/Tex]
where:
• β0 is the intercept
The goal of the algorithm is to find the best Fit Line equation that can predict
the values based on the independent variables.
In regression set of records are present with X and Y values and these values are
used to learn a function so if you want to predict Y from an unknown X this
learned function can be used. In regression we have to find the value of Y, So, a
function is required that predicts continuous Y in the case of regression given X
as independent features.
Our primary objective while using linear regression is to locate the best-fit line,
which implies that the error between the predicted and actual values should be
kept to a minimum. There will be the least error in the best-fit line.
The best Fit Line equation provides a straight line that represents the relationship
between the dependent and independent variables. The slope of the line indicates
how much the dependent variable changes for a unit change in the independent
variable(s).
Linear Regression
Linear regression performs the task to predict a dependent variable value (y)
based on a given independent variable (x)). Hence, the name is Linear Regression.
In the figure above, X (input) is the work experience and Y (output) is the salary
of a person. The regression line is the best-fit line for our model.
We utilize the cost function to compute the best values in order to get the best fit
line since different values for weights or the coefficient of lines result in different
regression lines.
As we have assumed earlier that our independent feature is the experience i.e X
and the respective salary Y is the dependent variable. Let's assume there is a linear
relationship between X and Y then the salary can be predicted using:
[Tex]\hat{Y} = \theta_1 + \theta_2X [/Tex]
OR
• [Tex]\hat{y_i} \epsilon \hat{Y} \;\; (i= 1,2, \cdots , n) [/Tex] are the
predicted values.
The model gets the best regression fit line by finding the best θ1 and θ2 values.
• θ1: intercept
• θ2: coefficient of x
Once we find the best θ1 and θ2 values, we get the best-fit line. So when we are
finally using our model for prediction, it will predict the value of y for the input
value of x.
To achieve the best-fit regression line, the model aims to predict the target
value [Tex]\hat{Y} [/Tex] such that the error difference between the predicted
value [Tex]\hat{Y} [/Tex] and the true value Y is minimum. So, it is very
important to update the θ1 and θ2 values, to reach the best value that minimizes
the error between the predicted y value (pred) and the true y value (y).
[Tex]minimize\frac{1}{n}\sum_{i=1}^{n}(\hat{y_i}-y_i)^2 [/Tex]
The cost function or the loss function is nothing but the error or difference
between the predicted value [Tex]\hat{Y} [/Tex] and the true value Y.
Linear regression is a powerful tool for understanding and predicting the behavior
of a variable, however, it needs to meet a few conditions in order to be accurate
and dependable solutions.
For Multiple Linear Regression, all four of the assumptions from Simple Linear
Regression apply. In addition to this, below are few more:
4. Overfitting: Overfitting occurs when the model fits the training data too
closely, capturing noise or random fluctuations that do not represent the
true underlying relationship between variables. This can lead to poor
generalization performance on new, unseen data.
Multicollinearity
Mean Squared Error (MSE) is an evaluation metric that calculates the average of
the squared differences between the actual and predicted values for all the data
points. The difference is squared to ensure that negative and positive differences
don't cancel each other out.
• [Tex]\widehat{y_{i}} [/Tex] is the predicted value for the ith data point.
Here,
Lower MAE value indicates better model performance. It is not sensitive to the
outliers as we consider absolute differences.
The square root of the residuals' variance is the Root Mean Squared Error. It
describes how well the observed data points match the expected values, or the
model's absolute fit to the data.
RSME is not as good of a metric as R-squared. Root Mean Squared Error can
fluctuate when the units of the variables vary since its value is dependent on the
variables' units (it is not a normalized measure).
R-Squared is a statistic that indicates how much variation the developed model
can explain or capture. It is always in the range of 0 to 1. In general, the better the
model matches the data, the greater the R-squared number.
In mathematical notation, it can be expressed as:
[Tex]R^{2}=1-(^{\frac{RSS}{TSS}}) [/Tex]
• Residual sum of Squares (RSS): The sum of squares of the residual for
each data point in the plot or data is known as the residual sum of squares,
or RSS. It is a measurement of the difference between the output that was
observed and what was anticipated.
[Tex]RSS=\sum_{i=2}^{n}(y_{i}-b_{0}-b_{1}x_{i})^{2} [/Tex]
• Total Sum of Squares (TSS): The sum of the data points' errors from the
answer variable's mean is known as the total sum of squares, or TSS.
[Tex]TSS= \sum_{}^{}(y-\overline{y_{i}})^2 [/Tex]
• the first term is the least squares loss, representing the squared difference
between predicted and actual values.
• the first term is the least squares loss, representing the squared difference
between predicted and actual values.
• the second term is the L1 regularization term, it penalizes the sum of square
of values of the regression coefficient θj.
• Linear
regression is a well-established algorithm with a rich history and is
widely available in various machine learning libraries and software
packages.
LOGISTIC REGRESSION:
What is Logistic Regression?
Logistic regression is used for binary classification where we use sigmoid
function, that takes input as independent variables and produces a probability
value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to Class
1 otherwise it belongs to Class 0. It’s referred to as regression because it is the
extension of linear regression but is mainly used for classification problems.
Key Points:
• Logistic regression predicts the output of a categorical dependent
variable. Therefore, the outcome must be a categorical or discrete
value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of
giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit an
“S” shaped logistic function, which predicts two maximum values (0
or 1).
•
Logistic Function – Sigmoid Function
• The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
• It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the “S” form.
• The S-form curve is called the Sigmoid function or the logistic
function.
• In logistic regression, we use the concept of the threshold value,
which defines the probability of either 0 or 1. Such as values above the
threshold value tends to 1, and a value below the threshold values tends
to 0.
•
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three
types:
1. Binomial: In binomial Logistic regression, there can be only two
possible types of the dependent variables, such as 0 or 1, Pass or Fail,
etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or
more possible unordered types of the dependent variable, such as
“cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more
possible ordered types of dependent variables, such as “low”,
“Medium”, or “High”.
4.
Assumptions of Logistic Regression
We will explore the assumptions of logistic regression as understanding these
assumptions is important to ensure that we are using appropriate application of
the model. The assumption include:
1. Independent observations: Each observation is independent of the
other. meaning there is no correlation between any input variables.
2. Binary dependent variables: It takes the assumption that the
dependent variable must be binary or dichotomous, meaning it can
take only two values. For more than two categories SoftMax functions
are used.
3. Linearity relationship between independent variables and log odds:
The relationship between the independent variables and the log odds
of the dependent variable should be linear.
4. No outliers: There should be no outliers in the dataset.
5. Large sample size: The sample size is sufficiently large
6.
Terminologies involved in Logistic Regression
Here are some common terms involved in logistic regression:
• Independent variables: The input characteristics or predictor factors
applied to the dependent variable’s predictions.
• Dependent variable: The target variable in a logistic regression
model, which we are trying to predict.
• Logistic function: The formula used to represent how the
independent and dependent variables relate to one another. The
logistic function transforms the input variables into a probability value
between 0 and 1, which represents the likelihood of the dependent
variable being 1 or 0.
• Odds: It is the ratio of something occurring to something not
occurring. it is different from probability as the probability is the ratio
of something occurring to everything that could possibly occur.
• Log-odds: The log-odds, also known as the logit function, is the
natural logarithm of the odds. In logistic regression, the log odds of
the dependent variable are modeled as a linear combination of the
independent variables and the intercept.
• Coefficient: The logistic regression model’s estimated parameters,
show how the independent and dependent variables relate to one
another.
• Intercept: A constant term in the logistic regression model, which
represents the log odds when all independent variables are equal to
zero.
• Maximum likelihood estimation: The method used to estimate the
coefficients of the logistic regression model, which maximizes the
likelihood of observing the data given the model.
•
How does Logistic Regression work?
The logistic regression model transforms the linear regression function
continuous value output into categorical value output using a sigmoid function,
which maps any real-valued set of independent variables input into a value
between 0 and 1. This function is known as the logistic function.
Let the independent input features be:
𝑋=[𝑥11 …𝑥1𝑚𝑥21 …𝑥2𝑚 ⋮⋱ ⋮ 𝑥𝑛1 …𝑥𝑛𝑚]X=x11 x21 ⋮xn1 ……⋱ …x1m
x2m⋮ xnm
and the dependent variable is Y having only binary value i.e. 0 or 1.
𝑌={0 if 𝐶𝑙𝑎𝑠𝑠11 if 𝐶𝑙𝑎𝑠𝑠2Y={01 if Class1 if Class2
then, apply the multi-linear function to the input variables X.
𝑧=(∑𝑖=1𝑛𝑤𝑖𝑥𝑖)+𝑏z=(∑i=1nwixi)+b
Here 𝑥𝑖xi is the ith observation of X, 𝑤𝑖=[𝑤1,𝑤2,𝑤3,⋯,𝑤𝑚]wi=[w1,w2,w3
,⋯,wm] is the weights or Coefficient, and b is the bias term also known as
intercept. simply this can be represented as the dot product of weight and bias.
𝑧=𝑤⋅𝑋+𝑏z=w⋅X+b
whatever we discussed above is the linear regression.
Sigmoid Function
Now we use the sigmoid function where the input will be z and we find the
probability between 0 and 1. i.e. predicted y.
𝜎(𝑧)=11+𝑒−𝑧σ(z)=1+e−z1
Sigmoid function
As shown above, the figure sigmoid function converts the continuous variable
data into the probability i.e. between 0 and 1.
• 𝜎(𝑧) σ(z) tends towards 1 as 𝑧→∞z→∞
• 𝜎(𝑧) σ(z) tends towards 0 as 𝑧→−∞z→−∞
• 𝜎(𝑧) σ(z) is always bounded between 0 and 1
where the probability of being a class can be measured as:
𝑃(𝑦=1)=𝜎(𝑧)𝑃(𝑦=0)=1−𝜎(𝑧)P(y=1)=σ(z)P(y=0)=1−σ(z)
Python
# import the necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
The difference between linear regression and logistic regression is that linear
regression output is the continuous value that can be anything while logistic
regression predicts the probability that an instance belongs to a given class or
not.
Linear Regression Logistic Regression
It required linear
relationship between It not required linear
dependent and independent relationship.
variables.
There are times when we would like to analyze the effect of different independent
features on the target or what we say dependent features. This helps us make
decisions that can affect the target variable in the desired direction. Regression
analysis is heavily based on statistics and hence gives quite reliable results to this
reason only regression models are used to find the linear as well as non-linear
relation between the independent and the dependent or target variables.
Along with the development of the machine learning domain regression analysis
techniques have gained popularity as well as developed manifold from just y =
mx + c. There are several types of regression techniques, each suited for different
types of data and different types of relationships. The main types of regression
techniques are:
1. Linear Regression
2. Polynomial Regression
3. Stepwise Regression
4. Decision Tree Regression
5. Random Forest Regression
6. Support Vector Regression
7. Ridge Regression
8. Lasso Regression
9. ElasticNet Regression
10.Bayesian Linear Regression
Linear Regression
Linear regression is used for predictive analysis. Linear regression is a linear
approach for modeling the relationship between the criterion or the scalar
response and the multiple predictors or explanatory variables. Linear regression
focuses on the conditional probability distribution of the response given the
values of the predictors. For linear regression, there is a danger of overfitting. The
formula for linear regression is:
Syntax:
y = θx + b
where,
• θ – It is the model weights or parameters
• b – It is known as the bias.
This is the most basic form of regression analysis and is used to model a linear
relationship between a single dependent variable and one or more independent
variables.
Here, a linear regression model is instantiated to fit a linear relationship between
input features (X) and target values (y). This code is used for simple
demonstration of the approach.
Polynomial Regression
This is an extension of linear regression and is used to model a non-linear
relationship between the dependent variable and independent variables. Here as
well syntax remains the same but now in the input variables we include some
polynomial or higher degree terms of some already existing features as well.
Linear regression was only able to fit a linear model to the data at hand but
with polynomial features, we can easily fit some non-linear relationship between
the target as well as input features.
Here is the code for simple demonstration of the Polynomial regression approach.
Stepwise Regression
Stepwise regression is used for fitting regression models with predictive models.
It is carried out automatically. With each step, the variable is added or subtracted
from the set of explanatory variables. The approaches for stepwise regression are
forward selection, backward elimination, and bidirectional elimination. The
formula for stepwise regression is
Here is the code for simple demonstration of the stepwise regression approach.
Ridge Regression
Ridge regression is a technique for analyzing multiple regression data. When
multicollinearity occurs, least squares estimates are unbiased. This is a
regularized linear regression model, it tries to reduce the model complexity by
adding a penalty term to the cost function. A degree of bias is added to the
regression estimates, and as a result, ridge regression reduces the standard errors.
Here is the code for simple demonstration of the Ridge regression approach.
from sklearn.linear_model import Ridge
Lasso Regression
Lasso regression is a regression analysis method that performs both variable
selection and regularization. Lasso regression uses soft thresholding. Lasso
regression selects only a subset of the provided covariates for use in the final
model.
This is another regularized linear regression model, it works by adding a penalty
term to the cost function, but it tends to zero out some features’ coefficients,
which makes it useful for feature selection.
Here is the code for simple demonstration of the Lasso regression approach.
ElasticNet Regression
Linear Regression suffers from overfitting and can’t deal with collinear data.
When there are many features in the dataset and even some of them are not
relevant to the predictive model. This makes the model more complex with a too-
inaccurate prediction on the test set (or overfitting). Such a model with high
variance does not generalize on the new data. So, to deal with these issues, we
include both L-2 and L-1 norm regularization to get the benefits of both Ridge
and Lasso at the same time. The resultant model has better predictive power than
Lasso. It performs feature selection and also makes the hypothesis simpler. The
modified cost function for Elastic-Net Regression is given below
where,
• w(j) represents the weight for the jth feature.
• n is the number of features in the dataset.
• lambda1 is the regularization strength for the L1 norm.
• lambda2 is the regularization strength for the L2 norm.
Here is the code for simple demonstration of the Elasticnet regression approach.