Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

1.3 Impact of Applying Data Science in Business Scenario

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Applied Data Science

Dr. Payal Varangaonkar

Assistant Professor
Department of Electronics and Computer Engineering
K. J. Somaiya College of Engineering
Somaiya Vidyavihar University

07/06/2024 1
Impact of applying data science in business
scenario

07/06/2024 2
Impact of applying data science in
business scenario
• Data science for business decision-making: Companies
can evaluate the efficiency of their operations by basing
their reporting on accurate and up-to-date data. Business
intelligence provides essential data on the company's
recent and historical productivity as well as future
projections, anticipated demands, shopping habits, etc., to
assist businesses in making informed decisions on
important planning. Business analytics teams make sure
the organization gets real-time, enhanced reports so it can
use the data available to operate the business more
effectively.

07/06/2024 3
Impact of applying data science in
business scenario
• Making quality products: Businesses require data to
maximize product development following the needs and
expectations of the client. With the help of data analysis,
companies develop the best products.

07/06/2024 4
Impact of applying data science in
business scenario
• Effective business management: Small and large
businesses can efficiently manage their operations and
develop themselves through data science. Using data
science, companies can predict the success of their
strategies.

07/06/2024 5
Impact of applying data science in
business scenario
• Forecasting using predictive analysis: Forecasting is a
significant application of data science in business.
Companies use analytical tools and technologies to develop
their data mining proficiency. Businesses can use predictive
analysis to extract insights that could affect their operations
and then take the appropriate action.

07/06/2024 6
Impact of applying data science in
business scenario
• Leveraging data for business decisions: Without surveys,
businesses would make poor decisions and incur losses
since projection is necessary for businesses to understand
the future.

07/06/2024 7
Impact of applying data science in
business scenario
• Evaluating business resolutions: Companies can make
accurate business decisions quickly by projecting future
events and trends. How the resolutions that are
implemented affect their growth and performance the
business should be fully aware of that.

07/06/2024 8
Impact of applying data science in
business scenario
• Fraud and risk management: Due to their expertise,
data scientists can recognize data that stands out. Then
they can create a network, a path, and data-driven
ways that foresee fraud.

07/06/2024 9
Impact of applying data science in
business scenario
• Recruiting automation: In an era of intense competition
for high performers, these organizations understand that
the usual process for hire simply doesn't work as effectively
as it once did. In comparison to the objectives they are
attempting to attain, these businesses seek to produce
greater results in less time and frequently with fewer
resources.

07/06/2024 10
Need of Estimation and Validation in
Data Science
• There is always a need to validate the stability of your
machine learning model. I mean you just can’t fit the
model to your training data and hope it would
accurately work for the real data it has never seen
before. You need some kind of assurance that your
model has got most of the patterns from the data
correct, and its not picking up too much on the
noise, or in other words its low on bias and variance.

07/06/2024 11
Need of Estimation and Validation in
Data Science
Validation
• This process of deciding whether the numerical results quantifying
hypothesized relationships between variables, are acceptable as
descriptions of the data, is known as validation.
• Generally, an error estimation for the model is made after training,
better known as evaluation of residuals. In this process, a numerical
estimate of the difference in predicted and original responses is done,
also called the training error. However, this only gives us an idea about
how well our model does on data used to train it. Now its possible that
the model is underfitting or overfitting the data.
• So, the problem with this evaluation technique is that it does not
give an indication of how well the learner will generalize to an
independent/ unseen data set. Getting this idea about our model is
known as Cross Validation.

07/06/2024 12
Need of Estimation and Validation in
Data Science
Holdout Method
Now a basic remedy for this involves removing a part of the training data
and using it to get predictions from the model trained on rest of the
data. The error estimation then tells how our model is doing on unseen
data or the validation set. This is a simple kind of cross validation
technique, also known as the holdout method. Although this method
doesn’t take any overhead to compute and is better than traditional
validation, it still suffers from issues of high variance. This is because it
is not certain which data points will end up in the validation set and
the result might be entirely different for different sets.

07/06/2024 13
Need of Estimation and Validation in
Data Science
• K-Fold Cross Validation
• As there is never enough data to train your model, removing a part of it for
validation poses a problem of underfitting. By reducing the training data, we
risk losing important patterns/ trends in data set, which in turn increases
error induced by bias. So, what we require is a method that provides ample
data for training the model and also leaves ample data for validation. K Fold
cross validation does exactly that.
• In K Fold cross validation, the data is divided into k subsets. Now the holdout
method is repeated k times, such that each time, one of the k subsets is used
as the test set/ validation set and the other k-1 subsets are put together to
form a training set. The error estimation is averaged over all k trials to get
total effectiveness of our model. As can be seen, every data point gets to be in a
validation set exactly once, and gets to be in a training set k-1 times. This
significantly reduces bias as we are using most of the data for fitting, and
also significantly reduces variance as most of the data is also being used in
validation set. Interchanging the training and test sets also adds to the
effectiveness of this method. As a general rule and empirical evidence, K =
5 or 10 is generally preferred, but nothing’s fixed and it can take any value.

07/06/2024 14
K- Fold Cross Validation

07/06/2024 15
Errors in Machine Learning
While making predictions, a Low Bias: A low bias model will make fewer
difference occurs between assumptions about the form of the target
prediction values made by the function.
model and actual values/expected High Bias: A model with a high bias makes
values, and this difference is more assumptions, and the model becomes
known as bias errors or Errors due unable to capture the important features of
to bias. our dataset. A high bias model also cannot
perform well on new data.

Low variance means there is a small variation


in the prediction of the target function with
changes in the training data set. At the same
time, High variance shows a large variation in
the prediction of the target function with
changes in the training dataset.

A model that exhibits small variance and high bias will underfit the target, while a
model with high variance and little bias will overfit the target. A model with high
variance may represent the data set accurately but could lead to overfitting to noisy or
otherwise unrepresentative training data.

07/06/2024 16
Question
?

07/06/2024 17

You might also like