Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 5 BDTT

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

UNIT 5

ENTERPRISE DATA SCIENCE


Enterprise data science combines data scientists, data engineers, software
engineering, information architects, IT teams, and more to generate value out of big
data.
Key features:

1. Enterprise data science involves cloud computing, because laptops and on-
prem servers can’t process the huge datasets that are needed and end up
generating bottlenecks. Serverless data pipelines remove the headache of
managing infrastructure.
2. Data science for enterprise applies data insights and predictions to the
entire organization to provoke strategic, holistic change, instead of using
data analytics on a case-by-case basis.
3. Enterprise data science uses ML and cloud computing to drive
change across the organization. Data science that’s carried out in isolation
often results in abstruse models that don’t serve a practical business purpose.

Need for Enterprise data science


Businesses want to tap into big data to improve strategic decision-making, but big
data is larger than we realize. Enterprises have massive amounts of uncategorized,
unindexed, inaccessible data, and they need data science pipelines to convert it into
insights and predictions that add value to the business.
Who uses enterprise data science?
Enterprise data automation involves citizen data scientists, ML experts, software
engineers, data scientists, analysts, information architects, and more. To succeed,
you’ll also need input from domain experts like C-suite executives who can help
identify the most relevant datasets.

An enterprise data science platform is particularly relevant when corporations have


a large amount of semi-structured, structured, and unstructured data that they want
to combine to generate predictions that guide business decision-making. It brings
benefits to enterprises across industries and verticals.

There are many use cases for a data science platform:

 When you have too much data coming from too many disparate sources and
you need to combine them without slowing down your analytics machines
 When data is incomplete, imprecise, inaccurate, or varies too much between
datasets.
 When you need to accelerate knowledge discovery across the organization
 When you need to flatten the silos that obstruct departments from accessing
the data insights that power better decisions

Benefits of Enterprise Data Science


Speed up time to valuable insights
An enterprise data science platform provides a fast, effective way for businesses to process
and analyze piles of data, even when the data types vary and the data comes from
disparate sources. The right platform displays better data visualization to assist
departments in mining actionable insights from their data.

Mitigate risk and fraud

Data pipeline automation helps enterprises to mine data in real time so that they can
prevent fraud. By spotting risks long in advance, business leaders can develop ways
to mitigate, prevent, and soften their impact to preserve the enterprise safely.

Improve product-market fit

With serverless data pipelines, enterprises can explore the pain points and needs of their
target market, forecast demand, and apply data-driven predictions to create a better,
more relevant product.

Cut the right costs

With the help of data pipeline automation, enterprises can analyze their
organizational costs and expenditure to find the best places to make efficiencies and
reduce costs. These insights guide you to allocate resources in the right way so that
you can cut costs without undermining productivity or performance.

DATA SCIENCE SOLUTIONS IN THE ENTERPRISE

In biomedical research and health, advanced Data Science and big data analytics
techniques are used for

 increasing online revenue,

 reducing customer complaints, and

 enhancing customer experience through personalized services.

In the hospitality and food services industries, once again big data analytics is
used for

 studying customers’ behavior through shopping data,


 to improve organizational effectiveness.

In the insurance sector,

 for analyzing large volumes of data at high speed during the underwriting
stage.

 Insurance claims analysts now have access to algorithms that help identify
fraudulent behaviors.

Data Science teams can now train deep learning systems to identify contracts and
invoices from a stack of documents, as well as perform different types of
identification for the information.

Big data analytics has the potential to unlock great insights into data across social
media channels and platforms

enabling marketing, customer support, and advertising to improve and be more


aligned with corporate goals.

Big data analytics make research results better, and helps organizations use
research more effectively by allowing them to identify specific test cases and user
settings.

Specialized Data Science Use Cases with Examples

Data cleansing:

In Data Science, the first step is data cleansing, which involves identifying and
cleaning up any incorrect or incomplete data sets.

Data cleansing is critical to identify errors and inconsistencies that can skew your
data analysis and lead to poor business decisions.

The most important thing about data cleansing is that it’s an ongoing process.

Business data is always changing, which means the data you have today might
not be correct tomorrow. T

he best data scientists know that data cleansing isn’t done just once; it’s an
ongoing process that starts with the very first data set you collect.

Prediction and forecasting:


The next step in Data Science is data analysis, prediction, and forecasting.

You can do this on an individual level or on a larger scale for your entire customer
base.

Prediction and forecasting helps you understand how your customers behave and
what they may do next.

You can use these insights to create better products, marketing campaigns, and
customer support.

Fraud detection:

Fraud detection is a highly specialized use of Data Science that relies on many
techniques to identify inconsistencies.

With fraud detection, you’re trying to find any transactions that are incorrect or
fraudulent.

It’s an important use case because it can significantly reduce the costs of business
operations.

Data Science for business growth:

Every business wants to grow, and this is a natural outcome of doing business.

Yet many businesses struggle to keep up with their competitors.

Data Science can help you understand your potential customers and improve your
services.

It can also help you identify new opportunities and explore different areas you
can expand into.

Use Data Science to identify your target audience and their needs.

Then create products and services that serve those needs better than your
competitors can.

You can also use Data Science to identify new markets, explore new areas for
growth, and expand into new industries.
MACHINE LEARNING AND AI INFRASTRUCTURE
SOLUTIONS
Machine learning is currently being used effectively in enterprise software to
automate routine processes. Rather than starting with the data, the initial step is
defining a business issue to solve and then the exact process to enhance. Explore
the problems first by discussing essential questions with your specialists who
know the processes from A to Z.
 Does the business problem you are trying to solve look like a process?

 Is there a clear beginning and an end with logical steps in between?

 How are the decisions made, and what kinds of data are used to make
those decisions?

the processes more suitable for AI implementation have the following features:
1. mostly routine and standard

2. possibility to add new data to train and improve an ML model, updating


the learning loop

3. interactions with the physical world are nonexistent or simple

4. performed frequently and require many people

5. could be improved if there was more time for analysis before decision-
making

6. could be improved if all relevant data sources could be taken into account
ou have to discuss four main questions to identify the right process to enhance
with AI.
 Could the process be automated?
 Should the process be automated?
 Should you use AI to automate the process?
 Can you use AI to automate the process?

Problem-first approach
According to Gartner’s hype cycle, different AI and ML technologies are
currently trending. However, most people should understand that machine
learning cannot be a solution to every problem. Moreover, you cannot solve the
problem simply by feeding training data to machine learning algorithms and
expecting it to magically deliver perfect business results.

First, analyze the steps in your business processes that require any prediction or
judgment. Then find out where any (even slight) improvement in the accuracy of
those predictions would significantly benefit the business results.
The number of enterprises with more than five machine learning use cases
increased by 74% over a year. The top priorities have been customer service,
safety, and process automation.
For example, AI image recognition is used for quality control, whereas ML
models detecting objects in video identify employees under-equipped for specific
processes. Furthermore, by developing natural language processing (NLP) in
customer service, enterprises have improved client experience and enhanced
retention.

However, several challenges remain for all enterprise machine-learning projects

 Enterprises are usually already equipped with different technologies and


complex machines. Such legacy components need integration with a new
machine-learning solution. Hence, any changes in processes require
approval and testing time.

 Enterprises have processes for employees only and processes for customers.
Thus, generally, they require both employee-facing and customer-facing
apps. This means machine learning and data science teams should pay
attention to both and consider their integration.
 The bigger the enterprise is, the more complex infrastructure it has.
Infrastructure peculiarities could bring issues with data collection and
storage or machine learning model implementation. For example, necessary
data produced on the production floor might be needed for ML models at
customer service based elsewhere. As cloud services are not always a
solution for enterprises (due to security or processes’ structure), transferring
data could become a struggle for data scientists when it is siloed, as in the
example.

 New technology adoption could be a challenge as well. Changes are always


complex, but, for larger enterprises, such issues become exponentially more
complicated.

DATA VISUALIZATION USING PYTHON


Data visualization provides a good, organized pictorial representation of the data
which makes it easier to understand, observe, analyze. In this tutorial, we will
discuss how to visualize data using Python.
Python provides various libraries that come with different features for visualizing
data. All these libraries come with different features and can support various types
of graphs. In this tutorial, we will be discussing four such libraries.
 Matplotlib
 Seaborn
 Bokeh
 Plotly

Matplotlib

Matplotlib is an easy-to-use, low-level data visualization library that is built on


NumPy arrays. It consists of various plots like scatter plot, line plot, histogram,
etc. Matplotlib provides a lot of flexibility.
To install this type the below command in the terminal.
pip install matplotlib

Scatter Plot

Scatter plots are used to observe relationships between variables and uses dots to
represent the relationship between them. The scatter() method in the matplotlib
library is used to draw a scatter plot.
Line Chart

Line Chart is used to represent a relationship between two data X and Y on a


different axis. It is plotted using the plot() function. Let’s see the below example.

Bar Chart

A bar plot or bar chart is a graph that represents the category of data with
rectangular bars with lengths and heights that is proportional to the values which
they represent. It can be created using the bar() method.
Histogram

A histogram is basically used to represent data in the form of some groups. It is a


type of bar plot where the X-axis represents the bin ranges while the Y-axis gives
information about frequency. The hist() function is used to compute and create a
histogram. In histogram, if we pass categorical data then it will automatically
compute the frequency of that data i.e. how often each value occurred.

Seaborn

Seaborn is a high-level interface built on top of the Matplotlib. It provides


beautiful design styles and color palettes to make more attractive graphs.
To install seaborn type the below command in the terminal.
pip install seaborn
Seaborn is built on the top of Matplotlib, therefore it can be used with the
Matplotlib as well. Using both Matplotlib and Seaborn together is a very simple
process. We just have to invoke the Seaborn Plotting function as normal, and then
we can use Matplotlib’s customization function.

Line Plot

Line Plot in Seaborn plotted using the lineplot() method. In this, we can pass
only the data argument also.

Bokeh
Let’s move on to the third library of our list. Bokeh is mainly famous for its
interactive charts visualization. Bokeh renders its plots using HTML and
JavaScript that uses modern web browsers for presenting elegant, concise
construction of novel graphics with high-level interactivity.
To install this type the below command in the terminal.
pip install bokeh

Scatter Plot
Scatter Plot in Bokeh can be plotted using the scatter() method of the plotting
module. Here pass the x and y coordinates respectively.

Line Chart

A line plot can be created using the line() method of the plotting module.

Plotly

This is the last library of our list and you might be wondering why plotly. Here’s
why –
 Plotly has hover tool capabilities that allow us to detect any outliers or
anomalies in numerous data points.
 It allows more customization.
 It makes the graph visually more attractive.
To install it type the below command in the terminal.
pip install plotly
Scatter Plot

Scatter plot in Plotly can be created using the scatter() method of plotly.express.
Like Seaborn, an extra data argument is also required here.

Line Chart

Line plot in Plotly is much accessible and illustrious annexation to plotly which
manage a variety of types of data and assemble easy-to-style statistic.
With px.line each data position is represented as a vertex

Bar Chart

Bar Chart in Plotly can be created using the bar() method of plotly.express class.
DATA VISUALIZATION USING R
R is a language that is designed for statistical computing, graphical data analysis,
and scientific research. It is usually preferred for data visualization as it offers
flexibility and minimum required coding through its packages.

Bar Plot

There are two types of bar plots- horizontal and vertical which represent data points
as horizontal or vertical bars of certain lengths proportional to the value of the data
item. They are generally used for continuous and categorical variable plotting. By
setting the horiz parameter to true and false, we can get horizontal and vertical bar
plots respectively.

Histogram

A histogram is like a bar chart as it uses bars of varying height to represent data
distribution. However, in a histogram values are grouped into consecutive intervals
called bins. In a Histogram, continuous values are grouped and displayed in these
bins whose size can be varied.

Box Plot

The statistical summary of the given data is presented graphically using a boxplot.
A boxplot depicts information like the minimum and maximum data point, the
median value, first and third quartile, and interquartile range.

Scatter Plot

A scatter plot is composed of many points on a Cartesian plane. Each point denotes
the value taken by two parameters and helps us easily identify the relationship
between them.
Heat Map

Heatmap is defined as a graphical representation of data using colors to visualize


the value of the matrix. heatmap() function is used to plot heatmap.

Advantages of Data Visualization in R:

 R offers a broad collection of visualization libraries along with


extensive online guidance on their usage.
 R also offers data visualization in the form of 3D models and
multipanel charts.
 Through R, we can easily customize our data visualization by changing
axes, fonts, legends, annotations, and labels.
Disadvantages of Data Visualization in R:
 R is only preferred for data visualization when done on an individual
standalone server.
 Data visualization using R is slow for large amounts of data as
compared to other counterparts.

Application Areas:

 Presenting analytical conclusions of the data to the non-analysts


departments of your company.
 Health monitoring devices use data visualization to track any anomaly
in blood pressure, cholesterol and others.
 To discover repeating patterns and trends in consumer and marketing
data.
 Meteorologists use data visualization for assessing prevalent weather
changes throughout the world.
 Real-time maps and geo-positioning systems use visualization for
traffic monitoring and estimating travel time.

DATA VISUALIZATION USING TABLEAU


Tableau is a Data visualization tool widely used for Business Intelligence but not
limited to it. It helps create interactive graphs and charts in dashboards and
worksheets to gain business insights. And all of this is made possible with
gestures as simple as drag and drop!

Visualization in Tableau is possible through dragging and dropping


Measures and Dimensions onto these different Shelves.
Rows and Columns : Represent the x and y-axis of your graphs/charts.
Filter: Filters help you view a strained version of your data. For example,
instead of seeing the combined Sales of all the Categories, you can look at a
specific one, such as just Furniture.
Pages: Pages work on the same principle as Filters, with the difference that
you can actually see the changes as you shift between the Paged
values. Remember that Rosling chart? You can easily make one of your own
using Pages.
Marks: The Marks property is used to control the mark types of your data.
You may choose to represent your data using different shapes, sizes or text.

You might also like