Unit 5 BDTT
Unit 5 BDTT
Unit 5 BDTT
1. Enterprise data science involves cloud computing, because laptops and on-
prem servers can’t process the huge datasets that are needed and end up
generating bottlenecks. Serverless data pipelines remove the headache of
managing infrastructure.
2. Data science for enterprise applies data insights and predictions to the
entire organization to provoke strategic, holistic change, instead of using
data analytics on a case-by-case basis.
3. Enterprise data science uses ML and cloud computing to drive
change across the organization. Data science that’s carried out in isolation
often results in abstruse models that don’t serve a practical business purpose.
When you have too much data coming from too many disparate sources and
you need to combine them without slowing down your analytics machines
When data is incomplete, imprecise, inaccurate, or varies too much between
datasets.
When you need to accelerate knowledge discovery across the organization
When you need to flatten the silos that obstruct departments from accessing
the data insights that power better decisions
Data pipeline automation helps enterprises to mine data in real time so that they can
prevent fraud. By spotting risks long in advance, business leaders can develop ways
to mitigate, prevent, and soften their impact to preserve the enterprise safely.
With serverless data pipelines, enterprises can explore the pain points and needs of their
target market, forecast demand, and apply data-driven predictions to create a better,
more relevant product.
With the help of data pipeline automation, enterprises can analyze their
organizational costs and expenditure to find the best places to make efficiencies and
reduce costs. These insights guide you to allocate resources in the right way so that
you can cut costs without undermining productivity or performance.
In biomedical research and health, advanced Data Science and big data analytics
techniques are used for
In the hospitality and food services industries, once again big data analytics is
used for
for analyzing large volumes of data at high speed during the underwriting
stage.
Insurance claims analysts now have access to algorithms that help identify
fraudulent behaviors.
Data Science teams can now train deep learning systems to identify contracts and
invoices from a stack of documents, as well as perform different types of
identification for the information.
Big data analytics has the potential to unlock great insights into data across social
media channels and platforms
Big data analytics make research results better, and helps organizations use
research more effectively by allowing them to identify specific test cases and user
settings.
Data cleansing:
In Data Science, the first step is data cleansing, which involves identifying and
cleaning up any incorrect or incomplete data sets.
Data cleansing is critical to identify errors and inconsistencies that can skew your
data analysis and lead to poor business decisions.
The most important thing about data cleansing is that it’s an ongoing process.
Business data is always changing, which means the data you have today might
not be correct tomorrow. T
he best data scientists know that data cleansing isn’t done just once; it’s an
ongoing process that starts with the very first data set you collect.
You can do this on an individual level or on a larger scale for your entire customer
base.
Prediction and forecasting helps you understand how your customers behave and
what they may do next.
You can use these insights to create better products, marketing campaigns, and
customer support.
Fraud detection:
Fraud detection is a highly specialized use of Data Science that relies on many
techniques to identify inconsistencies.
With fraud detection, you’re trying to find any transactions that are incorrect or
fraudulent.
It’s an important use case because it can significantly reduce the costs of business
operations.
Every business wants to grow, and this is a natural outcome of doing business.
Data Science can help you understand your potential customers and improve your
services.
It can also help you identify new opportunities and explore different areas you
can expand into.
Use Data Science to identify your target audience and their needs.
Then create products and services that serve those needs better than your
competitors can.
You can also use Data Science to identify new markets, explore new areas for
growth, and expand into new industries.
MACHINE LEARNING AND AI INFRASTRUCTURE
SOLUTIONS
Machine learning is currently being used effectively in enterprise software to
automate routine processes. Rather than starting with the data, the initial step is
defining a business issue to solve and then the exact process to enhance. Explore
the problems first by discussing essential questions with your specialists who
know the processes from A to Z.
Does the business problem you are trying to solve look like a process?
How are the decisions made, and what kinds of data are used to make
those decisions?
the processes more suitable for AI implementation have the following features:
1. mostly routine and standard
5. could be improved if there was more time for analysis before decision-
making
6. could be improved if all relevant data sources could be taken into account
ou have to discuss four main questions to identify the right process to enhance
with AI.
Could the process be automated?
Should the process be automated?
Should you use AI to automate the process?
Can you use AI to automate the process?
Problem-first approach
According to Gartner’s hype cycle, different AI and ML technologies are
currently trending. However, most people should understand that machine
learning cannot be a solution to every problem. Moreover, you cannot solve the
problem simply by feeding training data to machine learning algorithms and
expecting it to magically deliver perfect business results.
First, analyze the steps in your business processes that require any prediction or
judgment. Then find out where any (even slight) improvement in the accuracy of
those predictions would significantly benefit the business results.
The number of enterprises with more than five machine learning use cases
increased by 74% over a year. The top priorities have been customer service,
safety, and process automation.
For example, AI image recognition is used for quality control, whereas ML
models detecting objects in video identify employees under-equipped for specific
processes. Furthermore, by developing natural language processing (NLP) in
customer service, enterprises have improved client experience and enhanced
retention.
Enterprises have processes for employees only and processes for customers.
Thus, generally, they require both employee-facing and customer-facing
apps. This means machine learning and data science teams should pay
attention to both and consider their integration.
The bigger the enterprise is, the more complex infrastructure it has.
Infrastructure peculiarities could bring issues with data collection and
storage or machine learning model implementation. For example, necessary
data produced on the production floor might be needed for ML models at
customer service based elsewhere. As cloud services are not always a
solution for enterprises (due to security or processes’ structure), transferring
data could become a struggle for data scientists when it is siloed, as in the
example.
Matplotlib
Scatter Plot
Scatter plots are used to observe relationships between variables and uses dots to
represent the relationship between them. The scatter() method in the matplotlib
library is used to draw a scatter plot.
Line Chart
Bar Chart
A bar plot or bar chart is a graph that represents the category of data with
rectangular bars with lengths and heights that is proportional to the values which
they represent. It can be created using the bar() method.
Histogram
Seaborn
Line Plot
Line Plot in Seaborn plotted using the lineplot() method. In this, we can pass
only the data argument also.
Bokeh
Let’s move on to the third library of our list. Bokeh is mainly famous for its
interactive charts visualization. Bokeh renders its plots using HTML and
JavaScript that uses modern web browsers for presenting elegant, concise
construction of novel graphics with high-level interactivity.
To install this type the below command in the terminal.
pip install bokeh
Scatter Plot
Scatter Plot in Bokeh can be plotted using the scatter() method of the plotting
module. Here pass the x and y coordinates respectively.
Line Chart
A line plot can be created using the line() method of the plotting module.
Plotly
This is the last library of our list and you might be wondering why plotly. Here’s
why –
Plotly has hover tool capabilities that allow us to detect any outliers or
anomalies in numerous data points.
It allows more customization.
It makes the graph visually more attractive.
To install it type the below command in the terminal.
pip install plotly
Scatter Plot
Scatter plot in Plotly can be created using the scatter() method of plotly.express.
Like Seaborn, an extra data argument is also required here.
Line Chart
Line plot in Plotly is much accessible and illustrious annexation to plotly which
manage a variety of types of data and assemble easy-to-style statistic.
With px.line each data position is represented as a vertex
Bar Chart
Bar Chart in Plotly can be created using the bar() method of plotly.express class.
DATA VISUALIZATION USING R
R is a language that is designed for statistical computing, graphical data analysis,
and scientific research. It is usually preferred for data visualization as it offers
flexibility and minimum required coding through its packages.
Bar Plot
There are two types of bar plots- horizontal and vertical which represent data points
as horizontal or vertical bars of certain lengths proportional to the value of the data
item. They are generally used for continuous and categorical variable plotting. By
setting the horiz parameter to true and false, we can get horizontal and vertical bar
plots respectively.
Histogram
A histogram is like a bar chart as it uses bars of varying height to represent data
distribution. However, in a histogram values are grouped into consecutive intervals
called bins. In a Histogram, continuous values are grouped and displayed in these
bins whose size can be varied.
Box Plot
The statistical summary of the given data is presented graphically using a boxplot.
A boxplot depicts information like the minimum and maximum data point, the
median value, first and third quartile, and interquartile range.
Scatter Plot
A scatter plot is composed of many points on a Cartesian plane. Each point denotes
the value taken by two parameters and helps us easily identify the relationship
between them.
Heat Map
Application Areas: