Module For Data Science
Module For Data Science
MODULE 1: MULTIMEDIA
MODULE 1
Data Science
Learning Competencies
2.1 Identify the Organizational Standards for assessing “data-driven” maturity.
2.2 Understanding the relation of Data Science to Big Data.
2.3 Familiarize with the Data Science processes and how it is being conducted.
2.4 Recognize how Data Science Platform helps businesses to turn insights faster
and more efficient.
G11: 16
MODULE 2:
MODULE 1: MULTIMEDIA
INTRODUCTION
Data is the universal thread in today’s increasingly technologically advanced world and is
being transformed into valuable knowledge and powerful capabilities by avantgarde businesses.
While these leading businesses increasingly rely on data to make decisions, others struggle to extract
value from it and fail to realize this data-driven ambition. These organizations are facing an increased
need to adapt in order to stay ahead of the competition. But even so, they frequently lack the
information to shift their initial, intuition-based method of operating.
Furthermore, it is said that humans are sometimes irrational. This inhibits decision-making
and results in inferior decisions. With this, it is asserted that computers, due to their information
processing capabilities, potentially play an essential role as decision support systems. The invention of
the computer sparked interest in how computers may aid business processes by quickly converting
information into economic value.
The concept and desire for a data-driven organization did not emerge from thin air. For a long
time, organizations have been conducting data-driven operations in order to make more factual and
economical decisions. Perhaps, it brings many benefits to businesses to keep up with the rapidly
changing digitized world. Thus, form challenges in conducting and processing data.
In this session, we will go further into Data Science and offer answers to the following
questions:
What are the standards in assessing organizational maturity?
How did Data Scientists use Big Data?
How are Data Science procedures carried out?
How do Data Science Platforms assist businesses in becoming more efficient?
DATA INPUT
In 2007, in the book "Competing on Analytics: The New Science of Winning," Thomas
Davenport and Jeanne Harris established the Five Stages of Analytics Maturity. While in 2010, in
their book "Analytics at Work: Smarter Decisions, Better Results," Tom and Jeanne were joined by
Robert Morison in presenting the DELTA Model. Tom and Jeanne revised both frameworks in their
2017 version of "Competing on Analytics." The DELTA+ Model was created by combining two
additional components to the DELTA model.
The DELTA+ Model and the Five Stages of Analytics Maturity have become industry-
standard frameworks for measuring corporate analytics maturity. Let us highlight the essential
features of these frameworks to have ideas on the organization's level of analytics maturity.
The DELTA+ Model comprises seven parts that must develop and mature for businesses to be
successful with their Analytics projects.
G11: 17
MODULE 2:
MODULE 1: MULTIMEDIA
The + in the DELTA+ model was given by the continuing development of big data and the
adoption of new analytics methods such as machine learning:
T stands for the technology that will be used to enable analytics throughout the company.
A for the many analytical approaches available, which range from simple descriptive statistics
to machine learning.
G11: 18
MODULE 2:
MODULE 1: MULTIMEDIA
Big data refers to the massive amount of data generated on a daily basis. It is data that is
incredibly big and complicated, and it is generated at an incredible rate. The process of gathering,
storing, and analyzing data in order to derive insights has been practiced for a long time. However, the
phrase "Big Data" only appeared in the late 2000’s.
The most meaningful description is that "big data" occurs when the amount of the data itself
becomes part of the problem. We're talking about data concerns spanning from gigabytes to petabytes.
Traditional data-working approaches eventually run out of steam. The significance of data or big data
rests in what you can do with it rather than the amount of data you have.
For a long time, industries such as manufacturing, retail, oil corporations, telecommunications
companies, financial services, health care, and other data-centric sectors have possessed massive
databases. And, as storage capacity grows, today's "big" is almost definitely tomorrow's "medium"
and next week's "little."
Let's look at some big data use cases to see how companies are using big data more than ever
before. Each use case demonstrates how businesses are using data insights to enhance decision-
making, penetrate new markets, and provide better consumer experiences. For the time being, the use
cases that involve health care as an example will be essential and timely.
Big data is being used by healthcare companies for anything from boosting profitability to
saving lives. Massive volumes of data are collected by healthcare businesses, hospitals, and
researchers. However, none of this information is helpful on its own. It becomes critical when the data
is examined to identify trends and dangers in patterns and to develop prediction models.
Genomic research
Using big data, researchers can uncover disease genes and biomarkers to assist
patients in identifying potential health concerns. The findings may potentially enable
healthcare organizations to develop tailored therapies.
Patient experience and outcomes
Healthcare companies strive to deliver better treatment and higher quality care
while keeping costs down. Big data enables them to improve the patient experience in the
most cost-effective way possible. Healthcare companies may use big data to generate a
360-degree perspective of patient care as the patient travels through various therapies and
departments.
Claims fraud
Each healthcare claim may include hundreds of related reports in a variety of
forms. This makes verifying the integrity of insurance incentive schemes and identifying
trends that suggest fraudulent conduct exceedingly challenging. Big data assists
healthcare companies in detecting possible fraud by highlighting specific patterns for
further investigation.
Healthcare billing analytics
Big data has the potential to boost the bottom line. Organizations can uncover
missed revenue opportunities and areas where payment cash flows may be improved by
examining billing and claims data. This use case necessitates integrating billing data from
G11: 19
MODULE 2:
MODULE 1: MULTIMEDIA
multiple payers, evaluating a massive volume of that data, and then detecting activity
patterns in the billing data.
Big Data can benefit every industry and every organization. But it has no value unless you know
how to put big data at work.
In Lesson 1, data scientists are described and briefly discuss what they do daily. Data Science
is a multidimensional field that uses scientific methods, tools, and algorithms to extract knowledge
and insights from structured and unstructured data. But, in truth, he does far more than merely
analyzing data. His work is all about data, but it also incorporates a variety of other data-driven
procedures.
Following a meeting with the marketing team, you decide to concentrate on the issue: "How
can we identify potential consumers who are more likely to buy our product?" The next stage is for
you to determine what data you have available to answer the following questions.
G11: 20
MODULE 2:
MODULE 1: MULTIMEDIA
The majority of the customer-related data may be found in the company's Customer
Relationship Management (CRM) software, maintained by the sales team. SQL databases,
which include many tables, serve as the backbone of CRM software. Going through the SQL
database, discover that the system maintains extensive identification, contact, and
demographic information about clients (that they provided to the firm), as well as their entire
sales process.
If the current data is insufficient, create plans to acquire more data. Display or distribute a
feedback form to your visitors and customers to solicit feedback. That is a significant amount of
engineering work that will take time and effort. The information gathered is actually 'raw data,' which
contains mistakes and missing values. So, before examining the data, you must first clean it.
Once the missing and incorrect values in your data have been identified, it is ready for
analysis. Remember that having the wrong insights from data is worse than not having any insights at
all.
Answering these questions, however, will only provide clues and hypotheses. Data modeling
is a primary method of representing data in a suitable equation that the computer can comprehend.
Based on the model, make predictions. Try on multiple models to get the best fit.
G11: 21
MODULE 2:
MODULE 1: MULTIMEDIA
Graph or chart the data for presentation using R, Python, Tableau, and Excel
programs.
To fit the findings, use the term "storytelling."
Respond to the different follow-up questions.
Data may be presented in a variety of forms, including reports and web pages.
Answers will always elicit new questions, and the cycle will repeat itself.
The data science platform delivers new capabilities. Many businesses recognized that data
science activity was inefficient, insecure, and impossible to expand without an integrated platform.
This discovery prompted the creation of data science platforms. These platforms serve as software
centers for all data science activity. According to https://solutionsreview.com/, the Best Data Science
Platforms of 2021 are Altair, Alteryx, Anaconda, Databrix, Dataiku, DataRobot, Domino Data Lab,
Google, H20, IBM, Knime, MatLab, Rapidminer, SAS, and Tibco.
A good platform mitigates many problems associated with deploying data science and enables
organizations to convert their data into insights more quickly and efficiently. Data scientists may work
in a collaborative environment using their favorite open-source tools on a centralized machine
learning platform. All of their work is synchronized via a version control system.
Expert data scientists, citizen data scientists, data engineers, and machine learning engineers
or experts all utilize data science platforms to collaborate. A data science platform, for example, may
allow data scientists to distribute models as APIs, making it simple to integrate them into other
applications. Without having to wait for IT, data scientists may access tools, data, and infrastructure.
The market's need for data science platforms has skyrocketed.
Select a project-based UI that promotes cooperation. The platform should enable users to
collaborate on a model from conception to completion. It should provide self-service access
to data and resources to all team members.
Prioritized Integrity and adaptability. Ascertain that the platform supports the most recent
open-source technologies, popular version control providers such as GitHub, GitLab, and
Bitbucket, and tight interaction with other resources.
G11: 22
MODULE 2:
MODULE 1: MULTIMEDIA
Include skills that are of enterprise standard. As your team develops, be sure the platform
can scale with it. The platform should be highly available, have strict access restrictions, and
be able to handle a large number of concurrent users.
Increase the level of self-service in data science. Look for a platform that relieves IT and
engineering of the load by allowing data scientists to spin up environments immediately, track
all of their work, and push models into production.
G11: 23
MODULE 2:
MODULE 1: MULTIMEDIA
You will write a blog entry for this activity. Blogging is the term used to
describe writing, photography, and other forms of media that are self-published online.
Blogging began as a tool for individuals to write diary-style entries, but it has since
been integrated into the websites of many businesses. If you're unfamiliar with blogs,
take a look at the following:
Careathers, Liz, (2021, June 29). How to Write a Blog Post in 2021: The
Ultimate Guide https://smartblogger.com/how-to-write-a-blog-post/
The content of your blog will be recorded as your Written Work (15 points). It can cover any
topics taught in the lesson. You will need to conduct an additional study on your chosen topics,
consider applicable instances from everyday life, and consider unique case studies that your readers
may not be aware of. Look for research, examples, and case studies to which you can link to illustrate
the data science. Hence, the way you convey your ideas into creativity through graphics, illustrations,
etc. will be graded as your Performance Task, as well as navigation, and team collaboration. This will
be your second entry in your group’s virtual expo.
Please use the rubrics provided below as a guide on how you will be graded.
VIRTUAL EXPO RUBRIC
Exemplary (15) Proficient (12) Partially Incomplete (5)
Proficient (9)
Content The content is rich, Content is There is adequate There is
concise, and complete and detail. Some insufficient
straightforward. includes relevant extraneous detail, or detail is
The content is detail. information and irrelevant and
relevant to the minor gaps are extraneous.
discussed topics and included.
thoroughly answers
the questions.
Creativity/Visual The expo is The expo is visually The main theme Lacks visual
visually sensible. The use of is still clarity. The
effective. graphics/images/ discernible, but graphics/images/
The use of photographs are use of photographs are
graphics/images/ included and graphics/images/ distracting
photographs appropriate. photographs are from the content of
seamlessly relate well included but are the
to the content. used randomly. expo.
Navigation The document is fully Hyperlinks are Hyperlinks are good There are few
hyperlinked. organized but lacks links. Some links are
The index is into logical groups. organization “broken”.
well organized and Not all
easy to possible features
navigate. have been
employed.
Team The group establishes The group establishes The group establishes The group does not
Collaboration and clear and formal roles informal roles for establish roles for
documents clear and for each each
formal each member and member. The member and/or the
roles for each distributes the workload workload is
member and workload could be distributed unequally
distributes the equally. more distributed.
workload equally.
equally.
G11: 24
MODULE 2:
MODULE 1: MULTIMEDIA
REFERENCES
Curry, E. (2016).The Big Data Value Chain: Definitions, Concepts, and Theoretical
Approaches. Springer International Publishing.
Porter ME (1985). Competitive Advantage : Creating and Sustaining Superior
Performance. New York.
Rayport JF, Sviokla JJ (1995) Exploiting the Virtual Value Chain. Harv Bus Rev
Data Flair(n.d.). What is Data Science. https://data-flair.training/blogs/what-is-data-
science/
Oracle.(n.d.).What is Data Science?. https://www.oracle.com/data-science/what-is-
data-science/
MODULE CREATORS
ANSWER KEY
G11: 25