Assessment 1&2 in Data Science

DOCENA, FRANCIS C.
BSIT 3D
MODULE 1 Answer:
Assessment 1. Introduction to Data Science/ Evolution of Data Science
1. Identify at least five skill areas of a data scientist
Teamwork
Python
Advanced Statistics
Data Visualization
Business Savvy
2. Identify the seven main categories of data.
Nominal
Ordinal
Binary
Count
Time
Interval
Useless
3. Identify the year when the significant events in the evolution of data science took place.
Event Year
Leo Breiman published the paper (Statistical Modeling: the 2001
two cultures) His distinction between a statistical focus on
models that explain the data versus an algorithmic focus on
models that can actually predict the role of a data scientist has
become very broad
The term “data science” came to prominence in discussions of 1990

the need for statisticians to join with computer scientists to
bring mathematical rigor to computational analysis of large
data sets.
C.F. Jeff Wu’s public lecture (Statistics = Data Science) 1997
William S. Cleveland published an action plan for creating a 2001

university department.
Papers by Alan Turing on the topics of computable numbers 1936, 1950

and artificial intelligence were published ( 2 different years)
Assessment 2. Introduction to Data Science (2)
1. List down major differences between Supervised and Unsupervised Machine Learning
Supervised Machine Learning Unsupervised Machine Learning
-The goal of supervised learning is to -The goal of unsupervised learning is
train the model so that it can predict the to find the hidden patterns and useful
output when it is given new data. insights from the unknown dataset
-Supervised learning model takes direct -Unsupervised learning model does not
feedback to check if it is predicting take any feedback.

correct output or not. - Unsupervised learning model finds
the hidden patterns in data.

-Supervised learning model predicts the
output.
In supervised learning, input data is
provided to the model along with the
output.
2. What are the drawbacks of having too much information?
Information overload can lead to many disadvantages such as it can
cause our brain to become less productive, easily get tired and
distracted. There are several ways a student or a researcher can do
to manage information and make a better use of internet resources in
order to avoid information overload.
Module Assessment
1. Identify and discuss the facets of data in Data Science?

- Data science is focused on making sense of complex datasets and in building predictive
models from those data. There are many facets of data science, including;
 Cleaning, filtering, reorganizing, augmenting, and aggregating data
 Visualizing data
 Data analysis, statistics, and modeling
2. Among the data scientists, who do you think has the greatest contribution in the
existence of data science? Support your answer with a brief explanation.
- Over the past few years, there’s been a lot of hype in the media about “data science”.
Geoffrey Hinton has the greatest contribution in the existence of data science because
Geoffrey Hilton is called the Godfather of Deep Learning in the field of data science. Mr.
Hinton is best known for his work on neural networks and artificial intelligence. A Ph.D.
in artificial intelligence, he is accredited for his exemplary work on neural nets.
3. What is data science, and what are the skills needed for you to be data scientist.
- A Data Scientist is responsible for compiling and analyzing large data sets — both
structured and unstructured. These roles combine math, statistics, and computer
science skills to make sense of big data and then use the information to create business
solutions.
4. Enumerate the 4 V’s in big data, and expound why data science in essential?
Velocity
- is accelerating. Streams of tweets, Facebook posts, financial data, and other data are
being generated at an ever-increasing rate by more individuals. While velocity increases
data volume (sometimes enormously), it also has the potential to shorten the data
retention or application window.
- Variety is much greater than ever before. As processing power has increased, models
that formerly relied on only a few variables now have access to hundreds of them.
- Volume You may have heard on more than one occasion that Big Data is nothing more
than business intelligence, but in a very large format. More data, however, does not
necessarily mean it is Big Data.Obviously, the Big Data, needs a certain amount of data,
but having a huge amount of data, does not necessarily mean that you are working on
Big Data.
-
- Varacity This V will refer to both data quality and availability. When it comes to
traditional business analytics, the source of the data is going to be much smaller in both
quantity and variety. However, the organization will have more control over them, and
their veracity will be greater.

Assessment 1&2 in Data Science

Uploaded by

Copyright:

Available Formats

Assessment 1&2 in Data Science

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessment 1&2 in Data Science

Uploaded by

Copyright:

Available Formats

DOCENA, FRANCIS C.

Assessment 1. Introduction to Data Science/ Evolution of Data Science

1. Identify at least five skill areas of a data scientist

2. Identify the seven main categories of data.

Leo Breiman published the paper (Statistical Modeling: the 2001

two cultures) His distinction between a statistical focus on

models that explain the data versus an algorithmic focus on

The term “data science” came to prominence in discussions of 1990

C.F. Jeff Wu’s public lecture (Statistics = Data Science) 1997

William S. Cleveland published an action plan for creating a 2001

Papers by Alan Turing on the topics of computable numbers 1936, 1950

Assessment 2. Introduction to Data Science (2)

Supervised Machine Learning Unsupervised Machine Learning

-The goal of supervised learning is to -The goal of unsupervised learning is

feedback to check if it is predicting take any feedback.

the hidden patterns in data.

In supervised learning, input data is

provided to the model along with the

2. What are the drawbacks of having too much information?

Information overload can lead to many disadvantages such as it can

distracted. There are several ways a student or a researcher can do

to manage information and make a better use of internet resources in

order to avoid information overload.

1. Identify and discuss the facets of data in Data Science?

 Cleaning, filtering, reorganizing, augmenting, and aggregating data

 Data analysis, statistics, and modeling

in artificial intelligence, he is accredited for his exemplary work on neural nets.

You might also like