Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Parcial Lleno Mineria

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Nombre: José Germán Matrícula: 18-0200

Iberoamericana University
School of Engineering in Information
Technology and Communication
TI1144 - Data Mining
1st Quiz

1. Explain the importance of statistics in business.


Statistics is the science of collecting, organizing, analyzing, interpreting, and
presenting data. In business, statistics is quite important because it allows managers to
make fact-based decisions instead of “gut feel” type decisions. In addition, if various
claims are made about a product or service, the use of statistics can prove or disprove
claims which can prevent legal issues and can allow for true and ethical decisions about a
hypothesis.

2. Explain the difference between data and information.


Data typically refers to the raw data, which is an important ingredient in producing
useful information. Information is what managers can use to make appropriate decisions.

03. What is statistical thinking? Why is it an important managerial skill?


Statistical thinking is a philosophy of learning and the action for improvement based
on three principles:
a. All work occurs in a system of interconnected processes.
b. Variation exists in all processes.
c. Understanding and reducing variation are keys to success.
Statistical thinking is an important management skill because managers need to be
able to understand the difference between common and special cause of variation in the
business processes that they are responsible for. This type of mindset allow managers to
making decisions the help to reduce variation and to deliver more consistent performance
over a long term time horizon.
04. What is the difference between a population and a sample?
A population consists of all items of interest for a particular decision or investigation,
such as all the residents of a county or all the students at a university. A sample is a subset
of a population, such as the residents in a neighborhood or the students in a business
statistics class.
05. What types of chart would be best for displaying the data in each of the following
data sets on the Companion Website? If several charts are appropriate, state this,
but justify your best choice.
a. Mortgage Rates
b. Census Education Data
c. Consumer Transportation Survey
d. MBA Student Survey
e. Vacation Survey
f. Washington, DC, Weather

The types of charts best for the following would be:


a. Mortgage rates – stock chart if the high, low, and close rates are of interest, line
chart to show trends over time, and area chart for rates over time.
b. Census Education Data – scatter diagram to show relationships between ad,
gender, race, or marital status to type of degree held, pie charts for proportions of
degrees, age, and marital status as part of the population, radar chart for multiple
dimensions of demographics and a surface chart to plot 3 various dimensions of
demographics, and doughnut charts to include more than one set of data.
c. Consumer transportation survey – pie charts to show proportions of demographics
and types of vehicles among consumers, and scatter diagrams to show cause and
effect between hours/week in car and miles driven.
d. MBA student survey – scatter diagram to plot relationship between age and nights
out/week or undergraduate concentration and nights out/week. Pie charts for
proportion of international students. A bubble chart for major vs. nights out/week
vs. study hours/week.
e. Vacation survey – scatter chart on vacations per year vs. marital status, bar or
column charts on number of vacations/years, pie chart on proportions for gender,
age, or marital status.
f. Washington DC average temperatures – line chart for temperatures over time, area
chart for temperatures over time. Scatter chart for month vs. temperature.
06. Construct a column chart for the data in the Excel file State Unemployment Rates
to allow comparison of the June rate with the historical highs and lows. Would any
other charts be better to visually convey this information? Why or why not?

Answer:
Column Chart

A line graph demonstrates all 3 scenarios


Line Graph:

A bar chart would be another representation, but is too complicated.


07. The Excel file Internet Usage provides data about users of the Internet.
a. Construct appropriate charts that will allow you to compare any differences due
to age or educational attainment.
b. What conclusions can you draw from these charts?
Answer:
a.

b. The possible conclusions would be that more than half of those 45 and under have
Internet accounts and those with a Bachelor's degree or higher have the highest
proportion of Internet users.
08. The Excel file Freshman College Data provides data from different colleges and
branch campuses within one university over four years.
a. Construct appropriate charts that allow you to contrast the differences among the
colleges and branch campuses.
b. Write a report to the academic vice president explaining the information.

Answer:
a.
b. Dear President Smith,
Attached you will find summaries that compare the branch campus scores of students
with that of the main campus students.
The ACT and SAT averages are lower for the branch campus students.
The average ACT scores are 3.75 points less, or 16% lower.
The SAT scores average about 150 points less, or 14% lower.
High school GPA's are comparable, but the percentage of students in the top 10% and
top 20% is dramatically lower for the branch campuses.
The 1st year retention rate is also much lower for the branch campus students, 63%
compared to 75% for the main campus, approximately.

Sincerely,

José Germán, University statistician.

09. Construct whatever charts you deem appropriate to convey comparative


information on the two categories of televisions in the Excel file Hi‐Definition
Televisions. What conclusions can you draw from these?
Answer

Comparison of Big Screen/Projection with LCD/Plasma TV's Price


Bigscreen/Projection $ 2,124.00
LCD/Plasma $ 2,472.00

Comparison of Big Screen/Projection with LCD/Plasma TV's Screen Size


Bigscreen/Projection 52.76
LCD/Plasma 29.08

Comparison of Big Screen/Projection with LCD/Plasma TV's Sound Quality


Bigscreen/Projection 4.71
LCD/Plasma 5
Comparison of Big Screen/Projection with LCD/Plasma TV's Ease of Use
Bigscreen/Projection 3.4
LCD/Plasma 3

Comparison of Big Screen/Projection with LCD/Plasma TV's Overall Score


Bigscreen/Projection 50.76
LCD/Plasma 52
a) We need to construct a double-stem-and-leaf plot for the life in seconds of 50 fruit
flies subject to a new spray, using the stems 0*, 0., 1*, 2*, 2 and 3*, where the
symbols * and . are associated, respectively, with leaves 0-4 and 5-9.
Therefore, the required double-stem-and-leaf plot is
Steam Leaf Frequency
0* 34 2
0. 56667777777889999 17
1* 0000001223333344 16
1. 5555788899 10
2* 034 3
2. 7 1
3* 2 1

b) Relative frequency distribution:


Class Interval Class Midpoint Frequency Relative Frequency
0-4 2 2 0.04
5-9 7 17 0.34
10-14 12 16 0.32
15-19 17 10 0.20
20-24 22 3 0.06
25-29 27 1 0.02
30-34 32 1 0.02

c) A relative frequency histogram


d) Because the number of observation (n = 50) is the even number, the sample median is
equal

We can obtain the 25th and 26th value of the orders arrangement from the double-stem-
and-leaf plot:
Steam Leaf Frequency
0* 34 2
0. 56667777777889999 17
1* 0000001223333344 16
1. 5555788899 10
2* 034 3
2. 7 1
3* 2 1

So, the 25th and 26th values are x25=10 and x26= 11. Hence the sample median is equal

You might also like