Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views19 pages

Icte Lesson

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 19

WEEK 1

BASIC STATISTICAL CONCEPTS


What is Statistics?
Statistics has become an integral part of our everyday life. If you read any newspaper or watch television, use the
Internet, or other types of media , you will see statistical information. Such information has become highly influential in our
lives. With this information, we make a decision about the correctness of a statement, claim, or "fact." Statistical methods
can help us make the "best educated guess.“

Statistical Data
- Facts about, or summaries of data (e.g. sales statistics, employment statistics, etc.)

Statistical Method
- Statistics is a science that deals with the method of collection, organization, presentation, analysis and
interpretation of quantitative data.

Stages in Statistical Investigation

Classification of Statistics

Descriptive Statistics - involves organizing, summarizing, and displaying data.


- Collect data – (ex) Survey

- Present data – (ex) Tables and graphs

- Characterize data – (ex) Sample mean

Inferential Statistics - uses sample data to draw conclusions about a population.


- Estimation – (ex) Estimate the population mean weight using the sample mean weight

- Hypothesis testing – (ex) Test the claim that the population mean weight is 120 pounds

Important Terms
 Population
- The collection of all responses, measurements, or counts that are of interest.
- The population must be defined explicitly before the study begins and the research hypothesis/questions specify
the population being studied
- Defined by certain characteristics: - Inclusion criteria - Exclusion criteria

 Sample - A portion or subset of the population.


 Sampling Error- reflects the fact that the result we get from our sample is not going to be exactly equal to the
result we would have got if we have been able to measure the entire population. And each possible sample we
could take give a different result.
 Parameter - A number that describes a population characteristic.
Example: Average gross income of all people in the Philippines in 2019
 Statistic - A number that describes a sample characteristic
2019 gross income of people in a sample of 3 regions

Applications of Statistics

DATA AND SAMPLING

Types of Data

Data Sources
Sampling Techniques
- The process of selecting a portion (or subset) of the larger population and study that portion (sample) to gain
information about the population is called sampling

- When gathering data, you will not always have the luxury of collecting all available data. for example, economists
cannot measure the entire unemployment of the population, so they must take a random sample instead.
Likewise, in a manufacturing facility, quality control managers do not have the resources to test every product that
comes off the line; it is simply not feasible. instead, they take samples at various points during the production
process to test the quality of the products the firm produces.
Sampling
There are a number of methods employed in sampling data. It is important that the sampling method fits the application.

Random Sampling- each member of a population initially has an equal chance of being selected for the
sample
(1) Simple Random Sampling
- Every possible sample of a given size has an equal chance of being selected

- The sample can be obtained using


✓ Lottery Method
✓ a table of random numbers
✓ computer random number generator

e.g. A guidance counselor uses a computer to generate 50 random numbers and then picks students whose names
correspond to the numbers.

(2) Stratified Sampling


- Divide population into subgroups (called strata) according to some common characteristic
- Take a proportionate number of sample from each subgroup using simple random sampling
- Combine samples from subgroups into one

e.g. We could stratify (group) the college population by department and then choose a proportionate simple random
sample from each stratum (each department) to get a stratified random sample. Those samples picked from the SABH
department, picked from the SEAITE department and so on represent the members who make up the stratified sample

(3) Systematic Sampling


- Decide on sample size:
- Randomly select a starting point
- Select every kth individual thereafter

e.g. Suppose you have to do a phone survey. Your phone book contains 20,000 residence listings. You must choose 400
names for the sample. Number the population 1 - 20,000 and then use a simple random sample to pick a number that
represents the first name of the sample. Then choose every 50th name thereafter until you have a total of 400 names (you
might have to go back to your phone list)

(4) Cluster Sampling


- Divide population into several “clusters,” each representative of the population
- Select a simple random sample of clusters
- The sample consists of all members from selected cluster/s
-
e.g. A pollster interviews all human resource personnel in five different high tech companies

Non-Random Sampling

(1) Convenience Sampling


- Choose readily available members of the population for your sample
-
e.g. A computer software store conducts a marketing study by interviewing potential customers who happen to be in the
store browsing through the available software

VARIABLES AND MEASUREMENT SCALES


Variable
- a characteristic or an attribute that can assume different values.

Dichotomous Variable
- A variable that can have only two values

Qualities of Variables
 Exhaustive – Should include all possible answerable responses
 Mutually exclusive – No respondent should be able to have two attributes simultaneously
e.g. Employed vs. Unemployed - it is possible to be both if looking for a second job while employed

Types of Variables

(1) Qualitative Variables


- Variables that do not assume numeric values
- Variable whose observations vary in kind but not in degree
- e.g. Sex, Religion, Marital status

(2) Quantitative Variables


- Variables which assume numeric values
- Variable whose observations vary in magnitude
- e.g. Age, No. of children, Income

 (a)Discrete Variable
- Quantitative variables whose observations can assume only a countable number of values
- Values are obtained by counting
- e.g. No. of children in the family, No. of COVID-19 cases in the Philippines, No. of dates in the past month

 (b)Continuous Variable
- Quantitative variables whose observations can assume any one of the countless number of values.
- Values are obtained by measuring
- e.g. Height, Weight, Time

 Independent Variable - Cause or determine or influence the dependent variable(s)


 Dependent Variable - Presumed outcome of the influence of the independent variable(s)
Direct relationship between Independent and Dependent variables

Intervening Variable
- Sometimes referred to as test or control variables.
- Used to test whether the observed relations between the independent and dependent variables are spurious
- Serve either to increase or decrease the effect the independent variable has on the dependent variable

Scales of Measurement
Measurement - refers to the procedure of attributing qualities or quantities to specific characteristics of objects, persons
or events. Measurement is a key process in quantitative research and evaluation. If the measurement procedures are
inadequate its usefulness will be limited (Polgar & Thomas, 2008)

Levels of Measurement
❑ Nominal ❑ Ordinal ❑ Interval ❑ Ratio

(1) Nominal Level


- A measurement level in which numbers are used as labels or names rather than to reflect quantitative information
- (ex) Sex 1 = Male 2 = Female –
- (ex) Marital status , Religion, Type of car used

(2) Ordinal Level


- A measurement level in which values reflect only rank order
- (ex) Educational attainment 1 = Elementary 2 = High School 3 = College
- (ex) Service quality rating Opinion on an issue (Strongly agree, Agree, Neutral, Disagree, Strongly disagree

(3) Interval Level


- A measurement level with an arbitrary zero point in which numerically equal intervals at different locations on the
scale reflect the same quantitative difference
- (ex) Temperature in Celsius or Fahrenheit, IQ level, Standardized exam score
(4) Ratio Level
- The highest level of measurement that has all the characteristics of the interval scale plus a true zero point
- (ex) Income, No. of children, Weekly mobile data load spending

Properties held by each level of measurement

Levels of Measurement Guidelines


- It is usually best to gather data at highest level of measurement possible because one can perform more
mathematical operations and gain greater precision of measurement

❑ Interval and ratio variables-can be changed to become ordinal or nominal variables but not vice versa

WEEK 2
DATA COLLECTION
Data collection is the first stage in any statistical investigation. It is the process of obtaining (gathering) a set of related
measurements or counts to meet predetermined objectives.

Types of Data
The statistical data may be classified into two categories depending upon the sources utilized. These categories are
primary and secondary data.

(1) Primary Data – data which is collected by the investigator himself for the purpose of a specific inquiry or study.
These data are those collected for the first time (original/first-hand data) either through surveys or direct observation.

Examples:
 Data on banking and finance collected by the Central Bank
 Data on opinions and sentiments of people on current issues collected by the Pulse Asia

(2) Secondary Data – data which has been collected by others and used by an investigator for his own purposes. Such
data can be obtained from journals, official reports, government publications, publications of professional and research
organizations and so on.

Examples:
 Documented data used by a medical researcher for his research which was originally collected or published by
the Department of Health.
 Data published from Business journals

Remember!
Secondary data should be used with utmost care. So before using this data, the following three points should be
considered.

1. Whether the data are suitable for the purpose of investigation. This can be judged in the light of the nature and
scope of investigation.

2. If the data obtained is suitable for our purpose it should be assessed whether the data are adequate for the
purpose of investigation. This can be determined based on the time period and geographical area covered by the
data.

3. Whether the data are reliable. The data obtained should be checked for its accuracy. If the data are based on a
sample, one should see whether the sample is a representative of the population.

Data Collection Methods


Data collection, as previously discussed, is the first stage in statistical investigation. Before the actual data collection, four
important points should be considered.

1. The purpose of data collection (why we need to collect data),


2. The kind of data to be collected (what type of data to be collected),
3. The source of data (where we can get the data), and
4. The methods of data collection (how can we collect this data).
Once these questions are answered, it becomes necessary to collect the information needed. This information can be
collected directly or indirectly.

The most commonly used methods of data collection are the following:

(a) Survey Method – a data collection technique in which information is collected directly or indirectly from certain
individuals.
The most common methods of data collection for survey are self-administered questionnaire and personal interview.

(b) Questionnaire Method – a data collection method in which respondents are given prepared questionnaires with a
series of questions to answer and is used to record the responses.

(c) Interview Method – this method involves a face-to-face interaction between the interviewer and the interviewee.
Apart from face-to-face interviews, it can also be conducted over the phone or the computer terminal via video
conferencing technology.

(d) Observation Method - this method involves collecting information without asking questions. It is used when the
study relates to behavioral science This method is more subjective, as it requires the researcher (or observer) to add their
judgement to the data.

(f) Experiment Method – this method is used if the researcher would like to determine the cause and effect
relationship of certain phenomena under investigation. It is also used in making scientific inquiry

(g) Document Review Method - a method of utilizing the existing data or fact or information for a study.

WEEK 3
STATISTICAL ANALYSIS
DATA PRESENTATION

METHODS OF DATA PRESENTATION

(1) Textual Method


- is a narrative description of data gathered

(2) Tabular Method


- is a systematic arrangement of information into columns and rows

(3) Graphical Method


- is an illustrative description of data

THE FREQUENCY DISTRIBUTION TABLE (FDT)


The most convenient way of organizing data is by constructing frequency distribution. A frequency distribution is a
collection of observations produced by sorting them into classes and showing their frequency (or numbers) of occurrences
in each class.

Parts of a Statistical Table


 Table Heading -includes the table number and the title of the table

 Body - main part of the table that contains the information or figures

 Stubs or Classes - classification or categories describing the data and usually found at the leftmost side of the table

 Caption - designations or identifications of the information contained in a column, usually found at the top most of the
column
THREE Types of Frequency Distribution Table (FDT)
(1) The Qualitative or Categorical frequency distribution table
- is used for data that can be placed in specific categories, such as nominal, or ordinal level data.

EXAMPLE:

(2) The Quantitative or Grouped Frequency Distribution Table


- is a frequency distribution table where the data or observations are grouped according to some numerical or
quantitative characteristics.

EXAMPLE:

(3) The Quantitative or Ungrouped Frequency Distribution Table


- is a frequency distribution table where the data or observations are sorted into classes of single values
EXAMPLE:

BASIC TERMINOLOGIES associated with frequency tables


Lower class limit
- the smallest data value that can be included in the class.

Upper class limit


- the largest data value that can be included in the class.
Class boundaries (CB)
- are used to separate the classes so that there are no gaps in the frequency distribution
Class marks (CM)
- the midpoints of the classes

lower limit + upper limit


CM =
2
Class Width
- the difference between two consecutive class limits

Relative Frequency (RF)


- the proportion of observations falling in a class and is expressed in Percentage

Frequency
RF=
N
Frequency
% RF= X 100 %
N

Cumulative Frequency (CF) accumulated frequency of the classes

(1) Less than CF (<CF)


- total number of observations whose values do not exceed the upper limit of the class

(2) Greater than CF (>CF)


- total number of observations whose values are not less than the lower limit of the class

STEPS in constructing a frequency table


STEP 1
Determine the range, R.
R = highest score (HS) – lowest score (LS)

STEP 2
Determine the number of classes, k.
k =√ N
where N is the total number of observations/ scores in a given set of data
STEP 3
Determine the class size, c.
R
c=
k
The class width should be an odd number. This ensures that the midpoint of each class has the same place
value as the data.

STEP 4
Determine the lower limit (LL) of the first interval.
LL = Lowest score

STEP 5 Determine the upper limit UL of the first interval.


UL = LL + c – 1

STEP 6 Enumerate the classes or categories.

STEP 7 Tally the observations / scores.

Example 1
When 40 people were surveyed in Tuguegarao city, they reported the distance they drove to the mall, and the results (in
kilometers) are given below. Construct a frequency distribution table.

Solution
1. Determine the range, R.
R = highest score (HS) – lowest score (LS)
R = 40 – 1
R = 39

2. Determine the number of classes or categories or intervals, k.


k =√ N where N = 40 k =√ 40
k =6.32 ≅ 6
3. Determine the class size, c.
R 39
c= c=
k 6

c=6.5 ≅ 7
The class width should be an odd number. This ensures that the midpoint of each class has the same place value as the
data.

4. Determine the lower limit (LL) of the first interval.


LL = Lowest score
LL = 1

5. Determine the upper limit UL of the first interval.


UL = LL + c – 1
UL = 1 + 7 – 1
UL = 7
6. Enumerate the classes or categories or intervals.
Distance (in km)
To determine the second interval just add the value of the class
1–7 size to the lower limit and upper limit of the first interval.
8 – 14 Repeat the process.
15 – 21
Sometimes the number of classes or categories or intervals (k) is
22 – 28 NOT followed. An extra class will be added to accommodate the
29 – 35 highest observed value in the data set and a class will be deleted
36 – 42 if it turns out to be empty.

7. Tally the observations / scores.


TABLE: Frequency Distribution of Distance Driven (in kilometers) to the Mall by 40 People

7. Construct the frequency distribution table


TABLE: Frequency Distribution of Distance Driven (in kilometers) to the Mall by 40 People

LESS THAN Cumulative Frequency


The LESS THAN Cumulative Frequency is obtained by adding successively the frequencies of all the previous
classes/intervals including the class against which it is written. The cumulate is started from the lowest class/interval to the
highest class/interval.

GREATER THAN Cumulative Frequency


The GREATER THAN Cumulative Frequency is obtained by adding successively the frequencies of all the previous
classes/intervals including the class against which it is written. The cumulate is started from the highest class/interval to
the lowest class/interval

When constructing frequency tables,


 The classes must be mutually exclusive; each score must belong to only one class.
 Include all classes, even if their frequency is zero.
 Make sure that all classes have the same width.
 Try to select convenient numbers for class limits.
 Make sure that the number of classes should be between 5 and 20.
Example 2
Determine the less than CF (<CF) of the given table:

NOTE
To determine the entries in the LESS THAN Cumulative Frequency ( < CF), add successively the frequencies of all the
previous classes/intervals including the class against which it is written. The cumulate is started from the frequency of
lowest class/interval to the frequency of the highest class/interval.

Example 3
Determine the greater than CF (>CF) of the given table:

NOTE
To determine the entries in the GREATER THAN Cumulative Frequency ( > CF), add successively the frequencies of all
the previous classes/intervals including the class against which it is written. The cumulate is started from the frequency of
highest class/interval to the frequency of the lowest class/interval.

Example 4
Construct a grouped frequency table for the given data below.

Solution
Following the steps discussed earlier, the frequency distribution of the given data is shown below. The classes are
arranged in increasing order.
GRAPHICAL PRESENTATION OF DATA
- A graph or a chart is a device for showing numerical values or relationships in pictorial form.

Advantages
1. Main features and implication of a body of data can be seen at once
2. Can attract attention and hold the reader’s interest
3. Simplifies concepts that would otherwise have been expressed in so many words
4. Can readily clarify data, frequently bring out hidden facts and relationships

QUALITIES OF A GOOD GRAPH

It is accurate.
A good graph should not be deceptive, distorted or misleading, or in any way susceptible to wrong interpretations as a
result of inaccurate or careless construction.

It is clear.
An effective graph can be easily read and understood. The graph should focus on the message it is trying to
communicate.

It is simple.
The basic design of statistical graph should be simple, straightforward, not loaded with irrelevant, or trivial symbols. There
should be no distracting elements in a chart that inhibit effective visual communication.

It has a good appearance.


A good graph is one that is designed and constructed to attract or catch attention by holding a neat, dignified and
professional appearance.

COMMON TYPES OF GRAPH


Scatter Graph
- A graph used to present measurements or values that are thought to be related.
Line Chart / Graph
- Graphical representation of data especially useful for showing trends over a period of time

Pie Chart
- A circular graph that is useful in showing
how a total quantity is distributed
among a group of categories. The
pieces of pie represent the proportions
of the total that fall into each category.

Column or Bar Graph


- Like pie charts, column charts and bar
graph are applicable only to grouped
data. They should be used for discrete,
grouped data of ordinal or nominal
scale.

Graphical Presentation of the


Frequency Distribution Table
(1) Frequency Histogram
A bar graph that displays the classes on the horizontal axis and the frequencies of the classes on the vertical axis, the
vertical lines of the bars are erected at the class boundaries and the height of the bars correspond to the class frequency.
(2) Relative Frequency histogram
A graph that displays the classes on the horizontal axis and the relative frequencies on the vertical axis.

(3) Frequency Polygon


A line chart that is constructed by plotting the frequencies at the class marks and connecting the plotted points by means
of straight lines; the polygon is closed by considering an additional class at each end and the ends of the lines are brought
down to the horizontal axis at the midpoints of the additional classes.

(4) Ogives

(a) Less than Ogive


- the less than CF is plotted against the Upper True Class Boundaries (UTCB)

(b) Greater than Ogive


- the greater than CF is plotted against the Lower True Class Boundaries (LTCB)
WEEK 4
Measures of Central Tendency
 Statistics that describe the location of the distribution
 extent to which all the data values group around a typical or central value.

Most common measures of central tendency


 Mean … average value
 Median … middle value of a ranked-ordered group
 Mode … most frequent value

(1) The Mean


- the most common measure of central tendency
- the only common measure in which all the values play an equal role
- serves as a balance point in a set of data (like a fulcrum on a seesaw)
- can be applied in at least interval level
- greatly affected by extreme values (avoid using the mean
- the sum of all the data entries divided by the number of entries

Example 1. The Statistics exam scores for a sample of 20 students are as follows: 50 53 59 59 63 63 72 72 72 72 72 76
78 81 83 84 84 87 90 93. Calculate the mean score of the students.
Solution: The sum of the scores is

Summation of X = 50 + 53 + 59 + 59 + 63 + 63 +72 +72 +72 + 72 + 72 + 76 + 78 + 81 + 83 + 84 + 84 + 87 + 90 + 93 =


1463

To find the mean score, divide the sum of the scores by the number of scores in the sample data
1463/20 = 73.15
The mean score of the students in the Statistics exam is 73.15

(2) The Median


- the middle value in a set of data that has been ordered from smallest to largest

If the data set has


 odd number of entries, the median is the middle data entry
 even number of entries, the median is the mean of the 2 middle data entries.

- a positional measure; not affected by extreme values


- can be applied in at least ordinal
(3) The Mode
- The data entry that occurs most frequently
- A data set can have one mode, more than one mode, or no mode.
- Not affected by extreme values
- Can be used for qualitative as well as quantitative
Practice Drill

ANSWERS:

You might also like