BSGPT Notes
BSGPT Notes
**Example**:
Imagine you want to survey the academic performance of high school students in a large city.
You divide the city into different neighborhoods (clusters). Instead of selecting students from the
entire city, you randomly select a few neighborhoods and survey all the students within those
neighborhoods.
**Steps**:
1. **Divide the population into clusters**: Neighborhoods in a city.
2. **Randomly select clusters**: Randomly choose a few neighborhoods.
3. **Survey all individuals within the selected clusters**: Survey all high school students in
those selected neighborhoods.
**Example**:
Suppose you want to study the dietary habits of adults in a country. You divide the population
into strata based on age groups (e.g., 18-30, 31-50, 51-70, 71+). Then, you randomly select
individuals from each age group to ensure representation from all age categories.
**Steps**:
1. **Divide the population into strata**: Age groups (18-30, 31-50, 51-70, 71+).
2. **Randomly sample from each stratum**: Randomly select individuals from each age group.
### Comparison
**Cluster Sampling**:
- Used when the population is naturally divided into clusters.
- More cost-effective and practical for large, geographically dispersed populations.
- Example: Surveying students from selected neighborhoods in a city.
In summary, cluster sampling is useful for geographically dispersed populations and practical
constraints, while stratified random sampling is ideal for ensuring all subgroups are
proportionally represented.
-Sure! Let's go through a real-life example with data to illustrate the concept of an estimator in
statistics.
### Scenario
Suppose we want to estimate the average amount of time people spend on social media per day
in a particular city. To do this, we conduct a survey and collect data from a random sample of
residents.
\[ \text{Sample data: } [120, 150, 80, 200, 90, 160, 110, 140, 130, 100] \]
To estimate the population mean (average time spent on social media per day by all residents in
the city), we use the sample mean as our estimator.
where \( n \) is the sample size, and \( x_i \) represents each observation in the sample.
\[ n = 10 \]
\[ \sum_{i=1}^{10} x_i = 120 + 150 + 80 + 200 + 90 + 160 + 110 + 140 + 130 + 100 = 1280 \]
So,
Let's discuss some properties of the sample mean as an estimator for the population mean:
1. **Unbiasedness**: The sample mean is an unbiased estimator of the population mean. This
means that, on average, the sample mean equals the population mean. If we were to take many
samples and calculate their means, the average of those sample means would be the true
population mean.
2. **Consistency**: As the sample size increases, the sample mean becomes closer to the
population mean. With a larger sample, our estimate becomes more reliable.
3. **Efficiency**: Among all unbiased estimators of the population mean, the sample mean has
the smallest variance, making it the most efficient.
Suppose we also want to estimate the proportion of residents who spend more than 100 minutes
per day on social media. In our sample, we count how many residents spend more than 100
minutes per day.
\[ \hat{p} = \frac{x}{n} \]
where \( x \) is the number of successes (residents who spend more than 100 minutes per day),
and \( n \) is the sample size.
\[ \text{Residents spending more than 100 minutes: } [120, 150, 200, 160, 110, 140, 130] \]
Number of successes \( x = 7 \)
So,
The sample proportion \( \hat{p} = 0.70 \) is our estimate for the proportion of residents who
spend more than 100 minutes per day on social media.
### Summary
By using these estimators, we can make informed inferences about the population based on our
sample data.3