Session 3.2, Summaries
29th March 2010
Confidence Estimation of the
Population Mean
In theory, we can compute the
population mean face value of a fair, six-sided d6 with face values 1,2,3,4,5,6
as
M = 1*Pr{d6 shows 1}+2*Pr{d6 shows
2}+3*Pr{d6 shows 3}+4*Pr{d6 shows 4}+5*Pr{d6 shows 5}+6*Pr{d6 shows 6}
M = 1*(1/6) +2*(1/6)+3*(1/6)+4*(1/6)+5*(1/6)+6*(1/6)
= 3.5
Our 95% confidence
interval estimation process should produce intervals containing this population
mean in approximately 95% of samples. Here is an example from Summer 2009:
Sample |
1 |
2 |
3 |
4 |
5 |
6 |
n |
m |
sd |
se |
lower |
upper |
M |
|
Perfect50 |
8.333 |
8.333 |
8.333 |
8.333 |
8.333 |
8.333 |
50 |
3.5 |
1.725 |
0.244 |
3.022 |
3.978 |
3.5 |
|
1 |
11 |
5 |
7 |
7 |
10 |
10 |
50 |
3.6 |
1.852 |
0.262 |
3.087 |
4.113 |
3.5 |
Hit |
2 |
7 |
5 |
13 |
10 |
11 |
4 |
50 |
3.5 |
1.502 |
0.212 |
3.084 |
3.916 |
3.5 |
Hit |
3 |
11 |
5 |
7 |
13 |
7 |
7 |
50 |
3.42 |
1.715 |
0.243 |
2.945 |
3.895 |
3.5 |
Hit |
4 |
7 |
9 |
7 |
15 |
8 |
4 |
50 |
3.4 |
1.512 |
0.214 |
2.981 |
3.819 |
3.5 |
Hit |
5 |
11 |
6 |
10 |
9 |
8 |
6 |
50 |
3.3 |
1.693 |
0.239 |
2.831 |
3.769 |
3.5 |
Hit |
6 |
7 |
3 |
7 |
13 |
10 |
10 |
50 |
3.92 |
1.639 |
0.232 |
3.466 |
4.374 |
3.5 |
Hit |
7 |
8 |
5 |
9 |
10 |
10 |
8 |
50 |
3.66 |
1.673 |
0.237 |
3.196 |
4.124 |
3.5 |
Hit |
8 |
6 |
8 |
9 |
10 |
6 |
11 |
50 |
3.7 |
1.693 |
0.239 |
3.231 |
4.169 |
3.5 |
Hit |
9 |
14 |
10 |
7 |
9 |
3 |
7 |
50 |
2.96 |
1.749 |
0.247 |
2.475 |
3.445 |
3.5 |
Miss |
10 |
9 |
7 |
12 |
9 |
9 |
4 |
50 |
3.28 |
1.565 |
0.221 |
2.846 |
3.714 |
3.5 |
Hit |
11 |
9 |
10 |
4 |
3 |
11 |
13 |
50 |
3.72 |
1.938 |
0.274 |
3.183 |
4.257 |
3.5 |
Hit |
12 |
7 |
7 |
7 |
10 |
8 |
11 |
50 |
3.76 |
1.733 |
0.245 |
3.28 |
4.24 |
3.5 |
Hit |
13 |
10 |
11 |
8 |
8 |
5 |
8 |
50 |
3.22 |
1.741 |
0.246 |
2.737 |
3.703 |
3.5 |
Hit |
14 |
9 |
6 |
11 |
11 |
6 |
7 |
50 |
3.4 |
1.641 |
0.232 |
2.945 |
3.855 |
3.5 |
Hit |
15 |
7 |
5 |
8 |
7 |
12 |
11 |
50 |
3.9 |
1.729 |
0.245 |
3.421 |
4.379 |
3.5 |
Hit |
16 |
10 |
12 |
7 |
12 |
3 |
6 |
50 |
3.08 |
1.627 |
0.23 |
2.629 |
3.531 |
3.5 |
Hit |
17 |
5 |
9 |
6 |
7 |
14 |
9 |
50 |
3.86 |
1.666 |
0.236 |
3.398 |
4.322 |
3.5 |
Hit |
18 |
9 |
5 |
8 |
9 |
10 |
9 |
50 |
3.66 |
1.745 |
0.247 |
3.176 |
4.144 |
3.5 |
Hit |
19 |
9 |
9 |
9 |
5 |
9 |
9 |
50 |
3.46 |
1.787 |
0.253 |
2.965 |
3.955 |
3.5 |
Hit |
20 |
7 |
8 |
7 |
8 |
12 |
8 |
50 |
3.68 |
1.696 |
0.24 |
3.21 |
4.15 |
3.5 |
Hit |
21 |
7 |
10 |
6 |
10 |
9 |
8 |
50 |
3.56 |
1.692 |
0.239 |
3.091 |
4.029 |
3.5 |
Hit |
22 |
3 |
12 |
15 |
8 |
3 |
9 |
50 |
3.46 |
1.528 |
0.216 |
3.036 |
3.884 |
3.5 |
Hit |
Success
Rate |
95% |
20.9 |
20
or 21 |
Sample
Success Rate |
0.955 |
|||||||||
Failure
Rate |
5% |
1.1 |
1
or 2 |
Sample
Failure Rate |
0.045 |
Confidence Interval
General Mean
Diseased Monkeys
Objective: Be able to
perform interval estimation of the population mean using the confidence
interval method. Be able to fully discuss the confidence interval. This
discussion must include a clear description of the population and the population
mean, the family of samples, the family of intervals and how the confidence
applies to the family of intervals.
A random sample of Lab Monkeys is
infected with the agent that causes Disease X. The time (in hours) from
infection to the appearance of symptoms of Disease X is measured for each
monkey. The sample of monkeys yields the following times (in hours):
12, 26, 36, 38, 40, 42,
44, 48, 52, 62, |
13, 27, 37, 38, 41, 42,
44, 49, 55, 65, |
15, 30, 37, 39, 41, 44,
46, 50, 56, 70, |
16, 32, 38, 40, 42, 44,
48, 50, 58, 72, |
18, 35, 40, 41, 42, 45,
48, 52, 58, 75 |
Follow the steps:
Edit the data into your calculator, and
compute the following statistics: sample size, sample mean, sample standard
deviation.
N
M
SD Z
LOBOUND HIBOUND
50 42.66 14.0968
1.96 38.7526 46.5674
Identify the Population Mean for
this Sample.
We seek the population
mean time to symptoms, in hours for disease X among the population of Lab
Monkeys.
Consult the Normal Table, and
determine the SD Multiplier required to ensure 95%Confidence. Justify the
approach.
Since we need approximate
95% confidence, we need a number somewhat larger than 1.95, but 2.00 is more
than we need. In practice, the number that we need is 1.96. But you should use
2.00 from your table…Here are the rows from the table:
1.95
0.025588 0.94882
2.00
0.022750 0.95450
The cost of this approach
is the availability of large random samples – n > 30 will usually suffice.
Compute a 95% Confidence Interval
for the true but unknown population mean in this problem.
Compute
LOBOUND @ M – Z*(SD/ÖN)
= 42.66-1.96*(14.0968/Ö50) @ 38.7526
and
HIBOUND @ M + Z*(SD/ÖN)
= 42.66+1.96*(14.0968/Ö50) @ 46.5674.
Write the approximate
interval as: [38.8,46.6]. This is our approximate interval.
Discuss the Family of Intervals for
this problem.
Each member of this Family
is a single random sample of n=50 Lab Monkeys. The Family of Samples (FoS)
consists of every possible random sample of n=50 Lab Monkeys. Each member of
the FoS yields the following statistics: { n(sample size), m(sample mean) and sd(sample
std deviation}. For this FoS, n=50 for member samples, but m and sd will vary
from member to member. Each member of the FoS yields an interval of the form:
[m – 1.96*(sd/Ön), m + 1.96*(sd/Ön)].
These intervals
collectively form a Family of Intervals(FoI) – each member of the FoI is an
interval derived from a member of the FoS. Approximately 95% of these intervals
contain the true population mean time (in hours) to symptoms of Disease X in
Lab Monkeys, and the approximately 5% fail.
Interpret the Single Confidence Interval
for this problem.
If our interval captures
the true population mean, then the mean time to symptoms of Disease X in Lab
Monkeys is between 38.8 and 46.6 hours.
Confidence Interval
General Mean
Generic Fictitious
Spiders
Objective: Be able to
perform interval estimation of the population mean using the confidence
interval method. Be able to fully discuss the confidence interval.
This discussion must include a clear description of the population and the population
mean, the family of samples, the family of intervals and how the confidence
applies to the family of intervals.
Generic Fictitious Spiders
We have a sample of Generic
Fictitious Spiders. Each spider's diameter (maximum length, in cm, from leg tip
to leg tip).
The spider diameters are listed
below:
16.5, 21.9 22.0, 22.8 22.8,
23.4 23.5, 23.7 24.4, 24.4 |
24.7, 28.1 28.3, 28.3
28.6, 29.1 30.3, 30.4 31.7, 31.7 |
32.4, 33.1 34.1, 35.0 35.0,
35.3 35.8, 36.7 37.1, 38.4 |
38.4, 38.6 38.7, 39.5
40.2, 41.9 43.3, 43.6 50.1, 52.4 |
Follow the steps:
Edit the
data into your calculator, and compute the following statistics: sample size,
sample mean, sample standard deviation.
Statistic |
Value |
Comment |
Sample Size |
n=40 |
There are n=40 spiders
in the sample. |
Sample Mean |
m=32.4 |
The average diameter of
spiders in the sample is 32.4 cms. |
Sample SD |
sd=8.09 |
No comment here. |
Identify the Population Mean for
this Sample.
We are after the
Population Mean Maximum Spider Diameter.
Consult the Normal Table, and determine
the SD Multiplier required to ensure 90%
We want z(k)=1.65 from
the table, as discussed in class.
Confidence. Justify the approach.
We are working with large
(n>=30) random samples, and are working with sample means.
Compute a 90% Confidence Interval
for the true but unknown population mean in this problem.
m-1.65(sd/Ö n) = 32.4 - 1.65*(8.09/Ö 40) = 30.29;
m+1.65(sd/Ö n) = 32.4 + 1.65*(8.09/Ö 40) = 34.51;
Write the interval as
[30.29,34.51].
Discuss the Family of Intervals for
this problem.
Each member of our Family
is a random sample of 40 spiders from our population. The Family of Samples
consists of every possible sample of this type.
From each member, compute
the interval
m ± 1.65(sd/Ö40);
where m is the sample
mean and sd is the sample standard deviation for the sample.
Each member of the Family
of Intervals is obtained in this way from the Family of Samples, and consists
of all such intervals.
Approximately 90% of the
Family of Intervals captures the population mean maximum spider diameter. The
remaining intervals do not capture the population mean.
Interpret the Single Confidence
Interval for this problem.
If our interval contains
the population mean, then the true population mean maximum diameter for Generic
Fictitious Spiders is between 30.29 and 34.51 centimeters.
From here: http://www.mindspring.com/~cjalverson/3rdhourlyfall2008versionA_key.htm
Case One | Confidence Interval, Mean | Glioblastoma Multiforme
Glioblastoma multiforme
(GBM) is the highest
grade glioma tumor and is the most malignant form of astrocytomas. These tumors
originate in the brain. GBM tumors grow rapidly, invade nearby tissue and
contain cells that are very malignant. GBM are among the most common and
devastating primary brain tumors in adults.
Suppose that we
have a random sample of GBM patients, with survival time (in weeks) listed
below:
3, 4, 5, 5, 12, 15, 17, 20, 21, 22, 23, 24, 25, 26, 27,
30, 31, 37, 38, 45, 48, 51, 53, 53, 57, 60, 61, 62, 63, 64, 65, 65, 65, 66, 66,
67, 68, 69, 72, 72, 73, 74, 76, 77, 78, 79, 80, 80, 81, 82, 83, 83, 85, 86, 87,
90, 150, 180,
Estimate the population mean
survival time for Glioblastoma multiforme patients with 99% confidence. That
is, compute and discuss a 99% confidence interval for this population mean.
Provide concise and complete details and discussion as demonstrated in the case
study summaries.
Table 1. Means and
Proportions
Z(k) PROBRT
PROBCENT 0.05 0.48006 0.03988 0.10 0.46017 0.07966 0.15 0.44038 0.11924 0.20 0.42074 0.15852 0.25 0.40129 0.19741 0.30 0.38209 0.23582 0.35 0.36317 0.27366 0.40 0.34458 0.31084 0.45 0.32636 0.34729 0.50 0.30854 0.38292 0.55 0.29116 0.41768 0.60 0.27425 0.45149 0.65 0.25785 0.48431 0.70 0.24196 0.51607 0.75 0.22663 0.54675 0.80 0.21186 0.57629 0.85 0.19766 0.60467 0.90 0.18406 0.63188 0.95 0.17106 0.65789 1.00 0.15866 0.68269 |
Z(k) PROBRT PROBCENT 1.05 0.14686 0.70628 1.10 0.13567 0.72867 1.15 0.12507 0.74986 1.20 0.11507 0.76986 1.25 0.10565 0.78870 1.30 0.09680 0.80640 1.35 0.088508 0.82298 1.40 0.080757 0.83849 1.45 0.073529 0.85294 1.50 0.066807 0.86639 1.55 0.060571 0.87886 1.60 0.054799 0.89040 1.65 0.049471 0.90106 1.70 0.044565 0.91087 1.75 0.040059 0.91988 1.80 0.035930 0.92814 1.85 0.032157 0.93569 1.90 0.028717 0.94257 1.95 0.025588 0.94882 2.00 0.022750 0.95450 |
Z(k) PROBRT PROBCENT 2.05 0.020182 0.95964 2.10 0.017864 0.96427 2.15 0.015778 0.96844 2.20 0.013903 0.97219 2.25 0.012224 0.97555 2.30 0.010724 0.97855 2.35 0.009387 0.98123 2.40 0.008198 0.98360 2.45 0.007143 0.98571 2.50 0.006210 0.98758 2.55 0.005386 0.98923 2.60 0.004661 0.99068 2.65 0.004025 0.99195 2.70 .0034670 0.99307 2.75 .0029798 0.99404 2.80 .0025551 0.99489 2.85 .0021860 0.99563 2.90 .0018658 0.99627 2.95 .0015889 0.99682 3.00 .0013499 0.99730 |
Numbers
n
m sd
se
Z lower
upper
58
56.91
33.12 4.35
2.60
45.61 68.22
se = sd/sqrt(n) » 33.12/sqrt(58) » 4.35
z » 2.60 for 99% confidence from 2.60 0.004661
0.99068
lower = m ─
(z*se) » 56.91 ─
(2.60*4.35) » 45.61
upper = m +
(z*se) » 56.91 + (2.60*4.35) » 68.22
Report the interval as [45.6,
68.2].
Interpretation
Our population is
the population of Glioblastoma multiforme patients and our population mean is the mean
survival time (weeks).
Our Family of Samples
(FoS) consists of every possible random sample of 58 Glioblastoma
multiforme patients. From each individual
sampled Glioblastoma
multiforme patients, survival time in weeks is obtained.
From each member sample
of the FoS, we compute the sample mean (m) and standard deviation (sd) for GBM
survival time, and then compute the interval
[m – 2.60*( sd/sqrt(n)), m + 2.60*( sd/sqrt(n))].
Computing this interval
for each member sample of the FoS, we obtain a Family of Intervals (FoI),
approximately 99% of which cover the true population mean survival time in
weeks for Glioblastoma
multiforme patients.
If our interval, [45.6,
68.2] is among the approximate 99% super-majority of intervals that cover
the population mean, then the true population mean survival time for Glioblastoma
multiforme patients
is between 45.6 and 68.2 weeks.
From here: http://www.mindspring.com/~cjalverson/CompFinalSpring2008verMondayKey.htm
Table 1. Means and
Proportions
Z(k) PROBRT
PROBCENT 0.05 0.48006 0.03988 0.10 0.46017 0.07966 0.15 0.44038 0.11924 0.20 0.42074 0.15852 0.25 0.40129 0.19741 0.30 0.38209 0.23582 0.35 0.36317 0.27366 0.40 0.34458 0.31084 0.45 0.32636 0.34729 0.50 0.30854 0.38292 0.55 0.29116 0.41768 0.60 0.27425 0.45149 0.65 0.25785 0.48431 0.70 0.24196 0.51607 0.75 0.22663 0.54675 0.80 0.21186 0.57629 0.85 0.19766 0.60467 0.90 0.18406 0.63188 0.95 0.17106 0.65789 1.00 0.15866 0.68269 |
Z(k) PROBRT PROBCENT 1.05 0.14686 0.70628 1.10 0.13567 0.72867 1.15 0.12507 0.74986 1.20 0.11507 0.76986 1.25 0.10565 0.78870 1.30 0.09680 0.80640 1.35 0.088508 0.82298 1.40 0.080757 0.83849 1.45 0.073529 0.85294 1.50 0.066807 0.86639 1.55 0.060571 0.87886 1.60 0.054799 0.89040 1.65 0.049471 0.90106 1.70 0.044565 0.91087 1.75 0.040059 0.91988 1.80 0.035930 0.92814 1.85 0.032157 0.93569 1.90 0.028717 0.94257 1.95 0.025588 0.94882 2.00 0.022750 0.95450 |
Z(k) PROBRT PROBCENT 2.05 0.020182 0.95964 2.10 0.017864 0.96427 2.15 0.015778 0.96844 2.20 0.013903 0.97219 2.25 0.012224 0.97555 2.30 0.010724 0.97855 2.35 0.009387 0.98123 2.40 0.008198 0.98360 2.45 0.007143 0.98571 2.50 0.006210 0.98758 2.55 0.005386 0.98923 2.60 0.004661 0.99068 2.65 0.004025 0.99195 2.70 .0034670 0.99307 2.75 .0029798 0.99404 2.80 .0025551 0.99489 2.85 .0021860 0.99563 2.90 .0018658 0.99627 2.95 .0015889 0.99682 3.00 .0013499 0.99730 |
Case Four | Confidence
Interval for Mean | Gestational Age
Consider the population
mean gestational ages (in weeks) at birth of Year 2005 US Resident Live Births.
Using the data from Case Two, compute and interpret a 93% confidence
interval for this population mean.
Numbers
From 1.85 0.032157
0.93569, z=1.85.
n = 56
m » 37.34
sd » 3.9278
lowCI = m − z*(sd/sqrt(n))
» 37.34 − 1.85*(3.9278/sqrt(56)) » 36.37
highCI = m + z*(sd/sqrt(n))
» 37.34 + 1.85*(3.9278/sqrt(56)) » 38.31
Report the interval as
[36.4, 38.3].
Interpretation
Our population is the
population of year 2005 US resident live born infants and our population mean
is the mean gestational age (weeks).
Our Family of Samples (FoS)
consists of every possible random sample of 56 year 2005 US resident live born
infants. From each individual sampled live born infant, gestational age in
weeks is obtained.
From each member sample
of the FoS, we compute the sample mean (m) and standard deviation (sd) for
serum CRP, and then compute the interval
[m – 1.85*( sd/sqrt(n)), m + 1.85*( sd/sqrt(n))].
Computing this interval
for each member sample of the FoS, we obtain a Family of Intervals (FoI),
approximately 93% of which cover the true population mean gestational age in
weeks for year 2005 US resident live born infants.
If our interval, [36.4,
38.3] is among the approximate 93% super-majority of intervals that cover the
population mean, then the true population mean gestational age is between 36.4
and 38.3 weeks for year 2005 US resident live born infants.