Session 2.2, Summaries
29th June 2009
Confidence Estimation of the Population Mean
In
theory, we can compute the population mean face value of a fair, six-sided d6
with face values 1,2,3,4,5,6 as
M =
1*Pr{d6 shows 1}+2*Pr{d6 shows 2}+3*Pr{d6 shows 3}+4*Pr{d6 shows 4}+5*Pr{d6
shows 5}+6*Pr{d6 shows 6}
M = 1*(1/6) +2*(1/6)+3*(1/6)+4*(1/6)+5*(1/6)+6*(1/6) = 3.5
Our 95% confidence interval estimation process should
produce intervals containing this population mean in approximately 95% of
samples.
Sample |
1 |
2 |
3 |
4 |
5 |
6 |
n |
m |
sd |
se |
lower |
upper |
M |
|
Perfect50 |
8.333 |
8.333 |
8.333 |
8.333 |
8.333 |
8.333 |
50 |
3.5 |
1.725 |
0.244 |
3.022 |
3.978 |
3.5 |
|
1 |
11 |
5 |
7 |
7 |
10 |
10 |
50 |
3.6 |
1.852 |
0.262 |
3.087 |
4.113 |
3.5 |
Hit |
2 |
7 |
5 |
13 |
10 |
11 |
4 |
50 |
3.5 |
1.502 |
0.212 |
3.084 |
3.916 |
3.5 |
Hit |
3 |
11 |
5 |
7 |
13 |
7 |
7 |
50 |
3.42 |
1.715 |
0.243 |
2.945 |
3.895 |
3.5 |
Hit |
4 |
7 |
9 |
7 |
15 |
8 |
4 |
50 |
3.4 |
1.512 |
0.214 |
2.981 |
3.819 |
3.5 |
Hit |
5 |
11 |
6 |
10 |
9 |
8 |
6 |
50 |
3.3 |
1.693 |
0.239 |
2.831 |
3.769 |
3.5 |
Hit |
6 |
7 |
3 |
7 |
13 |
10 |
10 |
50 |
3.92 |
1.639 |
0.232 |
3.466 |
4.374 |
3.5 |
Hit |
7 |
8 |
5 |
9 |
10 |
10 |
8 |
50 |
3.66 |
1.673 |
0.237 |
3.196 |
4.124 |
3.5 |
Hit |
8 |
6 |
8 |
9 |
10 |
6 |
11 |
50 |
3.7 |
1.693 |
0.239 |
3.231 |
4.169 |
3.5 |
Hit |
9 |
14 |
10 |
7 |
9 |
3 |
7 |
50 |
2.96 |
1.749 |
0.247 |
2.475 |
3.445 |
3.5 |
Miss |
10 |
9 |
7 |
12 |
9 |
9 |
4 |
50 |
3.28 |
1.565 |
0.221 |
2.846 |
3.714 |
3.5 |
Hit |
11 |
9 |
10 |
4 |
3 |
11 |
13 |
50 |
3.72 |
1.938 |
0.274 |
3.183 |
4.257 |
3.5 |
Hit |
12 |
7 |
7 |
7 |
10 |
8 |
11 |
50 |
3.76 |
1.733 |
0.245 |
3.28 |
4.24 |
3.5 |
Hit |
13 |
10 |
11 |
8 |
8 |
5 |
8 |
50 |
3.22 |
1.741 |
0.246 |
2.737 |
3.703 |
3.5 |
Hit |
14 |
9 |
6 |
11 |
11 |
6 |
7 |
50 |
3.4 |
1.641 |
0.232 |
2.945 |
3.855 |
3.5 |
Hit |
15 |
7 |
5 |
8 |
7 |
12 |
11 |
50 |
3.9 |
1.729 |
0.245 |
3.421 |
4.379 |
3.5 |
Hit |
16 |
10 |
12 |
7 |
12 |
3 |
6 |
50 |
3.08 |
1.627 |
0.23 |
2.629 |
3.531 |
3.5 |
Hit |
17 |
5 |
9 |
6 |
7 |
14 |
9 |
50 |
3.86 |
1.666 |
0.236 |
3.398 |
4.322 |
3.5 |
Hit |
18 |
9 |
5 |
8 |
9 |
10 |
9 |
50 |
3.66 |
1.745 |
0.247 |
3.176 |
4.144 |
3.5 |
Hit |
19 |
9 |
9 |
9 |
5 |
9 |
9 |
50 |
3.46 |
1.787 |
0.253 |
2.965 |
3.955 |
3.5 |
Hit |
20 |
7 |
8 |
7 |
8 |
12 |
8 |
50 |
3.68 |
1.696 |
0.24 |
3.21 |
4.15 |
3.5 |
Hit |
21 |
7 |
10 |
6 |
10 |
9 |
8 |
50 |
3.56 |
1.692 |
0.239 |
3.091 |
4.029 |
3.5 |
Hit |
22 |
3 |
12 |
15 |
8 |
3 |
9 |
50 |
3.46 |
1.528 |
0.216 |
3.036 |
3.884 |
3.5 |
Hit |
Success Rate |
95% |
20.9 |
20 or 21 |
Sample Success Rate |
0.955 |
|||||||||
Failure Rate |
5% |
1.1 |
1 or 2 |
Sample Failure Rate |
0.045 |
Confidence Interval
General Mean
Diseased Monkeys
Objective: Be able to
perform interval estimation of the population mean using the confidence
interval method. Be able to fully discuss the confidence interval.
This discussion must include a clear description of the population and the population mean, the family of samples, the family of
intervals and how the confidence applies to the family of intervals.
A random sample of Lab Monkeys is
infected with the agent that causes Disease X. The time (in hours) from
infection to the appearance of symptoms of Disease X is measured for each
monkey. The sample of monkeys yields the following times (in hours):
12, 26, 36, 38, 40, 42,
44, 48, 52, 62, |
13, 27, 37, 38, 41, 42,
44, 49, 55, 65, |
15, 30, 37, 39, 41, 44,
46, 50, 56, 70, |
16, 32, 38, 40, 42, 44,
48, 50, 58, 72, |
18, 35, 40, 41, 42, 45,
48, 52, 58, 75 |
Follow the steps:
Edit the data into your calculator, and
compute the following statistics: sample size, sample mean, sample standard
deviation.
N
M
SD Z
LOBOUND HIBOUND
50
42.66 14.0968 1.96
38.7526 46.5674
Identify the Population Mean for
this Sample.
We seek the population
mean time to symptoms, in hours for disease X among the population of Lab
Monkeys.
Consult the Normal Table, and
determine the SD Multiplier required to ensure
95%Confidence. Justify the approach.
Since we need approximate
95% confidence, we need a number somewhat larger than 1.95, but 2.00 is more
than we need. In practice, the number that we need is 1.96. But you should use
2.00 from your table…Here are the rows from the table:
1.95 0.025588 0.94882
2.00 0.022750 0.95450
The cost of this approach
is the availability of large random samples – n > 30 will usually suffice.
Compute a 95% Confidence Interval
for the true but unknown population mean in this problem.
Compute
LOBOUND @ M – Z*(SD/ÖN) = 42.66-1.96*(14.0968/Ö50) @ 38.7526
and
HIBOUND @ M + Z*(SD/ÖN) = 42.66+1.96*(14.0968/Ö50) @ 46.5674.
Write the approximate
interval as: [38.8,46.6]. This is our approximate
interval.
Discuss the Family of Intervals for
this problem.
Each member of this
Family is a single random sample of n=50 Lab Monkeys. The Family of Samples (FoS) consists of every possible
random sample of n=50 Lab Monkeys. Each member of the FoS
yields the following statistics: { n(sample size),
m(sample mean) and sd(sample std deviation}. For this
FoS, n=50 for member
samples, but m and sd will vary from member to
member. Each member of the FoS
yields an interval of the form:
[m – 1.96*(sd/Ön), m + 1.96*(sd/Ön)].
These intervals
collectively form a Family of Intervals(FoI) – each member of the FoI is
an interval derived from a member of the FoS.
Approximately 95% of these intervals contain the true population mean time (in
hours) to symptoms of Disease X in Lab Monkeys, and the approximately 5% fail.
Interpret the Single Confidence
Interval for this problem.
If our interval captures
the true population mean, then the mean time to symptoms of Disease X in Lab
Monkeys is between 38.8 and 46.6 hours.
Confidence Interval
General Mean
Generic Fictitious Spiders
Objective: Be able to perform
interval estimation of the population mean using the confidence interval
method. Be able to fully discuss the confidence interval. This
discussion must include a clear description of the population and the population mean, the family of samples, the family of
intervals and how the confidence applies to the family of intervals.
Generic Fictitious Spiders
We have a sample of Generic
Fictitious Spiders. Each spider's diameter (maximum length,
in cm, from leg tip to leg tip).
The spider diameters are listed
below:
16.5, 21.9 22.0, 22.8 22.8, 23.4 23.5, 23.7 24.4, 24.4 |
24.7, 28.1 28.3, 28.3
28.6, 29.1 30.3, 30.4 31.7, 31.7 |
32.4, 33.1 34.1, 35.0 35.0, 35.3 35.8, 36.7 37.1, 38.4 |
38.4, 38.6 38.7, 39.5
40.2, 41.9 43.3, 43.6 50.1, 52.4 |
Follow the steps:
Edit the data into your calculator,
and compute the following statistics: sample size, sample mean, sample standard
deviation.
Statistic |
Value |
Comment |
Sample Size |
n=40 |
There are n=40 spiders
in the sample. |
Sample Mean |
m=32.4 |
The average diameter of
spiders in the sample is 32.4 cms. |
Sample SD |
sd=8.09 |
No comment here. |
Identify the Population Mean for
this Sample.
We are after the
Population Mean Maximum Spider Diameter.
Consult the Normal Table, and determine
the SD Multiplier required to ensure 90%
We want z(k)=1.65 from the table, as discussed in class.
Confidence.
Justify the approach.
We are working with large
(n>=30) random samples, and are working with sample means.
Compute a 90% Confidence Interval
for the true but unknown population mean in this problem.
m-1.65(sd/Ö n) = 32.4 - 1.65*(8.09/Ö 40) = 30.29;
m+1.65(sd/Ö n) = 32.4 + 1.65*(8.09/Ö 40) = 34.51;
Write the interval as
[30.29,34.51].
Discuss the Family of Intervals for
this problem.
Each member of our Family
is a random sample of 40 spiders from our population. The Family of Samples
consists of every possible sample of this type.
From each member, compute
the interval
m ± 1.65(sd/Ö40);
where m is the sample mean and sd is the sample
standard deviation for the sample.
Each member of the Family
of Intervals is obtained in this way from the Family of Samples, and consists
of all such intervals.
Approximately 90% of the
Family of Intervals captures the population mean maximum spider diameter. The
remaining intervals do not capture the population mean.
Interpret the Single Confidence
Interval for this problem.
If our interval contains
the population mean, then the true population mean
maximum diameter for Generic Fictitious Spiders is between 30.29 and
34.51 centimeters.
From here: http://www.mindspring.com/~cjalverson/3rdhourlyfall2008versionA_key.htm
Case
One | Confidence Interval, Mean | Glioblastoma Multiforme
Glioblastoma multiforme (GBM) is the highest grade glioma tumor and is the most
malignant form of astrocytomas. These tumors
originate in the brain. GBM tumors grow rapidly, invade nearby tissue and
contain cells that are very malignant. GBM are among the most common and
devastating primary brain tumors in adults.
Suppose that we
have a random sample of GBM patients, with survival time (in weeks) listed
below:
3, 4, 5, 5, 12, 15, 17, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 37, 38, 45,
48, 51, 53, 53, 57, 60, 61, 62, 63, 64, 65, 65, 65, 66, 66, 67, 68, 69, 72, 72,
73, 74, 76, 77, 78, 79, 80, 80, 81, 82, 83, 83, 85, 86, 87, 90, 150, 180,
Estimate the population
mean survival time for Glioblastoma multiforme patients
with 99% confidence. That is, compute and discuss a 99%
confidence interval for this population mean. Provide concise and complete
details and discussion as demonstrated in the case study summaries.
Table 1. Means and Proportions
Z(k) PROBRT
PROBCENT 0.05 0.48006 0.03988 0.10 0.46017 0.07966 0.15 0.44038 0.11924 0.20 0.42074 0.15852 0.25 0.40129 0.19741 0.30 0.38209 0.23582 0.35 0.36317 0.27366 0.40 0.34458 0.31084 0.45 0.32636 0.34729 0.50 0.30854 0.38292 0.55 0.29116 0.41768 0.60 0.27425 0.45149 0.65 0.25785 0.48431 0.70 0.24196 0.51607 0.75 0.22663 0.54675 0.80 0.21186 0.57629 0.85 0.19766 0.60467 0.90 0.18406 0.63188 0.95 0.17106 0.65789 1.00 0.15866 0.68269 |
Z(k) PROBRT PROBCENT 1.05 0.14686 0.70628 1.10 0.13567 0.72867 1.15 0.12507 0.74986 1.20 0.11507 0.76986 1.25 0.10565 0.78870 1.30 0.09680 0.80640 1.35 0.088508 0.82298 1.40 0.080757 0.83849 1.45 0.073529 0.85294 1.50 0.066807 0.86639 1.55 0.060571 0.87886 1.60 0.054799 0.89040 1.65 0.049471 0.90106 1.70 0.044565 0.91087 1.75 0.040059 0.91988 1.80 0.035930 0.92814 1.85 0.032157 0.93569 1.90 0.028717 0.94257 1.95 0.025588 0.94882 2.00 0.022750 0.95450 |
Z(k) PROBRT PROBCENT 2.05 0.020182 0.95964 2.10 0.017864 0.96427 2.15 0.015778 0.96844 2.20 0.013903 0.97219 2.25 0.012224 0.97555 2.30 0.010724 0.97855 2.35 0.009387 0.98123 2.40 0.008198 0.98360 2.45 0.007143 0.98571 2.50 0.006210 0.98758 2.55 0.005386 0.98923 2.60 0.004661 0.99068 2.65 0.004025 0.99195 2.70 .0034670 0.99307 2.75 .0029798 0.99404 2.80 .0025551 0.99489 2.85 .0021860 0.99563 2.90 .0018658 0.99627 2.95 .0015889 0.99682 3.00 .0013499 0.99730 |
Numbers
n
m sd
se
Z lower
upper
58
56.91
33.12
4.35
2.60 45.61
68.22
se = sd/sqrt(n) » 33.12/sqrt(58) » 4.35
z » 2.60 for 99% confidence from 2.60 0.004661 0.99068
lower = m ─ (z*se) » 56.91 ─
(2.60*4.35) » 45.61
upper = m + (z*se) » 56.91 + (2.60*4.35) » 68.22
Report the interval as [45.6,
68.2].
Interpretation
Our population is
the population of Glioblastoma multiforme patients
and our population mean is the mean
survival time (weeks).
Our Family of Samples
(FoS) consists of every
possible random sample of 58 Glioblastoma multiforme
patients. From each individual
sampled Glioblastoma multiforme
patients, survival time in weeks is obtained.
From each member sample
of the FoS, we compute the
sample mean (m) and standard deviation (sd) for GBM
survival time, and then compute the interval
[m – 2.60*( sd/sqrt(n)),
m + 2.60*( sd/sqrt(n))].
Computing this interval
for each member sample of the FoS,
we obtain a Family of Intervals (FoI),
approximately 99% of which cover the true population mean survival time in
weeks for Glioblastoma multiforme patients.
If our interval, [45.6,
68.2] is among the approximate 99% super-majority of intervals that cover
the population mean, then the true population mean survival time for Glioblastoma multiforme
patients is between 45.6 and 68.2
weeks.
From here: http://www.mindspring.com/~cjalverson/CompFinalSpring2008verMondayKey.htm
Table 1. Means and Proportions
Z(k) PROBRT
PROBCENT 0.05 0.48006 0.03988 0.10 0.46017 0.07966 0.15 0.44038 0.11924 0.20 0.42074 0.15852 0.25 0.40129 0.19741 0.30 0.38209 0.23582 0.35 0.36317 0.27366 0.40 0.34458 0.31084 0.45 0.32636 0.34729 0.50 0.30854 0.38292 0.55 0.29116 0.41768 0.60 0.27425 0.45149 0.65 0.25785 0.48431 0.70 0.24196 0.51607 0.75 0.22663 0.54675 0.80 0.21186 0.57629 0.85 0.19766 0.60467 0.90 0.18406 0.63188 0.95 0.17106 0.65789 1.00 0.15866 0.68269 |
Z(k) PROBRT PROBCENT 1.05 0.14686 0.70628 1.10 0.13567 0.72867 1.15 0.12507 0.74986 1.20 0.11507 0.76986 1.25 0.10565 0.78870 1.30 0.09680 0.80640 1.35 0.088508 0.82298 1.40 0.080757 0.83849 1.45 0.073529 0.85294 1.50 0.066807 0.86639 1.55 0.060571 0.87886 1.60 0.054799 0.89040 1.65 0.049471 0.90106 1.70 0.044565 0.91087 1.75 0.040059 0.91988 1.80 0.035930 0.92814 1.85 0.032157 0.93569 1.90 0.028717 0.94257 1.95 0.025588 0.94882 2.00 0.022750 0.95450 |
Z(k) PROBRT PROBCENT 2.05 0.020182 0.95964 2.10 0.017864 0.96427 2.15 0.015778 0.96844 2.20 0.013903 0.97219 2.25 0.012224 0.97555 2.30 0.010724 0.97855 2.35 0.009387 0.98123 2.40 0.008198 0.98360 2.45 0.007143 0.98571 2.50 0.006210 0.98758 2.55 0.005386 0.98923 2.60 0.004661 0.99068 2.65 0.004025 0.99195 2.70 .0034670 0.99307 2.75 .0029798 0.99404 2.80 .0025551 0.99489 2.85 .0021860 0.99563 2.90 .0018658 0.99627 2.95 .0015889 0.99682 3.00 .0013499 0.99730 |
Case Four | Confidence
Interval for Mean | Gestational Age
Consider the population
mean gestational ages (in weeks) at birth of Year 2005 US Resident Live Births.
Using the data from Case Two, compute and interpret a 93% confidence
interval for this population mean.
Numbers
From 1.85 0.032157
0.93569, z=1.85.
n = 56
m » 37.34
sd » 3.9278
lowCI = m − z*(sd/sqrt(n)) » 37.34 −
1.85*(3.9278/sqrt(56)) » 36.37
highCI = m + z*(sd/sqrt(n)) » 37.34 + 1.85*(3.9278/sqrt(56)) » 38.31
Report the interval as
[36.4, 38.3].
Interpretation
Our population is the
population of year 2005 US resident live born infants and our population mean
is the mean gestational age (weeks).
Our Family of Samples (FoS) consists of every possible
random sample of 56 year 2005 US resident live born infants. From each
individual sampled live born infant, gestational age in weeks is obtained.
From each member sample
of the FoS, we compute the
sample mean (m) and standard deviation (sd) for serum
CRP, and then compute the interval
[m – 1.85*( sd/sqrt(n)),
m + 1.85*( sd/sqrt(n))].
Computing this interval
for each member sample of the FoS,
we obtain a Family of Intervals (FoI), approximately
93% of which cover the true population mean gestational age in weeks for year
2005 US resident live born infants.
If our interval, [36.4,
38.3] is among the approximate 93% super-majority of intervals that cover the
population mean, then the true population mean gestational age is between 36.4
and 38.3 weeks for year 2005 US resident live born infants.