Session 3.3
2nd November 2009
Confidence Interval – Population
Proportion
Population Level
E = Event = Definition of Event of
Interest
P = Pr
Sample Level
e = Number of Observed Events in Sample
n = Number of Observations / Sample Size
Number of Trials
p = e/n = Proportion of Total Sample
Observing Event E
sdp = sqrt(p*(1 –
p)/n) = Sample Standard Error for the Proportion
Z = Confidence Coefficient
Confidence Interval is given as [ (p – Z*sdp), (p + Z*sdp) ]
Validation of the Confidence Interval Process – Population Proportion
(from Spring 2009)
Track the event “Sum = 7” in
n=50 tosses of a fair of fair, six-sided dice (face values 1,2,3,4,5,6
per face).
We know that P=Pr
we have
one failure and 25 successes, yielding a failure rate of approximately 3.8%
(versus expected 5%).
Event Count for
(Sum=7) |
p=e/50 |
sdp = sqrt(p*(1-p)/50) |
lower95 = p-*2sdp |
upper95 = p+2*sdp |
P=Pr |
Result
(Contains P=1/6?) |
11 |
0.22 |
0.0585833 |
0.102833452 |
0.337166548 |
(1/6) » 0.1667 |
Hit |
10 |
0.2 |
0.0565685 |
0.086862915 |
0.313137085 |
(1/6) » 0.1667 |
Hit |
9 |
0.18 |
0.0543323 |
0.071335378 |
0.288664622 |
(1/6) » 0.1667 |
Hit |
8 |
0.16 |
0.0518459 |
0.056308149 |
0.263691851 |
(1/6) » 0.1667 |
Hit |
4 |
0.08 |
0.0383667 |
0.003266696 |
0.156733304 |
(1/6) » 0.1667 |
Miss |
13 |
0.26 |
0.0620322 |
0.135935501 |
0.384064499 |
(1/6) » 0.1667 |
Hit |
9 |
0.18 |
0.0543323 |
0.071335378 |
0.288664622 |
(1/6) » 0.1667 |
Hit |
7 |
0.14 |
0.0490714 |
0.041857247 |
0.238142753 |
(1/6) » 0.1667 |
Hit |
7 |
0.14 |
0.0490714 |
0.041857247 |
0.238142753 |
(1/6) » 0.1667 |
Hit |
6 |
0.12 |
0.0459565 |
0.028086998 |
0.211913002 |
(1/6) » 0.1667 |
Hit |
9 |
0.18 |
0.0543323 |
0.071335378 |
0.288664622 |
(1/6) » 0.1667 |
Hit |
12 |
0.24 |
0.0603987 |
0.119202649 |
0.360797351 |
(1/6) » 0.1667 |
Hit |
7 |
0.14 |
0.0490714 |
0.041857247 |
0.238142753 |
(1/6) » 0.1667 |
Hit |
9 |
0.18 |
0.0543323 |
0.071335378 |
0.288664622 |
(1/6) » 0.1667 |
Hit |
5 |
0.1 |
0.0424264 |
0.015147186 |
0.184852814 |
(1/6) » 0.1667 |
Hit |
7 |
0.14 |
0.0490714 |
0.041857247 |
0.238142753 |
(1/6) » 0.1667 |
Hit |
12 |
0.24 |
0.0603987 |
0.119202649 |
0.360797351 |
(1/6) » 0.1667 |
Hit |
10 |
0.2 |
0.0565685 |
0.086862915 |
0.313137085 |
(1/6) » 0.1667 |
Hit |
10 |
0.2 |
0.0565685 |
0.086862915 |
0.313137085 |
(1/6) » 0.1667 |
Hit |
8 |
0.16 |
0.0518459 |
0.056308149 |
0.263691851 |
(1/6) » 0.1667 |
Hit |
12 |
0.24 |
0.0603987 |
0.119202649 |
0.360797351 |
(1/6) » 0.1667 |
Hit |
13 |
0.26 |
0.0620322 |
0.135935501 |
0.384064499 |
(1/6) » 0.1667 |
Hit |
5 |
0.1 |
0.0424264 |
0.015147186 |
0.184852814 |
(1/6) » 0.1667 |
Hit |
7 |
0.14 |
0.0490714 |
0.041857247 |
0.238142753 |
(1/6) » 0.1667 |
Hit |
10 |
0.2 |
0.0565685 |
0.086862915 |
0.313137085 |
(1/6) » 0.1667 |
Hit |
10 |
0.2 |
0.0565685 |
0.086862915 |
0.313137085 |
(1/6) » 0.1667 |
Hit |
From http://www.mindspring.com/~cjalverson/2ndhourlySummer2008Key.htm
Second
Hourly, Summer 2008, Version A
The top
number is the systolic blood pressure reading. It represents the maximum
pressure exerted when the heart contracts. The bottom
number is the diastolic blood pressure reading. It represents the pressure in
the arteries when the heart is at rest. A sample of FHS adult
subjects yields the following readings:
124/88, 140/90, 156/108, 130/70,
175/75, 136/84, 124/84, 144/88,
128/74, 154/90, 160/92, 210/120, 110/75, 166/108, 100/70, 172/110,
160/90, 145/75, 122/84, 162/80, 156/84, 120/65, 128/84, 130/90, 210/110,
110/68, 160/106, 140/90, 132/72, 120/80, 200/100, 165/105,
132/88, 134/84, 120/75, 138/85, 118/86, 152/74, 138/70, 124/74, 122/80, 155/90,
160/100, 294/144, 140/82, 132/86, 120/80, 200/130, 126/86,
150/100, 135/75, 140/78, 142/85, 146/94, 185/90, 166/78, 190/100,
160/80, 140/80, 120/80,150/95, 124/75, 150/110, 140/84, 130/82, 130/80, 230/124,
128/72, 220/118, 130/80, 165/95, 208/114, 126/80, 140/90, 166/104,
130/70, 130/80, 120/90
Case Two | Confidence Interval: Population
Proportion |
Using the data and context
from Case One, compute and interpret a 95% confidence interval for the
population proportion of Framingham Heart Study subjects with Systolic Blood
Pressure strictly greater than 160 mm Hg. work. Fully discuss the results. This discussion must include a clear
discussion of the population and the population proportion, the family of
samples, the family of intervals and the interpretation of the interval.
Numbers
n
event
p sdp
z lower upper
78
18 0.23077 0.047706
2 0.13536 0.32618
event = number of FHS subjects in the sample
with SBP > 160 = 18
p = event/n = 18/78 ≈ 0.23077
sdp = sqrt(p*(1-p)/n)
= sqrt((18/78)*(60/78)/78) » 0.047706
Table 1. Means and Proportions
Z(k)
PROBRT PROBCENT 0.05 0.48006
0.03988 0.10 0.46017
0.07966 0.15 0.44038
0.11924 0.20 0.42074
0.15852 0.25 0.40129
0.19741 0.30 0.38209
0.23582 0.35 0.36317
0.27366 0.40 0.34458
0.31084 0.45 0.32636
0.34729 0.50 0.30854
0.38292 0.55 0.29116
0.41768 0.60 0.27425
0.45149 0.65 0.25785
0.48431 0.70 0.24196
0.51607 0.75 0.22663
0.54675 0.80 0.21186
0.57629 0.85 0.19766
0.60467 0.90 0.18406
0.63188 0.95 0.17106
0.65789 1.00 0.15866
0.68269 |
Z(k)
PROBRT PROBCENT 1.05 0.14686
0.70628 1.10 0.13567
0.72867 1.15 0.12507
0.74986 1.20 0.11507
0.76986 1.25 0.10565
0.78870 1.30 0.09680
0.80640 1.35
0.088508 0.82298 1.40
0.080757 0.83849 1.45
0.073529 0.85294 1.50
0.066807 0.86639 1.55
0.060571 0.87886 1.60
0.054799 0.89040 1.65
0.049471 0.90106 1.70
0.044565 0.91087 1.75
0.040059 0.91988 1.80 0.035930
0.92814 1.85
0.032157 0.93569 1.90
0.028717 0.94257 1.95
0.025588 0.94882 2.00
0.022750 0.95450 |
Z(k)
PROBRT PROBCENT 2.05
0.020182 0.95964 2.10
0.017864 0.96427 2.15
0.015778 0.96844 2.20
0.013903 0.97219 2.25
0.012224 0.97555 2.30
0.010724 0.97855 2.35
0.009387 0.98123 2.40
0.008198 0.98360 2.45
0.007143 0.98571 2.50
0.006210 0.98758 2.55
0.005386 0.98923 2.60
0.004661 0.99068 2.65
0.004025 0.99195 2.70
.0034670 0.99307 2.75
.0029798 0.99404 2.80
.0025551 0.99489 2.85
.0021860 0.99563 2.90 .0018658
0.99627 2.95
.0015889 0.99682 3.00
.0013499 0.99730 |
from 2.00 0.022750 0.95450, Z ≈2.00
upper = p ─ z*sdp
≈ 0.23077 ─ 2*0.047706 ≈ 0.13536
upper = p + z*sdp ≈
0.23077 + 2*0.047706 ≈ 0.32618
We estimate the population proportion of Framingham
Heart Study (FSH) subjects whose systolic blood pressure (SBP) strictly exceeds
160 mm Hg.
Each member of our family of samples is a
single random sample of 78 Framingham Heart Study (FSH) subjects, and the
family of samples consists of all possible samples of this type.
From each member of the family of samples,
we compute event( = number of FHS subjects in the
sample with SBP > 160), p( = event/n), sdp(= sqrt(p*(1-p)/n)) and the interval [p ─ z*sdp, p + z*sdp].
Approximately 95% of the member samples
yield intervals containing the true population proportion of FHS subjects whose
SBP strictly exceeds 160 mm Hg.
If our interval resides within this
super-majority, then between 13.5% and 32,6% of FHS
subjects have SBP strictly exceeding 160 mm Hg.
From http://www.mindspring.com/~cjalverson/3rd%20Hourly%20Spring%202007%20Version%20A%20Key.htm
Third Hourly, Spring 2007, Version A
Case Three
Confidence Interval: Population Proportion
Traumatic Brain Injury (TBI) and Glasgow Coma Scale (GCS)
The Glasgow Coma Scale (GCS) is the most widely used system for
scoring the level of consciousness of a patient who has had a traumatic brain
injury. GCS is based on the patient's best eye-opening, verbal, and motor
responses. Each response is scored and then the sum of the three scores is
computed. That is,
Glasgow Coma Scale Categories: Mild (13-15); Moderate (9-12) and
Severe/Coma (3-8)
Traumatic brain injury (TBI) 1, 2 is
an insult to the brain from an external mechanical force, possibly leading to
permanent or temporary impairments of cognitive, physical, and psychosocial
functions with an associated diminished or altered state of consciousness.
Consider a random sample of patients with TBI, with GCS at initial treatment
and diagnosis listed below:
3, 3, 3, 4, 4, 4,
4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7 8, 8, 8, 9,
9, 9,
9, 9, 9, 9, 9,
10, 10, 10, 10, 10, 11, 11, 11, 12, 12, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15
Consider the proportion of TBI patients presenting severe
GCS. Compute and interpret a 97% confidence interval
for this population proportion. Show your work. Fully discuss the results. This
discussion must include a clear discussion of the population and the population
proportion, the family of samples, the family of intervals and the
interpretation of the interval.
1: http://www2.state.tn.us/health/statistics/PdfFiles/TBI_Rpt_2000-2004.pdf
2: http://www.aemj.org/cgi/content/abstract/10/5/491
Numbers
3, 3, 3, 4, 4 | 4, 4,
4, 4, 5 | 5, 5, 5, 5, 5 | 5, 5, 5, 5, 6 | 6, 6, 6, 6, 6 | 7, 7, 7, 7 8 | 8, 8, 9, 9, 9 |
9, 9, 9, 9, 9 |
10, 10, 10, 10, 10 | 11, 11, 11, 12, 12 | 13, 13, 13, 14, 14 | 14, 14, 14, 14,
15
n = sample size = 60
Event = “TBI Patient Initially Presents with GCS Severe (3 ≤
GCS ≤ 8).”
e = sample event count = 32
p = e/n = 32/60 ≈
.5333
sdp = sqrt(p*(1–p)/n) = sqrt((32/60)*(28/60)/60)
≈ 0.064406
Table 1. Means and Proportions
Z(k)
PROBRT 0.05 0.48006
0.03988 0.10 0.46017
0.07966 0.15 0.44038
0.11924 0.20 0.42074
0.15852 0.25 0.40129
0.19741 0.30 0.38209
0.23582 0.35 0.36317
0.27366 0.40 0.34458
0.31084 0.45 0.32636
0.34729 0.50 0.30854
0.38292 0.55 0.29116
0.41768 0.60 0.27425
0.45149 0.65 0.25785
0.48431 0.70 0.24196
0.51607 0.75 0.22663
0.54675 0.80 0.21186
0.57629 0.85 0.19766
0.60467 0.90 0.18406
0.63188 0.95 0.17106
0.65789 1.00 0.15866
0.68269 |
Z(k)
PROBRT P 1.05 0.14686
0.70628 1.10 0.13567
0.72867 1.15 0.12507
0.74986 1.20 0.11507
0.76986 1.25 0.10565
0.78870 1.30 0.09680
0.80640 1.35 0.088508
0.82298 1.40
0.080757 0.83849 1.45
0.073529 0.85294 1.50
0.066807 0.86639 1.55
0.060571 0.87886 1.60
0.054799 0.89040 1.65
0.049471 0.90106 1.70
0.044565 0.91087 1.75
0.040059 0.91988 1.80
0.035930 0.92814 1.85
0.032157 0.93569 1.90
0.028717 0.94257 1.95
0.025588 0.94882 2.00
0.022750 0.95450 |
Z(k)
PROBRT PR 2.05
0.020182 0.95964 2.10
0.017864 0.96427 2.15
0.015778 0.96844 2.20
0.013903 0.97219 2.25
0.012224 0.97555 2.30
0.010724 0.97855 2.35
0.009387 0.98123 2.40
0.008198 0.98360 2.45 0.007143
0.98571 2.50
0.006210 0.98758 2.55
0.005386 0.98923 2.60
0.004661 0.99068 2.65
0.004025 0.99195 2.70
.0034670 0.99307 2.75
.0029798 0.99404 2.80
.0025551 0.99489 2.85
.0021860 0.99563 2.90
.0018658 0.99627 2.95
.0015889 0.99682 3.00 .0013499
0.99730 |
Z = 2.20 from
the row 2.20 0.013903 0.97219.
Lower Bound = p – 2.2*sdp ≈
.5333 – 2.2*0.064406 ≈ 0.39164
Upper Bound = p + 2.2*sdp ≈
.5333 + 2.2*0.064406 ≈ 0.67503
Discussion
Our population is the population of people with
Traumatic Brain Injury (TBI).
Our Family of Samples (FoS) consists of every possible random sample of
60 people with TBI.
From each member sample of the FoS, we compute the interval [Lower Bound, Upper
Bound] =
[p – 2.2*sdp, p + 2.2*sdp], where sdp = sqrt(p*(1–p)/n). Computing this interval for each
member sample of the FoS, we
obtain a Family of Intervals (FoI),
approximately 97% of which cover the true population proportion of TBI cases
with initially severe TBI.
If our interval, [.3916, .6750] is among the approximate
97% super-majority of intervals that cover the population proportion, then
between 39.2% and 67.5% of TBI cases initially present with severe (3 ≤
GCS ≤ 8) TBI.
From http://www.mindspring.com/~cjalverson/CompFinalSpring2008verWednesdayKey.htm
Final Examination, Spring 2008, Version
Wednesday
Case Three | Confidence Interval for
Proportion | Gestational Age
Consider the proportion of Year 2005 US
Resident Live Births that are “Full Term,” that is births with [37,40] weeks of gestation at birth. Using the data
from Case Two, compute and interpret a 98% confidence interval for this
population proportion.
Gestational age is the time spent between conception and
birth, usually measured in weeks. In general, infants born after 36 or fewer
weeks of gestation are defined as premature, and may face significant
challenges in health and development. Infants born after 37-40 weeks of gestation are generally viewed
as full term, and those born
after 41 or more weeks of gestation are generally viewed as post term. Suppose
that a random sample of 2005 US resident live born infants yields the following
gestational ages (in weeks):
25, 26,
27, 29, 30, 32, 33, 34, 34, 35, 35, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38,
38, 38 38,38, 38, 38, 38, 38, 38, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 41, 41, 41,
42, 42, 42, 43, 43
Numbers
Table 1. Means and Proportions
Z(k)
PROBRT PROBCENT 0.05 0.48006
0.03988 0.10 0.46017
0.07966 0.15 0.44038
0.11924 0.20 0.42074
0.15852 0.25 0.40129
0.19741 0.30 0.38209
0.23582 0.35 0.36317
0.27366 0.40 0.34458
0.31084 0.45 0.32636
0.34729 0.50 0.30854
0.38292 0.55 0.29116
0.41768 0.60 0.27425
0.45149 0.65 0.25785
0.48431 0.70 0.24196
0.51607 0.75 0.22663
0.54675 0.80 0.21186
0.57629 0.85 0.19766
0.60467 0.90 0.18406
0.63188 0.95 0.17106
0.65789 1.00 0.15866
0.68269 |
Z(k)
PROBRT PROBCENT 1.05 0.14686
0.70628 1.10 0.13567
0.72867 1.15 0.12507
0.74986 1.20 0.11507
0.76986 1.25 0.10565
0.78870 1.30 0.09680
0.80640 1.35
0.088508 0.82298 1.40
0.080757 0.83849 1.45
0.073529 0.85294 1.50
0.066807 0.86639 1.55
0.060571 0.87886 1.60
0.054799 0.89040 1.65
0.049471 0.90106 1.70
0.044565 0.91087 1.75
0.040059 0.91988 1.80 0.035930
0.92814 1.85
0.032157 0.93569 1.90
0.028717 0.94257 1.95
0.025588 0.94882 2.00
0.022750 0.95450 |
Z(k)
PROBRT PROBCENT 2.05
0.020182 0.95964 2.10
0.017864 0.96427 2.15
0.015778 0.96844 2.20
0.013903 0.97219 2.25
0.012224 0.97555 2.30
0.010724 0.97855 2.35
0.009387 0.98123 2.40
0.008198 0.98360 2.45
0.007143 0.98571 2.50
0.006210 0.98758 2.55
0.005386 0.98923 2.60
0.004661 0.99068 2.65
0.004025 0.99195 2.70
.0034670 0.99307 2.75
.0029798 0.99404 2.80
.0025551 0.99489 2.85
.0021860 0.99563 2.90
.0018658 0.99627 2.95
.0015889 0.99682 3.00
.0013499 0.99730 |
From 2.35
0.009387 0.98123, z=2.35
n = 56
e = 36
p = 36/56 ≈ 0.64286
sdp = sqrt(p*(1-p)/n)
= sqrt((36/56)*(20/56)/56) ≈ 0.064030
lowCI = p − z*sdp
= 0.64286 − 2.35*0.064030 ≈ 0.49239
highCI = p + z*sdp =
0.64286 + 2.35*0.064030 ≈ 0.79333
Report the interval as [.492, .793 ].
Interpretation
Our population is the population of year
2005 US resident live born infants and our population mean is the mean
gestational age (weeks). Our event is that the live born infant was born with
between 37 and 40 weeks of gestation.
Our Family of Samples (FoS) consists of every possible random sample of 56
year 2005 US resident live born infants. From each individual sampled live born
infant, gestational age in weeks is obtained.
From each member sample of the FoS, we compute the sample proportion p of infants in the
sample with between 37 and 40 weeks of gestation at birth and sdp, where sdp=sqrt(p*(1-p)/56), and then compute the interval
[p – 2.35*sdp, p + 2.35*sdp].
Computing this interval for each member
sample of the FoS, we obtain
a Family of Intervals (FoI), approximately 98% of
which cover the true population proportion of year 2005 US resident live born
infants with between 37 and 40 weeks of gestation.
If our interval, [.492, .793] is among the
approximate 98% super-majority of intervals that cover the population mean,
then between 49.2% and 79.3% of year 2005 US resident live born infants have
gestation ages between 37 and 40 weeks.