Session 3.3

2nd November 2009

 

Confidence Interval – Population Proportion

 

Population Level

 

E = Event = Definition of Event of Interest

P = Pr{E} = True Probability for Event E

 

Sample Level

 

e = Number of Observed Events in Sample

n = Number of Observations / Sample Size Number of Trials

p = e/n = Proportion of Total Sample Observing Event E

sdp = sqrt(p*(1 – p)/n) = Sample Standard Error for the Proportion

Z = Confidence Coefficient

Confidence Interval is given as [ (p – Z*sdp), (p + Z*sdp) ]

 

Validation of the Confidence Interval Process – Population Proportion (from Spring 2009)

 

Track the event “Sum = 7” in n=50 tosses of a fair of fair, six-sided dice (face values 1,2,3,4,5,6 per face).

We know that P=Pr{Sum = 7} = 6/36 = 1/6  0.1667, so we can check our intervals for accuracy. In 26 intervals,

we have one failure and 25 successes, yielding a failure rate of approximately 3.8% (versus expected 5%).

 

Event Count for (Sum=7)

p=e/50

sdp = sqrt(p*(1-p)/50)

lower95 = p-*2sdp

upper95 = p+2*sdp

P=Pr{Sum=7}

Result (Contains P=1/6?)

11

0.22

0.0585833

0.102833452

0.337166548

(1/6)  » 0.1667

Hit

10

0.2

0.0565685

0.086862915

0.313137085

(1/6)  » 0.1667

Hit

9

0.18

0.0543323

0.071335378

0.288664622

(1/6)  » 0.1667

Hit

8

0.16

0.0518459

0.056308149

0.263691851

(1/6)  » 0.1667

Hit

4

0.08

0.0383667

0.003266696

0.156733304

(1/6)  » 0.1667

Miss

13

0.26

0.0620322

0.135935501

0.384064499

(1/6)  » 0.1667

Hit

9

0.18

0.0543323

0.071335378

0.288664622

(1/6)  » 0.1667

Hit

7

0.14

0.0490714

0.041857247

0.238142753

(1/6)  » 0.1667

Hit

7

0.14

0.0490714

0.041857247

0.238142753

(1/6)  » 0.1667

Hit

6

0.12

0.0459565

0.028086998

0.211913002

(1/6)  » 0.1667

Hit

9

0.18

0.0543323

0.071335378

0.288664622

(1/6)  » 0.1667

Hit

12

0.24

0.0603987

0.119202649

0.360797351

(1/6)  » 0.1667

Hit

7

0.14

0.0490714

0.041857247

0.238142753

(1/6)  » 0.1667

Hit

9

0.18

0.0543323

0.071335378

0.288664622

(1/6)  » 0.1667

Hit

5

0.1

0.0424264

0.015147186

0.184852814

(1/6)  » 0.1667

Hit

7

0.14

0.0490714

0.041857247

0.238142753

(1/6)  » 0.1667

Hit

12

0.24

0.0603987

0.119202649

0.360797351

(1/6)  » 0.1667

Hit

10

0.2

0.0565685

0.086862915

0.313137085

(1/6)  » 0.1667

Hit

10

0.2

0.0565685

0.086862915

0.313137085

(1/6)  » 0.1667

Hit

8

0.16

0.0518459

0.056308149

0.263691851

(1/6)  » 0.1667

Hit

12

0.24

0.0603987

0.119202649

0.360797351

(1/6)  » 0.1667

Hit

13

0.26

0.0620322

0.135935501

0.384064499

(1/6)  » 0.1667

Hit

5

0.1

0.0424264

0.015147186

0.184852814

(1/6)  » 0.1667

Hit

7

0.14

0.0490714

0.041857247

0.238142753

(1/6)  » 0.1667

Hit

10

0.2

0.0565685

0.086862915

0.313137085

(1/6)  » 0.1667

Hit

10

0.2

0.0565685

0.086862915

0.313137085

(1/6)  » 0.1667

Hit

 

From http://www.mindspring.com/~cjalverson/2ndhourlySummer2008Key.htm

Second Hourly, Summer 2008, Version A

The top number is the systolic blood pressure reading. It represents the maximum pressure exerted when the heart contracts.  The bottom number is the diastolic blood pressure reading. It represents the pressure in the arteries when the heart is at rest. A sample of FHS adult subjects yields the following readings:

124/88, 140/90, 156/108, 130/70, 175/75, 136/84, 124/84, 144/88, 128/74, 154/90, 160/92, 210/120, 110/75, 166/108, 100/70, 172/110, 160/90, 145/75, 122/84, 162/80, 156/84, 120/65, 128/84, 130/90, 210/110, 110/68, 160/106, 140/90, 132/72, 120/80, 200/100, 165/105, 132/88, 134/84, 120/75, 138/85, 118/86, 152/74, 138/70, 124/74, 122/80, 155/90, 160/100, 294/144, 140/82, 132/86, 120/80, 200/130, 126/86, 150/100, 135/75, 140/78, 142/85, 146/94, 185/90, 166/78, 190/100, 160/80, 140/80, 120/80,150/95, 124/75, 150/110, 140/84, 130/82, 130/80, 230/124, 128/72, 220/118, 130/80, 165/95, 208/114, 126/80, 140/90, 166/104, 130/70, 130/80, 120/90

Case Two | Confidence Interval: Population Proportion |

Using the data and context from Case One, compute and interpret a 95% confidence interval for the population proportion of Framingham Heart Study subjects with Systolic Blood Pressure strictly greater than 160 mm Hg. work. Fully discuss the results. This discussion must include a clear discussion of the population and the population proportion, the family of samples, the family of intervals and the interpretation of the interval.

Numbers

n    event       p          sdp      z     lower      upper

78      18     0.23077    0.047706    2    0.13536    0.32618

event = number of FHS subjects in the sample with SBP > 160 = 18

p = event/n = 18/78 ≈ 0.23077

sdp = sqrt(p*(1-p)/n) = sqrt((18/78)*(60/78)/78) » 0.047706

Table 1. Means and Proportions

 

Z(k) PROBRT PROBCENT

0.05 0.48006 0.03988

0.10 0.46017 0.07966

0.15 0.44038 0.11924

0.20 0.42074 0.15852

0.25 0.40129 0.19741

0.30 0.38209 0.23582

0.35 0.36317 0.27366

0.40 0.34458 0.31084

0.45 0.32636 0.34729

0.50 0.30854 0.38292

0.55 0.29116 0.41768

0.60 0.27425 0.45149

0.65 0.25785 0.48431

0.70 0.24196 0.51607

0.75 0.22663 0.54675

0.80 0.21186 0.57629

0.85 0.19766 0.60467

0.90 0.18406 0.63188

0.95 0.17106 0.65789

1.00 0.15866 0.68269

Z(k) PROBRT PROBCENT

1.05 0.14686 0.70628

1.10 0.13567 0.72867

1.15 0.12507 0.74986

1.20 0.11507 0.76986

1.25 0.10565 0.78870

1.30 0.09680 0.80640

1.35 0.088508 0.82298

1.40 0.080757 0.83849

1.45 0.073529 0.85294

1.50 0.066807 0.86639

1.55 0.060571 0.87886

1.60 0.054799 0.89040

1.65 0.049471 0.90106

1.70 0.044565 0.91087

1.75 0.040059 0.91988

1.80 0.035930 0.92814

1.85 0.032157 0.93569

1.90 0.028717 0.94257

1.95 0.025588 0.94882

2.00 0.022750 0.95450

Z(k) PROBRT PROBCENT

2.05 0.020182 0.95964

2.10 0.017864 0.96427

2.15 0.015778 0.96844

2.20 0.013903 0.97219

2.25 0.012224 0.97555

2.30 0.010724 0.97855

2.35 0.009387 0.98123

2.40 0.008198 0.98360

2.45 0.007143 0.98571

2.50 0.006210 0.98758

2.55 0.005386 0.98923

2.60 0.004661 0.99068

2.65 0.004025 0.99195

2.70 .0034670 0.99307

2.75 .0029798 0.99404

2.80 .0025551 0.99489

2.85 .0021860 0.99563

2.90 .0018658 0.99627

2.95 .0015889 0.99682

3.00 .0013499 0.99730

 

 

 

from 2.00 0.022750 0.95450, Z ≈2.00

upper = p ─ z*sdp ≈  0.23077 ─ 2*0.047706 ≈ 0.13536

upper = p + z*sdp ≈  0.23077 + 2*0.047706 ≈ 0.32618

We estimate the population proportion of Framingham Heart Study (FSH) subjects whose systolic blood pressure (SBP) strictly exceeds 160 mm Hg.

Each member of our family of samples is a single random sample of 78 Framingham Heart Study (FSH) subjects, and the family of samples consists of all possible samples of this type.

From each member of the family of samples, we compute event( = number of FHS subjects in the sample with SBP > 160), p( = event/n), sdp(= sqrt(p*(1-p)/n)) and the interval [p ─ z*sdp, p + z*sdp].

Approximately 95% of the member samples yield intervals containing the true population proportion of FHS subjects whose SBP strictly exceeds 160 mm Hg.

If our interval resides within this super-majority, then between 13.5% and 32,6% of FHS subjects have SBP strictly exceeding 160 mm Hg.

 

From http://www.mindspring.com/~cjalverson/3rd%20Hourly%20Spring%202007%20Version%20A%20Key.htm

 

Third Hourly, Spring 2007, Version A

 

Case Three

Confidence Interval: Population Proportion

Traumatic Brain Injury (TBI) and Glasgow Coma Scale (GCS)

The Glasgow Coma Scale (GCS) is the most widely used system for scoring the level of consciousness of a patient who has had a traumatic brain injury. GCS is based on the patient's best eye-opening, verbal, and motor responses. Each response is scored and then the sum of the three scores is computed. That is,

Glasgow Coma Scale Categories: Mild (13-15); Moderate (9-12) and Severe/Coma (3-8)

Traumatic brain injury (TBI) 1, 2 is an insult to the brain from an external mechanical force, possibly leading to permanent or temporary impairments of cognitive, physical, and psychosocial functions with an associated diminished or altered state of consciousness. Consider a random sample of patients with TBI, with GCS at initial treatment and diagnosis listed below:

3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7 8, 8, 8, 9, 9, 9,

9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 11, 11, 11, 12, 12, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15

 

Consider the proportion of TBI patients presenting severe GCS. Compute and interpret a 97% confidence interval for this population proportion. Show your work. Fully discuss the results. This discussion must include a clear discussion of the population and the population proportion, the family of samples, the family of intervals and the interpretation of the interval.

1: http://www2.state.tn.us/health/statistics/PdfFiles/TBI_Rpt_2000-2004.pdf

2: http://www.aemj.org/cgi/content/abstract/10/5/491

Numbers

3, 3, 3, 4, 4 | 4, 4, 4, 4, 5 | 5, 5, 5, 5, 5 | 5, 5, 5, 5, 6 | 6, 6, 6, 6, 6 | 7, 7, 7, 7 8 | 8, 8, 9, 9, 9 |

9, 9, 9, 9, 9 | 10, 10, 10, 10, 10 | 11, 11, 11, 12, 12 | 13, 13, 13, 14, 14 | 14, 14, 14, 14, 15

n = sample size = 60

Event = “TBI Patient Initially Presents with GCS Severe (3 ≤ GCS ≤ 8).”

e = sample event  count = 32

p = e/n = 32/60  ≈ .5333

sdp = sqrt(p*(1–p)/n) = sqrt((32/60)*(28/60)/60)  ≈ 0.064406

Table 1. Means and Proportions

 

Z(k) PROBRT

0.05 0.48006 0.03988

0.10 0.46017 0.07966

0.15 0.44038 0.11924

0.20 0.42074 0.15852

0.25 0.40129 0.19741

0.30 0.38209 0.23582

0.35 0.36317 0.27366

0.40 0.34458 0.31084

0.45 0.32636 0.34729

0.50 0.30854 0.38292

0.55 0.29116 0.41768

0.60 0.27425 0.45149

0.65 0.25785 0.48431

0.70 0.24196 0.51607

0.75 0.22663 0.54675

0.80 0.21186 0.57629

0.85 0.19766 0.60467

0.90 0.18406 0.63188

0.95 0.17106 0.65789

1.00 0.15866 0.68269

Z(k) PROBRT P

1.05 0.14686 0.70628

1.10 0.13567 0.72867

1.15 0.12507 0.74986

1.20 0.11507 0.76986

1.25 0.10565 0.78870

1.30 0.09680 0.80640

1.35 0.088508 0.82298

1.40 0.080757 0.83849

1.45 0.073529 0.85294

1.50 0.066807 0.86639

1.55 0.060571 0.87886

1.60 0.054799 0.89040

1.65 0.049471 0.90106

1.70 0.044565 0.91087

1.75 0.040059 0.91988

1.80 0.035930 0.92814

1.85 0.032157 0.93569

1.90 0.028717 0.94257

1.95 0.025588 0.94882

2.00 0.022750 0.95450

Z(k) PROBRT PR

2.05 0.020182 0.95964

2.10 0.017864 0.96427

2.15 0.015778 0.96844

2.20 0.013903 0.97219

2.25 0.012224 0.97555

2.30 0.010724 0.97855

2.35 0.009387 0.98123

2.40 0.008198 0.98360

2.45 0.007143 0.98571

2.50 0.006210 0.98758

2.55 0.005386 0.98923

2.60 0.004661 0.99068

2.65 0.004025 0.99195

2.70 .0034670 0.99307

2.75 .0029798 0.99404

2.80 .0025551 0.99489

2.85 .0021860 0.99563

2.90 .0018658 0.99627

2.95 .0015889 0.99682

3.00 .0013499 0.99730

 

Z = 2.20 from the row 2.20 0.013903 0.97219.

Lower Bound = p – 2.2*sdp ≈ .5333 – 2.2*0.064406  ≈ 0.39164

Upper Bound = p + 2.2*sdp ≈ .5333 + 2.2*0.064406  ≈ 0.67503

Discussion

Our population is the population of people with Traumatic Brain Injury (TBI).

Our Family of Samples (FoS) consists of every possible random sample of 60 people with TBI.

From each member sample of the FoS, we compute the interval [Lower Bound, Upper Bound] =

[p – 2.2*sdp, p + 2.2*sdp], where  sdp = sqrt(p*(1p)/n). Computing this interval for each member sample of the FoS, we obtain a Family of Intervals (FoI), approximately 97% of which cover the true population proportion of TBI cases with initially severe TBI.

If our interval, [.3916, .6750] is among the approximate 97% super-majority of intervals that cover the population proportion, then between 39.2% and 67.5% of TBI cases initially present with severe (3 ≤ GCS ≤ 8) TBI.

 

From http://www.mindspring.com/~cjalverson/CompFinalSpring2008verWednesdayKey.htm

Final Examination, Spring 2008, Version Wednesday

Case Three | Confidence Interval for Proportion | Gestational Age

Consider the proportion of Year 2005 US Resident Live Births that are “Full Term,” that is births with  [37,40] weeks of gestation at birth. Using the data from Case Two, compute and interpret a 98% confidence interval for this population proportion.

Gestational age is the time spent between conception and birth, usually measured in weeks. In general, infants born after 36 or fewer weeks of gestation are defined as premature, and may face significant challenges in health and development. Infants born after 37-40 weeks of gestation are generally viewed as full term, and those born after 41 or more weeks of gestation are generally viewed as post term. Suppose that a random sample of 2005 US resident live born infants yields the following gestational ages (in weeks):

 

25, 26, 27, 29, 30, 32, 33, 34, 34, 35, 35, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38 38,38, 38, 38, 38, 38, 38, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 41, 41, 41, 42, 42, 42, 43, 43

 

Numbers

 Table 1. Means and Proportions

 

Z(k) PROBRT PROBCENT

0.05 0.48006 0.03988

0.10 0.46017 0.07966

0.15 0.44038 0.11924

0.20 0.42074 0.15852

0.25 0.40129 0.19741

0.30 0.38209 0.23582

0.35 0.36317 0.27366

0.40 0.34458 0.31084

0.45 0.32636 0.34729

0.50 0.30854 0.38292

0.55 0.29116 0.41768

0.60 0.27425 0.45149

0.65 0.25785 0.48431

0.70 0.24196 0.51607

0.75 0.22663 0.54675

0.80 0.21186 0.57629

0.85 0.19766 0.60467

0.90 0.18406 0.63188

0.95 0.17106 0.65789

1.00 0.15866 0.68269

Z(k) PROBRT PROBCENT

1.05 0.14686 0.70628

1.10 0.13567 0.72867

1.15 0.12507 0.74986

1.20 0.11507 0.76986

1.25 0.10565 0.78870

1.30 0.09680 0.80640

1.35 0.088508 0.82298

1.40 0.080757 0.83849

1.45 0.073529 0.85294

1.50 0.066807 0.86639

1.55 0.060571 0.87886

1.60 0.054799 0.89040

1.65 0.049471 0.90106

1.70 0.044565 0.91087

1.75 0.040059 0.91988

1.80 0.035930 0.92814

1.85 0.032157 0.93569

1.90 0.028717 0.94257

1.95 0.025588 0.94882

2.00 0.022750 0.95450

Z(k) PROBRT PROBCENT

2.05 0.020182 0.95964

2.10 0.017864 0.96427

2.15 0.015778 0.96844

2.20 0.013903 0.97219

2.25 0.012224 0.97555

2.30 0.010724 0.97855

2.35 0.009387 0.98123

2.40 0.008198 0.98360

2.45 0.007143 0.98571

2.50 0.006210 0.98758

2.55 0.005386 0.98923

2.60 0.004661 0.99068

2.65 0.004025 0.99195

2.70 .0034670 0.99307

2.75 .0029798 0.99404

2.80 .0025551 0.99489

2.85 .0021860 0.99563

2.90 .0018658 0.99627

2.95 .0015889 0.99682

3.00 .0013499 0.99730

 

From 2.35 0.009387 0.98123, z=2.35

n = 56

e = 36

p = 36/56 ≈ 0.64286

sdp = sqrt(p*(1-p)/n) = sqrt((36/56)*(20/56)/56) ≈ 0.064030

lowCI = p − z*sdp = 0.64286 − 2.35*0.064030 ≈ 0.49239

highCI = p + z*sdp = 0.64286 + 2.35*0.064030 ≈ 0.79333

Report the interval as [.492, .793 ].

 

Interpretation

Our population is the population of year 2005 US resident live born infants and our population mean is the mean gestational age (weeks). Our event is that the live born infant was born with between 37 and 40 weeks of gestation.

Our Family of Samples (FoS) consists of every possible random sample of 56 year 2005 US resident live born infants. From each individual sampled live born infant, gestational age in weeks is obtained.

From each member sample of the FoS, we compute the sample proportion p of infants in the sample with between 37 and 40 weeks of gestation at birth and sdp, where sdp=sqrt(p*(1-p)/56), and then compute the interval

[p – 2.35*sdp, p + 2.35*sdp].

Computing this interval for each member sample of the FoS, we obtain a Family of Intervals (FoI), approximately 98% of which cover the true population proportion of year 2005 US resident live born infants with between 37 and 40 weeks of gestation.

If our interval, [.492, .793] is among the approximate 98% super-majority of intervals that cover the population mean, then between 49.2% and 79.3% of year 2005 US resident live born infants have gestation ages between 37 and 40 weeks.