Session 3.2, Summaries

29^th March 2010

Confidence Estimation of the Population Mean

In theory, we can compute the population mean face value of a fair, six-sided d6 with face values 1,2,3,4,5,6 as

M = 1*Pr{d6 shows 1}+2*Pr{d6 shows 2}+3*Pr{d6 shows 3}+4*Pr{d6 shows 4}+5*Pr{d6 shows 5}+6*Pr{d6 shows 6}

M = 1*(1/6) +2*(1/6)+3*(1/6)+4*(1/6)+5*(1/6)+6*(1/6) = 3.5

Our 95% confidence interval estimation process should produce intervals containing this population mean in approximately 95% of samples. Here is an example from Summer 2009:

Sample	1	2	3	4	5	6	n	m	sd	se	lower	upper	M
Perfect50	8.333	8.333	8.333	8.333	8.333	8.333	50	3.5	1.725	0.244	3.022	3.978	3.5
1	11	5	7	7	10	10	50	3.6	1.852	0.262	3.087	4.113	3.5	Hit
2	7	5	13	10	11	4	50	3.5	1.502	0.212	3.084	3.916	3.5	Hit
3	11	5	7	13	7	7	50	3.42	1.715	0.243	2.945	3.895	3.5	Hit
4	7	9	7	15	8	4	50	3.4	1.512	0.214	2.981	3.819	3.5	Hit
5	11	6	10	9	8	6	50	3.3	1.693	0.239	2.831	3.769	3.5	Hit
6	7	3	7	13	10	10	50	3.92	1.639	0.232	3.466	4.374	3.5	Hit
7	8	5	9	10	10	8	50	3.66	1.673	0.237	3.196	4.124	3.5	Hit
8	6	8	9	10	6	11	50	3.7	1.693	0.239	3.231	4.169	3.5	Hit
9	14	10	7	9	3	7	50	2.96	1.749	0.247	2.475	3.445	3.5	Miss
10	9	7	12	9	9	4	50	3.28	1.565	0.221	2.846	3.714	3.5	Hit
11	9	10	4	3	11	13	50	3.72	1.938	0.274	3.183	4.257	3.5	Hit
12	7	7	7	10	8	11	50	3.76	1.733	0.245	3.28	4.24	3.5	Hit
13	10	11	8	8	5	8	50	3.22	1.741	0.246	2.737	3.703	3.5	Hit
14	9	6	11	11	6	7	50	3.4	1.641	0.232	2.945	3.855	3.5	Hit
15	7	5	8	7	12	11	50	3.9	1.729	0.245	3.421	4.379	3.5	Hit
16	10	12	7	12	3	6	50	3.08	1.627	0.23	2.629	3.531	3.5	Hit
17	5	9	6	7	14	9	50	3.86	1.666	0.236	3.398	4.322	3.5	Hit
18	9	5	8	9	10	9	50	3.66	1.745	0.247	3.176	4.144	3.5	Hit
19	9	9	9	5	9	9	50	3.46	1.787	0.253	2.965	3.955	3.5	Hit
20	7	8	7	8	12	8	50	3.68	1.696	0.24	3.21	4.15	3.5	Hit
21	7	10	6	10	9	8	50	3.56	1.692	0.239	3.091	4.029	3.5	Hit
22	3	12	15	8	3	9	50	3.46	1.528	0.216	3.036	3.884	3.5	Hit
Success Rate		95%	20.9	20 or 21		Sample Success Rate				0.955
Failure Rate		5%	1.1	1 or 2		Sample Failure Rate				0.045

Confidence Interval

General Mean

Diseased Monkeys

Objective: Be able to perform interval estimation of the population mean using the confidence interval method. Be able to fully discuss the confidence interval. This discussion must include a clear description of the population and the population mean, the family of samples, the family of intervals and how the confidence applies to the family of intervals.

A random sample of Lab Monkeys is infected with the agent that causes Disease X. The time (in hours) from infection to the appearance of symptoms of Disease X is measured for each monkey. The sample of monkeys yields the following times (in hours):

12, 26, 36, 38, 40, 42, 44, 48, 52, 62,

13, 27, 37, 38, 41, 42, 44, 49, 55, 65,

15, 30, 37, 39, 41, 44, 46, 50, 56, 70,

16, 32, 38, 40, 42, 44, 48, 50, 58, 72,

18, 35, 40, 41, 42, 45, 48, 52, 58, 75

Follow the steps:

Edit the data into your calculator, and compute the following statistics: sample size, sample mean, sample standard deviation.

N M SD Z LOBOUND HIBOUND

50 42.66 14.0968 1.96 38.7526 46.5674

Identify the Population Mean for this Sample.

We seek the population mean time to symptoms, in hours for disease X among the population of Lab Monkeys.

Consult the Normal Table, and determine the SD Multiplier required to ensure 95%Confidence. Justify the approach.

Since we need approximate 95% confidence, we need a number somewhat larger than 1.95, but 2.00 is more than we need. In practice, the number that we need is 1.96. But you should use 2.00 from your table…Here are the rows from the table:

1.95 0.025588 0.94882

2.00 0.022750 0.95450

The cost of this approach is the availability of large random samples – n > 30 will usually suffice.

Compute a 95% Confidence Interval for the true but unknown population mean in this problem.

Compute

LOBOUND @ M – Z*(SD/ÖN) = 42.66-1.96*(14.0968/Ö50) @ 38.7526

and

HIBOUND @ M + Z*(SD/ÖN) = 42.66+1.96*(14.0968/Ö50) @ 46.5674.

Write the approximate interval as: [38.8,46.6]. This is our approximate interval.

Discuss the Family of Intervals for this problem.

Each member of this Family is a single random sample of n=50 Lab Monkeys. The Family of Samples (FoS) consists of every possible random sample of n=50 Lab Monkeys. Each member of the FoS yields the following statistics: { n(sample size), m(sample mean) and sd(sample std deviation}. For this FoS, n=50 for member samples, but m and sd will vary from member to member. Each member of the FoS yields an interval of the form:

[m – 1.96*(sd/Ön), m + 1.96*(sd/Ön)].

These intervals collectively form a Family of Intervals(FoI) – each member of the FoI is an interval derived from a member of the FoS. Approximately 95% of these intervals contain the true population mean time (in hours) to symptoms of Disease X in Lab Monkeys, and the approximately 5% fail.

Interpret the Single Confidence Interval for this problem.

If our interval captures the true population mean, then the mean time to symptoms of Disease X in Lab Monkeys is between 38.8 and 46.6 hours.

Confidence Interval

General Mean

Generic Fictitious Spiders

We have a sample of Generic Fictitious Spiders. Each spider's diameter (maximum length, in cm, from leg tip to leg tip).

The spider diameters are listed below:

16.5, 21.9 22.0, 22.8 22.8, 23.4 23.5, 23.7 24.4, 24.4

24.7, 28.1 28.3, 28.3 28.6, 29.1 30.3, 30.4 31.7, 31.7

32.4, 33.1 34.1, 35.0 35.0, 35.3 35.8, 36.7 37.1, 38.4

38.4, 38.6 38.7, 39.5 40.2, 41.9 43.3, 43.6 50.1, 52.4

Follow the steps:

Edit the data into your calculator, and compute the following statistics: sample size, sample mean, sample standard deviation.

Statistic	Value	Comment
Sample Size	n=40	There are n=40 spiders in the sample.
Sample Mean	m=32.4	The average diameter of spiders in the sample is 32.4 cms.
Sample SD	sd=8.09	No comment here.

Identify the Population Mean for this Sample.

We are after the Population Mean Maximum Spider Diameter.

Consult the Normal Table, and determine the SD Multiplier required to ensure 90%

We want z(k)=1.65 from the table, as discussed in class.

Confidence. Justify the approach.

We are working with large (n>=30) random samples, and are working with sample means.

Compute a 90% Confidence Interval for the true but unknown population mean in this problem.

m-1.65(sd/Ö n) = 32.4 - 1.65*(8.09/Ö 40) = 30.29;

m+1.65(sd/Ö n) = 32.4 + 1.65*(8.09/Ö 40) = 34.51;

Write the interval as [30.29,34.51].

Discuss the Family of Intervals for this problem.

Each member of our Family is a random sample of 40 spiders from our population. The Family of Samples consists of every possible sample of this type.

From each member, compute the interval

m ± 1.65(sd/Ö40);

where m is the sample mean and sd is the sample standard deviation for the sample.

Each member of the Family of Intervals is obtained in this way from the Family of Samples, and consists of all such intervals.

Approximately 90% of the Family of Intervals captures the population mean maximum spider diameter. The remaining intervals do not capture the population mean.

Interpret the Single Confidence Interval for this problem.

If our interval contains the population mean, then the true population mean maximum diameter for Generic Fictitious Spiders is between 30.29 and 34.51 centimeters.

From here: http://www.mindspring.com/~cjalverson/3rdhourlyfall2008versionA_key.htm

Case One | Confidence Interval, Mean | Glioblastoma Multiforme

Glioblastoma multiforme (GBM) is the highest grade glioma tumor and is the most malignant form of astrocytomas. These tumors originate in the brain. GBM tumors grow rapidly, invade nearby tissue and contain cells that are very malignant. GBM are among the most common and devastating primary brain tumors in adults.

Suppose that we have a random sample of GBM patients, with survival time (in weeks) listed below:

3, 4, 5, 5, 12, 15, 17, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 37, 38, 45, 48, 51, 53, 53, 57, 60, 61, 62, 63, 64, 65, 65, 65, 66, 66, 67, 68, 69, 72, 72, 73, 74, 76, 77, 78, 79, 80, 80, 81, 82, 83, 83, 85, 86, 87, 90, 150, 180,

Estimate the population mean survival time for Glioblastoma multiforme patients with 99% confidence. That is, compute and discuss a 99% confidence interval for this population mean. Provide concise and complete details and discussion as demonstrated in the case study summaries.

Table 1. Means and Proportions

Z(k) PROBRT PROBCENT

0.05 0.48006 0.03988

0.10 0.46017 0.07966

0.15 0.44038 0.11924

0.20 0.42074 0.15852

0.25 0.40129 0.19741

0.30 0.38209 0.23582

0.35 0.36317 0.27366

0.40 0.34458 0.31084

0.45 0.32636 0.34729

0.50 0.30854 0.38292

0.55 0.29116 0.41768

0.60 0.27425 0.45149

0.65 0.25785 0.48431

0.70 0.24196 0.51607

0.75 0.22663 0.54675

0.80 0.21186 0.57629

0.85 0.19766 0.60467

0.90 0.18406 0.63188

0.95 0.17106 0.65789

1.00 0.15866 0.68269

Z(k) PROBRT PROBCENT

1.05 0.14686 0.70628

1.10 0.13567 0.72867

1.15 0.12507 0.74986

1.20 0.11507 0.76986

1.25 0.10565 0.78870

1.30 0.09680 0.80640

1.35 0.088508 0.82298

1.40 0.080757 0.83849

1.45 0.073529 0.85294

1.50 0.066807 0.86639

1.55 0.060571 0.87886

1.60 0.054799 0.89040

1.65 0.049471 0.90106

1.70 0.044565 0.91087

1.75 0.040059 0.91988

1.80 0.035930 0.92814

1.85 0.032157 0.93569

1.90 0.028717 0.94257

1.95 0.025588 0.94882

2.00 0.022750 0.95450

Z(k) PROBRT PROBCENT

2.05 0.020182 0.95964

2.10 0.017864 0.96427

2.15 0.015778 0.96844

2.20 0.013903 0.97219

2.25 0.012224 0.97555

2.30 0.010724 0.97855

2.35 0.009387 0.98123

2.40 0.008198 0.98360

2.45 0.007143 0.98571

2.50 0.006210 0.98758

2.55 0.005386 0.98923

2.60 0.004661 0.99068

2.65 0.004025 0.99195

2.70 .0034670 0.99307

2.75 .0029798 0.99404

2.80 .0025551 0.99489

2.85 .0021860 0.99563

2.90 .0018658 0.99627

2.95 .0015889 0.99682

3.00 .0013499 0.99730

Numbers

n m sd se Z lower upper

58 56.91 33.12 4.35 2.60 45.61 68.22

se = sd/sqrt(n) » 33.12/sqrt(58) » 4.35

z » 2.60 for 99% confidence from 2.60 0.004661 0.99068

lower = m ─ (z*se) » 56.91 ─ (2.60*4.35) » 45.61

upper = m + (z*se) » 56.91 + (2.60*4.35) » 68.22

Report the interval as [45.6, 68.2].

Interpretation

Our population is the population of Glioblastoma multiforme patients and our population mean is the mean survival time (weeks).

Our Family of Samples (FoS) consists of every possible random sample of 58 Glioblastoma multiforme patients. From each individual sampled Glioblastoma multiforme patients, survival time in weeks is obtained.

From each member sample of the FoS, we compute the sample mean (m) and standard deviation (sd) for GBM survival time, and then compute the interval

[m – 2.60*( sd/sqrt(n)), m + 2.60*( sd/sqrt(n))].

Computing this interval for each member sample of the FoS, we obtain a Family of Intervals (FoI), approximately 99% of which cover the true population mean survival time in weeks for Glioblastoma multiforme patients.

If our interval, [45.6, 68.2] is among the approximate 99% super-majority of intervals that cover the population mean, then the true population mean survival time for Glioblastoma multiforme patients is between 45.6 and 68.2 weeks.

From here: http://www.mindspring.com/~cjalverson/CompFinalSpring2008verMondayKey.htm

Table 1. Means and Proportions

Z(k) PROBRT PROBCENT

0.05 0.48006 0.03988

0.10 0.46017 0.07966

0.15 0.44038 0.11924

0.20 0.42074 0.15852

0.25 0.40129 0.19741

0.30 0.38209 0.23582

0.35 0.36317 0.27366

0.40 0.34458 0.31084

0.45 0.32636 0.34729

0.50 0.30854 0.38292

0.55 0.29116 0.41768

0.60 0.27425 0.45149

0.65 0.25785 0.48431

0.70 0.24196 0.51607

0.75 0.22663 0.54675

0.80 0.21186 0.57629

0.85 0.19766 0.60467

0.90 0.18406 0.63188

0.95 0.17106 0.65789

1.00 0.15866 0.68269

Z(k) PROBRT PROBCENT

1.05 0.14686 0.70628

1.10 0.13567 0.72867

1.15 0.12507 0.74986

1.20 0.11507 0.76986

1.25 0.10565 0.78870

1.30 0.09680 0.80640

1.35 0.088508 0.82298

1.40 0.080757 0.83849

1.45 0.073529 0.85294

1.50 0.066807 0.86639

1.55 0.060571 0.87886

1.60 0.054799 0.89040

1.65 0.049471 0.90106

1.70 0.044565 0.91087

1.75 0.040059 0.91988

1.80 0.035930 0.92814

1.85 0.032157 0.93569

1.90 0.028717 0.94257

1.95 0.025588 0.94882

2.00 0.022750 0.95450

Z(k) PROBRT PROBCENT

2.05 0.020182 0.95964

2.10 0.017864 0.96427

2.15 0.015778 0.96844

2.20 0.013903 0.97219

2.25 0.012224 0.97555

2.30 0.010724 0.97855

2.35 0.009387 0.98123

2.40 0.008198 0.98360

2.45 0.007143 0.98571

2.50 0.006210 0.98758

2.55 0.005386 0.98923

2.60 0.004661 0.99068

2.65 0.004025 0.99195

2.70 .0034670 0.99307

2.75 .0029798 0.99404

2.80 .0025551 0.99489

2.85 .0021860 0.99563

2.90 .0018658 0.99627

2.95 .0015889 0.99682

3.00 .0013499 0.99730

Case Four | Confidence Interval for Mean | Gestational Age

Consider the population mean gestational ages (in weeks) at birth of Year 2005 US Resident Live Births. Using the data from Case Two, compute and interpret a 93% confidence interval for this population mean.

Numbers

From 1.85 0.032157 0.93569, z=1.85.

n = 56

m » 37.34

sd » 3.9278

lowCI = m − z*(sd/sqrt(n)) » 37.34 − 1.85*(3.9278/sqrt(56)) » 36.37

highCI = m + z*(sd/sqrt(n)) » 37.34 + 1.85*(3.9278/sqrt(56)) » 38.31

Report the interval as [36.4, 38.3].

Interpretation

Our population is the population of year 2005 US resident live born infants and our population mean is the mean gestational age (weeks).

Our Family of Samples (FoS) consists of every possible random sample of 56 year 2005 US resident live born infants. From each individual sampled live born infant, gestational age in weeks is obtained.

From each member sample of the FoS, we compute the sample mean (m) and standard deviation (sd) for serum CRP, and then compute the interval

[m – 1.85*( sd/sqrt(n)), m + 1.85*( sd/sqrt(n))].

Computing this interval for each member sample of the FoS, we obtain a Family of Intervals (FoI), approximately 93% of which cover the true population mean gestational age in weeks for year 2005 US resident live born infants.

If our interval, [36.4, 38.3] is among the approximate 93% super-majority of intervals that cover the population mean, then the true population mean gestational age is between 36.4 and 38.3 weeks for year 2005 US resident live born infants.