Session 3.2

27th October 2010

Descriptive Summary Intervals

Links

http://www.pages.drexel.edu/~tpm23/Stat201Spr04/EmpiricalTchebysheff.pdf

http://knowledgerush.com/kr/encyclopedia/Tchebysheff's_theorem/

http://faculty.roosevelt.edu/currano/M347/Lectures/3.11.Example.pdf

http://www.mathstat.carleton.ca/~lhaque/2507-chap2a.pdf

http://commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section4/roughcuts/roughcuts.htm

From http://www.mindspring.com/~cjalverson/_2ndhourlyfall2008verB_key.htm:

Case Four | Summary Intervals | Fictitious Striped Lizard

The Fictitious Striped Lizard is a native species of Lizard Island, and is noteworthy for the both the quantity and quality of its spots. Consider a random sample of Fictitious Striped Lizards, in which the number of stripes per lizard is noted:

1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 17, 17, 17, 17, 18, 21, 21, 21, 22, 24, 24, 24, 25, 25, 27

Let m denote the sample mean, and sd the sample standard deviation. Compute and interpret the intervals m±2sd and m±3sd, using Tchebysheff’s Inequalities and the Empirical Rule. Be specific and complete. Show your work, and discuss completely for full credit.

Numbers

n       m             sd             lower2     upper2     lower3      upper3

51    13.5294    6.49724    0.53493    26.5239    -5.96231    33.0211

 

We’re working with counts….

 

Short Interval, Raw: [0.53493    26.5239], restricted to [1, 26].

 

0 [ ||1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26|| ] 27 28 29 30

 

Long Interval, Raw: [ -5.96231    33.0211], restricted to [0, 33].

 

-6 [ -5 -4 -3 -2 -1 ||0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33|| ] 34

 

Short Interval: m ± (2*sd)

 

Lower Bound = m ─ (2*sd)  ≈ 13.5294     ─ (2*6.49724) ≈ 0.53493 [1]

Upper Bound = m + (2*sd)  ≈ 13.5294     + (2*6.49724) ≈ 26.5239 [26]

Long Interval: m ± (3*sd)

 

Lower Bound = m ─ (3*sd)  ≈ 13.5294     ─ (3*6.49724) ≈ -5.96231 [0]

Upper Bound = m + (3*sd)  ≈ 13.5294     + (3*6.49724) ≈ 33.0211 [33]

 

Interpretation

 

There are 51 Fictitious Striped lizards in our sample.

 

At least 75% of the lizards in our sample have between 1 and 26 stripes.

At least 89% of the lizards in our sample have between 0 and 33 stripes.

 

If the Fictitious Striped lizard stripe counts cluster symmetrically around a central value, becoming rare with increasing distance from the central value, then:

 

approximately 95% of the lizards in our sample have between 1 and 26 stripes. and approximately 100% of the lizards in our sample have between 0 and 33 stripes.

 

From http://www.mindspring.com/~cjalverson/_2ndhourlyfall2006versionA_key.htm:

Case One

Descriptive Statistics

Serum Creatinine and Kidney (Renal) Function

Healthy kidneys remove wastes and excess fluid from the blood. Blood tests show whether the kidneys are failing to remove wastes. Urine tests can show how quickly bdy wastes are being removed and whether the kidneys are also leaking abnormal amounts of protein. The nephron is the basic structure in the kidney that produces urine. In a healthy kidney there may be as many as 1,000,000 nephrons. Loss of nephrons reduces the ability of the kidney to function by reducing the kidney’s ability to produce urine. Progressive loss of nephrons leads to kidney failure. Serum creatinine. Creatinine is a waste product that comes from meat protein in the diet and also comes from the normal wear and tear on muscles of the body. Creatinine is produced at a continuous rate and is excreted only through the kidneys. When renal dysfunction occurs, the kidneys are impaired in their ability to excrete creatinine and the serum creatinine rises. As kidney disease progresses, the level of creatinine in the blood increases.

Suppose that we sample serum creatinine levels in a random sample of adults. Serum creatinine (as mg/dL) for each sampled subject follows:

15.0, 14.5, 14.2, 13.8, 13.5, 13.1, 12.2, 11.1, 10.1, 9.8, 8.1, 7.3, 5.1, 5.0, 4.9, 4.8, 4.0, 3.5, 3.3, 3.2, 3.2, 2.9, 2.5, 2.3, 2.1, 2.0, 1.9, 1.9, 1.8, 1.6, 1.5, 1.5, 1.4, 1.4, 1.3, 1.3, 1.3, 1.2, 1.2, 1.1, 1.12, 1.09, 1.05, 0.95, 0.92, 0.9, 0.9, 0.9, 0.9, 0.8, 0.8, 0.8, 0.8, 0.8, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6

Compute and interpret the following statistics: sample size (n), p00, p25, p50, p75, p100, (p75-p00), (p100-p25), (p75-p50), (p50-p25). Be specific and complete. Show your work, and discuss completely for full credit.

Case Two

Summary Intervals

Serum Creatinine and Kidney (Renal) Function

 

Using the context and data from Case One, let m denote the sample mean, and sd the sample standard deviation. Compute and interpret the intervals  m ± 2sd and m ± 3sd, using Tchebysheff’s Inequalities and the Empirical Rule. Be specific and complete. Show your work, and discuss completely for full credit.

 

Numbers

         number of

        nonmissing                 the standard

         values,      the mean,     deviation,

         sercreat     sercreat       sercreat      m-3*sd    m+3*sd    m-2*sd    m+2*sd

            69           3.4           4.2                  -9.2           16.0         -5.0          11.8

 

n=69

m=3.4

sd=4.2

 

 

“Short Interval”

Lower2 = m – 2*sd = 3.4 – 2*4.2 = -5.0[0] (Negative concentrations don’t make sense here.)

Upper2 = m + 2*sd = 3.4 + 2*4.2 = 11.8

 

“Long Interval”

Lower3 = m – 3*sd = 3.4 – 3*4.2 = -9.2[0] (Negative concentrations don’t make sense here.)

Upper3 = m + 3*sd = 3.4 + 3*4.2 = 16.0

 

Interpretation

 

Tchebyshev’s Inequalities

 

At least 75% of the subjects in the sample have serum creatinine levels between 0 and 11.8 mg creatinine per deciliter serum. 

At least 89% of the subjects in the sample have serum creatinine levels between 0 and 16.0 mg creatinine per deciliter serum.

 

Empirical Rule

 

If the serum creatinine levels cluster symmetrically around a central value, with values becoming progressively and symmetrically rarer with increasing distance from the central value, then …

 

approximately 95% of the subjects in the sample have serum creatinine levels between 0 and 11.8 mg creatinine per deciliter serum and  

approximately 100% of the subjects in the sample have serum creatinine levels between 0 and 16.0 mg creatinine per deciliter serum.

 

 

Diseased Monkeys

A random sample of Lab Monkeys is infected with the agent that causes Disease X. The time (in hours) from infection to the appearance of symptoms of Disease X is measured for each monkey. The sample of monkeys yields the following times (in hours):

12, 26, 36, 38, 40, 42, 44, 48, 52, 62, 13, 27, 37, 38, 41, 42, 44, 49, 55, 65, 15, 30, 37, 39, 41, 44, 46, 50, 56, 70

16, 32, 38, 40, 42, 44, 48, 50, 58, 72, 18, 35, 40, 41, 42, 45, 48, 52, 58, 75

Edit the data into your calculator, and compute the following statistics: sample size (n), sample mean (m) and sample standard deviation (sd).

Compute the intervals m ± 2sd and m ± 3sd.

Apply and discuss the Empirical Rule for these intervals. Interpret each interval, using the context of the data. Do not simply state the value of the interval, interpret it. Be specific and complete.

Apply and discuss Tchebysheff’s Theorem for these intervals. Interpret each interval, using the context of the data. Do not simply state the value of the interval, interpret it. Be specific and complete.

Short Interval: m ± (2*sd)

 

Lower Bound = m ─ (2*sd)  ≈ 42.66 ─ (2*14.0968) ≈ 14.5

Upper Bound = m + (2*sd)  ≈ 42.66  + (2*14.0968) ≈ 70.8

 

Long Interval: m ± (3*sd)

 

Lower Bound = m ─ (3*sd)  ≈ 42.66 ─ (3*14.0968) ≈ 0.37

Upper Bound = m + (3*sd)  ≈ 42.66  + (3*14.0968) ≈ 84.9

 

At least 75% of the monkeys in the sample showed symptoms between 14.5 and 70.8 hours after exposure. 

At least 89% of the monkeys in the sample showed symptoms between 0.37 and 84.9 hours after exposure.

 

If the monkey times-to-symptom cluster symmetrically around a central value, becoming rare with increasing distance from the central value, then:

 

Approximately 95% of the monkeys in the sample showed symptoms between 14.5 and 79.8 hours after exposure, and  

Approximately 100% of the monkeys in the sample showed symptoms between 0.37 and 84.9 hours after exposure.

 

Barrel of Monkeysä

A random sample of people are selected, and their performance on the Barrel of Monkeysä game is measured.

Here are the instructions for this game: "Dump monkeys onto table. Pick up one monkey by an arm. Hook other arm through a second monkey's arm. Continue making a chain. Your turn is over when a monkey is dropped."

Each person makes one chain of monkeys, and the number of monkeys in each chain is recorded:

1, 2, 5, 2, 9, 12, 8, 7, 10, 9, 6, 4, 6, 9, 3, 12, 11, 10, 8, 4, 12, 7, 8, 6, 7, 8, 6, 5, 9, 10, 7, 5, 4, 3, 10, 7

7, 6, 8, 6, 6, 6, 6, 7, 8, 8, 7, 8

 

Edit the data into your calculator, and compute the following statistics: sample size (n), sample mean (m) and sample standard deviation (sd).

Compute the intervals m ± 2sd and m ± 3sd.

Apply and discuss the Empirical Rule for these intervals. Interpret each interval, using the context of the data. Do not simply state the value of the interval, interpret it. Be specific and complete.

Apply and discuss Tchebysheff’s Theorem for these intervals. Interpret each interval, using the context of the data. Do not simply state the value of the interval, interpret it. Be specific and complete.

n       m              sd              Lower2SD       Upper2SD       Lower3SD        Upper3SD

48    6.97917    2.59697     1.78523[2]     12.1731[12]    -0.81173[0]     14.7701[14]

 

We’re working with counts….

 

Short Interval, Raw: [1.78523, 12.1731], restricted to [2, 12].

 

-1 --- 0 --- 1 - [-- ||2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12|| -] -- 13 --- 14 --- 15

 

Long Interval, Raw: [-0.81173, 14.7701], restricted to [0, 14] or to [1, 14].

 

-1 -- [- ||0 --- 1 --- 2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12 --- 13 --- 14|| -- ] - 15

 

Short Interval: m ± (2*sd)

 

Lower Bound = m ─ (2*sd)  ≈ 6.97917 ─ (2*2.59697) ≈ 2

Upper Bound = m + (2*sd)  ≈ 6.97917  + (2*2.59697) ≈ 12

 

Long Interval: m ± (3*sd)

 

Lower Bound = m ─ (3*sd)  ≈ 6.97917 ─ (3*2.59697) ≈ 0 (or 1)

Upper Bound = m + (3*sd)  ≈ 6.97917  + (3*2.59697) ≈ 14

 

At least 75% of the monkey chains in the sample had between 2 ands 12 monkeys. 

At least 89% of the monkey chains in the sample had between 0 (or 1) and 14 monkeys.

 

If the monkey chain counts cluster symmetrically around a central value, becoming rare with increasing distance from the central value, then:

 

approximately 95% of the monkey chains in the sample showed between 2 and 12 monkeys and 

approximately 100% of the monkey chains in the sample showed between 0 (or 1) and 14 monkeys.

Confidence Estimation of the Population Mean

 

In theory, we can compute the population mean face value of a fair, six-sided d6 with face values 1,2,3,4,5,6 as

M = 1*Pr{d6 shows 1}+2*Pr{d6 shows 2}+3*Pr{d6 shows 3}+4*Pr{d6 shows 4}+5*Pr{d6 shows 5}+6*Pr{d6 shows 6}

M = 1*(1/6) +2*(1/6)+3*(1/6)+4*(1/6)+5*(1/6)+6*(1/6) = 3.5

 

Our 95% confidence interval estimation process should produce intervals containing this population mean in approximately 95% of samples.

Here is an example from Summer 2009:

 

Sample

1

2

3

4

5

6

n

m

sd

se

lower

upper

M

Perfect50

8.333

8.333

8.333

8.333

8.333

8.333

50

3.5

1.725

0.244

3.022

3.978

3.5

1

11

5

7

7

10

10

50

3.6

1.852

0.262

3.087

4.113

3.5

Hit

2

7

5

13

10

11

4

50

3.5

1.502

0.212

3.084

3.916

3.5

Hit

3

11

5

7

13

7

7

50

3.42

1.715

0.243

2.945

3.895

3.5

Hit

4

7

9

7

15

8

4

50

3.4

1.512

0.214

2.981

3.819

3.5

Hit

5

11

6

10

9

8

6

50

3.3

1.693

0.239

2.831

3.769

3.5

Hit

6

7

3

7

13

10

10

50

3.92

1.639

0.232

3.466

4.374

3.5

Hit

7

8

5

9

10

10

8

50

3.66

1.673

0.237

3.196

4.124

3.5

Hit

8

6

8

9

10

6

11

50

3.7

1.693

0.239

3.231

4.169

3.5

Hit

9

14

10

7

9

3

7

50

2.96

1.749

0.247

2.475

3.445

3.5

Miss

10

9

7

12

9

9

4

50

3.28

1.565

0.221

2.846

3.714

3.5

Hit

11

9

10

4

3

11

13

50

3.72

1.938

0.274

3.183

4.257

3.5

Hit

12

7

7

7

10

8

11

50

3.76

1.733

0.245

3.28

4.24

3.5

Hit

13

10

11

8

8

5

8

50

3.22

1.741

0.246

2.737

3.703

3.5

Hit

14

9

6

11

11

6

7

50

3.4

1.641

0.232

2.945

3.855

3.5

Hit

15

7

5

8

7

12

11

50

3.9

1.729

0.245

3.421

4.379

3.5

Hit

16

10

12

7

12

3

6

50

3.08

1.627

0.23

2.629

3.531

3.5

Hit

17

5

9

6

7

14

9

50

3.86

1.666

0.236

3.398

4.322

3.5

Hit

18

9

5

8

9

10

9

50

3.66

1.745

0.247

3.176

4.144

3.5

Hit

19

9

9

9

5

9

9

50

3.46

1.787

0.253

2.965

3.955

3.5

Hit

20

7

8

7

8

12

8

50

3.68

1.696

0.24

3.21

4.15

3.5

Hit

21

7

10

6

10

9

8

50

3.56

1.692

0.239

3.091

4.029

3.5

Hit

22

3

12

15

8

3

9

50

3.46

1.528

0.216

3.036

3.884

3.5

Hit

Success Rate

95%

20.9

20 or 21

Sample Success Rate

0.955

Failure Rate

5%

1.1

1 or 2

Sample Failure Rate

0.045

 Here are our current samples:

 

Sample

1

2

3

4

5

6

mean

sd

se

Lower95

Upper95

Mean

Status

Perfect

8.333333

8.333333

8.333333

8.333333

8.333333

8.333333

3.5

1.725164

0.243975

3.01205

3.98795

3.5

Perfect

#1

8

10

6

11

5

10

3.5

1.752549

0.247848

3.004304

3.995696

3.5

Hit

#2

11

11

7

6

7

8

3.22

1.798979

0.254414

2.711172

3.728828

3.5

Hit

#3

6

8

8

6

10

12

3.84

1.75383

0.248029

3.343942

4.336058

3.5

Hit

#4

6

12

6

9

12

5

3.48

1.606619

0.22721

3.02558

3.93442

3.5

Hit

#5

8

6

7

10

12

7

3.66

1.673442

0.23666

3.186679

4.133321

3.5

Hit

#6

9

9

4

11

7

10

3.56

1.797504

0.254205

3.051589

4.068411

3.5

Hit

#7

10

7

5

9

9

10

3.6

1.829464

0.258725

3.082549

4.117451

3.5

Hit

#8

14

3

5

12

10

6

3.38

1.794436

0.253772

2.872457

3.887543

3.5

Hit

#9

7

14

6

7

10

6

3.34

1.673442

0.23666

2.866679

3.813321

3.5

Hit

#10

12

14

6

11

2

5

2.84

1.595402

0.225624

2.388752

3.291248

3.5

Miss

#11

8

7

15

3

4

13

3.54

1.809386

0.255886

3.028228

4.051772

3.5

Hit

#12

12

10

11

2

7

8

3.12

1.802945

0.254975

2.61005

3.62995

3.5

Hit

#13

4

12

8

11

5

10

3.62

1.627443

0.230155

3.15969

4.08031

3.5

Hit

#14

9

3

12

9

8

9

3.62

1.70102

0.240561

3.138879

4.101121

3.5

Hit

#15

6

9

11

9

6

9

3.54

1.643913

0.232484

3.075031

4.004969

3.5

Hit

#16

6

14

6

8

8

8

3.44

1.692239

0.239319

2.961362

3.918638

3.5

Hit

#17

6

9

8

5

6

16

3.88

1.847668

0.2613

3.357401

4.402599

3.5

Hit

#18

8

7

8

8

9

6

3.18

1.63464

0.231173

2.717654

3.642346

3.5

Hit

#19

11

12

7

4

5

11

3.26

1.893167

0.267734

2.724531

3.795469

3.5

Hit

#20

10

8

4

8

8

12

3.64

1.892628

0.267658

3.104684

4.175316

3.5

Hit

 

Confidence Interval

General Mean

Diseased Monkeys

Objective: Be able to perform interval estimation of the population mean using the confidence interval method. Be able to fully discuss the confidence interval. This discussion must include a clear description of the population and the population mean, the family of samples, the family of intervals and how the confidence applies to the family of intervals.

A random sample of Lab Monkeys is infected with the agent that causes Disease X. The time (in hours) from infection to the appearance of symptoms of Disease X is measured for each monkey. The sample of monkeys yields the following times (in hours):

12, 26, 36, 38, 40, 42, 44, 48, 52, 62,

13, 27, 37, 38, 41, 42, 44, 49, 55, 65,

15, 30, 37, 39, 41, 44, 46, 50, 56, 70,

16, 32, 38, 40, 42, 44, 48, 50, 58, 72,

18, 35, 40, 41, 42, 45, 48, 52, 58, 75

Follow the steps:

Edit the data into your calculator, and compute the following statistics: sample size, sample mean, sample standard deviation.

 

N      M         SD        Z      LOBOUND    HIBOUND

50    42.66    14.0968    1.96    38.7526    46.5674

Identify the Population Mean for this Sample.

We seek the population mean time to symptoms, in hours for disease X among the population of Lab Monkeys.

Consult the Normal Table, and determine the SD Multiplier required to ensure 95%Confidence. Justify the approach.

Since we need approximate 95% confidence, we need a number somewhat larger than 1.95, but 2.00 is more than we need. In practice, the number that we need is 1.96. But you should use 2.00 from your table…Here are the rows from the table:

1.95 0.025588 0.94882

2.00 0.022750 0.95450

The cost of this approach is the availability of large random samples – n > 30 will usually suffice.

Compute a 95% Confidence Interval for the true but unknown population mean in this problem.

Compute

 

LOBOUND @ M – Z*(SD/ÖN) = 42.66-1.96*(14.0968/Ö50) @ 38.7526

 

and

 

HIBOUND @ M + Z*(SD/ÖN) = 42.66+1.96*(14.0968/Ö50) @ 46.5674.

 

Write the approximate interval as: [38.8,46.6]. This is our approximate interval.

Discuss the Family of Intervals for this problem.

Each member of this Family is a single random sample of n=50 Lab Monkeys. The Family of Samples (FoS) consists of every possible random sample of n=50 Lab Monkeys. Each member of the FoS yields the following statistics: { n(sample size), m(sample mean) and sd(sample std deviation}. For this FoS, n=50 for member samples, but m and sd will vary from member to member. Each member of the FoS yields an interval of the form:

[m – 1.96*(sd/Ön), m + 1.96*(sd/Ön)].

These intervals collectively form a Family of Intervals(FoI) – each member of the FoI is an interval derived from a member of the FoS. Approximately 95% of these intervals contain the true population mean time (in hours) to symptoms of Disease X in Lab Monkeys, and the approximately 5% fail.

Interpret the Single Confidence Interval for this problem.

If our interval captures the true population mean, then the mean time to symptoms of Disease X in Lab Monkeys is between 38.8 and 46.6 hours.

 

 

Confidence Interval

General Mean

Generic Fictitious Spiders

Objective: Be able to perform interval estimation of the population mean using the confidence interval method. Be able to fully discuss the confidence interval. This discussion must include a clear description of the population and the population mean, the family of samples, the family of intervals and how the confidence applies to the family of intervals.

Generic Fictitious Spiders

We have a sample of Generic Fictitious Spiders. Each spider's diameter (maximum length, in cm, from leg tip to leg tip).

The spider diameters are listed below:

16.5, 21.9 22.0, 22.8 22.8, 23.4 23.5, 23.7 24.4, 24.4

24.7, 28.1 28.3, 28.3 28.6, 29.1 30.3, 30.4 31.7, 31.7

32.4, 33.1 34.1, 35.0 35.0, 35.3 35.8, 36.7 37.1, 38.4

38.4, 38.6 38.7, 39.5 40.2, 41.9 43.3, 43.6 50.1, 52.4

Follow the steps:

Edit the data into your calculator, and compute the following statistics: sample size, sample mean, sample standard deviation.

Statistic

Value

Comment

Sample Size

n=40

There are n=40 spiders in the sample.

Sample Mean

m=32.4

The average diameter of spiders in the sample is 32.4 cms.

Sample SD

sd=8.09

No comment here.

 

Identify the Population Mean for this Sample.

We are after the Population Mean Maximum Spider Diameter.

Consult the Normal Table, and determine the SD Multiplier required to ensure 90%

We want z(k)=1.65 from the table, as discussed in class.

Confidence. Justify the approach.

We are working with large (n>=30) random samples, and are working with sample means.

Compute a 90% Confidence Interval for the true but unknown population mean in this problem.

 

m-1.65(sd/Ö n) = 32.4 - 1.65*(8.09/Ö 40) = 30.29;

m+1.65(sd/Ö n) = 32.4 + 1.65*(8.09/Ö 40) = 34.51;

 

Write the interval as [30.29,34.51].

Discuss the Family of Intervals for this problem.

Each member of our Family is a random sample of 40 spiders from our population. The Family of Samples consists of every possible sample of this type.

From each member, compute the interval

m ± 1.65(sd/Ö40);

where m is the sample mean and sd is the sample standard deviation for the sample.

Each member of the Family of Intervals is obtained in this way from the Family of Samples, and consists of all such intervals.

Approximately 90% of the Family of Intervals captures the population mean maximum spider diameter. The remaining intervals do not capture the population mean.

Interpret the Single Confidence Interval for this problem.

If our interval contains the population mean, then the true population mean maximum diameter for Generic Fictitious Spiders is between 30.29 and 34.51 centimeters.

 

From here: http://www.mindspring.com/~cjalverson/3rdhourlyfall2008versionA_key.htm

 

Case One | Confidence Interval, Mean | Glioblastoma Multiforme

Glioblastoma multiforme (GBM) is the highest grade glioma tumor and is the most malignant form of astrocytomas. These tumors originate in the brain. GBM tumors grow rapidly, invade nearby tissue and contain cells that are very malignant. GBM are among the most common and devastating primary brain tumors in adults.

Suppose that we have a random sample of GBM patients, with survival time (in weeks) listed below:

3, 4, 5, 5, 12, 15, 17, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 37, 38, 45, 48, 51, 53, 53, 57, 60, 61, 62, 63, 64, 65, 65, 65, 66, 66, 67, 68, 69, 72, 72, 73, 74, 76, 77, 78, 79, 80, 80, 81, 82, 83, 83, 85, 86, 87, 90, 150, 180,

Estimate the population mean survival time for Glioblastoma multiforme patients with 99% confidence. That is, compute and discuss a 99% confidence interval for this population mean. Provide concise and complete details and discussion as demonstrated in the case study summaries.

Table 1. Means and Proportions

 Z(k) PROBRT PROBCENT

0.05 0.48006 0.03988

0.10 0.46017 0.07966

0.15 0.44038 0.11924

0.20 0.42074 0.15852

0.25 0.40129 0.19741

0.30 0.38209 0.23582

0.35 0.36317 0.27366

0.40 0.34458 0.31084

0.45 0.32636 0.34729

0.50 0.30854 0.38292

0.55 0.29116 0.41768

0.60 0.27425 0.45149

0.65 0.25785 0.48431

0.70 0.24196 0.51607

0.75 0.22663 0.54675

0.80 0.21186 0.57629

0.85 0.19766 0.60467

0.90 0.18406 0.63188

0.95 0.17106 0.65789

1.00 0.15866 0.68269

Z(k) PROBRT PROBCENT

1.05 0.14686 0.70628

1.10 0.13567 0.72867

1.15 0.12507 0.74986

1.20 0.11507 0.76986

1.25 0.10565 0.78870

1.30 0.09680 0.80640

1.35 0.088508 0.82298

1.40 0.080757 0.83849

1.45 0.073529 0.85294

1.50 0.066807 0.86639

1.55 0.060571 0.87886

1.60 0.054799 0.89040

1.65 0.049471 0.90106

1.70 0.044565 0.91087

1.75 0.040059 0.91988

1.80 0.035930 0.92814

1.85 0.032157 0.93569

1.90 0.028717 0.94257

1.95 0.025588 0.94882

2.00 0.022750 0.95450

Z(k) PROBRT PROBCENT

2.05 0.020182 0.95964

2.10 0.017864 0.96427

2.15 0.015778 0.96844

2.20 0.013903 0.97219

2.25 0.012224 0.97555

2.30 0.010724 0.97855

2.35 0.009387 0.98123

2.40 0.008198 0.98360

2.45 0.007143 0.98571

2.50 0.006210 0.98758

2.55 0.005386 0.98923

2.60 0.004661 0.99068

2.65 0.004025 0.99195

2.70 .0034670 0.99307

2.75 .0029798 0.99404

2.80 .0025551 0.99489

2.85 .0021860 0.99563

2.90 .0018658 0.99627

2.95 .0015889 0.99682

3.00 .0013499 0.99730

Numbers

n           m          sd          se           Z       lower       upper

58       56.91       33.12        4.35        2.60       45.61       68.22

 

se = sd/sqrt(n) » 33.12/sqrt(58) » 4.35

z » 2.60 for 99% confidence from 2.60 0.004661 0.99068

lower = m ─  (z*se)  » 56.91 ─ (2.60*4.35) » 45.61

upper = m +  (z*se)  » 56.91 + (2.60*4.35) » 68.22

Report the interval as [45.6, 68.2].

 

Interpretation

 

Our population is the population of Glioblastoma multiforme patients and our population mean is the mean survival time (weeks).

Our Family of Samples (FoS) consists of every possible random sample of 58 Glioblastoma multiforme patients. From each individual sampled Glioblastoma multiforme patients, survival time in weeks is obtained.

From each member sample of the FoS, we compute the sample mean (m) and standard deviation (sd) for GBM survival time, and then compute the interval

 

[m – 2.60*( sd/sqrt(n)), m + 2.60*( sd/sqrt(n))].

Computing this interval for each member sample of the FoS, we obtain a Family of Intervals (FoI), approximately 99% of which cover the true population mean survival time in weeks for Glioblastoma multiforme patients.

If our interval, [45.6, 68.2] is among the approximate 99% super-majority of intervals that cover the population mean, then the true population mean survival time for Glioblastoma multiforme patients is between 45.6 and 68.2 weeks.

From here: http://www.mindspring.com/~cjalverson/CompFinalSpring2008verMondayKey.htm

Table 1. Means and Proportions

 Z(k) PROBRT PROBCENT

0.05 0.48006 0.03988

0.10 0.46017 0.07966

0.15 0.44038 0.11924

0.20 0.42074 0.15852

0.25 0.40129 0.19741

0.30 0.38209 0.23582

0.35 0.36317 0.27366

0.40 0.34458 0.31084

0.45 0.32636 0.34729

0.50 0.30854 0.38292

0.55 0.29116 0.41768

0.60 0.27425 0.45149

0.65 0.25785 0.48431

0.70 0.24196 0.51607

0.75 0.22663 0.54675

0.80 0.21186 0.57629

0.85 0.19766 0.60467

0.90 0.18406 0.63188

0.95 0.17106 0.65789

1.00 0.15866 0.68269

Z(k) PROBRT PROBCENT

1.05 0.14686 0.70628

1.10 0.13567 0.72867

1.15 0.12507 0.74986

1.20 0.11507 0.76986

1.25 0.10565 0.78870

1.30 0.09680 0.80640

1.35 0.088508 0.82298

1.40 0.080757 0.83849

1.45 0.073529 0.85294

1.50 0.066807 0.86639

1.55 0.060571 0.87886

1.60 0.054799 0.89040

1.65 0.049471 0.90106

1.70 0.044565 0.91087

1.75 0.040059 0.91988

1.80 0.035930 0.92814

1.85 0.032157 0.93569

1.90 0.028717 0.94257

1.95 0.025588 0.94882

2.00 0.022750 0.95450

Z(k) PROBRT PROBCENT

2.05 0.020182 0.95964

2.10 0.017864 0.96427

2.15 0.015778 0.96844

2.20 0.013903 0.97219

2.25 0.012224 0.97555

2.30 0.010724 0.97855

2.35 0.009387 0.98123

2.40 0.008198 0.98360

2.45 0.007143 0.98571

2.50 0.006210 0.98758

2.55 0.005386 0.98923

2.60 0.004661 0.99068

2.65 0.004025 0.99195

2.70 .0034670 0.99307

2.75 .0029798 0.99404

2.80 .0025551 0.99489

2.85 .0021860 0.99563

2.90 .0018658 0.99627

2.95 .0015889 0.99682

3.00 .0013499 0.99730

 

Case Four | Confidence Interval for Mean | Gestational Age

Consider the population mean gestational ages (in weeks) at birth of Year 2005 US Resident Live Births. Using the data from Case Two, compute and interpret a 93% confidence interval for this population mean.

Numbers

 

From 1.85 0.032157 0.93569, z=1.85.

n = 56

m » 37.34

sd » 3.9278

 

lowCI = m − z*(sd/sqrt(n)) » 37.34 − 1.85*(3.9278/sqrt(56)) » 36.37

highCI = m + z*(sd/sqrt(n)) » 37.34 + 1.85*(3.9278/sqrt(56)) » 38.31

 

Report the interval as [36.4, 38.3].

 

Interpretation

Our population is the population of year 2005 US resident live born infants and our population mean is the mean gestational age (weeks).

Our Family of Samples (FoS) consists of every possible random sample of 56 year 2005 US resident live born infants. From each individual sampled live born infant, gestational age in weeks is obtained.

From each member sample of the FoS, we compute the sample mean (m) and standard deviation (sd) for serum CRP, and then compute the interval

[m – 1.85*( sd/sqrt(n)), m + 1.85*( sd/sqrt(n))].

Computing this interval for each member sample of the FoS, we obtain a Family of Intervals (FoI), approximately 93% of which cover the true population mean gestational age in weeks for year 2005 US resident live born infants.

If our interval, [36.4, 38.3] is among the approximate 93% super-majority of intervals that cover the population mean, then the true population mean gestational age is between 36.4 and 38.3 weeks for year 2005 US resident live born infants.