Session 2.4

Descriptive Summary Intervals

14^th October 2009

Links

http://www.pages.drexel.edu/~tpm23/Stat201Spr04/EmpiricalTchebysheff.pdf

http://knowledgerush.com/kr/encyclopedia/Tchebysheff's_theorem/

http://faculty.roosevelt.edu/currano/M347/Lectures/3.11.Example.pdf

http://www.mathstat.carleton.ca/~lhaque/2507-chap2a.pdf

http://commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section4/roughcuts/roughcuts.htm

From http://www.mindspring.com/~cjalverson/_2ndhourlyfall2008verB_key.htm:

Case Four | Summary Intervals | Fictitious Striped Lizard

The Fictitious Striped Lizard is a native species of Lizard Island, and is noteworthy for the both the quantity and quality of its spots. Consider a random sample of Fictitious Striped Lizards, in which the number of stripes per lizard is noted:

1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 17, 17, 17, 17, 18, 21, 21, 21, 22, 24, 24, 24, 25, 25, 27

Let m denote the sample mean, and sd the sample standard deviation. Compute and interpret the intervals m±2sd and m±3sd, using Tchebysheff’s Inequalities and the Empirical Rule. Be specific and complete. Show your work, and discuss completely for full credit.

Numbers

n m sd lower2 upper2 lower3 upper3

51 13.5294 6.49724 0.53493 26.5239 -5.96231 33.0211

We’re working with counts….

Short Interval, Raw: [0.53493 26.5239], restricted to [1, 26].

0 [ ||1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26|| ] 27 28 29 30

Long Interval, Raw: [ -5.96231 33.0211], restricted to [0, 33].

-6 [ -5 -4 -3 -2 -1 ||0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33|| ] 34

Short Interval: m ± (2*sd)

Lower Bound = m ─ (2*sd) ≈ 13.5294 ─ (2*6.49724) ≈ 0.53493 [1]

Upper Bound = m + (2*sd) ≈ 13.5294 + (2*6.49724) ≈ 26.5239 [26]

Long Interval: m ± (3*sd)

Lower Bound = m ─ (3*sd) ≈ 13.5294 ─ (3*6.49724) ≈ -5.96231 [0]

Upper Bound = m + (3*sd) ≈ 13.5294 + (3*6.49724) ≈ 33.0211 [33]

Interpretation

There are 51 Fictitious Striped lizards in our sample.

At least 75% of the lizards in our sample have between 1 and 26 stripes.

At least 89% of the lizards in our sample have between 0 and 33 stripes.

If the Fictitious Striped lizard stripe counts cluster symmetrically around a central value, becoming rare with increasing distance from the central value, then:

approximately 95% of the lizards in our sample have between 1 and 26 stripes.

and approximately 100% of the lizards in our sample have between 0 and 33 stripes.

From http://www.mindspring.com/~cjalverson/_2ndhourlyfall2006versionA_key.htm:

Case One

Descriptive Statistics

Serum Creatinine and Kidney (Renal) Function

Healthy kidneys remove wastes and excess fluid from the blood. Blood tests show whether the kidneys are failing to remove wastes. Urine tests can show how quickly bdy wastes are being removed and whether the kidneys are also leaking abnormal amounts of protein. The nephron is the basic structure in the kidney that produces urine. In a healthy kidney there may be as many as 1,000,000 nephrons. Loss of nephrons reduces the ability of the kidney to function by reducing the kidney’s ability to produce urine. Progressive loss of nephrons leads to kidney failure. Serum creatinine. Creatinine is a waste product that comes from meat protein in the diet and also comes from the normal wear and tear on muscles of the body. Creatinine is produced at a continuous rate and is excreted only through the kidneys. When renal dysfunction occurs, the kidneys are impaired in their ability to excrete creatinine and the serum creatinine rises. As kidney disease progresses, the level of creatinine in the blood increases.

Suppose that we sample serum creatinine levels in a random sample of adults. Serum creatinine (as mg/dL) for each sampled subject follows:

15.0, 14.5, 14.2, 13.8, 13.5, 13.1, 12.2, 11.1, 10.1, 9.8, 8.1, 7.3, 5.1, 5.0, 4.9, 4.8, 4.0, 3.5, 3.3, 3.2, 3.2, 2.9, 2.5, 2.3, 2.1, 2.0, 1.9, 1.9, 1.8, 1.6, 1.5, 1.5, 1.4, 1.4, 1.3, 1.3, 1.3, 1.2, 1.2, 1.1, 1.12, 1.09, 1.05, 0.95, 0.92, 0.9, 0.9, 0.9, 0.9, 0.8, 0.8, 0.8, 0.8, 0.8, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6

Compute and interpret the following statistics: sample size (n), p₀₀, p₂₅, p₅₀, p₇₅, p₁₀₀, (p₇₅-p₀₀), (p₁₀₀-p₂₅), (p₇₅-p₅₀), (p₅₀-p₂₅). Be specific and complete. Show your work, and discuss completely for full credit.

Case Two

Summary Intervals

Serum Creatinine and Kidney (Renal) Function

Using the context and data from Case One, let m denote the sample mean, and sd the sample standard deviation. Compute and interpret the intervals m ± 2sd and m ± 3sd, using Tchebysheff’s Inequalities and the Empirical Rule. Be specific and complete. Show your work, and discuss completely for full credit.

Numbers

number of

nonmissing the standard

values, the mean, deviation,

sercreat sercreat sercreat m-3*sd m+3*sd m-2*sd m+2*sd

69 3.4 4.2 -9.2 16.0 -5.0 11.8

n=69

m=3.4

sd=4.2

“Short Interval”

Lower2 = m – 2*sd = 3.4 – 2*4.2 = -5.0[0] (Negative concentrations don’t make sense here.)

Upper2 = m + 2*sd = 3.4 + 2*4.2 = 11.8

“Long Interval”

Lower3 = m – 3*sd = 3.4 – 3*4.2 = -9.2[0] (Negative concentrations don’t make sense here.)

Upper3 = m + 3*sd = 3.4 + 3*4.2 = 16.0

Interpretation

Tchebyshev’s Inequalities

At least 75% of the subjects in the sample have serum creatinine levels between 0 and 11.8 mg creatinine per deciliter serum.

At least 89% of the subjects in the sample have serum creatinine levels between 0 and 16.0 mg creatinine per deciliter serum.

Empirical Rule

If the serum creatinine levels cluster symmetrically around a central value, with values becoming progressively and symmetrically rarer with increasing distance from the central value, then …

approximately 95% of the subjects in the sample have serum creatinine levels between 0 and 11.8 mg creatinine per deciliter serum and

approximately 100% of the subjects in the sample have serum creatinine levels between 0 and 16.0 mg creatinine per deciliter serum.

Diseased Monkeys

A random sample of Lab Monkeys is infected with the agent that causes Disease X. The time (in hours) from infection to the appearance of symptoms of Disease X is measured for each monkey. The sample of monkeys yields the following times (in hours):

12, 26, 36, 38, 40, 42, 44, 48, 52, 62, 13, 27, 37, 38, 41, 42, 44, 49, 55, 65, 15, 30, 37, 39, 41, 44, 46, 50, 56, 70

16, 32, 38, 40, 42, 44, 48, 50, 58, 72, 18, 35, 40, 41, 42, 45, 48, 52, 58, 75

Edit the data into your calculator, and compute the following statistics: sample size (n), sample mean (m) and sample standard deviation (sd).

Compute the intervals m ± 2sd and m ± 3sd.

Apply and discuss the Empirical Rule for these intervals. Interpret each interval, using the context of the data. Do not simply state the value of the interval, interpret it. Be specific and complete.

Apply and discuss Tchebysheff’s Theorem for these intervals. Interpret each interval, using the context of the data. Do not simply state the value of the interval, interpret it. Be specific and complete.

Short Interval: m ± (2*sd)

Lower Bound = m ─ (2*sd) ≈ 42.66 ─ (2*14.0968) ≈ 14.5

Upper Bound = m + (2*sd) ≈ 42.66 + (2*14.0968) ≈ 70.8

Long Interval: m ± (3*sd)

Lower Bound = m ─ (3*sd) ≈ 42.66 ─ (3*14.0968) ≈ 0.37

Upper Bound = m + (3*sd) ≈ 42.66 + (3*14.0968) ≈ 84.9

At least 75% of the monkeys in the sample showed symptoms between 14.5 and 70.8 hours after exposure.

At least 89% of the monkeys in the sample showed symptoms between 0.37 and 84.9 hours after exposure.

If the monkey times-to-symptom cluster symmetrically around a central value, becoming rare with increasing distance from the central value, then:

Approximately 95% of the monkeys in the sample showed symptoms between 14.5 and 79.8 hours after exposure, and

Approximately 100% of the monkeys in the sample showed symptoms between 0.37 and 84.9 hours after exposure.

Barrel of Monkeysä

A random sample of people are selected, and their performance on the Barrel of Monkeysä game is measured.

Here are the instructions for this game: "Dump monkeys onto table. Pick up one monkey by an arm. Hook other arm through a second monkey's arm. Continue making a chain. Your turn is over when a monkey is dropped."

Each person makes one chain of monkeys, and the number of monkeys in each chain is recorded:

1, 2, 5, 2, 9, 12, 8, 7, 10, 9, 6, 4, 6, 9, 3, 12, 11, 10, 8, 4, 12, 7, 8, 6, 7, 8, 6, 5, 9, 10, 7, 5, 4, 3, 10, 7

7, 6, 8, 6, 6, 6, 6, 7, 8, 8, 7, 8

Edit the data into your calculator, and compute the following statistics: sample size (n), sample mean (m) and sample standard deviation (sd).

Compute the intervals m ± 2sd and m ± 3sd.

n m sd Lower2SD Upper2SD Lower3SD Upper3SD

48 6.97917 2.59697 1.78523[2] 12.1731[12] -0.81173[0] 14.7701[14]

We’re working with counts….

Short Interval, Raw: [1.78523, 12.1731], restricted to [2, 12].

-1 --- 0 --- 1 - [-- ||2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12|| -] -- 13 --- 14 --- 15

Long Interval, Raw: [-0.81173, 14.7701], restricted to [0, 14] or to [1, 14].

-1 -- [- ||0 --- 1 --- 2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12 --- 13 --- 14|| -- ] - 15

Short Interval: m ± (2*sd)

Lower Bound = m ─ (2*sd) ≈ 6.97917 ─ (2*2.59697) ≈ 2

Upper Bound = m + (2*sd) ≈ 6.97917 + (2*2.59697) ≈ 12

Long Interval: m ± (3*sd)

Lower Bound = m ─ (3*sd) ≈ 6.97917 ─ (3*2.59697) ≈ 0 (or 1)

Upper Bound = m + (3*sd) ≈ 6.97917 + (3*2.59697) ≈ 14

At least 75% of the monkey chains in the sample had between 2 ands 12 monkeys.

At least 89% of the monkey chains in the sample had between 0 (or 1) and 14 monkeys.

If the monkey chain counts cluster symmetrically around a central value, becoming rare with increasing distance from the central value, then:

approximately 95% of the monkey chains in the sample showed between 2 and 12 monkeys and

approximately 100% of the monkey chains in the sample showed between 0 (or 1) and 14 monkeys.