Session 2.4
Descriptive Summary Intervals
14th October 2009
Links
http://www.pages.drexel.edu/~tpm23/Stat201Spr04/EmpiricalTchebysheff.pdf
http://knowledgerush.com/kr/encyclopedia/Tchebysheff's_theorem/
http://faculty.roosevelt.edu/currano/M347/Lectures/3.11.Example.pdf
http://www.mathstat.carleton.ca/~lhaque/2507-chap2a.pdf
http://commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section4/roughcuts/roughcuts.htm
From http://www.mindspring.com/~cjalverson/_2ndhourlyfall2008verB_key.htm:
Case
Four | Summary Intervals | Fictitious Striped Lizard
The Fictitious Striped
Lizard is a native species of Lizard Island, and is noteworthy for the both the
quantity and quality of its spots. Consider a random sample of Fictitious
Striped Lizards, in which the number of stripes per lizard is noted:
1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 11,
11, 11, 12, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 17, 17, 17, 17,
18, 21, 21, 21, 22, 24, 24, 24, 25, 25, 27
Let m denote the sample mean, and sd the sample standard deviation. Compute
and interpret the intervals m±2sd and m±3sd, using Tchebysheff’s
Inequalities and the Empirical Rule. Be specific and complete. Show your
work, and discuss completely for full credit.
Numbers
n
m sd
lower2 upper2
lower3 upper3
51
13.5294 6.49724 0.53493
26.5239 -5.96231 33.0211
We’re working with
counts….
Short Interval, Raw: [0.53493
26.5239], restricted to [1, 26].
0 [
||1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26|| ] 27 28 29 30
Long Interval, Raw: [ -5.96231 33.0211],
restricted to [0, 33].
-6 [
-5 -4 -3 -2 -1 ||0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33|| ] 34
Short Interval: m ± (2*sd)
Lower Bound = m ─
(2*sd) ≈
13.5294 ─ (2*6.49724) ≈ 0.53493 [1]
Upper Bound = m + (2*sd) ≈
13.5294 + (2*6.49724) ≈ 26.5239 [26]
Long Interval: m ± (3*sd)
Lower Bound = m ─
(3*sd) ≈
13.5294 ─ (3*6.49724) ≈ -5.96231 [0]
Upper Bound = m + (3*sd) ≈
13.5294 + (3*6.49724) ≈ 33.0211 [33]
Interpretation
There are 51 Fictitious
Striped lizards in our sample.
At least 75% of the
lizards in our sample have between 1 and 26 stripes.
At least 89% of the
lizards in our sample have between 0 and 33 stripes.
If the Fictitious Striped
lizard stripe counts cluster symmetrically around a central value, becoming
rare with increasing distance from the central value, then:
approximately 95% of the lizards in our sample have between 1 and 26 stripes.
and
approximately 100% of the lizards in our sample have between 0 and 33 stripes.
From http://www.mindspring.com/~cjalverson/_2ndhourlyfall2006versionA_key.htm:
Case One
Descriptive Statistics
Serum
Creatinine and Kidney (Renal) Function
Healthy kidneys remove wastes and
excess fluid from the blood. Blood tests show whether the kidneys are failing
to remove wastes. Urine tests can show how quickly bdy
wastes are being removed and whether the kidneys are also leaking abnormal
amounts of protein. The nephron is the basic
structure in the kidney that produces urine. In a healthy kidney there may be
as many as 1,000,000 nephrons. Loss of nephrons reduces the ability of the kidney to function by
reducing the kidney’s ability to produce urine. Progressive loss of nephrons leads to kidney failure. Serum
creatinine. Creatinine
is a waste product that comes from meat protein in the diet and also comes from
the normal wear and tear on muscles of the body. Creatinine
is produced at a continuous rate and is excreted only through the kidneys. When
renal dysfunction occurs, the kidneys are impaired in their ability to excrete creatinine and the serum creatinine
rises. As kidney disease progresses, the level of creatinine in the blood increases.
Suppose that we sample serum creatinine levels in a random sample of adults. Serum creatinine (as mg/dL) for each
sampled subject follows:
15.0, 14.5, 14.2, 13.8, 13.5, 13.1, 12.2, 11.1, 10.1, 9.8, 8.1,
7.3, 5.1, 5.0, 4.9, 4.8, 4.0, 3.5, 3.3, 3.2, 3.2, 2.9, 2.5, 2.3, 2.1, 2.0, 1.9,
1.9, 1.8, 1.6, 1.5, 1.5, 1.4, 1.4, 1.3, 1.3, 1.3, 1.2, 1.2, 1.1, 1.12, 1.09,
1.05, 0.95, 0.92, 0.9, 0.9, 0.9, 0.9, 0.8, 0.8, 0.8, 0.8, 0.8, 0.7, 0.7, 0.7,
0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6
Compute and interpret
the following statistics: sample size (n), p00, p25, p50,
p75, p100, (p75-p00), (p100-p25),
(p75-p50), (p50-p25). Be specific and complete. Show your work, and discuss
completely for full credit.
Case Two
Summary Intervals
Serum Creatinine and Kidney
(Renal) Function
Using the context and
data from Case One,
let m denote the sample mean, and sd the sample standard deviation.
Compute and interpret the intervals m ± 2sd and
m ± 3sd, using Tchebysheff’s Inequalities and the
Empirical Rule. Be specific and complete. Show your work, and discuss
completely for full credit.
Numbers
number of
nonmissing
the standard
values, the
mean, deviation,
sercreat
sercreat sercreat m-3*sd m+3*sd
m-2*sd m+2*sd
69
3.4
4.2
-9.2
16.0
-5.0 11.8
n=69
m=3.4
sd=4.2
“Short Interval”
Lower2 = m – 2*sd = 3.4 – 2*4.2 = -5.0[0]
(Negative concentrations don’t make sense here.)
Upper2 = m + 2*sd = 3.4 + 2*4.2 = 11.8
“Long Interval”
Lower3 = m – 3*sd = 3.4 – 3*4.2 = -9.2[0]
(Negative concentrations don’t make sense here.)
Upper3 = m + 3*sd = 3.4 + 3*4.2 = 16.0
Interpretation
Tchebyshev’s Inequalities
At least 75% of the
subjects in the sample have serum creatinine levels
between 0 and 11.8 mg creatinine per deciliter serum.
At least 89% of the
subjects in the sample have serum creatinine levels
between 0 and 16.0 mg creatinine per deciliter serum.
Empirical Rule
If the serum creatinine levels cluster symmetrically around a central
value, with values becoming progressively and symmetrically rarer with
increasing distance from the central value, then …
approximately 95% of the subjects in the sample have serum creatinine levels between 0 and 11.8 mg creatinine
per deciliter serum and
approximately 100% of the subjects in the sample have serum creatinine levels between 0 and 16.0 mg creatinine
per deciliter serum.
Diseased Monkeys
A random sample of Lab Monkeys is
infected with the agent that causes Disease X. The time (in hours) from
infection to the appearance of symptoms of Disease X is measured for each
monkey. The sample of monkeys yields the following times (in hours):
12, 26, 36, 38, 40, 42,
44, 48, 52, 62, 13, 27, 37, 38, 41, 42, 44, 49, 55, 65, 15, 30, 37, 39, 41, 44,
46, 50, 56, 70
16, 32, 38, 40, 42, 44,
48, 50, 58, 72, 18, 35, 40, 41, 42, 45, 48, 52, 58, 75
Edit the data into your calculator,
and compute the following statistics: sample size (n), sample mean (m) and
sample standard deviation (sd).
Compute the intervals m ±
2sd and m ± 3sd.
Apply and discuss the Empirical Rule
for these intervals. Interpret each interval, using the context of the data. Do
not simply state the value of the interval, interpret it. Be specific and
complete.
Apply and discuss Tchebysheff’s Theorem for these intervals. Interpret each
interval, using the context of the data. Do not simply state the value of the
interval, interpret it. Be specific and complete.
Short Interval: m ± (2*sd)
Lower Bound = m ─
(2*sd) ≈ 42.66
─ (2*14.0968) ≈ 14.5
Upper Bound = m + (2*sd) ≈ 42.66 +
(2*14.0968) ≈ 70.8
Long Interval: m ± (3*sd)
Lower Bound = m ─
(3*sd) ≈ 42.66
─ (3*14.0968) ≈ 0.37
Upper Bound = m + (3*sd) ≈ 42.66 +
(3*14.0968) ≈ 84.9
At least 75% of the
monkeys in the sample showed symptoms between 14.5 and 70.8 hours after
exposure.
At least 89% of the
monkeys in the sample showed symptoms between 0.37 and 84.9 hours after
exposure.
If the monkey
times-to-symptom cluster symmetrically around a central value, becoming rare
with increasing distance from the central value, then:
Approximately 95% of the
monkeys in the sample showed symptoms between 14.5 and 79.8 hours after
exposure, and
Approximately 100% of the
monkeys in the sample showed symptoms between 0.37 and 84.9 hours after
exposure.
Barrel of Monkeysä
A
random sample of people are
selected, and their performance on the Barrel
of Monkeysä game is
measured.
Here are the instructions for this
game: "Dump monkeys onto table. Pick up one monkey by an arm. Hook other
arm through a second monkey's arm. Continue making a chain. Your turn is over
when a monkey is dropped."
Each person makes one chain of
monkeys, and the number of monkeys in each chain is recorded:
1,
2, 5, 2, 9, 12, 8, 7, 10, 9, 6, 4, 6, 9, 3, 12, 11, 10, 8, 4, 12, 7, 8, 6, 7,
8, 6, 5, 9, 10, 7, 5, 4, 3, 10, 7
7,
6, 8, 6, 6, 6, 6, 7, 8, 8, 7, 8
Edit the data into your calculator,
and compute the following statistics: sample size (n), sample mean (m) and
sample standard deviation (sd).
Compute the intervals m ±
2sd and m ± 3sd.
Apply and discuss the Empirical Rule
for these intervals. Interpret each interval, using the context of the data. Do
not simply state the value of the interval, interpret it. Be specific and
complete.
Apply and discuss Tchebysheff’s Theorem for these intervals. Interpret each
interval, using the context of the data. Do not simply state the value of the
interval, interpret it. Be specific and complete.
n
m
sd
Lower2SD
Upper2SD
Lower3SD Upper3SD
48
6.97917 2.59697
1.78523[2] 12.1731[12]
-0.81173[0] 14.7701[14]
We’re working with
counts….
Short Interval, Raw:
[1.78523, 12.1731], restricted to [2, 12].
-1 --- 0 --- 1 - [--
||2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12|| -] --
13 --- 14 --- 15
Long Interval, Raw:
[-0.81173, 14.7701], restricted to [0, 14] or to [1, 14].
-1 -- [- ||0 --- 1 ---
2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12 --- 13 ---
14|| -- ] - 15
Short Interval: m ± (2*sd)
Lower Bound = m ─
(2*sd) ≈ 6.97917
─ (2*2.59697) ≈ 2
Upper Bound = m + (2*sd) ≈ 6.97917 +
(2*2.59697) ≈ 12
Long Interval: m ± (3*sd)
Lower Bound = m ─
(3*sd) ≈ 6.97917
─ (3*2.59697) ≈ 0 (or 1)
Upper Bound = m + (3*sd) ≈ 6.97917 +
(3*2.59697) ≈ 14
At least 75% of the
monkey chains in the sample had between 2 ands 12
monkeys.
At least 89% of the
monkey chains in the sample had between 0 (or 1) and
14 monkeys.
If the monkey chain
counts cluster symmetrically around a central value, becoming rare with
increasing distance from the central value, then:
approximately 95% of the monkey chains in the sample showed
between 2 and 12 monkeys and
approximately 100% of the monkey chains in the sample showed
between 0 (or 1) and 14 monkeys.