Session 2.2
28th June 2010
Descriptive Summary Intervals
Links
http://www.pages.drexel.edu/~tpm23/Stat201Spr04/EmpiricalTchebysheff.pdf
http://knowledgerush.com/kr/encyclopedia/Tchebysheff's_theorem/
http://faculty.roosevelt.edu/currano/M347/Lectures/3.11.Example.pdf
http://www.mathstat.carleton.ca/~lhaque/2507-chap2a.pdf
http://commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section4/roughcuts/roughcuts.htm
From http://www.mindspring.com/~cjalverson/_2ndhourlyfall2008verB_key.htm:
Case
Four | Summary Intervals | Fictitious Striped Lizard
The Fictitious Striped
Lizard is a native species of Lizard Island, and is noteworthy for the both the
quantity and quality of its spots. Consider a random sample of Fictitious
Striped Lizards, in which the number of stripes per lizard is noted:
1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 9, 9, 10, 10,
10, 11, 11, 11, 11, 11, 11, 12, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16,
16, 17, 17, 17, 17, 18, 21, 21, 21, 22, 24, 24, 24, 25, 25, 27
Let m denote the sample mean, and sd the sample standard deviation. Compute
and interpret the intervals m±2sd and m±3sd, using Tchebysheff’s
Inequalities and the Empirical Rule. Be specific and complete. Show your
work, and discuss completely for full credit.
Numbers
n
m sd
lower2 upper2
lower3 upper3
51
13.5294 6.49724 0.53493
26.5239 -5.96231 33.0211
We’re working with
counts….
Short Interval, Raw: [0.53493
26.5239], restricted to [1, 26].
0 [
||1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26|| ] 27 28 29 30
Long Interval, Raw: [ -5.96231 33.0211],
restricted to [0, 33].
-6 [
-5 -4 -3 -2 -1 ||0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33|| ] 34
Short Interval: m ± (2*sd)
Lower Bound = m ─
(2*sd) ≈
13.5294 ─ (2*6.49724) ≈ 0.53493 [1]
Upper Bound = m + (2*sd) ≈
13.5294 + (2*6.49724) ≈ 26.5239 [26]
Long Interval: m ± (3*sd)
Lower Bound = m ─
(3*sd) ≈
13.5294 ─ (3*6.49724) ≈ -5.96231 [0]
Upper Bound = m + (3*sd) ≈
13.5294 + (3*6.49724) ≈ 33.0211 [33]
Interpretation
There are 51 Fictitious
Striped lizards in our sample.
At least 75% of the
lizards in our sample have between 1 and 26 stripes.
At least 89% of the
lizards in our sample have between 0 and 33 stripes.
If the Fictitious Striped
lizard stripe counts cluster symmetrically around a central value, becoming
rare with increasing distance from the central value, then:
approximately 95% of the lizards in our sample have between 1
and 26 stripes. and approximately 100% of the lizards
in our sample have between 0 and 33 stripes.
From http://www.mindspring.com/~cjalverson/_2ndhourlyfall2006versionA_key.htm:
Case One
Descriptive Statistics
Serum Creatinine
and Kidney (Renal) Function
Healthy kidneys remove wastes and
excess fluid from the blood. Blood tests show whether the kidneys are failing
to remove wastes. Urine tests can show how quickly bdy
wastes are being removed and whether the kidneys are also leaking abnormal
amounts of protein. The nephron is the basic
structure in the kidney that produces urine. In a healthy kidney there may be
as many as 1,000,000 nephrons. Loss of nephrons reduces the ability of the kidney to function by
reducing the kidney’s ability to produce urine. Progressive loss of nephrons leads to kidney failure. Serum
creatinine. Creatinine
is a waste product that comes from meat protein in the diet and also comes from
the normal wear and tear on muscles of the body. Creatinine
is produced at a continuous rate and is excreted only through the kidneys. When
renal dysfunction occurs, the kidneys are impaired in their ability to excrete creatinine and the serum creatinine
rises. As kidney disease progresses, the level of creatinine in the blood increases.
Suppose that we sample serum creatinine levels in a random sample of adults. Serum creatinine (as mg/dL) for each
sampled subject follows:
15.0, 14.5, 14.2, 13.8, 13.5, 13.1, 12.2, 11.1, 10.1, 9.8, 8.1,
7.3, 5.1, 5.0, 4.9, 4.8, 4.0, 3.5, 3.3, 3.2, 3.2, 2.9, 2.5, 2.3, 2.1, 2.0, 1.9,
1.9, 1.8, 1.6, 1.5, 1.5, 1.4, 1.4, 1.3, 1.3, 1.3, 1.2, 1.2, 1.1, 1.12, 1.09,
1.05, 0.95, 0.92, 0.9, 0.9, 0.9, 0.9, 0.8, 0.8, 0.8, 0.8, 0.8, 0.7, 0.7, 0.7,
0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6
Compute and interpret
the following statistics: sample size (n), p00, p25, p50,
p75, p100, (p75-p00), (p100-p25),
(p75-p50), (p50-p25). Be specific and complete. Show your work, and discuss
completely for full credit.
Case Two
Summary Intervals
Serum Creatinine
and Kidney (Renal) Function
Using the context and
data from Case One,
let m denote the sample mean, and sd the sample standard deviation.
Compute and interpret the intervals m ± 2sd and
m ± 3sd, using Tchebysheff’s Inequalities and the Empirical
Rule. Be specific and complete. Show your work, and discuss completely for full
credit.
Numbers
number of
nonmissing
the standard
values, the
mean, deviation,
sercreat
sercreat sercreat m-3*sd m+3*sd
m-2*sd m+2*sd
69
3.4
4.2
-9.2
16.0
-5.0 11.8
n=69
m=3.4
sd=4.2
“Short Interval”
Lower2 = m – 2*sd = 3.4 – 2*4.2 = -5.0[0] (Negative
concentrations don’t make sense here.)
Upper2 = m + 2*sd = 3.4 + 2*4.2 = 11.8
“Long Interval”
Lower3 = m – 3*sd = 3.4 – 3*4.2 = -9.2[0]
(Negative concentrations don’t make sense here.)
Upper3 = m + 3*sd = 3.4 + 3*4.2 = 16.0
Interpretation
Tchebyshev’s Inequalities
At least 75% of the
subjects in the sample have serum creatinine levels
between 0 and 11.8 mg creatinine per deciliter serum.
At least 89% of the
subjects in the sample have serum creatinine levels
between 0 and 16.0 mg creatinine per deciliter serum.
Empirical Rule
If the serum creatinine levels cluster symmetrically around a central
value, with values becoming progressively and symmetrically rarer with
increasing distance from the central value, then …
approximately 95% of the subjects in the sample have serum creatinine levels between 0 and 11.8 mg creatinine
per deciliter serum and
approximately 100% of the subjects in the sample have serum creatinine levels between 0 and 16.0 mg creatinine
per deciliter serum.
Diseased Monkeys
A random sample of Lab Monkeys is
infected with the agent that causes Disease X. The time (in hours) from
infection to the appearance of symptoms of Disease X is measured for each monkey.
The sample of monkeys yields the following times (in hours):
12, 26, 36, 38, 40, 42,
44, 48, 52, 62, 13, 27, 37, 38, 41, 42, 44, 49, 55, 65, 15, 30, 37, 39, 41, 44,
46, 50, 56, 70
16, 32, 38, 40, 42, 44,
48, 50, 58, 72, 18, 35, 40, 41, 42, 45, 48, 52, 58, 75
Edit the data into your calculator,
and compute the following statistics: sample size (n), sample mean (m) and
sample standard deviation (sd).
Compute the intervals m ±
2sd and m ± 3sd.
Apply and discuss the Empirical Rule
for these intervals. Interpret each interval, using the context of the data. Do
not simply state the value of the interval, interpret it. Be specific and
complete.
Apply and discuss Tchebysheff’s Theorem for these intervals. Interpret each
interval, using the context of the data. Do not simply state the value of the
interval, interpret it. Be specific and complete.
Short Interval: m ± (2*sd)
Lower Bound = m ─
(2*sd) ≈ 42.66 ─
(2*14.0968) ≈ 14.5
Upper Bound = m + (2*sd) ≈ 42.66 +
(2*14.0968) ≈ 70.8
Long Interval: m ± (3*sd)
Lower Bound = m ─
(3*sd) ≈ 42.66 ─
(3*14.0968) ≈ 0.37
Upper Bound = m + (3*sd) ≈ 42.66 +
(3*14.0968) ≈ 84.9
At least 75% of the
monkeys in the sample showed symptoms between 14.5 and 70.8 hours after
exposure.
At least 89% of the
monkeys in the sample showed symptoms between 0.37 and 84.9 hours after
exposure.
If the monkey
times-to-symptom cluster symmetrically around a central value, becoming rare
with increasing distance from the central value, then:
Approximately 95% of the
monkeys in the sample showed symptoms between 14.5 and 79.8 hours after
exposure, and
Approximately 100% of the
monkeys in the sample showed symptoms between 0.37 and 84.9 hours after
exposure.
Barrel of Monkeysä
A
random sample of people are
selected, and their performance on the Barrel
of Monkeysä game is
measured.
Here are the instructions for this
game: "Dump monkeys onto table. Pick up one monkey by an arm. Hook other
arm through a second monkey's arm. Continue making a chain. Your turn is over
when a monkey is dropped."
Each person makes one chain of
monkeys, and the number of monkeys in each chain is recorded:
1,
2, 5, 2, 9, 12, 8, 7, 10, 9, 6, 4, 6, 9, 3, 12, 11, 10, 8, 4, 12, 7, 8, 6, 7,
8, 6, 5, 9, 10, 7, 5, 4, 3, 10, 7
7,
6, 8, 6, 6, 6, 6, 7, 8, 8, 7, 8
Edit the data into your calculator,
and compute the following statistics: sample size (n), sample mean (m) and
sample standard deviation (sd).
Compute the intervals m ±
2sd and m ± 3sd.
Apply and discuss the Empirical Rule
for these intervals. Interpret each interval, using the context of the data. Do
not simply state the value of the interval, interpret it. Be specific and
complete.
Apply and discuss Tchebysheff’s Theorem for these intervals. Interpret each
interval, using the context of the data. Do not simply state the value of the
interval, interpret it. Be specific and complete.
n
m
sd
Lower2SD
Upper2SD
Lower3SD Upper3SD
48
6.97917 2.59697
1.78523[2] 12.1731[12]
-0.81173[0] 14.7701[14]
We’re working with
counts….
Short Interval, Raw:
[1.78523, 12.1731], restricted to [2, 12].
-1 --- 0 --- 1 - [--
||2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12|| -] --
13 --- 14 --- 15
Long Interval, Raw:
[-0.81173, 14.7701], restricted to [0, 14] or to [1, 14].
-1 -- [- ||0 --- 1 ---
2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12 --- 13 ---
14|| -- ] - 15
Short Interval: m ± (2*sd)
Lower Bound = m ─
(2*sd) ≈ 6.97917
─ (2*2.59697) ≈ 2
Upper Bound = m + (2*sd) ≈ 6.97917 +
(2*2.59697) ≈ 12
Long Interval: m ± (3*sd)
Lower Bound = m ─
(3*sd) ≈ 6.97917
─ (3*2.59697) ≈ 0 (or 1)
Upper Bound = m + (3*sd) ≈ 6.97917 +
(3*2.59697) ≈ 14
At least 75% of the monkey
chains in the sample had between 2 ands 12 monkeys.
At least 89% of the
monkey chains in the sample had between 0 (or 1) and
14 monkeys.
If the monkey chain
counts cluster symmetrically around a central value, becoming rare with
increasing distance from the central value, then:
approximately 95% of the monkey chains in the sample showed
between 2 and 12 monkeys and
approximately 100% of the monkey chains in the sample showed
between 0 (or 1) and 14 monkeys.
Confidence Estimation of the
Population Mean
In theory, we can compute the
population mean face value of a fair, six-sided d6 with face values 1,2,3,4,5,6 as
M = 1*Pr{d6 shows 1}+2*Pr{d6 shows
2}+3*Pr{d6 shows 3}+4*Pr{d6 shows 4}+5*Pr{d6 shows 5}+6*Pr{d6 shows 6}
M = 1*(1/6) +2*(1/6)+3*(1/6)+4*(1/6)+5*(1/6)+6*(1/6) = 3.5
Our 95% confidence
interval estimation process should produce intervals containing this population
mean in approximately 95% of samples.
Here is an example from Summer 2009:
Sample |
1 |
2 |
3 |
4 |
5 |
6 |
n |
m |
sd |
se |
lower |
upper |
M |
|
Perfect50 |
8.333 |
8.333 |
8.333 |
8.333 |
8.333 |
8.333 |
50 |
3.5 |
1.725 |
0.244 |
3.022 |
3.978 |
3.5 |
|
1 |
11 |
5 |
7 |
7 |
10 |
10 |
50 |
3.6 |
1.852 |
0.262 |
3.087 |
4.113 |
3.5 |
Hit |
2 |
7 |
5 |
13 |
10 |
11 |
4 |
50 |
3.5 |
1.502 |
0.212 |
3.084 |
3.916 |
3.5 |
Hit |
3 |
11 |
5 |
7 |
13 |
7 |
7 |
50 |
3.42 |
1.715 |
0.243 |
2.945 |
3.895 |
3.5 |
Hit |
4 |
7 |
9 |
7 |
15 |
8 |
4 |
50 |
3.4 |
1.512 |
0.214 |
2.981 |
3.819 |
3.5 |
Hit |
5 |
11 |
6 |
10 |
9 |
8 |
6 |
50 |
3.3 |
1.693 |
0.239 |
2.831 |
3.769 |
3.5 |
Hit |
6 |
7 |
3 |
7 |
13 |
10 |
10 |
50 |
3.92 |
1.639 |
0.232 |
3.466 |
4.374 |
3.5 |
Hit |
7 |
8 |
5 |
9 |
10 |
10 |
8 |
50 |
3.66 |
1.673 |
0.237 |
3.196 |
4.124 |
3.5 |
Hit |
8 |
6 |
8 |
9 |
10 |
6 |
11 |
50 |
3.7 |
1.693 |
0.239 |
3.231 |
4.169 |
3.5 |
Hit |
9 |
14 |
10 |
7 |
9 |
3 |
7 |
50 |
2.96 |
1.749 |
0.247 |
2.475 |
3.445 |
3.5 |
Miss |
10 |
9 |
7 |
12 |
9 |
9 |
4 |
50 |
3.28 |
1.565 |
0.221 |
2.846 |
3.714 |
3.5 |
Hit |
11 |
9 |
10 |
4 |
3 |
11 |
13 |
50 |
3.72 |
1.938 |
0.274 |
3.183 |
4.257 |
3.5 |
Hit |
12 |
7 |
7 |
7 |
10 |
8 |
11 |
50 |
3.76 |
1.733 |
0.245 |
3.28 |
4.24 |
3.5 |
Hit |
13 |
10 |
11 |
8 |
8 |
5 |
8 |
50 |
3.22 |
1.741 |
0.246 |
2.737 |
3.703 |
3.5 |
Hit |
14 |
9 |
6 |
11 |
11 |
6 |
7 |
50 |
3.4 |
1.641 |
0.232 |
2.945 |
3.855 |
3.5 |
Hit |
15 |
7 |
5 |
8 |
7 |
12 |
11 |
50 |
3.9 |
1.729 |
0.245 |
3.421 |
4.379 |
3.5 |
Hit |
16 |
10 |
12 |
7 |
12 |
3 |
6 |
50 |
3.08 |
1.627 |
0.23 |
2.629 |
3.531 |
3.5 |
Hit |
17 |
5 |
9 |
6 |
7 |
14 |
9 |
50 |
3.86 |
1.666 |
0.236 |
3.398 |
4.322 |
3.5 |
Hit |
18 |
9 |
5 |
8 |
9 |
10 |
9 |
50 |
3.66 |
1.745 |
0.247 |
3.176 |
4.144 |
3.5 |
Hit |
19 |
9 |
9 |
9 |
5 |
9 |
9 |
50 |
3.46 |
1.787 |
0.253 |
2.965 |
3.955 |
3.5 |
Hit |
20 |
7 |
8 |
7 |
8 |
12 |
8 |
50 |
3.68 |
1.696 |
0.24 |
3.21 |
4.15 |
3.5 |
Hit |
21 |
7 |
10 |
6 |
10 |
9 |
8 |
50 |
3.56 |
1.692 |
0.239 |
3.091 |
4.029 |
3.5 |
Hit |
22 |
3 |
12 |
15 |
8 |
3 |
9 |
50 |
3.46 |
1.528 |
0.216 |
3.036 |
3.884 |
3.5 |
Hit |
Success
Rate |
95% |
20.9 |
20
or 21 |
Sample
Success Rate |
0.955 |
|||||||||
Failure
Rate |
5% |
1.1 |
1
or 2 |
Sample
Failure Rate |
0.045 |
Here are our
current samples:
Sample |
1 |
2 |
3 |
4 |
5 |
6 |
mean |
sd |
se |
Lower95 |
Upper95 |
Mean |
Status |
Perfect |
8.333333 |
8.333333 |
8.333333 |
8.333333 |
8.333333 |
8.333333 |
3.5 |
1.725164 |
0.243975 |
3.01205 |
3.98795 |
3.5 |
Perfect |
#1 |
8 |
10 |
6 |
11 |
5 |
10 |
3.5 |
1.752549 |
0.247848 |
3.004304 |
3.995696 |
3.5 |
Hit |
#2 |
11 |
11 |
7 |
6 |
7 |
8 |
3.22 |
1.798979 |
0.254414 |
2.711172 |
3.728828 |
3.5 |
Hit |
#3 |
6 |
8 |
8 |
6 |
10 |
12 |
3.84 |
1.75383 |
0.248029 |
3.343942 |
4.336058 |
3.5 |
Hit |
#4 |
6 |
12 |
6 |
9 |
12 |
5 |
3.48 |
1.606619 |
0.22721 |
3.02558 |
3.93442 |
3.5 |
Hit |
#5 |
8 |
6 |
7 |
10 |
12 |
7 |
3.66 |
1.673442 |
0.23666 |
3.186679 |
4.133321 |
3.5 |
Hit |
#6 |
9 |
9 |
4 |
11 |
7 |
10 |
3.56 |
1.797504 |
0.254205 |
3.051589 |
4.068411 |
3.5 |
Hit |
#7 |
10 |
7 |
5 |
9 |
9 |
10 |
3.6 |
1.829464 |
0.258725 |
3.082549 |
4.117451 |
3.5 |
Hit |
#8 |
14 |
3 |
5 |
12 |
10 |
6 |
3.38 |
1.794436 |
0.253772 |
2.872457 |
3.887543 |
3.5 |
Hit |
#9 |
7 |
14 |
6 |
7 |
10 |
6 |
3.34 |
1.673442 |
0.23666 |
2.866679 |
3.813321 |
3.5 |
Hit |
#10 |
12 |
14 |
6 |
11 |
2 |
5 |
2.84 |
1.595402 |
0.225624 |
2.388752 |
3.291248 |
3.5 |
Miss |
#11 |
8 |
7 |
15 |
3 |
4 |
13 |
3.54 |
1.809386 |
0.255886 |
3.028228 |
4.051772 |
3.5 |
Hit |
#12 |
12 |
10 |
11 |
2 |
7 |
8 |
3.12 |
1.802945 |
0.254975 |
2.61005 |
3.62995 |
3.5 |
Hit |
#13 |
4 |
12 |
8 |
11 |
5 |
10 |
3.62 |
1.627443 |
0.230155 |
3.15969 |
4.08031 |
3.5 |
Hit |
#14 |
9 |
3 |
12 |
9 |
8 |
9 |
3.62 |
1.70102 |
0.240561 |
3.138879 |
4.101121 |
3.5 |
Hit |
#15 |
6 |
9 |
11 |
9 |
6 |
9 |
3.54 |
1.643913 |
0.232484 |
3.075031 |
4.004969 |
3.5 |
Hit |
#16 |
6 |
14 |
6 |
8 |
8 |
8 |
3.44 |
1.692239 |
0.239319 |
2.961362 |
3.918638 |
3.5 |
Hit |
#17 |
6 |
9 |
8 |
5 |
6 |
16 |
3.88 |
1.847668 |
0.2613 |
3.357401 |
4.402599 |
3.5 |
Hit |
#18 |
8 |
7 |
8 |
8 |
9 |
6 |
3.18 |
1.63464 |
0.231173 |
2.717654 |
3.642346 |
3.5 |
Hit |
#19 |
11 |
12 |
7 |
4 |
5 |
11 |
3.26 |
1.893167 |
0.267734 |
2.724531 |
3.795469 |
3.5 |
Hit |
#20 |
10 |
8 |
4 |
8 |
8 |
12 |
3.64 |
1.892628 |
0.267658 |
3.104684 |
4.175316 |
3.5 |
Hit |
Confidence Interval
General Mean
Diseased Monkeys
Objective: Be able to
perform interval estimation of the population mean using the confidence
interval method. Be able to fully discuss the confidence interval.
This discussion must include a clear description of the population and the population mean, the family of samples, the family of
intervals and how the confidence applies to the family of intervals.
A random sample of Lab Monkeys is
infected with the agent that causes Disease X. The time (in hours) from
infection to the appearance of symptoms of Disease X is measured for each
monkey. The sample of monkeys yields the following times (in hours):
12, 26, 36, 38, 40, 42,
44, 48, 52, 62, |
13, 27, 37, 38, 41, 42,
44, 49, 55, 65, |
15, 30, 37, 39, 41, 44,
46, 50, 56, 70, |
16, 32, 38, 40, 42, 44,
48, 50, 58, 72, |
18, 35, 40, 41, 42, 45,
48, 52, 58, 75 |
Follow the steps:
Edit the data into
your calculator, and compute the following statistics: sample size, sample
mean, sample standard deviation.
N
M
SD Z
LOBOUND HIBOUND
50
42.66 14.0968 1.96
38.7526 46.5674
Identify the Population
Mean for this Sample.
We seek the population
mean time to symptoms, in hours for disease X among the population of Lab
Monkeys.
Consult the Normal
Table, and determine the SD Multiplier required to ensure
95%Confidence. Justify the approach.
Since we need
approximate 95% confidence, we need a number somewhat larger than 1.95, but
2.00 is more than we need. In practice, the number that we need is 1.96. But
you should use 2.00 from your table…Here are the rows from the table:
1.95
0.025588 0.94882
2.00
0.022750 0.95450
The cost of this
approach is the availability of large random samples – n > 30 will usually
suffice.
Compute a 95%
Confidence Interval for the true but unknown population mean in this problem.
Compute
LOBOUND @ M – Z*(SD/ÖN) = 42.66-1.96*(14.0968/Ö50) @ 38.7526
and
HIBOUND @ M + Z*(SD/ÖN) = 42.66+1.96*(14.0968/Ö50) @ 46.5674.
Write the approximate
interval as: [38.8,46.6]. This is our approximate
interval.
Discuss the Family of
Intervals for this problem.
Each member of this
Family is a single random sample of n=50 Lab Monkeys. The Family of Samples (FoS) consists of every possible
random sample of n=50 Lab Monkeys. Each member of the FoS
yields the following statistics: { n(sample size),
m(sample mean) and sd(sample std deviation}. For this
FoS, n=50 for member
samples, but m and sd will vary from member to
member. Each member of the FoS
yields an interval of the form:
[m – 1.96*(sd/Ön), m + 1.96*(sd/Ön)].
These intervals
collectively form a Family of Intervals(FoI) – each member of the FoI is
an interval derived from a member of the FoS.
Approximately 95% of these intervals contain the true population mean time (in
hours) to symptoms of Disease X in Lab Monkeys, and the approximately 5% fail.
Interpret the Single
Confidence Interval for this problem.
If our interval
captures the true population mean, then the mean time to symptoms of Disease X
in Lab Monkeys is between 38.8 and 46.6 hours.
Confidence Interval
General Mean
Generic Fictitious
Spiders
Objective: Be able to
perform interval estimation of the population mean using the confidence
interval method. Be able to fully discuss the confidence interval.
This discussion must include a clear description of the population and the population mean, the family of samples, the family of
intervals and how the confidence applies to the family of intervals.
Generic Fictitious Spiders
We have a sample of Generic
Fictitious Spiders. Each spider's diameter (maximum length,
in cm, from leg tip to leg tip).
The spider diameters are listed
below:
16.5, 21.9 22.0, 22.8 22.8, 23.4 23.5, 23.7 24.4, 24.4 |
24.7, 28.1 28.3, 28.3
28.6, 29.1 30.3, 30.4 31.7, 31.7 |
32.4, 33.1 34.1, 35.0 35.0, 35.3 35.8, 36.7 37.1, 38.4 |
38.4, 38.6 38.7, 39.5
40.2, 41.9 43.3, 43.6 50.1, 52.4 |
Follow the steps:
Edit the
data into your calculator, and compute the following statistics: sample size,
sample mean, sample standard deviation.
Statistic |
Value |
Comment |
Sample Size |
n=40 |
There are n=40 spiders in
the sample. |
Sample Mean |
m=32.4 |
The average diameter of
spiders in the sample is 32.4 cms. |
Sample SD |
sd=8.09 |
No comment here. |
Identify the
Population Mean for this Sample.
We are after the
Population Mean Maximum Spider Diameter.
Consult the Normal
Table, and determine the SD Multiplier required to ensure
90%
We want z(k)=1.65 from the table, as discussed in class.
Confidence. Justify the approach.
We are working with
large (n>=30) random samples, and are working with sample means.
Compute a 90%
Confidence Interval for the true but unknown population mean in this problem.
m-1.65(sd/Ö n) = 32.4 - 1.65*(8.09/Ö 40) = 30.29;
m+1.65(sd/Ö n) = 32.4 +
1.65*(8.09/Ö 40) = 34.51;
Write the interval as
[30.29,34.51].
Discuss the Family of
Intervals for this problem.
Each member of our
Family is a random sample of 40 spiders from our population. The Family of
Samples consists of every possible sample of this type.
From each member,
compute the interval
m ± 1.65(sd/Ö40);
where m is the sample mean and sd
is the sample standard deviation for the sample.
Each member of the
Family of Intervals is obtained in this way from the Family of Samples, and
consists of all such intervals.
Approximately 90% of
the Family of Intervals captures the population mean maximum spider diameter.
The remaining intervals do not capture the population mean.
Interpret the Single
Confidence Interval for this problem.
If our interval
contains the population mean, then the true population mean
maximum diameter for Generic Fictitious Spiders is between 30.29 and
34.51 centimeters.
From here: http://www.mindspring.com/~cjalverson/3rdhourlyfall2008versionA_key.htm
Case One | Confidence Interval, Mean | Glioblastoma
Multiforme
Glioblastoma multiforme
(GBM) is the highest
grade glioma tumor and is the most malignant form of astrocytomas. These tumors originate in the brain. GBM
tumors grow rapidly, invade nearby tissue and contain cells that are very
malignant. GBM are among the most common and devastating primary brain tumors
in adults.
Suppose that we
have a random sample of GBM patients, with survival time (in weeks) listed
below:
3, 4, 5, 5, 12, 15, 17, 20, 21, 22, 23, 24, 25, 26, 27,
30, 31, 37, 38, 45, 48, 51, 53, 53, 57, 60, 61, 62, 63, 64, 65, 65, 65, 66, 66,
67, 68, 69, 72, 72, 73, 74, 76, 77, 78, 79, 80, 80, 81, 82, 83, 83, 85, 86, 87,
90, 150, 180,
Estimate the population
mean survival time for Glioblastoma multiforme patients
with 99% confidence. That is, compute and discuss a 99%
confidence interval for this population mean. Provide concise and complete
details and discussion as demonstrated in the case study summaries.
Table 1. Means and Proportions
Z(k) PROBRT
PROBCENT 0.05 0.48006 0.03988 0.10 0.46017 0.07966 0.15 0.44038 0.11924 0.20 0.42074 0.15852 0.25 0.40129 0.19741 0.30 0.38209 0.23582 0.35 0.36317 0.27366 0.40 0.34458 0.31084 0.45 0.32636 0.34729 0.50 0.30854 0.38292 0.55 0.29116 0.41768 0.60 0.27425 0.45149 0.65 0.25785 0.48431 0.70 0.24196 0.51607 0.75 0.22663 0.54675 0.80 0.21186 0.57629 0.85 0.19766 0.60467 0.90 0.18406 0.63188 0.95 0.17106 0.65789 1.00 0.15866 0.68269 |
Z(k) PROBRT PROBCENT 1.05 0.14686 0.70628 1.10 0.13567 0.72867 1.15 0.12507 0.74986 1.20 0.11507 0.76986 1.25 0.10565 0.78870 1.30 0.09680 0.80640 1.35 0.088508 0.82298 1.40 0.080757 0.83849 1.45 0.073529 0.85294 1.50 0.066807 0.86639 1.55 0.060571 0.87886 1.60 0.054799 0.89040 1.65 0.049471 0.90106 1.70 0.044565 0.91087 1.75 0.040059 0.91988 1.80 0.035930 0.92814 1.85 0.032157 0.93569 1.90 0.028717 0.94257 1.95 0.025588 0.94882 2.00 0.022750 0.95450 |
Z(k) PROBRT PROBCENT 2.05 0.020182 0.95964 2.10 0.017864 0.96427 2.15 0.015778 0.96844 2.20 0.013903 0.97219 2.25 0.012224 0.97555 2.30 0.010724 0.97855 2.35 0.009387 0.98123 2.40 0.008198 0.98360 2.45 0.007143 0.98571 2.50 0.006210 0.98758 2.55 0.005386 0.98923 2.60 0.004661 0.99068 2.65 0.004025 0.99195 2.70 .0034670 0.99307 2.75 .0029798 0.99404 2.80 .0025551 0.99489 2.85 .0021860 0.99563 2.90 .0018658 0.99627 2.95 .0015889 0.99682 3.00 .0013499 0.99730 |
Numbers
n
m sd
se
Z lower
upper
58
56.91
33.12
4.35
2.60
45.61 68.22
se = sd/sqrt(n) » 33.12/sqrt(58) » 4.35
z » 2.60 for 99% confidence from 2.60 0.004661 0.99068
lower = m ─ (z*se) » 56.91 ─ (2.60*4.35) » 45.61
upper = m + (z*se) » 56.91 + (2.60*4.35) » 68.22
Report the interval as [45.6,
68.2].
Interpretation
Our population is
the population of Glioblastoma multiforme patients and our population mean is the mean survival time (weeks).
Our Family of Samples
(FoS) consists of every
possible random sample of 58 Glioblastoma multiforme patients. From each individual sampled Glioblastoma multiforme patients, survival time in weeks is obtained.
From each member sample
of the FoS, we compute the
sample mean (m) and standard deviation (sd) for GBM
survival time, and then compute the interval
[m
– 2.60*( sd/sqrt(n)), m + 2.60*( sd/sqrt(n))].
Computing this interval
for each member sample of the FoS,
we obtain a Family of Intervals (FoI),
approximately 99% of which cover the true population mean survival time in
weeks for Glioblastoma multiforme patients.
If our interval, [45.6,
68.2] is among the approximate 99% super-majority of intervals that cover
the population mean, then the true population mean survival time for Glioblastoma multiforme patients is between 45.6 and 68.2 weeks.
From here: http://www.mindspring.com/~cjalverson/CompFinalSpring2008verMondayKey.htm
Table 1. Means and Proportions
Z(k) PROBRT
PROBCENT 0.05 0.48006 0.03988 0.10 0.46017 0.07966 0.15 0.44038 0.11924 0.20 0.42074 0.15852 0.25 0.40129 0.19741 0.30 0.38209 0.23582 0.35 0.36317 0.27366 0.40 0.34458 0.31084 0.45 0.32636 0.34729 0.50 0.30854 0.38292 0.55 0.29116 0.41768 0.60 0.27425 0.45149 0.65 0.25785 0.48431 0.70 0.24196 0.51607 0.75 0.22663 0.54675 0.80 0.21186 0.57629 0.85 0.19766 0.60467 0.90 0.18406 0.63188 0.95 0.17106 0.65789 1.00 0.15866 0.68269 |
Z(k) PROBRT PROBCENT 1.05 0.14686 0.70628 1.10 0.13567 0.72867 1.15 0.12507 0.74986 1.20 0.11507 0.76986 1.25 0.10565 0.78870 1.30 0.09680 0.80640 1.35 0.088508 0.82298 1.40 0.080757 0.83849 1.45 0.073529 0.85294 1.50 0.066807 0.86639 1.55 0.060571 0.87886 1.60 0.054799 0.89040 1.65 0.049471 0.90106 1.70 0.044565 0.91087 1.75 0.040059 0.91988 1.80 0.035930 0.92814 1.85 0.032157 0.93569 1.90 0.028717 0.94257 1.95 0.025588 0.94882 2.00 0.022750 0.95450 |
Z(k) PROBRT PROBCENT 2.05 0.020182 0.95964 2.10 0.017864 0.96427 2.15 0.015778 0.96844 2.20 0.013903 0.97219 2.25 0.012224 0.97555 2.30 0.010724 0.97855 2.35 0.009387 0.98123 2.40 0.008198 0.98360 2.45 0.007143 0.98571 2.50 0.006210 0.98758 2.55 0.005386 0.98923 2.60 0.004661 0.99068 2.65 0.004025 0.99195 2.70 .0034670 0.99307 2.75 .0029798 0.99404 2.80 .0025551 0.99489 2.85 .0021860 0.99563 2.90 .0018658 0.99627 2.95 .0015889 0.99682 3.00 .0013499 0.99730 |
Case Four | Confidence
Interval for Mean | Gestational Age
Consider the population
mean gestational ages (in weeks) at birth of Year 2005 US Resident Live Births.
Using the data from Case Two, compute and interpret a 93% confidence
interval for this population mean.
Numbers
From 1.85 0.032157
0.93569, z=1.85.
n = 56
m » 37.34
sd » 3.9278
lowCI = m − z*(sd/sqrt(n))
» 37.34 − 1.85*(3.9278/sqrt(56))
» 36.37
highCI = m + z*(sd/sqrt(n))
» 37.34 + 1.85*(3.9278/sqrt(56))
» 38.31
Report the interval as
[36.4, 38.3].
Interpretation
Our population is the
population of year 2005 US resident live born infants and our population mean
is the mean gestational age (weeks).
Our Family of Samples (FoS) consists of every possible
random sample of 56 year 2005 US resident live born infants. From each
individual sampled live born infant, gestational age in weeks is obtained.
From each member sample
of the FoS, we compute the
sample mean (m) and standard deviation (sd) for serum
CRP, and then compute the interval
[m – 1.85*( sd/sqrt(n)), m + 1.85*( sd/sqrt(n))].
Computing this interval
for each member sample of the FoS,
we obtain a Family of Intervals (FoI), approximately
93% of which cover the true population mean gestational age in weeks for year
2005 US resident live born infants.
If our interval, [36.4,
38.3] is among the approximate 93% super-majority of intervals that cover the
population mean, then the true population mean gestational age is between 36.4
and 38.3 weeks for year 2005 US resident live born infants.