Summaries
Session 1.2
3rd June 2009
Predicting Sample Behavior from a
Model
We use a population model to predict
the behavior of random samples. We check the predictions by direct inspection
of samples. We repeat sampling with replacement, obtaining multiple random
samples from the same population, obtained in the same process. We combine
(pool) compatible samples to form larger samples. Pooling samples of size 50,
we obtain samples of size 100, 150 and 300. In general, as sample size
increases, samples become more precise and reliable, provided that the sampling
process is reliable.
In general, if we are working with
the correct model, then the predicted sample behavior reliably describes
observed samples.
Session Overview
Estimation: Previously, we saw how random samples drawn from a
population could be used to estimate the structure of a population as an
empirical model.
Prediction: In this session, we continue our study of probability. We
begin with a very basic example of a population with known structure, and use
that structure to predict the behavior of random samples from that
population.
We begin by constructing a
probability model for our population, and define the concept of perfect
sample. We then relate the perfect sample to random samples, both observed
and unobserved. We then obtain real random samples, and check them against the
perfect samples.
Exclude Case Study 1.2.1
Case Study 1.2.2
In this case study the idea of a
perfect sample is introduced – a perfect sample matches perfectly the
population from which it is sampled. On average, real samples corresponded
nicely, though not perfectly, with their corresponding perfect samples.
We begin by building a color bowl.
We then compute a probability model for draws with replacement(DWR) from the
bowl, and then perfect samples of size 50, 100, 150, 200, 250 and 300. We then
engage six groups to generate six samples each of n=50 DWR. We then compare
sample frequencies and proportions to the model and to the perfect samples.
Bowl Counts and Perfect Sample
Calculations
Expected Count Blue ( for
Sample Size n) = n*PBlue
Expected Count Green (
for Sample Size n) = n*PGreen
Expected Count Red ( for
Sample Size n) = n*PRed
Expected
Count Yellow ( for Sample Size n) = n*PYellow
The Bowl
Color |
N |
Probability(P) |
Percent |
Blue |
3 |
3/21 ≈ .1429 |
14.29 |
Green |
10 |
10/30 ≈ .3333 |
33.33 |
Red |
3 |
3/21 ≈ .1429 |
14.29 |
Yellow |
5 |
5/21
≈ .2381 |
23.81 |
Total |
21 |
1 |
100 |
Probabilities with Long
Run Interpretation
PBlue = 3/21 ≈
.1429
In long runs of draws with
replacement from the bowl, approximately 14.3% of draws show blue.
PGreen = 10/30
≈ .3333
In long runs of draws
with replacement from the bowl, approximately 33.3% of draws show green.
PRed = 3/21 ≈
.1429
In long runs of draws with
replacement from the bowl, approximately 14.3% of draws how red.
PYellow = 5/21 ≈ .2381
In long runs of draws with replacement
from the bowl, approximately 23.8% of draws show yellow.
Perfect Counts for the
Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement
Color |
P |
E50=50*P |
E100=100*P |
E150=150*P |
E200=200*P |
E250=250*P |
E300=300*P |
Blue |
0.1429 |
7.143 |
14.286 |
21.4286 |
28.5714 |
35.7143 |
42.857 |
Green |
0.4762 |
23.81 |
47.619 |
71.4286 |
95.2381 |
119.048 |
142.86 |
Red |
0.1429 |
7.143 |
14.286 |
21.4286 |
28.5714 |
35.7143 |
42.857 |
Yellow |
0.2381 |
11.9 |
23.81 |
35.7143 |
47.619 |
59.5238 |
71.429 |
Total |
1 |
50 |
100 |
150 |
200 |
250 |
300 |
Perfect Samples
n=50
E50Blue
= n*PBlue= 50*(3/21) ≈ 7.1
E50Green
= n*PGreen = 50*(10/21) ≈ 23.8
E50Red = n*PRed
= 50*(3/21) ≈ 7.1
E50Yellow = n*PYellow
= 50*(5/21) ≈ 11.9
In samples of 50 draws with
replacement from the bowl, we expect approximately 7
or 8 blue draws, 23 or 24 green draws,
7 or 8 red draws, and 11 or 12 yellow draws.
n=100
E100Blue
= n*PBlue= 100*(3/21) ≈ 14.3
E100Green
= n*PGreen = 100*(10/21) ≈ 47.6
E100Red = n*PRed
= 100*(3/21) ≈ 14.3
E100Yellow = n*PYellow
= 100*(5/21) ≈ 23.8
In samples of 100 draws with
replacement from the bowl, we expect approximately 14
or 15 blue draws, 47 or 48 green draws,
14 or 15 red draws, and 23 or 24 yellow draws.
n=150
E150Blue
= n*PBlue= 150*(3/21) ≈ 21.4
E150Green
= n*PGreen = 150*(10/21) ≈ 71.4
E150Red = n*PRed
= 150*(3/21) ≈ 21.4
E150Yellow = n*PYellow
= 150*(5/21) ≈ 35.7
In samples of 150 draws with
replacement from the bowl, we expect approximately 21
or 22 blue draws, 71 or 72 green draws, 21 or 22 red draws,
and 35 or 36 yellow draws.
n=200
E200Blue
= n*PBlue= 200*(3/21) ≈ 28.6
E200Green
= n*PGreen = 200*(10/21) ≈ 95.2
E200Red = n*PRed
= 200*(3/21) ≈ 28.6
E200Yellow = n*PYellow
= 200*(5/21) ≈ 47.6
In samples of 200 draws with
replacement from the bowl, we expect approximately 28
or 29 blue draws, 95 or 96 green draws, 28 or 29 red draws,
and 47 or 48 yellow draws.
n=250
E250Blue
= n*PBlue= 250*(3/21) ≈ 35.7
E250Green
= n*PGreen = 250*(10/21) ≈ 119
E250Red = n*PRed
= 250*(3/21) ≈ 35.7
E250Yellow = n*PYellow
= 250*(5/21) ≈ 59.5
In samples of 250 draws with
replacement from the bowl, we expect approximately 35
or 36 blue draws, 119 green draws, 35 or 36 red draws,
and 59 or 60 yellow draws.
n=300
E300Blue
= n*PBlue= 300*(3/21) ≈ 42.9
E300Green
= n*PGreen = 300*(10/21) ≈ 142.9
E300Red = n*PRed
= 300*(3/21) ≈ 42.9
E300Yellow = n*PYellow
= 300*(5/21) ≈ 71.4
In samples of 300 draws with
replacement from the bowl, we expect approximately 42 or
43 blue draws, 142 or 143 green draws, 42 or 43 red draws,
and 71 or 72 yellow draws.
Samples
#1 |
#2 |
Pooled 12 |
||||||||||||
Color |
E50 |
n |
p |
% |
Color |
E50 |
n |
p |
% |
Color |
E100 |
n |
p |
% |
Blue |
7.14 |
7 |
0.14 |
14 |
Blue |
7.14 |
4 |
0.08 |
8 |
Blue |
14.29 |
11 |
0.11 |
11 |
Green |
23.8 |
21 |
0.42 |
42 |
Green |
23.8 |
22 |
0.44 |
44 |
Green |
47.62 |
43 |
0.43 |
43 |
Red |
7.14 |
4 |
0.08 |
8 |
Red |
7.14 |
7 |
0.14 |
14 |
Red |
14.29 |
11 |
0.11 |
11 |
Yellow |
11.9 |
18 |
0.36 |
36 |
Yellow |
11.9 |
17 |
0.34 |
34 |
Yellow |
23.81 |
35 |
0.35 |
35 |
Total |
50 |
50 |
1 |
100 |
Total |
50 |
50 |
1 |
100 |
Total |
100 |
100 |
1 |
100 |
#3 |
#4 |
Pooled 34 |
||||||||||||
Color |
E50 |
n |
p |
% |
Color |
E50 |
n |
p |
% |
Color |
E100 |
n |
p |
% |
Blue |
7.14 |
8 |
0.16 |
16 |
Blue |
7.14 |
6 |
0.12 |
12 |
Blue |
14.29 |
14 |
0.14 |
14 |
Green |
23.8 |
23 |
0.46 |
46 |
Green |
23.8 |
26 |
0.52 |
52 |
Green |
47.62 |
49 |
0.49 |
49 |
Red |
7.14 |
7 |
0.14 |
14 |
Red |
7.14 |
7 |
0.14 |
14 |
Red |
14.29 |
14 |
0.14 |
14 |
Yellow |
11.9 |
12 |
0.24 |
24 |
Yellow |
11.9 |
11 |
0.22 |
22 |
Yellow |
23.81 |
23 |
0.23 |
23 |
Total |
50 |
50 |
1 |
100 |
Total |
50 |
50 |
1 |
100 |
Total |
100 |
100 |
1 |
100 |
#5 |
#6 |
Pooled 56 |
||||||||||||
Color |
E50 |
n |
p |
% |
Color |
E50 |
n |
p |
% |
Color |
E100 |
n |
p |
% |
Blue |
7.14 |
2 |
0.04 |
4 |
Blue |
7.14 |
6 |
0.12 |
12 |
Blue |
14.29 |
8 |
0.08 |
8 |
Green |
23.8 |
33 |
0.66 |
66 |
Green |
23.8 |
33 |
0.66 |
66 |
Green |
47.62 |
66 |
0.66 |
66 |
Red |
7.14 |
4 |
0.08 |
8 |
Red |
7.14 |
6 |
0.12 |
12 |
Red |
14.29 |
10 |
0.1 |
10 |
Yellow |
11.9 |
11 |
0.22 |
22 |
Yellow |
11.9 |
5 |
0.1 |
10 |
Yellow |
23.81 |
16 |
0.16 |
16 |
Total |
50 |
50 |
1 |
100 |
Total |
50 |
50 |
1 |
100 |
Total |
100 |
100 |
1 |
100 |
Pooled 135 |
E150 |
Pooled 246 |
E150 |
Pooled All |
||||||||||
Color |
n |
p |
% |
Color |
n |
p |
% |
Color |
E300 |
n |
p |
% |
||
Blue |
21.4 |
17 |
0.113 |
11.33 |
Blue |
21.4 |
16 |
0.107 |
10.67 |
Blue |
42.86 |
33 |
0.11 |
11 |
Green |
71.4 |
77 |
0.513 |
51.33 |
Green |
71.4 |
81 |
0.54 |
54 |
Green |
142.9 |
158 |
0.527 |
52.67 |
Red |
21.4 |
15 |
0.1 |
10 |
Red |
21.4 |
20 |
0.133 |
13.33 |
Red |
42.86 |
35 |
0.117 |
11.67 |
Yellow |
35.7 |
41 |
0.273 |
27.33 |
Yellow |
35.7 |
33 |
0.22 |
22 |
Yellow |
71.43 |
74 |
0.247 |
24.67 |
Total |
150 |
150 |
1 |
100 |
Total |
150 |
150 |
1 |
100 |
Total |
300 |
300 |
1 |
100 |
The structure of the
bowl, expressed as color proportions, determines the basic structure of samples
drawn from the bowl. The perfect sample is a blueprint for the actual samples.
The actual samples show choppy resemblance to the perfect sample, with samples
#3, #4, pooled 34, pooled 135, pooled 246 and pooled all showing the best
overall agreement. Probability models allow the prediction of sample behavior,
but said predictions are only as reliable as the validity of the original model
and of the sampling procedures.
The foundation of
statistical applications is the careful preparation of a study population and
the random sampling procedures to go with it. Proper execution of this sampling
procedure ensures a potable sample.
You are now ready to learn the Long
Run Argument and Perfect Sample case types in 1st
Hourly Stuff.
Exclude Case Study 1.5
We now extend our study of probability
to dice. We revisit the idea of a model or population proportion as a
probability, and introduce the idea of a random variable.
Models
A Fair, Six-sided Die
Face Value, d6 (FV d6) |
Probability |
1 |
1/6 |
2 |
1/6 |
3 |
1/6 |
4 |
1/6 |
5 |
1/6 |
6 |
1/6 |
A Fair, Three-sided Die
Face Value, d3 (FV d3) |
Probability |
1 |
1/3 |
2 |
1/3 |
3 |
1/3 |
Using a Fair, Six-sided Die to
Simulate A Fair, Three-sided Die
Face Value, d6 (FV d6) |
Mapped Face Value, d3 (FV d3) |
1 |
1 |
2 |
|
3 |
2 |
4 |
|
5 |
3 |
6 |
Probability Calculations (fair d6→ fair d3)
Pr{E} denotes Probability for the
event E.
The Fair d6 Model
FV: Face Values:
1,2,3,4,5,6
Fair Model: Equally
likely face values – 1/6 per face value
Pr{d6 Shows 1} = (1/6) @
.1667 or 16.67%
In long runs of tosses,
approximately 1 toss in 6 shows “1”.
Pr{d6 Shows 2} = (1/6) @
.1667 or 16.67%
In long runs of tosses,
approximately 1 toss in 6 shows “2”.
Pr{d6 Shows 3} = (1/6) @
.1667 or 16.67%
In long runs of tosses,
approximately 1 toss in 6 shows “3”.
Pr{d6 Shows 4} = (1/6) @
.1667 or 16.67%
In long runs of tosses,
approximately 1 toss in 6 shows “4”.
Pr{d6 Shows 5} = (1/6) @
.1667 or 16.67%
In long runs of tosses,
approximately 1 toss in 6 shows “5”.
Pr{d6 Shows 6} = (1/6) @
.1667 or 16.67%
In long runs of tosses,
approximately 1 toss in 6 shows “6”.
The Fair d3 Model Nested
within a Fair d6 Model
FV: Face Values: 1(1,2), 2(3,4), 3(5,6)
Fair Model: Equally
likely face values – (2/6 =)1/3 per face value.
Pr{d3 shows “1”} = Pr{d6 Shows 1} + Pr{d6 Shows
2}1 = (1/6) + (1/6) = 2/6 = 1/3
@ .3333 or 33.33%
In long runs of tosses,
approximately 1 toss in 3 shows “1”.
Pr{d3 shows “2”} = Pr{d6 Shows 3} + Pr{d6 Shows 4}
= (1/6) + (1/6)2 = 2/6 = 1/3
@ .3333 or 33.33%
In long runs of tosses,
approximately 1 toss in 3 shows “2”.
Pr{d3 shows “3”} = Pr{d6 Shows 5} + Pr{d6 Shows 6} =
(1/6) + (1/6) = 2/6 = 1/33 @ .3333 or 33.33%
In long runs of tosses,
approximately 1 toss in 3 shows “3”.
1. Additive Rule – Map Faces to
Faces
2. Inheritance of Fair Model
3. Fair d3 Model from Fair d6 Model
In the samples, compare the sample
proportions (p) to the model probabilities (P) listed above.
Samples
#1 |
#2 |
||||||||||
FV d6 |
n |
p |
FV d3 |
n |
p |
FV d6 |
n |
p |
FV d3 |
n |
p |
1 |
7 |
0.14 |
1 |
8 |
0.16 |
||||||
2 |
8 |
0.16 |
1 |
15 |
0.3 |
2 |
7 |
0.14 |
1 |
15 |
0.3 |
3 |
8 |
0.16 |
3 |
10 |
0.2 |
||||||
4 |
12 |
0.24 |
2 |
20 |
0.4 |
4 |
10 |
0.2 |
2 |
20 |
0.4 |
5 |
6 |
0.12 |
5 |
8 |
0.16 |
||||||
6 |
9 |
0.18 |
3 |
15 |
0.3 |
6 |
7 |
0.14 |
3 |
15 |
0.3 |
Total |
50 |
1 |
Total |
50 |
1 |
Total |
50 |
1 |
Total |
50 |
1 |
#3 |
#4 |
||||||||||
FV d6 |
n |
p |
FV d3 |
n |
p |
FV d6 |
n |
p |
FV d3 |
n |
p |
1 |
9 |
0.18 |
1 |
5 |
0.1 |
||||||
2 |
9 |
0.18 |
1 |
18 |
0.36 |
2 |
8 |
0.16 |
1 |
13 |
0.26 |
3 |
6 |
0.12 |
3 |
11 |
0.22 |
||||||
4 |
8 |
0.16 |
2 |
14 |
0.28 |
4 |
8 |
0.16 |
2 |
19 |
0.38 |
5 |
10 |
0.2 |
5 |
7 |
0.14 |
||||||
6 |
8 |
0.16 |
3 |
18 |
0.36 |
6 |
11 |
0.22 |
3 |
18 |
0.36 |
Total |
50 |
1 |
Total |
50 |
1 |
Total |
50 |
1 |
Total |
50 |
1 |
#5 |
#6 |
||||||||||
FV d6 |
n |
p |
FV d3 |
n |
p |
FV d6 |
n |
p |
FV d3 |
n |
p |
1 |
5 |
0.1 |
1 |
10 |
0.2 |
||||||
2 |
9 |
0.18 |
1 |
14 |
0.28 |
2 |
9 |
0.18 |
1 |
19 |
0.38 |
3 |
13 |
0.26 |
3 |
7 |
0.14 |
||||||
4 |
3 |
0.06 |
2 |
16 |
0.32 |
4 |
8 |
0.16 |
2 |
15 |
0.3 |
5 |
7 |
0.14 |
5 |
10 |
0.2 |
||||||
6 |
13 |
0.26 |
3 |
20 |
0.4 |
6 |
6 |
0.12 |
3 |
16 |
0.32 |
Total |
50 |
1 |
Total |
50 |
1 |
Total |
50 |
1 |
Total |
50 |
1 |
Pooled 1,3,5 |
Pooled 2,4,6 |
||||||||||
FV d6 |
n |
p |
FV d3 |
n |
p |
FV d6 |
n |
p |
FV d3 |
n |
p |
1 |
21 |
0.14 |
1 |
23 |
0.153 |
||||||
2 |
26 |
0.173 |
1 |
47 |
0.313 |
2 |
24 |
0.16 |
1 |
47 |
0.313 |
3 |
27 |
0.18 |
3 |
28 |
0.187 |
||||||
4 |
23 |
0.153 |
2 |
50 |
0.333 |
4 |
26 |
0.173 |
2 |
54 |
0.36 |
5 |
23 |
0.153 |
5 |
25 |
0.167 |
||||||
6 |
30 |
0.2 |
3 |
53 |
0.353 |
6 |
24 |
0.16 |
3 |
49 |
0.327 |
Total |
150 |
1 |
Total |
150 |
1 |
Total |
150 |
1 |
Total |
150 |
1 |
Pooled 1,2 |
|||||
FV d6 |
n |
p |
FV d3 |
n |
p |
1 |
15 |
0.15 |
|||
2 |
15 |
0.15 |
1 |
30 |
0.3 |
3 |
18 |
0.18 |
|||
4 |
22 |
0.22 |
2 |
40 |
0.4 |
5 |
14 |
0.14 |
|||
6 |
16 |
0.16 |
3 |
30 |
0.3 |
Total |
100 |
1 |
Total |
100 |
1 |
Pooled 3,4 |
|||||
FV d6 |
n |
p |
FV d3 |
n |
p |
1 |
14 |
0.14 |
|||
2 |
17 |
0.17 |
1 |
31 |
0.31 |
3 |
17 |
0.17 |
|||
4 |
16 |
0.16 |
2 |
33 |
0.33 |
5 |
17 |
0.17 |
|||
6 |
19 |
0.19 |
3 |
36 |
0.36 |
Total |
100 |
1 |
Total |
100 |
1 |
Pooled 5,6 |
|||||
FV d6 |
n |
p |
FV d3 |
n |
p |
1 |
15 |
0.15 |
|||
2 |
18 |
0.18 |
1 |
33 |
0.33 |
3 |
20 |
0.2 |
|||
4 |
11 |
0.11 |
2 |
31 |
0.31 |
5 |
17 |
0.17 |
|||
6 |
19 |
0.19 |
3 |
36 |
0.36 |
Total |
100 |
1 |
Total |
100 |
1 |
Pooled All |
|||||
FV d6 |
n |
p |
FV d3 |
n |
p |
1 |
44 |
0.147 |
|||
2 |
50 |
0.167 |
1 |
94 |
0.313 |
3 |
55 |
0.183 |
|||
4 |
49 |
0.163 |
2 |
104 |
0.347 |
5 |
48 |
0.16 |
|||
6 |
54 |
0.18 |
3 |
102 |
0.34 |
Total |
300 |
1 |
Total |
300 |
1 |