Summaries
Session 1.2
26th January 2011
Predicting Sample Behavior from a
Model
We use a population model to predict
the behavior of random samples. We check the predictions by direct inspection
of samples. We repeat sampling with replacement, obtaining multiple random
samples from the same population, obtained in the same process. We combine
(pool) compatible samples to form larger samples. Pooling samples of size 50,
we obtain samples of size 100, 150 and 300. In general, as sample size
increases, samples become more precise and reliable, provided that the sampling
process is reliable.
In general, if we are working with
the correct model, then the predicted sample behavior reliably describes
observed samples.
Session Overview
Estimation: Previously, we saw how random samples drawn from a
population could be used to estimate the structure of a population as an
empirical model.
Prediction: In this session, we continue our study of probability. We
begin with a very basic example of a population with known structure, and use
that structure to predict the behavior of random samples from that
population.
We begin by constructing a
probability model for our population, and define the concept of perfect sample.
We then relate the perfect sample to random samples, both observed and
unobserved. We then obtain real random samples, and check them against the
perfect samples.
Exclude Case Study 1.2.1
Case Study 1.2.2
In this case study the idea of a
perfect sample is introduced – a perfect sample matches perfectly the
population from which it is sampled. On average, real samples corresponded
nicely, though not perfectly, with their corresponding perfect samples.
We begin by building a color bowl.
We then compute a probability model for draws with replacement (DWR) from the
bowl, and then compute perfect samples of size 50, 100, 150, 200, 250 and 300.
We then engage six groups to generate six samples each of n=50 DWR. We then
compare sample frequencies and proportions to the model and to the perfect
samples.
Bowl Counts and Perfect Sample
Calculations
Expected Count Blue ( for
Sample Size n) = n*PBlue
Expected Count Green ( for Sample Size n) = n*PGreen
Expected Count Red ( for
Sample Size n) = n*PRed
Expected Count
Yellow ( for Sample Size n) = n*PYellow
6:30 Model
Color |
N |
P |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
2 |
2/15 = 0.133 |
(2/15)*50 = 6.67 |
(2/15)*100 = 13.33 |
(2/15)*150 =20 |
(2/15)*200 =26.67 |
(2/15)*250 =33.333 |
(2/15)*300 =40 |
Green |
1 |
1/15 = 0.067 |
(1/15)*50 = 3.33 |
(1/15)*100 = 6.667 |
(1/15)*150 = 10 |
(1/15)*200 =13.33 |
(1/15)*250 = 16.667 |
(1/15)*300 = 20 |
Red |
6 |
6/15 = 0.4 |
(6/15)*50 = 20 |
(6/15)*100 = 40 |
(6/15)*150 = 60 |
(6/15)*200 = 80 |
(6/15)*250 = 100 |
(6/15)*300 = 120 |
Yellow |
6 |
6/15 = 0.4 |
(6/15)*50 = 20 |
(6/15)*100 =40 |
(6/15)*150 =60 |
(6/15)*200 =80 |
(6/15)*250 =100 |
(6/15)*300 =120 |
Total |
15 |
1 |
50 |
100 |
150 |
200 |
250 |
300 |
Probabilities with Long
Run Interpretation
Color |
N |
P |
Blue |
2 |
2/15 = 4/30 = 0.133 |
Green |
1 |
1/15 = 2/30 = 0.067 |
Red |
6 |
6/15 = 12/30 = 0.4 |
Yellow |
6 |
6/15 = 12/30 = 0.4 |
Total |
15 |
1 |
PBlue = 2/15 = .133
In long runs of draws
with replacement from the bowl, approximately 13.3% of draws show blue.
PGreen = 1/15 = 6.67
In long runs of draws
with replacement from the bowl, approximately 6.7% of draws show green.
PRed = 6/15 = .40
In long runs of draws with
replacement from the bowl, approximately 40% of draws show red.
PYellow = 6/15 = .40
In
long runs of draws with replacement from the bowl, approximately 40% of draws
show yellow.
Perfect Counts for the
Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement
Color |
N |
P |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
2 |
2/15 = 0.133 |
(2/15)*50 = 6.67 |
(2/15)*100 = 13.33 |
(2/15)*150 =20 |
(2/15)*200 =26.67 |
(2/15)*250 =33.333 |
(2/15)*300 =40 |
Green |
1 |
1/15 = 0.067 |
(1/15)*50 = 3.33 |
(1/15)*100 = 6.667 |
(1/15)*150 = 10 |
(1/15)*200 =13.33 |
(1/15)*250 = 16.667 |
(1/15)*300 = 20 |
Red |
6 |
6/15 = 0.4 |
(6/15)*50 = 20 |
(6/15)*100 = 40 |
(6/15)*150 = 60 |
(6/15)*200 = 80 |
(6/15)*250 = 100 |
(6/15)*300 = 120 |
Yellow |
6 |
6/15 = 0.4 |
(6/15)*50 = 20 |
(6/15)*100 =40 |
(6/15)*150 =60 |
(6/15)*200 =80 |
(6/15)*250 =100 |
(6/15)*300 =120 |
Total |
15 |
1 |
50 |
100 |
150 |
200 |
250 |
300 |
Perfect Samples
In samples of 50 draws with
replacement from the bowl, we expect approximately 6 or
7 blue draws, 13 or 14 green draws, 20 red draws, and 20 yellow draws.
In samples of 100 draws with replacement
from the bowl, we expect approximately 13 or 14 blue
draws, 6 or 7 green draws, 40 red draws, and 40 yellow draws.
In samples of 150 draws with
replacement from the bowl, we expect approximately 20
blue draws, 10 green draws, 60 red draws, and 60 yellow draws.
In samples of 200 draws with
replacement from the bowl, we expect approximately 26
or 27 blue draws, 13 or 14 green draws, 80 red draws, and 80 yellow draws.
In samples of 250 draws with replacement
from the bowl, we expect approximately 33 or 34 blue
draws, 16 or 17 green draws, 100 red draws, and
100 yellow draws.
In samples of 300 draws with
replacement from the bowl, we expect approximately 40
blue draws, 20 green draws, 120 red draws, and
120 yellow draws.
Samples – 6.30
Sample #1 |
Sample #2 |
Pooled 12 |
|||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
Blue |
6 |
0.12 |
6.666666667 |
1 |
0.02 |
6.666666667 |
7 |
0.07 |
13.33333333 |
Green |
7 |
0.14 |
3.333333333 |
4 |
0.08 |
3.333333333 |
11 |
0.11 |
6.666666667 |
Red |
20 |
0.4 |
20 |
20 |
0.4 |
20 |
40 |
0.4 |
40 |
Yellow |
17 |
0.34 |
20 |
25 |
0.5 |
20 |
42 |
0.42 |
40 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
Sample #3 |
Sample #4 |
Pooled 34 |
|||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
Blue |
9 |
0.18 |
6.666666667 |
8 |
0.16 |
6.666666667 |
17 |
0.17 |
13.33333333 |
Green |
1 |
0.02 |
3.333333333 |
2 |
0.04 |
3.333333333 |
3 |
0.03 |
6.666666667 |
Red |
19 |
0.38 |
20 |
16 |
0.32 |
20 |
35 |
0.35 |
40 |
Yellow |
21 |
0.42 |
20 |
24 |
0.48 |
20 |
45 |
0.45 |
40 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
Sample #5 |
Sample #6 |
Pooled 56 |
|||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
Blue |
14 |
0.28 |
6.666666667 |
7 |
0.14 |
6.666666667 |
21 |
0.21 |
13.33333333 |
Green |
1 |
0.02 |
3.333333333 |
4 |
0.08 |
3.333333333 |
5 |
0.05 |
6.666666667 |
Red |
19 |
0.38 |
20 |
23 |
0.46 |
20 |
42 |
0.42 |
40 |
Yellow |
16 |
0.32 |
20 |
16 |
0.32 |
20 |
32 |
0.32 |
40 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
Pooled 135 |
Pooled 246 |
Pooled All |
|||||||
Color |
n |
p |
E150 |
n |
p |
E150 |
n |
p |
E300 |
Blue |
29 |
0.193333333 |
20 |
16 |
0.106666667 |
20 |
45 |
0.15 |
40 |
Green |
9 |
0.06 |
10 |
10 |
0.066666667 |
10 |
19 |
0.063333333 |
20 |
Red |
58 |
0.386666667 |
60 |
59 |
0.393333333 |
60 |
117 |
0.39 |
120 |
Yellow |
54 |
0.36 |
60 |
65 |
0.433333333 |
60 |
119 |
0.396666667 |
120 |
Total |
150 |
1 |
150 |
150 |
1 |
150 |
300 |
1 |
300 |
Pooled 1234 |
||
n |
p |
E200 |
24 |
0.12 |
26.666667 |
14 |
0.07 |
13.333333 |
75 |
0.375 |
80 |
87 |
0.435 |
80 |
200 |
1 |
200 |
Pooled 3456 |
||
n |
p |
E200 |
38 |
0.19 |
26.666667 |
8 |
0.04 |
13.333333 |
77 |
0.385 |
80 |
77 |
0.385 |
80 |
200 |
1 |
200 |
8:00 Model
Color |
N |
P |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
8 |
8/28 = 0.2857 |
50*(8/28) = 14.29 |
28.57 |
42.86 |
57.14 |
71.43 |
85.71 |
Green |
10 |
10/28 = 0.3571 |
50*(10/28) = 17.86 |
35.71 |
53.57 |
71.43 |
89.29 |
107.14 |
Red |
8 |
8/28 = 0.2857 |
50*(8/28) = 14.29 |
28.57 |
42.86 |
57.14 |
71.43 |
85.71 |
Yellow |
2 |
2/28 = 0.0714 |
50*(2/28) = 3.57 |
7.14 |
10.71 |
14.29 |
17.86 |
21.43 |
Total |
28 |
1 |
50 |
100 |
150 |
200 |
250 |
300 |
Probabilities with Long
Run Interpretation
PBlue = 8/28 = 0.2857
In long runs of draws
with replacement from the bowl, approximately 28.57% of draws show blue.
PGreen = 10/28 = 0.3571
In long runs of draws with
replacement from the bowl, approximately 35.71% of draws show green.
PRed = 8/28 = 0.2857
In long runs of draws with
replacement from the bowl, approximately 28.57% of draws show red.
PYellow = 2/28 = 0.0714
In
long runs of draws with replacement from the bowl, approximately 7.14% of draws
show yellow.
Perfect Counts for the
Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement
Color |
N |
P |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
8 |
8/28 = 0.2857 |
50*(8/28) = 14.29 |
100*(8/28) = 28.57 |
150*(8/28) = 42.86 |
200*(8/28) = 57.14 |
250*(8/28) = 71.43 |
300*(8/28) = 85.71 |
Green |
10 |
10/28 = 0.3571 |
50*(10/28) = 17.86 |
35.71 |
53.57 |
71.43 |
89.29 |
107.14 |
Red |
8 |
8/28 = 0.2857 |
50*(8/28) = 14.29 |
28.57 |
42.86 |
57.14 |
71.43 |
85.71 |
Yellow |
2 |
2/28 = 0.0714 |
50*(2/28) = 3.57 |
7.14 |
10.71 |
14.29 |
17.86 |
21.43 |
Total |
28 |
1 |
50 |
100 |
150 |
200 |
250 |
300 |
Perfect Samples
In samples of 50 draws with
replacement from the bowl, we expect approximately 14
or 15 blue draws, 17 0r 18 green draws, 14 or 15 red draws,
and 7 or 8 yellow draws.
In samples of 100 draws with
replacement from the bowl, we expect approximately 28
or 29 blue draws, 35 or 36 green draws, 28 or 29 red draws,
and 7 or 8 yellow draws.
In samples of 150 draws with replacement
from the bowl, we expect approximately 42 or 43 blue
draws, 53 or 54 green draws, 42 or 43 red draws,
and 10 or 11 yellow draws.
In samples of 200 draws with
replacement from the bowl, we expect approximately 57
or 58 blue draws, 71 or 72 green draws, 57 or 58 red draws,
and 14 or 15 yellow draws.
In samples of 250 draws with
replacement from the bowl, we expect approximately 71
or 72 blue draws, 89 or 90 green draws, 71 or 72 red draws,
and 17 or 18 yellow draws.
In samples of 300 draws with
replacement from the bowl, we expect approximately 85
or 86 blue draws, 107 or 108 green draws, 85 or 86 red draws,
and 21 or 22 yellow draws.
Samples – 8.00
Sample #1 |
Sample #2 |
Pooled 12 |
|||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
Blue |
13 |
0.26 |
14.28571429 |
11 |
0.22 |
14.28571429 |
24 |
0.24 |
28.571429 |
Green |
26 |
0.52 |
17.85714286 |
17 |
0.34 |
17.85714286 |
43 |
0.43 |
35.714286 |
Red |
8 |
0.16 |
14.28571429 |
17 |
0.34 |
14.28571429 |
25 |
0.25 |
28.571429 |
Yellow |
3 |
0.06 |
3.571428571 |
5 |
0.1 |
3.571428571 |
8 |
0.08 |
7.1428571 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
Sample #3 |
Sample #4 |
Pooled 34 |
|||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
Blue |
8 |
0.16 |
14.28571429 |
16 |
0.32 |
14.28571429 |
24 |
0.24 |
28.571429 |
Green |
20 |
0.4 |
17.85714286 |
19 |
0.38 |
17.85714286 |
39 |
0.39 |
35.714286 |
Red |
22 |
0.44 |
14.28571429 |
13 |
0.26 |
14.28571429 |
35 |
0.35 |
28.571429 |
Yellow |
0 |
0 |
3.571428571 |
2 |
0.04 |
3.571428571 |
2 |
0.02 |
7.1428571 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
Sample #5 |
Sample #6 |
Pooled 56 |
|||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
Blue |
14 |
0.28 |
14.28571429 |
10 |
0.2 |
14.28571429 |
24 |
0.24 |
28.571429 |
Green |
16 |
0.32 |
17.85714286 |
26 |
0.52 |
17.85714286 |
42 |
0.42 |
35.714286 |
Red |
15 |
0.3 |
14.28571429 |
11 |
0.22 |
14.28571429 |
26 |
0.26 |
28.571429 |
Yellow |
5 |
0.1 |
3.571428571 |
3 |
0.06 |
3.571428571 |
8 |
0.08 |
7.1428571 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
Pooled 135 |
Pooled 246 |
Pooled All |
|||||||
Color |
n |
p |
E150 |
n |
p |
E150 |
n |
p |
E300 |
Blue |
35 |
0.233333333 |
42.85714286 |
37 |
0.24666667 |
42.85714286 |
72 |
0.24 |
85.714286 |
Green |
62 |
0.413333333 |
53.57142857 |
62 |
0.41333333 |
53.57142857 |
124 |
0.413333333 |
107.14286 |
Red |
45 |
0.3 |
42.85714286 |
41 |
0.27333333 |
42.85714286 |
86 |
0.286666667 |
85.714286 |
Yellow |
8 |
0.053333333 |
10.71428571 |
10 |
0.06666667 |
10.71428571 |
18 |
0.06 |
21.428571 |
Total |
150 |
1 |
150 |
150 |
1 |
150 |
300 |
1 |
300 |
Pooled 1234 |
||
n |
p |
E200 |
48 |
0.24 |
57.142857 |
82 |
0.41 |
71.428571 |
60 |
0.3 |
57.142857 |
10 |
0.05 |
14.285714 |
200 |
1 |
200 |
Pooled 3456 |
||
n |
p |
E200 |
48 |
0.24 |
57.142857 |
81 |
0.405 |
71.428571 |
61 |
0.305 |
57.142857 |
10 |
0.05 |
14.285714 |
200 |
1 |
200 |
The structure of the
bowl, expressed as color proportions, determines the basic structure of samples
drawn from the bowl. Probability models allow the prediction of sample behavior,
but said predictions are only as reliable as the validity of the original model
and of the sampling procedures.
In both models, the
results were choppy – the sample sizes were insufficient to fully stabilize the
sample frequencies. As a result, in most samples, one or two colors were
appreciably off. Despite the volatility seen in the samples, we did see
improvements with increasing sample size.
The foundation of
statistical applications is the careful preparation of a study population and
the random sampling procedures to go with it. Proper execution of this sampling
procedure ensures a potable sample.
You are now ready to learn the Long
Run Argument and Perfect Sample case types in 1st
Hourly Stuff.