Summaries
Session 1.2
23rd August 2010
Predicting Sample Behavior from a
Model
We use a population model to predict
the behavior of random samples. We check the predictions by direct inspection of
samples. We repeat sampling with replacement, obtaining multiple random samples
from the same population, obtained in the same process. We combine (pool)
compatible samples to form larger samples. Pooling samples of size 50, we
obtain samples of size 100, 150 and 300. In general, as sample size increases,
samples become more precise and reliable, provided that the sampling process is
reliable.
In general, if we are working with
the correct model, then the predicted sample behavior reliably describes observed
samples.
Session Overview
Estimation: Previously, we saw how random samples drawn from a
population could be used to estimate the structure of a population as an
empirical model.
Prediction: In this session, we continue our study of probability. We begin
with a very basic example of a population with known structure, and use that
structure to predict the behavior of random samples from that
population.
We begin by constructing a
probability model for our population, and define the concept of perfect sample.
We then relate the perfect sample to random samples, both observed and
unobserved. We then obtain real random samples, and check them against the
perfect samples.
Exclude Case Study 1.2.1
Case Study 1.2.2
In this case study the idea of a
perfect sample is introduced – a perfect sample matches perfectly the
population from which it is sampled. On average, real samples corresponded
nicely, though not perfectly, with their corresponding perfect samples.
We begin by building a color bowl.
We then compute a probability model for draws with replacement (DWR) from the
bowl, and then compute perfect samples of size 50, 100, 150, 200, 250 and 300.
We then engage six groups to generate six samples each of n=50 DWR. We then
compare sample frequencies and proportions to the model and to the perfect
samples.
Bowl Counts and Perfect Sample
Calculations
Expected Count Blue ( for Sample Size n) = n*PBlue
Expected Count Green ( for Sample Size n) = n*PGreen
Expected Count Red ( for Sample Size n) = n*PRed
Expected
Count Yellow ( for Sample Size n) = n*PYellow
6:30 Model
Color |
N |
P |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
3 |
3/12 = 0.25 |
50*(3/12) = 12.5 |
25 |
37.5 |
50 |
62.5 |
75 |
Green |
3 |
3/12 = 0.25 |
50*(3/12) = 12.5 |
25 |
37.5 |
50 |
62.5 |
75 |
Red |
2 |
2/12= 0.167 |
50*(2/12) = 8.333 |
16.67 |
25 |
33.33 |
41.67 |
50 |
Yellow |
4 |
4/12 = 0.333 |
50*(4/12) = 16.67 |
33.33 |
50 |
66.67 |
83.33 |
100 |
Total |
12 |
1 |
50 |
100 |
150 |
200 |
250 |
300 |
Probabilities with Long
Run Interpretation
PBlue = 3/12 = .25
In long runs of draws
with replacement from the bowl, approximately 25% of draws show blue.
PGreen = 3/12 = .25
In long runs of draws
with replacement from the bowl, approximately 25% of draws show green.
PRed = 2/12 = .1667
In long runs of draws with
replacement from the bowl, approximately 16.67% of draws show red.
PYellow = 4/12 = .3333
In long runs of draws with replacement from
the bowl, approximately 33.33% of draws show yellow.
Perfect Counts for the
Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement
Color |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
50*(3/12) = 12.5 |
100*(3/12) = 25 |
150*(3/12) = 37.5 |
200*(3/12) = 50 |
250*(3/12) = 62.5 |
300*(3/12) = 75 |
Green |
50*(3/12) = 12.5 |
100*(3/12) = 25 |
150*(3/12) = 37.5 |
200*(3/12) = 50 |
250*(3/12) = 62.5 |
300*(3/12) = 75 |
Red |
50*(2/12) = 8.333 |
100*(2/12) = 16.66 |
150*(2/12) = 25 |
200*(2/12) = 33.33 |
250*(2/12) = 41.67 |
300*(2/12) = 50 |
Yellow |
50*(4/12) = 16.67 |
100*(4/12) = 33.33 |
150*(4/12) = 50 |
200*(4/12) = 66.67 |
250*(4/12) = 83.33 |
300*(4/12) = 100 |
Total |
50 |
100 |
150 |
200 |
250 |
300 |
Perfect Samples
n=50
E50Blue = n*PBlue= 50*(3/12) = 12.5
E50Green = n*PGreen =
50*(3/12) = 12.5
E50Red = n*PRed = 50*(2/12) = 8.333
E50Yellow = n*PYellow = 50*(4/12) = 16.67
In samples of 50 draws with replacement
from the bowl, we expect approximately 12 or 13 blue
draws, 12 or 13 green draws, 8 or 9 red draws,
and 16 or 17 yellow draws.
n=100
E100Blue = n*PBlue= 100*(3/12) = 25
E100Green = n*PGreen =
100*(3/12) = 25
E100Red = n*PRed = 100*(2/12) = 16.66
E100Yellow = n*PYellow = 100*(4/12) = 33.33
In samples of 100 draws with
replacement from the bowl, we expect approximately 25
blue draws, 25 green draws, 16 or 17 red draws,
and 33 or 34 yellow draws.
n=150
E150Blue = n*PBlue= 150*(3/12) = 37.5
E150Green = n*PGreen =
150*(3/12) = 37.5
E150Red = n*PRed = 150*(2/12) = 25
E150Yellow = n*PYellow = 150*(4/12) = 50
In samples of 150 draws with
replacement from the bowl, we expect approximately 37
or 38 blue draws, 37 or 38 green draws, 25 red draws, and 50 yellow draws.
n=200
E200Blue = n*PBlue= 200*(3/12) = 50
E200Green = n*PGreen =
200*(3/12) = 50
E200Red = n*PRed = 200*(2/12) = 33.33
E200Yellow = n*PYellow = 200*(4/12) = 66.67
In samples of 200 draws with
replacement from the bowl, we expect approximately 50
blue draws, 50 green draws, 33 or 34 red draws,
and 66 or 67 yellow draws.
n=250
E250Blue = n*PBlue= 250*(3/12) = 62.5
E250Green = n*PGreen =
250*(3/12) = 62.5
E250Red = n*PRed = 250*(2/12) = 41.67
E250Yellow = n*PYellow = 250*(4/12) = 83.33
In samples of 250 draws with
replacement from the bowl, we expect approximately 62
or 63 blue draws, 62 or 63 green draws, 41 or 42 red draws,
and 83 or 84 yellow draws.
n=300
E300Blue = n*PBlue= 300*(3/12) = 75
E300Green = n*PGreen =
300*(3/12) = 75
E300Red = n*PRed = 300*(2/12) = 50
E300Yellow = n*PYellow = 300*(4/12) = 100
In samples of 300 draws with
replacement from the bowl, we expect approximately 75
blue draws, 75 green draws, 50 red draws, and 100 yellow draws.
Samples – 6.30
Sample #1 |
Sample #2 |
Pooled 12 |
||||||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
|||
Blue |
12 |
0.24 |
12.5 |
8 |
0.16 |
12.5 |
20 |
0.2 |
25 |
|||
Green |
15 |
0.3 |
12.5 |
9 |
0.18 |
12.5 |
24 |
0.24 |
25 |
|||
Red |
12 |
0.24 |
8.3333333 |
12 |
0.24 |
8.3333333 |
24 |
0.24 |
16.666667 |
|||
Yellow |
11 |
0.22 |
16.666667 |
21 |
0.42 |
16.666667 |
32 |
0.32 |
33.333333 |
|
|
|
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
|||
Sample #3 |
Sample #4 |
Pooled 34 |
Pooled 1234 |
|||||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
n |
p |
E200 |
Blue |
8 |
0.16 |
12.5 |
11 |
0.22 |
12.5 |
19 |
0.19 |
25 |
39 |
0.195 |
50 |
Green |
21 |
0.42 |
12.5 |
15 |
0.3 |
12.5 |
36 |
0.36 |
25 |
60 |
0.3 |
50 |
Red |
6 |
0.12 |
8.3333333 |
10 |
0.2 |
8.3333333 |
16 |
0.16 |
16.666667 |
40 |
0.2 |
33.333333 |
Yellow |
15 |
0.3 |
16.666667 |
14 |
0.28 |
16.666667 |
29 |
0.29 |
33.333333 |
61 |
0.305 |
66.666667 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
200 |
1 |
200 |
Sample #5 |
Sample #6 |
Pooled 56 |
Pooled 3456 |
|||||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
n |
p |
E200 |
Blue |
14 |
0.28 |
12.5 |
14 |
0.28 |
12.5 |
28 |
0.28 |
25 |
47 |
0.235 |
50 |
Green |
10 |
0.2 |
12.5 |
10 |
0.2 |
12.5 |
20 |
0.2 |
25 |
56 |
0.28 |
50 |
Red |
10 |
0.2 |
8.3333333 |
6 |
0.12 |
8.3333333 |
16 |
0.16 |
16.666667 |
32 |
0.16 |
33.333333 |
Yellow |
16 |
0.32 |
16.666667 |
20 |
0.4 |
16.666667 |
36 |
0.36 |
33.333333 |
65 |
0.325 |
66.666667 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
200 |
1 |
200 |
Pooled 135 |
Pooled 246 |
Pooled All |
||||||||||
Color |
n |
p |
E150 |
n |
p |
E150 |
n |
p |
E300 |
|||
Blue |
34 |
0.2266667 |
37.5 |
33 |
0.22 |
37.5 |
67 |
0.2233333 |
75 |
|||
Green |
46 |
0.3066667 |
37.5 |
34 |
0.2266667 |
37.5 |
80 |
0.2666667 |
75 |
|||
Red |
28 |
0.1866667 |
25 |
28 |
0.1866667 |
25 |
56 |
0.1866667 |
50 |
|||
Yellow |
42 |
0.28 |
50 |
55 |
0.3666667 |
50 |
97 |
0.3233333 |
100 |
|
|
|
Total |
150 |
1 |
150 |
150 |
1 |
150 |
300 |
1 |
300 |
8:00 Model
Color |
N |
P |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
8 |
8/28 = 0.2857 |
50*(8/28) = 14.29 |
100*(8/28) = 28.57 |
150*(8/28) = 42.86 |
200*(8/28) = 57.14 |
250*(8/28) = 71.43 |
300*(8/28) = 85.71 |
Green |
8 |
8/28 = 0.2857 |
50*(8/28) = 14.29 |
100*(8/28) = 28.57 |
150*(8/28) = 42.86 |
200*(8/28) = 57.14 |
250*(8/28) = 71.43 |
300*(8/28) = 85.71 |
Red |
2 |
2/28 = 0.0714 |
50*(2/28) = 3.57 |
100*(2/28) = 7.14 |
150*(2/28) = 10.71 |
200*(2/28) = 14.29 |
250*(2/28) = 17.86 |
300*(2/28) = 21.43 |
Yellow |
10 |
10/28 = 0.3571 |
50*(10/28) = 17.86 |
100*(10/28) =35.71 |
150*(10/28) =53.57 |
200*(10/28) =71.43 |
250*(10/28) =89.29 |
300*(10/28) =107.14 |
Total |
28 |
1 |
50 |
100 |
150 |
200 |
250 |
300 |
Probabilities with Long
Run Interpretation
PBlue = 8/28 = 0.2857
In long runs of draws
with replacement from the bowl, approximately 28.57% of draws show blue.
PGreen = 8/28 = 0.2857
In long runs of draws
with replacement from the bowl, approximately 28.57% of draws show green.
PRed = 2/28 = 0.0714
In long runs of draws with
replacement from the bowl, approximately 7.14% of draws show red.
PYellow = 10/28 = 0.3571
In long runs of draws with replacement from
the bowl, approximately 35.71% of draws show yellow.
Perfect Counts for the
Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement
Color |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
50*(8/28) = 14.29 |
100*(8/28) = 28.57 |
150*(8/28) = 42.86 |
200*(8/28) = 57.14 |
250*(8/28) = 71.43 |
300*(8/28) = 85.71 |
Green |
50*(8/28) = 14.29 |
100*(8/28) = 28.57 |
150*(8/28) = 42.86 |
200*(8/28) = 57.14 |
250*(8/28) = 71.43 |
300*(8/28) = 85.71 |
Red |
50*(2/28) = 3.57 |
100*(2/28) = 7.14 |
150*(2/28) = 10.71 |
200*(2/28) = 14.29 |
250*(2/28) = 17.86 |
300*(2/28) = 21.43 |
Yellow |
50*(10/28) = 17.86 |
100*(10/28) =35.71 |
150*(10/28) =53.57 |
200*(10/28) =71.43 |
250*(10/28) =89.29 |
300*(10/28) =107.14 |
Total |
50 |
100 |
150 |
200 |
250 |
300 |
Perfect Samples
n=50
E50Blue = n*PBlue= 50*(8/28) = 14.29
E50Green = n*PGreen =
50*(8/28) = 14.29
E50Red = n*PRed = 50*(2/28) = 3.57
E50Yellow = n*PYellow = 50*(10/28) = 17.86
In samples of 50 draws with replacement
from the bowl, we expect approximately 14 or 15 blue
draws, 14 or 15 green draws, 3 or 4 red draws,
and 17 or 18 yellow draws.
n=100
E100Blue = n*PBlue= 100*(8/28) = 28.57
E100Green = n*PGreen =
100*(8/28) = 28.57
E100Red = n*PRed = 100*(2/28) = 7.14
E100Yellow = n*PYellow = 100*(10/28) =35.71
In samples of 100 draws with
replacement from the bowl, we expect approximately 28
or 29 blue draws, 28 or 29 green draws, 7 or 8 red draws,
and 35 or 36 yellow draws.
n=150
E150Blue = n*PBlue= 150*(8/28) = 42.86
E150Green = n*PGreen =
150*(8/28) = 42.86
E150Red = n*PRed = 150*(2/28) = 10.71
E150Yellow = n*PYellow = 150*(10/28) =53.57
In samples of 150 draws with
replacement from the bowl, we expect approximately 42
or 43 blue draws, 42 or 43 green draws, 10 or 11 red draws,
and 53 or 54 yellow draws.
n=200
E200Blue = n*PBlue= 200*(8/28) = 57.14
E200Green = n*PGreen =
200*(8/28) = 57.14
E200Red = n*PRed = 200*(2/28) = 14.29
E200Yellow = n*PYellow = 200*(10/28) = 71.43
In samples of 200 draws with
replacement from the bowl, we expect approximately 57
or 58 blue draws, 57 or 58 green draws, 14 or 15 red draws,
and 71 or 72 yellow draws.
n=250
E250Blue = n*PBlue= 250*(8/28) = 71.43
E250Green = n*PGreen =
250*(8/28) = 71.43
E250Red = n*PRed = 250*(2/28) = 17.86
E250Yellow = n*PYellow = 250*(10/28) = 89.29
In samples of 250 draws with
replacement from the bowl, we expect approximately 71
or 72 blue draws, 71 or 72 green draws, 17 or 18 red draws,
and 89 or 90 yellow draws.
n=300
E300Blue = n*PBlue= 300*(8/28) = 85.71
E300Green = n*PGreen =
300*(8/28) = 85.71
E300Red = n*PRed = 300*(2/28) = 21.43
E300Yellow = n*PYellow = 300*(10/28) = 107.14
In samples of 300 draws with
replacement from the bowl, we expect approximately 85
or 86 blue draws, 85 or 86 green draws, 21 or 22 red draws,
and 107 or 108 yellow
draws.
Samples – 8.00
Sample #1 |
Sample #2 |
Pooled 12 |
||||||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
|||
Blue |
12 |
0.24 |
14.285714 |
13 |
0.26 |
14.28571429 |
25 |
0.25 |
28.571429 |
|||
Green |
14 |
0.28 |
14.285714 |
17 |
0.34 |
14.28571429 |
31 |
0.31 |
28.571429 |
|||
Red |
5 |
0.1 |
3.5714286 |
5 |
0.1 |
3.571428571 |
10 |
0.1 |
7.1428571 |
|||
Yellow |
19 |
0.38 |
17.857143 |
15 |
0.3 |
17.85714286 |
34 |
0.34 |
35.714286 |
|
|
|
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
|||
Sample #3 |
Sample #4 |
Pooled 34 |
Pooled 1234 |
|||||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
n |
p |
E200 |
Blue |
19 |
0.38 |
14.285714 |
11 |
0.22 |
14.28571429 |
30 |
0.3 |
28.571429 |
55 |
0.275 |
57.142857 |
Green |
13 |
0.26 |
14.285714 |
21 |
0.42 |
14.28571429 |
34 |
0.34 |
28.571429 |
65 |
0.325 |
57.142857 |
Red |
6 |
0.12 |
3.5714286 |
3 |
0.06 |
3.571428571 |
9 |
0.09 |
7.1428571 |
19 |
0.095 |
14.285714 |
Yellow |
12 |
0.24 |
17.857143 |
15 |
0.3 |
17.85714286 |
27 |
0.27 |
35.714286 |
61 |
0.305 |
71.428571 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
200 |
1 |
200 |
Sample #5 |
Sample #6 |
Pooled 56 |
Pooled 3456 |
|||||||||
Color |
n |
p |
E50 |
n |
p |
E50 |
n |
p |
E100 |
n |
p |
E200 |
Blue |
15 |
0.3 |
14.285714 |
17 |
0.34 |
14.28571429 |
32 |
0.32 |
28.571429 |
62 |
0.31 |
57.142857 |
Green |
21 |
0.42 |
14.285714 |
17 |
0.34 |
14.28571429 |
38 |
0.38 |
28.571429 |
72 |
0.36 |
57.142857 |
Red |
2 |
0.04 |
3.5714286 |
2 |
0.04 |
3.571428571 |
4 |
0.04 |
7.1428571 |
13 |
0.065 |
14.285714 |
Yellow |
12 |
0.24 |
17.857143 |
14 |
0.28 |
17.85714286 |
26 |
0.26 |
35.714286 |
53 |
0.265 |
71.428571 |
Total |
50 |
1 |
50 |
50 |
1 |
50 |
100 |
1 |
100 |
200 |
1 |
200 |
Pooled 135 |
Pooled 246 |
Pooled All |
||||||||||
Color |
n |
p |
E150 |
n |
p |
E150 |
n |
p |
E300 |
|||
Blue |
46 |
0.30667 |
42.857143 |
41 |
0.27333333 |
42.85714286 |
87 |
0.29 |
85.714286 |
|||
Green |
48 |
0.32 |
42.857143 |
55 |
0.36666667 |
42.85714286 |
103 |
0.34333333 |
85.714286 |
|||
Red |
13 |
0.08667 |
10.714286 |
10 |
0.06666667 |
10.71428571 |
23 |
0.07666667 |
21.428571 |
|||
Yellow |
43 |
0.28667 |
53.571429 |
44 |
0.29333333 |
53.57142857 |
87 |
0.29 |
107.14286 |
|
|
|
Total |
150 |
1 |
150 |
150 |
1 |
150 |
300 |
1 |
300 |
The structure of the
bowl, expressed as color proportions, determines the basic structure of samples
drawn from the bowl. Probability models allow the prediction of sample
behavior, but said predictions are only as reliable as the validity of the
original model and of the sampling procedures.
In
both models, the results were choppy – the sample sizes were insufficient to
fully stabilize the sample frequencies. As a result, in most samples, one or
two colors were appreciably off. Despite the volatility seen in the samples, we
did see improvements with increasing sample size.
The foundation of
statistical applications is the careful preparation of a study population and
the random sampling procedures to go with it. Proper execution of this sampling
procedure ensures a potable sample.
You are now ready to learn the Long
Run Argument and Perfect Sample case types in 1st Hourly Stuff.