Summaries
Session 1.2
20th January 2010
Predicting Sample Behavior from a
Model
We use a population model to predict
the behavior of random samples. We check the predictions by direct inspection
of samples. We repeat sampling with replacement, obtaining multiple random
samples from the same population, obtained in the same process. We combine
(pool) compatible samples to form larger samples. Pooling samples of size 50,
we obtain samples of size 100, 150 and 300. In general, as sample size
increases, samples become more precise and reliable, provided that the sampling
process is reliable.
In general, if we are working with
the correct model, then the predicted sample behavior reliably describes
observed samples.
Session Overview
Estimation: Previously, we saw how random samples drawn from a
population could be used to estimate the structure of a population as an
empirical model.
Prediction: In this session, we continue our study of probability. We
begin with a very basic example of a population with known structure, and use
that structure to predict the behavior of random samples from that
population.
We begin by constructing a
probability model for our population, and define the concept of perfect
sample. We then relate the perfect sample to random samples, both observed
and unobserved. We then obtain real random samples, and check them against the
perfect samples.
Exclude Case Study 1.2.1
Case Study 1.2.2
In this case study the idea of a
perfect sample is introduced – a perfect sample matches perfectly the population
from which it is sampled. On average, real samples corresponded nicely, though
not perfectly, with their corresponding perfect samples.
We begin by building a color bowl.
We then compute a probability model for draws with replacement(DWR) from the
bowl, and then perfect samples of size 50, 100, 150, 200, 250 and 300. We then
engage six groups to generate six samples each of n=50 DWR. We then compare
sample frequencies and proportions to the model and to the perfect samples.
Bowl Counts and Perfect Sample
Calculations
Expected Count Blue ( for
Sample Size n) = n*PBlue
Expected Count Green (
for Sample Size n) = n*PGreen
Expected Count Red ( for
Sample Size n) = n*PRed
Expected
Count Yellow ( for Sample Size n) = n*PYellow
Revise for Spring 2010
The Bowl
Color |
Count |
Proportion |
Percent |
Blue |
8 |
8/20 = 0.4 |
100*.40 = 40 |
Green |
3 |
3/20 = 0.15 |
100*.15 = 15 |
Red |
2 |
2/20 = 0.1 |
100*.10 = 10 |
Yellow |
7 |
7/20
= 0.35 |
100*.35
= 35 |
Total |
20 |
20/20
= 1 |
100*1
= 100 |
Probabilities with Long
Run Interpretation
PBlue = 8/20 =
.40
In long runs of draws
with replacement from the bowl, approximately 40% of draws show blue.
PGreen = 3/20
= .15
In long runs of draws
with replacement from the bowl, approximately 15% of draws show green.
PRed = 2/2= .10
In long runs of draws with
replacement from the bowl, approximately 10% of draws show red.
PYellow = 7/20 = .35
In long runs of draws with replacement
from the bowl, approximately 35% of draws show yellow.
Perfect Counts for the
Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement
Color |
Count |
Proportion |
Percent |
Blue |
8 |
8/20 = 0.4 |
100*.40 = 40 |
Green |
3 |
3/20 = 0.15 |
100*.15 = 15 |
Red |
2 |
2/20 = 0.1 |
100*.10 = 10 |
Yellow |
7 |
7/20
= 0.35 |
100*.35
= 35 |
Total |
20 |
20/20
= 1 |
100*1
= 100 |
Color |
E50 |
E100 |
E150 |
E200 |
E250 |
E300 |
Blue |
50*.40 = 20 |
100*.40 = 40 |
150*.40 = 60 |
200*.40 = 80 |
250*.40 = 100 |
300*.40 =120 |
Green |
50*.15 = 7.5 |
100*.15 = 15 |
150*.15 = 22.5 |
200*.15 = 30 |
250*.15 = 37.5 |
300*.15 = 45 |
Red |
50*.10 = 5 |
100*.10 =10 |
150*.10 = 15 |
200*.10 = 20 |
250*.10 = 25 |
300*.10 = 30 |
Yellow |
50*.35
=17.5 |
100*.35
= 35 |
100*.35
= 52.5 |
200*.35
= 70 |
250*.35
= 87.5 |
300*.35
= 105 |
Total |
50 |
100 |
150 |
200 |
250 |
300 |
Perfect Samples
n=50
E50Blue
= n*PBlue= 50*(8/20) = 20
E50Green
= n*PGreen = 50*(3/20) = 7.5
E50Red = n*PRed
= 50*(2/20) = 5
E50Yellow = n*PYellow
= 50*(7/20) = 17.5
In samples of 50 draws with
replacement from the bowl, we expect approximately 20
blue draws, 7 or 8 green draws, 5 red draws, and 17 or 18 yellow draws.
n=100
E100Blue
= n*PBlue= 100*(8/20) = 40
E100Green
= n*PGreen = 100*(3/20) = 15
E100Red = n*PRed
= 100*(2/20) = 10
E100Yellow = n*PYellow
= 100*(7/20) = 35
In samples of 100 draws with
replacement from the bowl, we expect approximately 40
blue draws, 15 green draws, 10 red draws, and 35 yellow draws.
n=150
E150Blue
= n*PBlue= 150*(8/20) = 60
E150Green
= n*PGreen = 150*(3/20) = 22.5
E150Red = n*PRed
= 150*(2/20) = 15
E150Yellow = n*PYellow
= 150*(7/20) = 42.5
In samples of 150 draws with
replacement from the bowl, we expect approximately 60
blue draws, 22 or 23 green draws, 15 red draws, and 42 or 43 yellow draws.
n=200
E200Blue
= n*PBlue= 200*(8/20) = 80
E200Green
= n*PGreen = 200*(3/20) = 30
E200Red = n*PRed
= 200*(2/20) = 20
E200Yellow = n*PYellow
= 200*(7/20) = 70
In samples of 200 draws with
replacement from the bowl, we expect approximately 80
blue draws, 30 green draws, 20 red draws, and 70 yellow draws.
n=250
E250Blue
= n*PBlue= 250*(8/20) = 100
E250Green
= n*PGreen = 250*(3/20) = 37.5
E250Red = n*PRed
= 250*(2/20) = 25
E250Yellow = n*PYellow
= 250*(7/20) = 87.5
In samples of 250 draws with
replacement from the bowl, we expect approximately 100
blue draws, 37 or 38 green draws, 25 red draws, and 87 or 88 yellow draws.
n=300
E300Blue
= n*PBlue= 300*(8/20) = 120
E300Green
= n*PGreen = 300*(3/20) = 45
E300Red = n*PRed
= 300*(2/20) = 30
E300Yellow = n*PYellow
= 300*(7/20) = 105
In samples of 300 draws with
replacement from the bowl, we expect approximately 120
blue draws, 45 green draws, 30 red draws, and 105 yellow draws.
Samples – 6.30
#1 |
#2 |
Pooled 12 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
22 |
0.44 |
44 |
Blue |
24 |
0.48 |
48 |
Blue |
46 |
0.46 |
46 |
40 |
Green |
10 |
0.2 |
20 |
Green |
9 |
0.18 |
18 |
Green |
19 |
0.19 |
19 |
15 |
Red |
3 |
0.06 |
6 |
Red |
6 |
0.12 |
12 |
Red |
9 |
0.09 |
9 |
10 |
Yellow |
15 |
0.3 |
30 |
Yellow |
11 |
0.22 |
22 |
Yellow |
26 |
0.26 |
26 |
35 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
#3 |
#4 |
Pooled 34 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
23 |
0.46 |
46 |
Blue |
18 |
0.36 |
36 |
Blue |
41 |
0.41 |
41 |
40 |
Green |
7 |
0.14 |
14 |
Green |
9 |
0.18 |
18 |
Green |
16 |
0.16 |
16 |
15 |
Red |
2 |
0.04 |
4 |
Red |
3 |
0.06 |
6 |
Red |
5 |
0.05 |
5 |
10 |
Yellow |
18 |
0.36 |
36 |
Yellow |
20 |
0.4 |
40 |
Yellow |
38 |
0.38 |
38 |
35 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
#5 |
#6 |
Pooled 56 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
22 |
0.44 |
44 |
Blue |
18 |
0.36 |
36 |
Blue |
40 |
0.4 |
40 |
40 |
Green |
6 |
0.12 |
12 |
Green |
9 |
0.18 |
18 |
Green |
15 |
0.15 |
15 |
15 |
Red |
5 |
0.1 |
10 |
Red |
4 |
0.08 |
8 |
Red |
9 |
0.09 |
9 |
10 |
Yellow |
17 |
0.34 |
34 |
Yellow |
19 |
0.38 |
38 |
Yellow |
36 |
0.36 |
36 |
35 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
Pooled 135 |
Pooled 246 |
Pooled All |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
67 |
0.4466667 |
44.6667 |
Blue |
60 |
0.4 |
40 |
Blue |
127 |
0.4233333 |
42.3333 |
40 |
Green |
23 |
0.1533333 |
15.3333 |
Green |
27 |
0.18 |
18 |
Green |
50 |
0.1666667 |
16.6667 |
15 |
Red |
10 |
0.0666667 |
6.66667 |
Red |
13 |
0.086667 |
8.6667 |
Red |
23 |
0.0766667 |
7.66667 |
10 |
Yellow |
50 |
0.3333333 |
33.3333 |
Yellow |
50 |
0.333333 |
33.333 |
Yellow |
100 |
0.3333333 |
33.3333 |
35 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
100 |
Samples – 8.00
#1 |
#2 |
Pooled 12 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
22 |
0.44 |
44 |
Blue |
26 |
0.52 |
52 |
Blue |
48 |
0.48 |
48 |
40 |
Green |
10 |
0.2 |
20 |
Green |
3 |
0.06 |
6 |
Green |
13 |
0.13 |
13 |
15 |
Red |
6 |
0.12 |
12 |
Red |
2 |
0.04 |
4 |
Red |
8 |
0.08 |
8 |
10 |
Yellow |
12 |
0.24 |
24 |
Yellow |
19 |
0.38 |
38 |
Yellow |
31 |
0.31 |
31 |
35 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
#3 |
#4 |
Pooled 34 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
20 |
0.4 |
40 |
Blue |
21 |
0.42 |
42 |
Blue |
41 |
0.41 |
41 |
40 |
Green |
8 |
0.16 |
16 |
Green |
6 |
0.12 |
12 |
Green |
14 |
0.14 |
14 |
15 |
Red |
3 |
0.06 |
6 |
Red |
3 |
0.06 |
6 |
Red |
6 |
0.06 |
6 |
10 |
Yellow |
19 |
0.38 |
38 |
Yellow |
20 |
0.4 |
40 |
Yellow |
39 |
0.39 |
39 |
35 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
#5 |
#6 |
Pooled 56 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
15 |
0.3 |
30 |
Blue |
23 |
0.46 |
46 |
Blue |
38 |
0.38 |
38 |
40 |
Green |
11 |
0.22 |
22 |
Green |
12 |
0.24 |
24 |
Green |
23 |
0.23 |
23 |
15 |
Red |
6 |
0.12 |
12 |
Red |
2 |
0.04 |
4 |
Red |
8 |
0.08 |
8 |
10 |
Yellow |
18 |
0.36 |
36 |
Yellow |
13 |
0.26 |
26 |
Yellow |
31 |
0.31 |
31 |
35 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
Pooled 135 |
Pooled 246 |
Pooled All |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
57 |
0.38 |
38 |
Blue |
70 |
0.466667 |
46.667 |
Blue |
127 |
0.4233333 |
42.333333 |
40 |
Green |
29 |
0.193333 |
19.333 |
Green |
21 |
0.14 |
14 |
Green |
50 |
0.1666667 |
16.666667 |
15 |
Red |
15 |
0.1 |
10 |
Red |
7 |
0.046667 |
4.6667 |
Red |
22 |
0.0733333 |
7.3333333 |
10 |
Yellow |
49 |
0.326667 |
32.667 |
Yellow |
52 |
0.346667 |
34.667 |
Yellow |
101 |
0.3366667 |
33.666667 |
35 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
100 |
Pooled
across Sessions (6:30 + 8:00)
Super Pool |
||||||||||||
6:30 |
||||||||||||
Pooled
135 |
Pooled
246 |
Pooled
All |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
67 |
0.4467 |
44.667 |
Blue |
60 |
0.4 |
40 |
Blue |
127 |
0.42 |
42.33 |
40 |
Green |
23 |
0.1533 |
15.333 |
Green |
27 |
0.18 |
18 |
Green |
50 |
0.17 |
16.67 |
15 |
Red |
10 |
0.0667 |
6.6667 |
Red |
13 |
0.087 |
8.66667 |
Red |
23 |
0.08 |
7.667 |
10 |
Yellow |
50 |
0.3333 |
33.333 |
Yellow |
50 |
0.333 |
33.3333 |
Yellow |
100 |
0.33 |
33.33 |
35 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
100 |
8:00 |
||||||||||||
Pooled
135 |
Pooled
246 |
Pooled
All |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
57 |
0.38 |
38 |
Blue |
70 |
0.467 |
46.6667 |
Blue |
127 |
0.42 |
42.33 |
40 |
Green |
29 |
0.1933 |
19.333 |
Green |
21 |
0.14 |
14 |
Green |
50 |
0.17 |
16.67 |
15 |
Red |
15 |
0.1 |
10 |
Red |
7 |
0.047 |
4.66667 |
Red |
22 |
0.07 |
7.333 |
10 |
Yellow |
49 |
0.3267 |
32.667 |
Yellow |
52 |
0.347 |
34.6667 |
Yellow |
101 |
0.34 |
33.67 |
35 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
100 |
Pooled
300 |
Pooled
300 |
Pooled
600 |
||||||||||
n |
p |
n
|
p |
n
|
p |
Truth |
||||||
Blue |
124 |
0.4133 |
130 |
0.433 |
254 |
0.423 |
0.4 |
|||||
Green |
52 |
0.1733 |
48 |
0.16 |
100 |
0.167 |
0.15 |
|||||
Red |
25 |
0.0833 |
20 |
0.067 |
45 |
0.075 |
0.1 |
|||||
Yellow |
99 |
0.33 |
102 |
0.34 |
201 |
0.335 |
0.35 |
|||||
Total |
300 |
1 |
300 |
1 |
600 |
1 |
1 |
The structure of the
bowl, expressed as color proportions, determines the basic structure of samples
drawn from the bowl. Probability models allow the prediction of sample
behavior, but said predictions are only as reliable as the validity of the
original model and of the sampling procedures.
The foundation of
statistical applications is the careful preparation of a study population and
the random sampling procedures to go with it. Proper execution of this sampling
procedure ensures a potable sample.
You are now ready to learn the Long
Run Argument and Perfect Sample case types in 1st
Hourly Stuff.