Summaries
Session
0.1
18th
August 2010
Session 1.1
Sampling a Simple Population
We use random sampling to estimate
an empirical model of a population. We check the empirical model by direct
inspection of the population. We repeat sampling with replacement, obtaining
multiple random samples from the same population, obtained in the same process.
We combine (pool) compatible samples to form larger samples. Pooling samples of
size 50, we obtain samples of size 100, 150 and 300. In general, as sample size
increases, samples become more precise and reliable, provided that the sampling
process is reliable.
Random sampling is the basis for
obtaining information in statistical activities. Sampling is necessary, tedious,
time consuming and expensive. Random sampling incorporates reliability,
precision and uncertainty.
In this session, we begin the study
of probability. We begin with a very basic example of a population, and explore
the process of sampling a population.
In our first case, we begin with a
fair, six-sided die. We track predicted and observed face values in six random
samples of 50 tosses of the die. We then compare our samples to what is
expected under the fair model.
We examine two modes of sampling a
population: census (total enumeration), in which every member of the
population is examined; and random sampling with replacement (SRS/WR),
in which single members are repeatedly selected from the population. One
practical reason why we would want a sampling process is that we wish to
estimate some property of the population. Total enumeration allows a definitive
settling of the question, and random sampling allows an approximate answer. In
most practical settings, the populations of interest are too difficult to
totally enumerate – the population is too large, or too complex, or cannot be
accessed in total. In practical applications, it is sufficient (and usually
necessary) to use a suitable random sample in lieu of the total population.
In our second case, we begin with a
color bowl whose true color frequencies are known. We compute a population
frequency model and then compute the expected structure for random samples from
that bowl. We obtain six (6) random samples, each consisting of 50 draws with replacement
(SRS/WR). We then compute sample color frequencies and compare them to the true
structure of the bowl.
We then explore a bit of decision
theory by playing with Ellsberg’s Urns.
Prediction and
Probabilistic Randomness: Predicting the Behavior of a Six-sided Die
Process
We have a fair, six-sided die, with face values 1, 2, 3, 4, 5 and 6.
Prior to each toss of the die, a member of the group predicts the face value
that will be observed on that toss. Upon tossing the die, the group notes the
observed face value, as well as the correctness (or lack thereof) of the
prediction. Each group produces a sample of 50 tosses.
Sample Worksheet
Prediction
and the Fair Die
Sample Grid n=50
Each cell corresponds to a single toss of the die.
X |
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
Sample results are tabulated in the form below.
Face
Value |
Count |
Prediction |
Count |
1 |
|
Hit(Correct) |
|
2 |
|
Miss(Incorrrect) |
|
3 |
|
Total |
|
4 |
|
|
|
5 |
|
|
|
6 |
|
|
|
Total |
|
|
|
Samples – Face Values
and Predictions
Here are the results for our six samples. You should be able to begin
with the counts in the table and work out the proportions and percentages.
6:30 Samples
Prediction and the Fair Die |
|||||||||||
Samples |
Samples |
Pooled |
|||||||||
#1 |
#2 |
12 |
|||||||||
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
1 |
14 |
14/50 = 0.28 |
100*.28=28 |
1 |
7 |
0.14 |
14 |
1 |
14+7=21 |
21/100=0.21 |
21 |
2 |
6 |
6/50 = 0.12 |
100*.12=12 |
2 |
10 |
0.2 |
20 |
2 |
6+10=16 |
16/100=0.16 |
16 |
3 |
9 |
9/50 = 0.18 |
100*.18=18 |
3 |
8 |
0.16 |
16 |
3 |
9+8=17 |
17/100=0.17 |
17 |
4 |
6 |
6/50 = 0.12 |
100*.12=12 |
4 |
10 |
0.2 |
20 |
4 |
6+10=16 |
16/100=0.16 |
16 |
5 |
9 |
9/50 = 0.18 |
100*.18=18 |
5 |
5 |
0.1 |
10 |
5 |
9+5=14 |
14/100=0.14 |
14 |
6 |
6 |
6/50 = 0.12 |
100*.12=12 |
6 |
10 |
0.2 |
20 |
6 |
6+10=16 |
16/100=0.16 |
16 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Prediction |
Prediction |
Prediction |
|||||||||
Hit |
9 |
0.18 |
18 |
Hit |
13 |
0.26 |
26 |
Hit |
22 |
0.22 |
22 |
Miss |
41 |
0.82 |
82 |
Miss |
37 |
0.74 |
74 |
Miss |
78 |
0.78 |
78 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Samples |
Samples |
Pooled |
|||||||||
#3 |
#4 |
34 |
|||||||||
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
1 |
4 |
0.08 |
8 |
1 |
10 |
0.2 |
20 |
1 |
14 |
0.14 |
14 |
2 |
12 |
0.24 |
24 |
2 |
6 |
0.12 |
12 |
2 |
18 |
0.18 |
18 |
3 |
7 |
0.14 |
14 |
3 |
8 |
0.16 |
16 |
3 |
15 |
0.15 |
15 |
4 |
6 |
0.12 |
12 |
4 |
12 |
0.24 |
24 |
4 |
18 |
0.18 |
18 |
5 |
14 |
0.28 |
28 |
5 |
9 |
0.18 |
18 |
5 |
23 |
0.23 |
23 |
6 |
7 |
0.14 |
14 |
6 |
5 |
0.1 |
10 |
6 |
12 |
0.12 |
12 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Prediction |
Prediction |
Prediction |
|||||||||
Hit |
8 |
0.16 |
16 |
Hit |
6 |
0.12 |
12 |
Hit |
14 |
0.14 |
14 |
Miss |
42 |
0.84 |
84 |
Miss |
44 |
0.88 |
88 |
Miss |
86 |
0.86 |
86 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Samples |
Samples |
Pooled |
|||||||||
#5 |
#6 |
56 |
|||||||||
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
1 |
8 |
0.16 |
16 |
1 |
8 |
0.16 |
16 |
1 |
16 |
0.16 |
16 |
2 |
10 |
0.2 |
20 |
2 |
6 |
0.12 |
12 |
2 |
16 |
0.16 |
16 |
3 |
10 |
0.2 |
20 |
3 |
5 |
0.1 |
10 |
3 |
15 |
0.15 |
15 |
4 |
6 |
0.12 |
12 |
4 |
9 |
0.18 |
18 |
4 |
15 |
0.15 |
15 |
5 |
9 |
0.18 |
18 |
5 |
8 |
0.16 |
16 |
5 |
17 |
0.17 |
17 |
6 |
7 |
0.14 |
14 |
6 |
14 |
0.28 |
28 |
6 |
21 |
0.21 |
21 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Prediction |
Prediction |
Prediction |
|||||||||
Hit |
13 |
0.26 |
26 |
Hit |
8 |
0.16 |
16 |
Hit |
21 |
0.21 |
21 |
Miss |
37 |
0.74 |
74 |
Miss |
42 |
0.84 |
84 |
Miss |
79 |
0.79 |
79 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Pooled |
Pooled |
Pooled |
|||||||||
135 |
246 |
123456 |
|||||||||
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
1 |
26 |
0.173333333 |
17.333 |
1 |
25 |
0.166666667 |
16.667 |
1 |
51 |
51/300 =0.17 |
17 |
2 |
28 |
0.186666667 |
18.667 |
2 |
22 |
0.146666667 |
14.667 |
2 |
50 |
50/300=0.1666667 |
16.667 |
3 |
26 |
0.173333333 |
17.333 |
3 |
21 |
0.14 |
14 |
3 |
47 |
47/300=0.1566667 |
15.667 |
4 |
18 |
0.12 |
12 |
4 |
31 |
0.206666667 |
20.667 |
4 |
49 |
50/300=0.1633333 |
16.333 |
5 |
32 |
0.213333333 |
21.333 |
5 |
22 |
0.146666667 |
14.667 |
5 |
54 |
54/300=0.18 |
18 |
6 |
20 |
0.133333333 |
13.333 |
6 |
29 |
0.193333333 |
19.333 |
6 |
49 |
49/300=0.1633333 |
16.333 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
Prediction |
Prediction |
Prediction |
|||||||||
Hit |
30 |
0.2 |
20 |
Hit |
27 |
0.18 |
18 |
Hit |
57 |
0.19 |
19 |
Miss |
120 |
0.8 |
80 |
Miss |
123 |
0.82 |
82 |
Miss |
243 |
0.81 |
81 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
In the fair die model
for this case, in long runs of tosses of the die: approximately 16⅔% of
tosses show “1”, approximately 16⅔% of tosses show “2”, approximately 16⅔%
of tosses show “3”, approximately 16⅔% of tosses show “4”, approximately
16⅔% of tosses show “5”, and approximately 16⅔% of tosses show “6.”
The sample data are generally compatible with a fair die assumption
(equally-likely face values) and with a baseline expected prediction success
rate of (1/6), or 16⅔%. Sample performance seems to improve with
increasing sample size – but the samples do not exactly fit the fair
assumption.
Sample versus Fair Model
6:30
Face Value 1: 17% (Sample) versus 16.67% (Fair Model)
Face Value 2: 16.67% (Sample) versus 16.67% (Fair Model)
Face Value 3: 15.67% (Sample) versus 16.67% (Fair Model)
Face Value 4: 16.33% (Sample) versus 16.67% (Fair Model)
Face Value 5: 18% (Sample) versus 16.67% (Fair Model)
Face Value 6: 16.33% (Sample) versus 16.67% (Fair Model)
Prediction “Hit”: 19% (Sample) versus 16.67% (Fair Model)
Prediction “Miss”: 81% (Sample) versus 83.33% (Fair Model))
8:00 Samples
Prediction and the Fair Die |
|||||||||||
Samples |
Samples |
Pooled |
|||||||||
#1 |
#2 |
12 |
|||||||||
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
1 |
7 |
0.14 |
14 |
1 |
6 |
0.12 |
12 |
1 |
13 |
0.13 |
13 |
2 |
6 |
0.12 |
12 |
2 |
5 |
0.1 |
10 |
2 |
11 |
0.11 |
11 |
3 |
14 |
0.28 |
28 |
3 |
10 |
0.2 |
20 |
3 |
24 |
0.24 |
24 |
4 |
9 |
0.18 |
18 |
4 |
7 |
0.14 |
14 |
4 |
16 |
0.16 |
16 |
5 |
6 |
0.12 |
12 |
5 |
16 |
0.32 |
32 |
5 |
22 |
0.22 |
22 |
6 |
8 |
0.16 |
16 |
6 |
6 |
0.12 |
12 |
6 |
14 |
0.14 |
14 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Prediction |
Prediction |
Prediction |
|||||||||
Hit |
10 |
0.2 |
20 |
Hit |
8 |
0.16 |
16 |
Hit |
18 |
0.18 |
18 |
Miss |
40 |
0.8 |
80 |
Miss |
42 |
0.84 |
84 |
Miss |
82 |
0.82 |
82 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Samples |
Samples |
Pooled |
|||||||||
#3 |
#4 |
34 |
|||||||||
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
1 |
6 |
0.12 |
12 |
1 |
7 |
0.14 |
14 |
1 |
13 |
0.13 |
13 |
2 |
5 |
0.1 |
10 |
2 |
7 |
0.14 |
14 |
2 |
12 |
0.12 |
12 |
3 |
12 |
0.24 |
24 |
3 |
9 |
0.18 |
18 |
3 |
21 |
0.21 |
21 |
4 |
8 |
0.16 |
16 |
4 |
3 |
0.06 |
6 |
4 |
11 |
0.11 |
11 |
5 |
5 |
0.1 |
10 |
5 |
13 |
0.26 |
26 |
5 |
18 |
0.18 |
18 |
6 |
14 |
0.28 |
28 |
6 |
11 |
0.22 |
22 |
6 |
25 |
0.25 |
25 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Prediction |
Prediction |
Prediction |
|||||||||
Hit |
8 |
0.16 |
16 |
Hit |
6 |
0.12 |
12 |
Hit |
14 |
0.14 |
14 |
Miss |
42 |
0.84 |
84 |
Miss |
44 |
0.88 |
88 |
Miss |
86 |
0.86 |
86 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Samples |
Samples |
Pooled |
|||||||||
#5 |
#6 |
56 |
|||||||||
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
1 |
6 |
0.12 |
12 |
1 |
12 |
0.24 |
24 |
1 |
18 |
0.18 |
18 |
2 |
8 |
0.16 |
16 |
2 |
8 |
0.16 |
16 |
2 |
16 |
0.16 |
16 |
3 |
7 |
0.14 |
14 |
3 |
4 |
0.08 |
8 |
3 |
11 |
0.11 |
11 |
4 |
11 |
0.22 |
22 |
4 |
6 |
0.12 |
12 |
4 |
17 |
0.17 |
17 |
5 |
12 |
0.24 |
24 |
5 |
9 |
0.18 |
18 |
5 |
21 |
0.21 |
21 |
6 |
6 |
0.12 |
12 |
6 |
11 |
0.22 |
22 |
6 |
17 |
0.17 |
17 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Prediction |
Prediction |
Prediction |
|||||||||
Hit |
7 |
0.14 |
14 |
Hit |
9 |
0.18 |
18 |
Hit |
16 |
0.16 |
16 |
Miss |
43 |
0.86 |
86 |
Miss |
41 |
0.82 |
82 |
Miss |
84 |
0.84 |
84 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
Pooled |
Pooled |
Pooled |
|||||||||
135 |
246 |
123456 |
|||||||||
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
Face Value |
Count |
Proportion |
Percent |
1 |
19 |
0.126667 |
12.667 |
1 |
25 |
0.166667 |
16.667 |
1 |
44 |
0.146667 |
14.667 |
2 |
19 |
0.126667 |
12.667 |
2 |
20 |
0.133333 |
13.333 |
2 |
39 |
0.13 |
13 |
3 |
33 |
0.22 |
22 |
3 |
23 |
0.153333 |
15.333 |
3 |
56 |
0.186667 |
18.667 |
4 |
28 |
0.186667 |
18.667 |
4 |
16 |
0.106667 |
10.667 |
4 |
44 |
0.146667 |
14.667 |
5 |
23 |
0.153333 |
15.333 |
5 |
38 |
0.253333 |
25.333 |
5 |
61 |
0.203333 |
20.333 |
6 |
28 |
0.186667 |
18.667 |
6 |
28 |
0.186667 |
18.667 |
6 |
56 |
0.186667 |
18.667 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
Prediction |
Prediction |
Prediction |
|||||||||
Hit |
25 |
0.166667 |
16.667 |
Hit |
23 |
0.153333 |
15.333 |
Hit |
48 |
0.16 |
16 |
Miss |
125 |
0.833333 |
83.333 |
Miss |
127 |
0.846667 |
84.667 |
Miss |
252 |
0.84 |
84 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
In the fair die model
for this case, in long runs of tosses of the die: approximately 16⅔% of
tosses show “1”, approximately 16⅔% of tosses show “2”, approximately
16⅔% of tosses show “3”, approximately 16⅔% of tosses show “4”,
approximately 16⅔% of tosses show “5”, and approximately 16⅔% of
tosses show “6.” The sample data are generally compatible with a fair die
assumption (equally-likely face values) and with a baseline expected prediction
success rate of (1/6), or 16⅔%. Sample performance seems to improve with
increasing sample size – but the samples do not exactly fit the fair
assumption.
Sample versus Fair Model
8:00
Face Value 1: 14.67% (Sample) versus 16.67% (Fair Model)
Face Value 2: 13% (Sample) versus 16.67% (Fair Model)
Face Value 3: 18.67% (Sample) versus 16.67% (Fair Model)
Face Value 4: 14.67% (Sample) versus 16.67% (Fair Model)
Face Value 5: 20.33% (Sample) versus 16.67% (Fair Model)
Face Value 6: 18.67% (Sample) versus 16.67% (Fair Model)
Prediction “Hit”: 16% (Sample) versus 16.67% (Fair Model)
Prediction “Miss”: 84% (Sample) versus 83.33% (Fair Model))
Case Study 1.1: A Color
Bowl
In random sampling, we
might get a complete list of colors - we'd need a total sample (census) for
that kind of listing. The sample proportions of each listed color approximate
the corresponding model proportion in the bowl itself. In census sampling,
every object in the bowl is counted. The listing is complete, and the model
proportions may be calculated directly.
The basic idea in case study 1.1 is
that random samples give imperfect pictures of what is being sampled. However,
with sufficiently large samples, these samples can reliably yield good pictures
of the processes or populations being sampled. And the essence of many
statistical applications is the study of selected processes or populations. For
a sense of the efficiency of the samples, compare sample and true percentages.
Process
We have a four color bowl, with blue, green, red and yellow marbles. Prior to each draw from the bowl,
the bowl is thoroughly mixed, giving each resident marble an approximately
equal chance of selection. After mixing, a blind (made without looking into the
bowl) draw of a single marble is made. The group notes the color of the marble,
and the marble is returned to the bowl – this is sampling with replacement. The
mixing makes the sampling random. Each group produces a sample of 50 tosses.
Each cell corresponds to a single draw with
replacement from the bowl.
Sample
Grid (n=50)
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sample results are tabulated in the form below.
Table
– Draws with Replacement
Color |
Count |
Blue |
|
Green |
|
Red |
|
Yellow |
|
Total |
|
Samples from the Color Bowl
Here are the six samples from our groups. You should be able to begin
with the counts in the table and work out the proportions and percentages.
6:30 Samples
Color Bowl I - Sampling with
Replacement |
||||||||||||
#1 |
#2 |
Pooled 12 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
20 |
20/50=0.4 |
100*.4=40 |
Blue |
25 |
0.5 |
50 |
Blue |
20+25=45 |
45/100=0.45 |
45 |
42.9 |
Green |
12 |
12/50=0.24 |
100*.24=24 |
Green |
11 |
0.22 |
22 |
Green |
12+11=23 |
23/100=0.23 |
23 |
21.4 |
Red |
15 |
15/50=0.3 |
100*.3=30 |
Red |
5 |
0.1 |
10 |
Red |
15+5=20 |
20/100=0.2 |
20 |
21.4 |
Yellow |
3 |
3/50=0.06 |
100*.06=6 |
Yellow |
9 |
0.18 |
18 |
Yellow |
3+9=12 |
12/100=0.12 |
12 |
14.3 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
#3 |
#4 |
Pooled 34 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
27 |
0.54 |
54 |
Blue |
21 |
0.42 |
42 |
Blue |
48 |
0.48 |
48 |
42.9 |
Green |
7 |
0.14 |
14 |
Green |
13 |
0.26 |
26 |
Green |
20 |
0.2 |
20 |
21.4 |
Red |
11 |
0.22 |
22 |
Red |
9 |
0.18 |
18 |
Red |
20 |
0.2 |
20 |
21.4 |
Yellow |
5 |
0.1 |
10 |
Yellow |
7 |
0.14 |
14 |
Yellow |
12 |
0.12 |
12 |
14.3 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
#5 |
#6 |
Pooled 56 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
22 |
0.44 |
44 |
Blue |
26 |
0.52 |
52 |
Blue |
48 |
0.48 |
48 |
42.9 |
Green |
9 |
0.18 |
18 |
Green |
11 |
0.22 |
22 |
Green |
20 |
0.2 |
20 |
21.4 |
Red |
12 |
0.24 |
24 |
Red |
11 |
0.22 |
22 |
Red |
23 |
0.23 |
23 |
21.4 |
Yellow |
7 |
0.14 |
14 |
Yellow |
2 |
0.04 |
4 |
Yellow |
9 |
0.09 |
9 |
14.3 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
Pooled 135 |
Pooled 246 |
Pooled All |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
69 |
0.46 |
46 |
Blue |
72 |
0.48 |
48 |
Blue |
141 |
0.47 |
47 |
42.9 |
Green |
28 |
0.1866667 |
18.6667 |
Green |
35 |
0.2333333 |
23.3333 |
Green |
63 |
0.21 |
21 |
21.4 |
Red |
38 |
0.2533333 |
25.3333 |
Red |
25 |
0.1666667 |
16.6667 |
Red |
63 |
0.21 |
21 |
21.4 |
Yellow |
15 |
0.1 |
10 |
Yellow |
18 |
0.12 |
12 |
Yellow |
33 |
0.11 |
11 |
14.3 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
100 |
Blue: 47% (Sample) versus 42.9% (Model)
Green: 21% (Sample) versus 21.4% (Model)
Red: 21% (Sample) versus 21.4% (Model)
Yellow: 11%
(Sample) versus 14.3% (Model)
8:00 Samples
Color Bowl I - Sampling with
Replacement |
||||||||||||
#1 |
#2 |
Pooled 12 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
15 |
0.3 |
30 |
Blue |
7 |
0.14 |
14 |
Blue |
22 |
0.22 |
22 |
25 |
Green |
15 |
0.3 |
30 |
Green |
23 |
0.46 |
46 |
Green |
38 |
0.38 |
38 |
35.714286 |
Red |
12 |
0.24 |
24 |
Red |
12 |
0.24 |
24 |
Red |
24 |
0.24 |
24 |
25 |
Yellow |
8 |
0.16 |
16 |
Yellow |
8 |
0.16 |
16 |
Yellow |
16 |
0.16 |
16 |
14.285714 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
#3 |
#4 |
Pooled 34 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
16 |
0.32 |
32 |
Blue |
9 |
0.18 |
18 |
Blue |
25 |
0.25 |
25 |
25 |
Green |
16 |
0.32 |
32 |
Green |
20 |
0.4 |
40 |
Green |
36 |
0.36 |
36 |
35.714286 |
Red |
10 |
0.2 |
20 |
Red |
11 |
0.22 |
22 |
Red |
21 |
0.21 |
21 |
25 |
Yellow |
8 |
0.16 |
16 |
Yellow |
10 |
0.2 |
20 |
Yellow |
18 |
0.18 |
18 |
14.285714 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
#5 |
#6 |
Pooled 56 |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
14 |
0.28 |
28 |
Blue |
12 |
0.24 |
24 |
Blue |
26 |
0.26 |
26 |
25 |
Green |
14 |
0.28 |
28 |
Green |
20 |
0.4 |
40 |
Green |
34 |
0.34 |
34 |
35.714286 |
Red |
13 |
0.26 |
26 |
Red |
11 |
0.22 |
22 |
Red |
24 |
0.24 |
24 |
25 |
Yellow |
9 |
0.18 |
18 |
Yellow |
7 |
0.14 |
14 |
Yellow |
16 |
0.16 |
16 |
14.285714 |
Total |
50 |
1 |
100 |
Total |
50 |
1 |
100 |
Total |
100 |
1 |
100 |
100 |
Pooled 135 |
Pooled 246 |
Pooled All |
||||||||||
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Color |
Count |
Proportion |
Percent |
Truth |
Blue |
45 |
0.3 |
30 |
Blue |
28 |
0.1866667 |
18.667 |
Blue |
73 |
0.243333 |
24.333 |
25 |
Green |
45 |
0.3 |
30 |
Green |
63 |
0.42 |
42 |
Green |
108 |
0.36 |
36 |
35.714286 |
Red |
35 |
0.2333333 |
23.333 |
Red |
34 |
0.2266667 |
22.667 |
Red |
69 |
0.23 |
23 |
25 |
Yellow |
25 |
0.1666667 |
16.667 |
Yellow |
25 |
0.1666667 |
16.667 |
Yellow |
50 |
0.166667 |
16.667 |
14.285714 |
Total |
150 |
1 |
100 |
Total |
150 |
1 |
100 |
Total |
300 |
1 |
100 |
100 |
Blue: 24.3% (Sample) versus 25% (Model)
Green: 36% (Sample) versus 35.7% (Model)
Red: 23% (Sample) versus 25% (Model)
Yellow: 16.7%
(Sample) versus 14.3% (Model)
Some Formulas – Proportions, Percentages,
Counts
The class represents
some property or attribute, for example, blue, or red.
Each member, or unit, of a sample can be classified – the result of the
classification of the unit is the unit’s class.
Sample Proportion (p)
nclass ~ number of units of sample in class
ntotal ~ total number of units in sample
pclass = nclass / ntotal
pclass ~ proportion of sample in class
Sample Percent (pct)
nclass ~ number of units of sample in class
ntotal ~ total number of units in sample
pclass = nclass / ntotal
pctclass = 100*(nclass / ntotal)
pctclass = 100* pclass
pctclass ~ percent of sample in class
Population Proportion
(P)
Nclass ~ number of units of population in class
Ntotal ~ total number of units in population
Pclass = Nclass /
Ntotal
Pclass ~ proportion of population in class
Population Percent (PCT)
Nclass ~ number of units of population in class
Ntotal ~ total number of units in population
Pclass = Nclass /
Ntotal
PCTclass = 100*(Nclass
/ Ntotal)
PCTclass = 100* Pclass
PCTclass ~ percent of population in class
In this setting,
nblue ~ number of blue draws in sample ntotal ~ total number of draws per sample pblue = nblue /
ntotal pblue ~ proportion of sample draws showing blue pctblue = 100*pblue pctblue ~ percent of sample draws showing blue Nblue ~ number of blue marbles in bowl Ntotal ~ total number of marbles in bowl Pblue = Nblue /
Nblue Pblue ~ proportion of marbles in bowl that are blue |
ngreen ~ number of green draws in sample ntotal ~ total number of draws per sample pgreen = ngreen
/ ngreen pgreen ~ proportion of sample draws showing green pctgreen = 100*pgreen pctgreen ~ percent of sample draws showing green Ngreen ~ number of green marbles in bowl Ntotal ~ total number of marbles in bowl Pgreen = Ngreen
/ Ngreen Pgreen ~ proportion of marbles in bowl that are
green |
nred ~ number of red draws in sample ntotal ~ total number of draws per sample pred = nred / nred pred ~ proportion of sample draws showing red pctred = 100*pred pctred ~ percent of sample draws showing red Nred ~ number of red marbles in bowl Ntotal ~ total number of marbles in bowl Pred = Nred / Nred Pred ~ proportion of marbles in bowl that are red |
nyellow ~ number of yellow draws in sample ntotal ~ total number of draws per sample pyellow = nyellow
/ nyellow pyellow ~ proportion of sample draws showing
yellow pctyellow = 100*pyellow pctyellow ~ percent of sample draws showing
yellow Nyellow ~ number of yellow marbles in bowl Ntotal ~ total number of marbles in bowl Pyellow = Nyellow
/ Nyellow Pyellow ~ proportion of marbles in bowl that
are yellow |
The
True State of the Bowl
The
6:30 Bowl
Color |
Count |
Proportion |
Percent |
Blue |
12 |
12/28 = 0.4285714 |
42.8571 |
Green |
6 |
6/28 = 0.2142857 |
21.4286 |
Red |
6 |
6/28 = 0.2142857 |
21.4286 |
Yellow |
4 |
4/28 = 0.1428571 |
14.2857 |
Total |
28 |
1 |
100 |
The true proportions are probabilities:
Pr{Blue Shows} =
In long runs of draws
with replacement from the bowl, approximately 42.8 percent of draws with replacement from
the bowl show blue.
Pr{Green Shows} =
In long runs of draws
with replacement from the bowl, approximately 21.4 percent of draws with replacement from
the bowl show green.
Pr{Red Shows} =
In long runs of draws with
replacement from the bowl, approximately 21.4 percent of draws with replacement from
the bowl show red.
Pr{Yellow Shows} =
In long runs of draws with replacement from the bowl, approximately
14.3 percent of
draws with replacement from the bowl
show yellow.
The
8:00 Bowl
Color |
Count |
Proportion |
Percent |
Blue |
7 |
7/28 = 0.25 |
25 |
Green |
10 |
10/28 = 0.3571429 |
35.714 |
Red |
7 |
7/28 = 0.25 |
25 |
Yellow |
4 |
4/28 = 0.1428571 |
14.286 |
Total |
28 |
1 |
100 |
The true proportions are probabilities:
Pr{Blue Shows} =
In long runs of draws
with replacement from the bowl, approximately 25 percent of draws with replacement from
the bowl show blue.
Pr{Green Shows} =
In long runs of draws
with replacement from the bowl, approximately 35.7 percent of draws with replacement from
the bowl show green.
Pr{Red Shows} =
In long runs of draws with
replacement from the bowl, approximately 25 percent of draws with replacement from
the bowl show red.
Pr{Yellow Shows} =
In long runs of draws with replacement from the bowl, approximately
14.3 percent of
draws with replacement from the bowl
show yellow.
We see reasonable, but not exact
matches between the sample proportions and the probabilities (P).
6:30
Blue: 47% (Sample) versus 42.9% (Model)
Green: 21% (Sample) versus 21.4% (Model)
Red: 21% (Sample) versus 21.4% (Model)
Yellow: 11%
(Sample) versus 14.3% (Model)
8:00
Blue: 24.3% (Sample) versus 25% (Model)
Green: 36% (Sample) versus 35.7% (Model)
Red: 23% (Sample) versus 25% (Model)
Yellow: 16.7%
(Sample) versus 14.3% (Model)
We didn’t get to these, so read up on
Ellsberg I and Ellsberg II.
Regarding Ellsberg I
The 1st Game: The first bowl is 50%/50% split between blue and green. The best we can do is break even, regardless of strategy.
The simplest strategy involves picking one of the colors and always betting on
that color.
The 2nd Game: The second bowl is an unknown composite of red and yellow.
We might be able to win this game if 1) there is a
dominant color and 2) we can determine that dominant color. A simple strategy
here is to pick one color and ride it for awhile. Then stop betting and check
the number of winning bets. If the color being betted is losing on a regular
basis, switch colors.
The 3rd Game: This game only makes sense if the second bowl is dominant
in red, bet on red
– if red consistently shows, stay on the second
bowl. Otherwise, either stop playing, or stick with the first bowl.
Regarding Ellsberg II
The 1st Game: The first bowl is 20% red /
40% black / 40% white. The simplest strategy involves
picking one of the colors and always betting on that color. Regardless of
betting choice, there is a 40% chance of losing for the single bet, and 20% for
getting kicked off the game.
The 2nd Game: The second bowl is 20% red /
80% black or white. The simplest strategy involves
picking one of the colors and always betting on that color. If either white or
black is sufficiently dominant, this game might be worth playing. The problem
is that regardless of the possible advantage in the white/black part of the
bowl, there is still a 20% chance of getting killed (permanently losing). But
to detect this advantage, one is forced to pick a betting color (white or
black) and spend some money.
The idea underlying the Ellsberg
games is to illustrate the concept of making decisions about selected processes
or populations by making decisions using random samples.