Summaries

Session 1.2

3rd June 2009

Predicting Sample Behavior from a Model

We use a population model to predict the behavior of random samples. We check the predictions by direct inspection of samples. We repeat sampling with replacement, obtaining multiple random samples from the same population, obtained in the same process. We combine (pool) compatible samples to form larger samples. Pooling samples of size 50, we obtain samples of size 100, 150 and 300. In general, as sample size increases, samples become more precise and reliable, provided that the sampling process is reliable.

In general, if we are working with the correct model, then the predicted sample behavior reliably describes observed samples.

Session Overview

Estimation: Previously, we saw how random samples drawn from a population could be used to estimate the structure of a population as an empirical model.

Prediction: In this session, we continue our study of probability. We begin with a very basic example of a population with known structure, and use that structure to predict the behavior of random samples from that population.

We begin by constructing a probability model for our population, and define the concept of perfect sample. We then relate the perfect sample to random samples, both observed and unobserved. We then obtain real random samples, and check them against the perfect samples.

Exclude Case Study 1.2.1

Case Study 1.2.2

In this case study the idea of a perfect sample is introduced – a perfect sample matches perfectly the population from which it is sampled. On average, real samples corresponded nicely, though not perfectly, with their corresponding perfect samples.

We begin by building a color bowl. We then compute a probability model for draws with replacement(DWR) from the bowl, and then perfect samples of size 50, 100, 150, 200, 250 and 300. We then engage six groups to generate six samples each of n=50 DWR. We then compare sample frequencies and proportions to the model and to the perfect samples.

Bowl Counts and Perfect Sample Calculations

Expected Count Blue ( for Sample Size n) = n*PBlue

Expected Count Green ( for Sample Size n) = n*PGreen

Expected Count Red ( for Sample Size n) = n*PRed

Expected Count Yellow ( for Sample Size n) = n*PYellow

The Bowl

Color

N

Probability(P)

Percent

Blue

3

3/21 ≈ .1429

14.29

Green

10

10/30 ≈ .3333

33.33

Red

3

3/21 ≈ .1429

14.29

Yellow

5

5/21 ≈ .2381

23.81

Total

21

1

100

 

Probabilities with Long Run Interpretation

 

PBlue = 3/21 ≈ .1429

In long runs of draws with replacement from the bowl, approximately 14.3% of draws show blue.

 

PGreen = 10/30 ≈ .3333

In long runs of draws with replacement from the bowl, approximately 33.3% of draws show green.

 

PRed = 3/21 ≈ .1429

In long runs of draws with replacement from the bowl, approximately 14.3% of draws how red.

 

PYellow = 5/21 ≈ .2381

In long runs of draws with replacement from the bowl, approximately 23.8% of draws show yellow.

Perfect Counts for the Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement

Color

P

E50=50*P

E100=100*P

E150=150*P

E200=200*P

E250=250*P

E300=300*P

Blue

0.1429

7.143

14.286

21.4286

28.5714

35.7143

42.857

Green

0.4762

23.81

47.619

71.4286

95.2381

119.048

142.86

Red

0.1429

7.143

14.286

21.4286

28.5714

35.7143

42.857

Yellow

0.2381

11.9

23.81

35.7143

47.619

59.5238

71.429

Total

1

50

100

150

200

250

300

Perfect Samples 

n=50

E50Blue  = n*PBlue= 50*(3/21) ≈ 7.1

E50Green  = n*PGreen = 50*(10/21) ≈ 23.8

E50Red = n*PRed = 50*(3/21) ≈ 7.1

E50Yellow = n*PYellow = 50*(5/21) ≈ 11.9

In samples of 50 draws with replacement from the bowl, we expect approximately 7 or 8 blue draws, 23 or 24  green draws, 7 or 8 red draws, and 11 or 12 yellow draws.

 

n=100

E100Blue  = n*PBlue= 100*(3/21) ≈ 14.3

E100Green  = n*PGreen = 100*(10/21) ≈ 47.6

E100Red = n*PRed = 100*(3/21) ≈ 14.3

E100Yellow = n*PYellow = 100*(5/21) ≈ 23.8

In samples of 100 draws with replacement from the bowl, we expect approximately 14 or 15 blue draws, 47 or 48  green draws, 14 or 15 red draws, and 23 or 24 yellow draws.

 

n=150

E150Blue  = n*PBlue= 150*(3/21) ≈ 21.4

E150Green  = n*PGreen = 150*(10/21) ≈ 71.4

E150Red = n*PRed = 150*(3/21) ≈ 21.4

E150Yellow = n*PYellow = 150*(5/21) ≈ 35.7

In samples of 150 draws with replacement from the bowl, we expect approximately 21 or 22 blue draws, 71 or 72 green draws, 21 or 22 red draws, and 35 or 36 yellow draws.

 

n=200

E200Blue  = n*PBlue= 200*(3/21) ≈ 28.6

E200Green  = n*PGreen = 200*(10/21) ≈ 95.2

E200Red = n*PRed = 200*(3/21) ≈ 28.6

E200Yellow = n*PYellow = 200*(5/21) ≈ 47.6

In samples of 200 draws with replacement from the bowl, we expect approximately 28 or 29 blue draws, 95 or 96 green draws, 28 or 29 red draws, and 47 or 48 yellow draws.

 

n=250

E250Blue  = n*PBlue= 250*(3/21) ≈ 35.7

E250Green  = n*PGreen = 250*(10/21) ≈ 119

E250Red = n*PRed = 250*(3/21) ≈ 35.7

E250Yellow = n*PYellow = 250*(5/21) ≈ 59.5

In samples of 250 draws with replacement from the bowl, we expect approximately 35 or 36 blue draws, 119 green draws, 35 or 36 red draws, and 59 or 60 yellow draws.

 

n=300

E300Blue  = n*PBlue= 300*(3/21) ≈ 42.9

E300Green  = n*PGreen = 300*(10/21) ≈ 142.9

E300Red = n*PRed = 300*(3/21) ≈ 42.9

E300Yellow = n*PYellow = 300*(5/21) ≈ 71.4

In samples of 300 draws with replacement from the bowl, we expect approximately 42 or 43 blue draws, 142 or 143 green draws, 42 or 43 red draws, and 71 or 72 yellow draws.

Samples

#1

#2

Pooled 12

Color

E50

n

p

%

Color

E50

n

p

%

Color

E100

n

p

%

Blue

7.14

7

0.14

14

Blue

7.14

4

0.08

8

Blue

14.29

11

0.11

11

Green

23.8

21

0.42

42

Green

23.8

22

0.44

44

Green

47.62

43

0.43

43

Red

7.14

4

0.08

8

Red

7.14

7

0.14

14

Red

14.29

11

0.11

11

Yellow

11.9

18

0.36

36

Yellow

11.9

17

0.34

34

Yellow

23.81

35

0.35

35

Total

50

50

1

100

Total

50

50

1

100

Total

100

100

1

100

#3

#4

Pooled 34

Color

E50

n

p

%

Color

E50

n

p

%

Color

E100

n

p

%

Blue

7.14

8

0.16

16

Blue

7.14

6

0.12

12

Blue

14.29

14

0.14

14

Green

23.8

23

0.46

46

Green

23.8

26

0.52

52

Green

47.62

49

0.49

49

Red

7.14

7

0.14

14

Red

7.14

7

0.14

14

Red

14.29

14

0.14

14

Yellow

11.9

12

0.24

24

Yellow

11.9

11

0.22

22

Yellow

23.81

23

0.23

23

Total

50

50

1

100

Total

50

50

1

100

Total

100

100

1

100

#5

#6

Pooled 56

Color

E50

n

p

%

Color

E50

n

p

%

Color

E100

n

p

%

Blue

7.14

2

0.04

4

Blue

7.14

6

0.12

12

Blue

14.29

8

0.08

8

Green

23.8

33

0.66

66

Green

23.8

33

0.66

66

Green

47.62

66

0.66

66

Red

7.14

4

0.08

8

Red

7.14

6

0.12

12

Red

14.29

10

0.1

10

Yellow

11.9

11

0.22

22

Yellow

11.9

5

0.1

10

Yellow

23.81

16

0.16

16

Total

50

50

1

100

Total

50

50

1

100

Total

100

100

1

100

Pooled 135

E150

Pooled 246

E150

Pooled All

Color

n

p

%

Color

n

p

%

Color

E300

n

p

%

Blue

21.4

17

0.113

11.33

Blue

21.4

16

0.107

10.67

Blue

42.86

33

0.11

11

Green

71.4

77

0.513

51.33

Green

71.4

81

0.54

54

Green

142.9

158

0.527

52.67

Red

21.4

15

0.1

10

Red

21.4

20

0.133

13.33

Red

42.86

35

0.117

11.67

Yellow

35.7

41

0.273

27.33

Yellow

35.7

33

0.22

22

Yellow

71.43

74

0.247

24.67

Total

150

150

1

100

Total

150

150

1

100

Total

300

300

1

100

The structure of the bowl, expressed as color proportions, determines the basic structure of samples drawn from the bowl. The perfect sample is a blueprint for the actual samples. The actual samples show choppy resemblance to the perfect sample, with samples #3, #4, pooled 34, pooled 135, pooled 246 and pooled all showing the best overall agreement. Probability models allow the prediction of sample behavior, but said predictions are only as reliable as the validity of the original model and of the sampling procedures.

The foundation of statistical applications is the careful preparation of a study population and the random sampling procedures to go with it. Proper execution of this sampling procedure ensures a potable sample.

You are now ready to learn the Long Run Argument and Perfect Sample case types in 1st Hourly Stuff.

Exclude Case Study 1.5

We now extend our study of probability to dice. We revisit the idea of a model or population proportion as a probability, and introduce the idea of a random variable.

Models

A Fair, Six-sided Die

Face Value, d6 (FV d6)

Probability

1

1/6

2

1/6

3

1/6

4

1/6

5

1/6

6

1/6

A Fair, Three-sided Die

Face Value, d3 (FV d3)

Probability

1

1/3

2

1/3

3

1/3

Using a Fair, Six-sided Die to Simulate A Fair, Three-sided Die

Face Value, d6 (FV d6)

Mapped Face Value, d3 (FV d3)

1

1

2

3

2

4

5

3

6

Probability Calculations (fair d6fair d3)

Pr{E} denotes Probability for the event E.

The Fair d6 Model

FV: Face Values: 1,2,3,4,5,6

Fair Model: Equally likely face values – 1/6 per face value

 

Pr{d6 Shows 1} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “1”.

 

Pr{d6 Shows 2} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “2”.

 

Pr{d6 Shows 3} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “3”.

 

Pr{d6 Shows 4} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “4”.

 

Pr{d6 Shows 5} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “5”.

 

Pr{d6 Shows 6} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “6”.

 

The Fair d3 Model Nested within a Fair d6 Model

 

FV: Face Values: 1(1,2), 2(3,4), 3(5,6)

Fair Model: Equally likely face values –  (2/6 =)1/3 per face value.

 

Pr{d3 shows “1”} = Pr{d6 Shows 1} + Pr{d6 Shows 2}1 = (1/6) + (1/6) = 2/6 = 1/3 @ .3333 or 33.33%

In long runs of tosses, approximately 1 toss in 3 shows “1”.

 

Pr{d3 shows “2”} = Pr{d6 Shows 3} + Pr{d6 Shows 4} = (1/6) + (1/6)2 = 2/6 = 1/3 @ .3333 or 33.33%

In long runs of tosses, approximately 1 toss in 3 shows “2”.

 

Pr{d3 shows “3”} = Pr{d6 Shows 5} + Pr{d6 Shows 6} = (1/6) + (1/6) = 2/6 = 1/33 @ .3333 or 33.33%

In long runs of tosses, approximately 1 toss in 3 shows “3”.

 

1. Additive Rule – Map Faces to Faces

2. Inheritance of Fair Model

3. Fair d3 Model from Fair d6 Model

In the samples, compare the sample proportions (p) to the model probabilities (P) listed above.

Samples

#1

#2

FV d6

n

p

FV d3

n

p

FV d6

n

p

FV d3

n

p

1

7

0.14

1

8

0.16

2

8

0.16

1

15

0.3

2

7

0.14

1

15

0.3

3

8

0.16

3

10

0.2

4

12

0.24

2

20

0.4

4

10

0.2

2

20

0.4

5

6

0.12

5

8

0.16

6

9

0.18

3

15

0.3

6

7

0.14

3

15

0.3

Total

50

1

Total

50

1

Total

50

1

Total

50

1

#3

#4

FV d6

n

p

FV d3

n

p

FV d6

n

p

FV d3

n

p

1

9

0.18

1

5

0.1

2

9

0.18

1

18

0.36

2

8

0.16

1

13

0.26

3

6

0.12

3

11

0.22

4

8

0.16

2

14

0.28

4

8

0.16

2

19

0.38

5

10

0.2

5

7

0.14

6

8

0.16

3

18

0.36

6

11

0.22

3

18

0.36

Total

50

1

Total

50

1

Total

50

1

Total

50

1

#5

#6

FV d6

n

p

FV d3

n

p

FV d6

n

p

FV d3

n

p

1

5

0.1

1

10

0.2

2

9

0.18

1

14

0.28

2

9

0.18

1

19

0.38

3

13

0.26

3

7

0.14

4

3

0.06

2

16

0.32

4

8

0.16

2

15

0.3

5

7

0.14

5

10

0.2

6

13

0.26

3

20

0.4

6

6

0.12

3

16

0.32

Total

50

1

Total

50

1

Total

50

1

Total

50

1

Pooled 1,3,5

Pooled 2,4,6

FV d6

n

p

FV d3

n

p

FV d6

n

p

FV d3

n

p

1

21

0.14

1

23

0.153

2

26

0.173

1

47

0.313

2

24

0.16

1

47

0.313

3

27

0.18

3

28

0.187

4

23

0.153

2

50

0.333

4

26

0.173

2

54

0.36

5

23

0.153

5

25

0.167

6

30

0.2

3

53

0.353

6

24

0.16

3

49

0.327

Total

150

1

Total

150

1

Total

150

1

Total

150

1

 

Pooled 1,2

FV d6

n

p

FV d3

n

p

1

15

0.15

2

15

0.15

1

30

0.3

3

18

0.18

4

22

0.22

2

40

0.4

5

14

0.14

6

16

0.16

3

30

0.3

Total

100

1

Total

100

1

Pooled 3,4

FV d6

n

p

FV d3

n

p

1

14

0.14

2

17

0.17

1

31

0.31

3

17

0.17

4

16

0.16

2

33

0.33

5

17

0.17

6

19

0.19

3

36

0.36

Total

100

1

Total

100

1

Pooled 5,6

FV d6

n

p

FV d3

n

p

1

15

0.15

2

18

0.18

1

33

0.33

3

20

0.2

4

11

0.11

2

31

0.31

5

17

0.17

6

19

0.19

3

36

0.36

Total

100

1

Total

100

1

Pooled All

FV d6

n

p

FV d3

n

p

1

44

0.147

2

50

0.167

1

94

0.313

3

55

0.183

4

49

0.163

2

104

0.347

5

48

0.16

6

54

0.18

3

102

0.34

Total

300

1

Total

300

1