Summaries

Session 1.2

26th January 2011

Predicting Sample Behavior from a Model

We use a population model to predict the behavior of random samples. We check the predictions by direct inspection of samples. We repeat sampling with replacement, obtaining multiple random samples from the same population, obtained in the same process. We combine (pool) compatible samples to form larger samples. Pooling samples of size 50, we obtain samples of size 100, 150 and 300. In general, as sample size increases, samples become more precise and reliable, provided that the sampling process is reliable.

In general, if we are working with the correct model, then the predicted sample behavior reliably describes observed samples.

Session Overview

Estimation: Previously, we saw how random samples drawn from a population could be used to estimate the structure of a population as an empirical model.

Prediction: In this session, we continue our study of probability. We begin with a very basic example of a population with known structure, and use that structure to predict the behavior of random samples from that population.

We begin by constructing a probability model for our population, and define the concept of perfect sample. We then relate the perfect sample to random samples, both observed and unobserved. We then obtain real random samples, and check them against the perfect samples.

Exclude Case Study 1.2.1

Case Study 1.2.2

In this case study the idea of a perfect sample is introduced – a perfect sample matches perfectly the population from which it is sampled. On average, real samples corresponded nicely, though not perfectly, with their corresponding perfect samples.

We begin by building a color bowl. We then compute a probability model for draws with replacement (DWR) from the bowl, and then compute perfect samples of size 50, 100, 150, 200, 250 and 300. We then engage six groups to generate six samples each of n=50 DWR. We then compare sample frequencies and proportions to the model and to the perfect samples.

Bowl Counts and Perfect Sample Calculations

Expected Count Blue ( for Sample Size n) = n*PBlue

Expected Count Green ( for Sample Size n) = n*PGreen

Expected Count Red ( for Sample Size n) = n*PRed

Expected Count Yellow ( for Sample Size n) = n*PYellow

6:30 Model

 

Color

N

P

E50

E100

E150

E200

E250

E300

Blue

2

2/15 = 0.133

(2/15)*50 = 6.67

(2/15)*100 = 13.33

(2/15)*150 =20

(2/15)*200 =26.67

(2/15)*250 =33.333

(2/15)*300 =40

Green

1

1/15 = 0.067

(1/15)*50 = 3.33

(1/15)*100 = 6.667

(1/15)*150 = 10

(1/15)*200 =13.33

(1/15)*250 = 16.667

(1/15)*300 = 20

Red

6

6/15 = 0.4

(6/15)*50 = 20

(6/15)*100 = 40

(6/15)*150 = 60

(6/15)*200 = 80

(6/15)*250 = 100

(6/15)*300 = 120

Yellow

6

6/15 = 0.4

(6/15)*50 = 20

(6/15)*100 =40

(6/15)*150 =60

(6/15)*200 =80

(6/15)*250 =100

(6/15)*300 =120

Total

15

1

50

100

150

200

250

300

Probabilities with Long Run Interpretation

 

Color

N

P

Blue

2

2/15 = 4/30 = 0.133

Green

1

1/15 = 2/30 = 0.067

Red

6

6/15 = 12/30 = 0.4

Yellow

6

6/15 = 12/30 = 0.4

Total

15

1

 

PBlue = 2/15 = .133 

In long runs of draws with replacement from the bowl, approximately 13.3% of draws show blue.

 

PGreen = 1/15 = 6.67 

In long runs of draws with replacement from the bowl, approximately 6.7% of draws show green.

 

PRed = 6/15 = .40 

In long runs of draws with replacement from the bowl, approximately 40% of draws show red.

 

PYellow = 6/15 = .40 

In long runs of draws with replacement from the bowl, approximately 40% of draws show yellow.

Perfect Counts for the Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement

Color

N

P

E50

E100

E150

E200

E250

E300

Blue

2

2/15 = 0.133

(2/15)*50 = 6.67

(2/15)*100 = 13.33

(2/15)*150 =20

(2/15)*200 =26.67

(2/15)*250 =33.333

(2/15)*300 =40

Green

1

1/15 = 0.067

(1/15)*50 = 3.33

(1/15)*100 = 6.667

(1/15)*150 = 10

(1/15)*200 =13.33

(1/15)*250 = 16.667

(1/15)*300 = 20

Red

6

6/15 = 0.4

(6/15)*50 = 20

(6/15)*100 = 40

(6/15)*150 = 60

(6/15)*200 = 80

(6/15)*250 = 100

(6/15)*300 = 120

Yellow

6

6/15 = 0.4

(6/15)*50 = 20

(6/15)*100 =40

(6/15)*150 =60

(6/15)*200 =80

(6/15)*250 =100

(6/15)*300 =120

Total

15

1

50

100

150

200

250

300

Perfect Samples 

In samples of 50 draws with replacement from the bowl, we expect approximately 6 or 7 blue draws, 13 or 14 green draws, 20 red draws, and 20 yellow draws.

In samples of 100 draws with replacement from the bowl, we expect approximately 13 or 14 blue draws, 6 or 7 green draws, 40 red draws, and 40 yellow draws.

In samples of 150 draws with replacement from the bowl, we expect approximately 20 blue draws, 10 green draws, 60 red draws, and 60 yellow draws.

In samples of 200 draws with replacement from the bowl, we expect approximately 26 or 27 blue draws, 13 or 14 green draws, 80 red draws, and 80 yellow draws.

In samples of 250 draws with replacement from the bowl, we expect approximately 33 or 34 blue draws, 16 or 17 green draws, 100 red draws, and 100 yellow draws.

In samples of 300 draws with replacement from the bowl, we expect approximately 40 blue draws, 20 green draws, 120 red draws, and 120 yellow draws.

 

Samples – 6.30

 

Sample #1

Sample #2

Pooled 12

Color

n

p

E50

n

p

E50

n

p

E100

Blue

6

0.12

6.666666667

1

0.02

6.666666667

7

0.07

13.33333333

Green

7

0.14

3.333333333

4

0.08

3.333333333

11

0.11

6.666666667

Red

20

0.4

20

20

0.4

20

40

0.4

40

Yellow

17

0.34

20

25

0.5

20

42

0.42

40

Total

50

1

50

50

1

50

100

1

100

Sample #3

Sample #4

Pooled 34

Color

n

p

E50

n

p

E50

n

p

E100

Blue

9

0.18

6.666666667

8

0.16

6.666666667

17

0.17

13.33333333

Green

1

0.02

3.333333333

2

0.04

3.333333333

3

0.03

6.666666667

Red

19

0.38

20

16

0.32

20

35

0.35

40

Yellow

21

0.42

20

24

0.48

20

45

0.45

40

Total

50

1

50

50

1

50

100

1

100

Sample #5

Sample #6

Pooled 56

Color

n

p

E50

n

p

E50

n

p

E100

Blue

14

0.28

6.666666667

7

0.14

6.666666667

21

0.21

13.33333333

Green

1

0.02

3.333333333

4

0.08

3.333333333

5

0.05

6.666666667

Red

19

0.38

20

23

0.46

20

42

0.42

40

Yellow

16

0.32

20

16

0.32

20

32

0.32

40

Total

50

1

50

50

1

50

100

1

100

Pooled 135

Pooled 246

Pooled All

Color

n

p

E150

n

p

E150

n

p

E300

Blue

29

0.193333333

20

16

0.106666667

20

45

0.15

40

Green

9

0.06

10

10

0.066666667

10

19

0.063333333

20

Red

58

0.386666667

60

59

0.393333333

60

117

0.39

120

Yellow

54

0.36

60

65

0.433333333

60

119

0.396666667

120

Total

150

1

150

150

1

150

300

1

300

 

Pooled 1234

n

p

E200

24

0.12

26.666667

14

0.07

13.333333

75

0.375

80

87

0.435

80

200

1

200

Pooled 3456

n

p

E200

38

0.19

26.666667

8

0.04

13.333333

77

0.385

80

77

0.385

80

200

1

200

 

8:00 Model

Color

N

P

E50

E100

E150

E200

E250

E300

Blue

8

8/28 = 0.2857

50*(8/28) = 14.29

28.57

42.86

57.14

71.43

85.71

Green

10

10/28 = 0.3571

50*(10/28) = 17.86

35.71

53.57

71.43

89.29

107.14

Red

8

8/28 = 0.2857

50*(8/28) = 14.29

28.57

42.86

57.14

71.43

85.71

Yellow

2

2/28 = 0.0714

50*(2/28) = 3.57

7.14

10.71

14.29

17.86

21.43

Total

28

1

50

100

150

200

250

300

Probabilities with Long Run Interpretation

 

PBlue = 8/28 = 0.2857 

In long runs of draws with replacement from the bowl, approximately 28.57% of draws show blue.

 

PGreen = 10/28 = 0.3571 

In long runs of draws with replacement from the bowl, approximately 35.71% of draws show green.

 

PRed = 8/28 = 0.2857 

In long runs of draws with replacement from the bowl, approximately 28.57% of draws show red.

 

PYellow = 2/28 = 0.0714

In long runs of draws with replacement from the bowl, approximately 7.14% of draws show yellow.

Perfect Counts for the Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement

Color

N

P

E50

E100

E150

E200

E250

E300

Blue

8

8/28 = 0.2857

50*(8/28) = 14.29

100*(8/28) = 28.57

150*(8/28) = 42.86

200*(8/28) = 57.14

250*(8/28) = 71.43

300*(8/28) = 85.71

Green

10

10/28 = 0.3571

50*(10/28) = 17.86

35.71

53.57

71.43

89.29

107.14

Red

8

8/28 = 0.2857

50*(8/28) = 14.29

28.57

42.86

57.14

71.43

85.71

Yellow

2

2/28 = 0.0714

50*(2/28) = 3.57

7.14

10.71

14.29

17.86

21.43

Total

28

1

50

100

150

200

250

300

 

Perfect Samples 

In samples of 50 draws with replacement from the bowl, we expect approximately 14 or 15 blue draws, 17 0r 18 green draws, 14 or 15 red draws, and 7 or 8 yellow draws.

In samples of 100 draws with replacement from the bowl, we expect approximately 28 or 29 blue draws, 35 or 36 green draws, 28 or 29 red draws, and 7 or 8 yellow draws.

In samples of 150 draws with replacement from the bowl, we expect approximately 42 or 43 blue draws, 53 or 54 green draws, 42 or 43 red draws, and 10 or 11 yellow draws.

In samples of 200 draws with replacement from the bowl, we expect approximately 57 or 58 blue draws, 71 or 72 green draws, 57 or 58 red draws, and 14 or 15 yellow draws.

In samples of 250 draws with replacement from the bowl, we expect approximately 71 or 72 blue draws, 89 or 90 green draws, 71 or 72 red draws, and 17 or 18 yellow draws.

In samples of 300 draws with replacement from the bowl, we expect approximately 85 or 86 blue draws, 107 or 108 green draws, 85 or 86 red draws, and 21 or 22 yellow draws.

Samples – 8.00

Sample #1

Sample #2

Pooled 12

Color

n

p

E50

n

p

E50

n

p

E100

Blue

13

0.26

14.28571429

11

0.22

14.28571429

24

0.24

28.571429

Green

26

0.52

17.85714286

17

0.34

17.85714286

43

0.43

35.714286

Red

8

0.16

14.28571429

17

0.34

14.28571429

25

0.25

28.571429

Yellow

3

0.06

3.571428571

5

0.1

3.571428571

8

0.08

7.1428571

Total

50

1

50

50

1

50

100

1

100

Sample #3

Sample #4

Pooled 34

Color

n

p

E50

n

p

E50

n

p

E100

Blue

8

0.16

14.28571429

16

0.32

14.28571429

24

0.24

28.571429

Green

20

0.4

17.85714286

19

0.38

17.85714286

39

0.39

35.714286

Red

22

0.44

14.28571429

13

0.26

14.28571429

35

0.35

28.571429

Yellow

0

0

3.571428571

2

0.04

3.571428571

2

0.02

7.1428571

Total

50

1

50

50

1

50

100

1

100

Sample #5

Sample #6

Pooled 56

Color

n

p

E50

n

p

E50

n

p

E100

Blue

14

0.28

14.28571429

10

0.2

14.28571429

24

0.24

28.571429

Green

16

0.32

17.85714286

26

0.52

17.85714286

42

0.42

35.714286

Red

15

0.3

14.28571429

11

0.22

14.28571429

26

0.26

28.571429

Yellow

5

0.1

3.571428571

3

0.06

3.571428571

8

0.08

7.1428571

Total

50

1

50

50

1

50

100

1

100

Pooled 135

Pooled 246

Pooled All

Color

n

p

E150

n

p

E150

n

p

E300

Blue

35

0.233333333

42.85714286

37

0.24666667

42.85714286

72

0.24

85.714286

Green

62

0.413333333

53.57142857

62

0.41333333

53.57142857

124

0.413333333

107.14286

Red

45

0.3

42.85714286

41

0.27333333

42.85714286

86

0.286666667

85.714286

Yellow

8

0.053333333

10.71428571

10

0.06666667

10.71428571

18

0.06

21.428571

Total

150

1

150

150

1

150

300

1

300

 

Pooled 1234

n

p

E200

48

0.24

57.142857

82

0.41

71.428571

60

0.3

57.142857

10

0.05

14.285714

200

1

200

Pooled 3456

n

p

E200

48

0.24

57.142857

81

0.405

71.428571

61

0.305

57.142857

10

0.05

14.285714

200

1

200

 

The structure of the bowl, expressed as color proportions, determines the basic structure of samples drawn from the bowl. Probability models allow the prediction of sample behavior, but said predictions are only as reliable as the validity of the original model and of the sampling procedures.

In both models, the results were choppy – the sample sizes were insufficient to fully stabilize the sample frequencies. As a result, in most samples, one or two colors were appreciably off. Despite the volatility seen in the samples, we did see improvements with increasing sample size.

The foundation of statistical applications is the careful preparation of a study population and the random sampling procedures to go with it. Proper execution of this sampling procedure ensures a potable sample.

You are now ready to learn the Long Run Argument and Perfect Sample case types in 1st Hourly Stuff.