Summaries

Session 1.2

24th August 2009

Predicting Sample Behavior from a Model

We use a population model to predict the behavior of random samples. We check the predictions by direct inspection of samples. We repeat sampling with replacement, obtaining multiple random samples from the same population, obtained in the same process. We combine (pool) compatible samples to form larger samples. Pooling samples of size 50, we obtain samples of size 100, 150 and 300. In general, as sample size increases, samples become more precise and reliable, provided that the sampling process is reliable.

In general, if we are working with the correct model, then the predicted sample behavior reliably describes observed samples.

Session Overview

Estimation: Previously, we saw how random samples drawn from a population could be used to estimate the structure of a population as an empirical model.

Prediction: In this session, we continue our study of probability. We begin with a very basic example of a population with known structure, and use that structure to predict the behavior of random samples from that population.

We begin by constructing a probability model for our population, and define the concept of perfect sample. We then relate the perfect sample to random samples, both observed and unobserved. We then obtain real random samples, and check them against the perfect samples.

Exclude Case Study 1.2.1

Case Study 1.2.2

In this case study the idea of a perfect sample is introduced – a perfect sample matches perfectly the population from which it is sampled. On average, real samples corresponded nicely, though not perfectly, with their corresponding perfect samples.

We begin by building a color bowl. We then compute a probability model for draws with replacement(DWR) from the bowl, and then perfect samples of size 50, 100, 150, 200, 250 and 300. We then engage six groups to generate six samples each of n=50 DWR. We then compare sample frequencies and proportions to the model and to the perfect samples.

Bowl Counts and Perfect Sample Calculations

Expected Count Blue ( for Sample Size n) = n*PBlue

Expected Count Green ( for Sample Size n) = n*PGreen

Expected Count Red ( for Sample Size n) = n*PRed

Expected Count Yellow ( for Sample Size n) = n*PYellow

Revise for Fall 2009

The Bowl

Color

Count

Proportion

Percent

Blue

8

8/20 = 0.4

100*.40 = 40

Green

3

3/20 = 0.15

100*.15 = 15

Red

2

2/20 = 0.1

100*.10 = 10

Yellow

7

7/20 = 0.35

100*.35 = 35

Total

20

20/20 = 1

100*1 = 100

 

Probabilities with Long Run Interpretation

 

PBlue = 8/20 = .40

In long runs of draws with replacement from the bowl, approximately 40% of draws show blue.

 

PGreen = 3/20 = .15

In long runs of draws with replacement from the bowl, approximately 15% of draws show green.

 

PRed = 2/2= .10

In long runs of draws with replacement from the bowl, approximately 10% of draws show red.

 

PYellow = 7/20 = .35

In long runs of draws with replacement from the bowl, approximately 35% of draws show yellow.

Perfect Counts for the Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement

Color

Count

Proportion

Percent

Blue

8

8/20 = 0.4

100*.40 = 40

Green

3

3/20 = 0.15

100*.15 = 15

Red

2

2/20 = 0.1

100*.10 = 10

Yellow

7

7/20 = 0.35

100*.35 = 35

Total

20

20/20 = 1

100*1 = 100

 

Color

E50

E100

E150

E200

E250

E300

Blue

50*.40 = 20

100*.40 = 40

150*.40 = 60

200*.40 = 80

250*.40 = 100

300*.40 =120

Green

50*.15 = 7.5

100*.15 = 15

150*.15 = 22.5

200*.15 = 30

250*.15 = 37.5

300*.15 = 45

Red

50*.10 = 5

100*.10 =10

150*.10 = 15

200*.10 = 20

250*.10 = 25

300*.10 = 30

Yellow

50*.35 =17.5

100*.35 = 35

100*.35 = 52.5

200*.35 = 70

250*.35 = 87.5

300*.35 = 105

Total

50

100

150

200

250

300

 

Perfect Samples 

n=50

E50Blue  = n*PBlue= 50*(8/20) = 20

E50Green  = n*PGreen = 50*(3/20) = 7.5

E50Red = n*PRed = 50*(2/20) = 5

E50Yellow = n*PYellow = 50*(7/20) = 17.5

 

In samples of 50 draws with replacement from the bowl, we expect approximately 20 blue draws, 7 or 8 green draws, 5 red draws, and 17 or 18 yellow draws.

 

n=100

E100Blue  = n*PBlue= 100*(8/20) = 40

E100Green  = n*PGreen = 100*(3/20) = 15

E100Red = n*PRed = 100*(2/20) = 10

E100Yellow = n*PYellow = 100*(7/20) = 35

 

In samples of 100 draws with replacement from the bowl, we expect approximately 40 blue draws, 15 green draws, 10 red draws, and 35 yellow draws.

 

n=150

E150Blue  = n*PBlue= 150*(8/20) = 60

E150Green  = n*PGreen = 150*(3/20) = 22.5

E150Red = n*PRed = 150*(2/20) = 15

E150Yellow = n*PYellow = 150*(7/20) = 42.5

 

In samples of 150 draws with replacement from the bowl, we expect approximately 60 blue draws, 22 or 23 green draws, 15 red draws, and 42 or 43 yellow draws.

 

n=200

E200Blue  = n*PBlue= 200*(8/20) = 80

E200Green  = n*PGreen = 200*(3/20) = 30

E200Red = n*PRed = 200*(2/20) = 20

E200Yellow = n*PYellow = 200*(7/20) = 70

 

In samples of 200 draws with replacement from the bowl, we expect approximately 80 blue draws, 30 green draws, 20 red draws, and 70 yellow draws.

 

 

n=250

E250Blue  = n*PBlue= 250*(8/20) = 100

E250Green  = n*PGreen = 250*(3/20) = 37.5

E250Red = n*PRed = 250*(2/20) = 25

E250Yellow = n*PYellow = 250*(7/20) = 87.5

 

In samples of 250 draws with replacement from the bowl, we expect approximately 100 blue draws, 37 or 38 green draws, 25 red draws, and 87 or 88 yellow draws.

 

n=300

E300Blue  = n*PBlue= 300*(8/20) = 120

E300Green  = n*PGreen = 300*(3/20) = 45

E300Red = n*PRed = 300*(2/20) = 30

E300Yellow = n*PYellow = 300*(7/20) = 105

 

In samples of 300 draws with replacement from the bowl, we expect approximately 120 blue draws, 45 green draws, 30 red draws, and 105 yellow draws.

Samples – 6.30

#1

#2

Pooled 12

Color

E50

n

p

%

E50

n

p

%

E100

n

p

%

Blue

20

18

0.36

36

20

22

0.44

44

40

40

0.4

40

Green

7.5

5

0.1

10

7.5

9

0.18

18

15

14

0.1

14

Red

5

11

0.22

22

5

2

0.04

4

10

13

0.1

13

Yellow

17.5

16

0.32

32

18

17

0.34

34

35

33

0.3

33

Total

50

50

1

100

50

50

1

100

100

100

1

100

#3

#4

Pooled 34

Color

E50

n

p

%

E50

n

p

%

E100

n

p

%

Blue

20

24

0.48

48

20

23

0.46

46

40

47

0.5

47

Green

7.5

9

0.18

18

7.5

8

0.16

16

15

17

0.2

17

Red

5

6

0.12

12

5

4

0.08

8

10

10

0.1

10

Yellow

17.5

11

0.22

22

18

15

0.3

30

35

26

0.3

26

Total

50

50

1

100

50

50

1

100

100

100

1

100

#5

#6

Pooled 56

Color

E50

n

p

%

E50

n

p

%

E100

n

p

%

Blue

20

22

0.44

44

20

19

0.38

38

40

41

0.4

41

Green

7.5

4

0.08

8

7.5

8

0.16

16

15

12

0.1

12

Red

5

6

0.12

12

5

3

0.06

6

10

9

0.1

9

Yellow

17.5

18

0.36

36

18

20

0.4

40

35

38

0.4

38

Total

50

50

1

100

50

50

1

100

100

100

1

100

Pooled 135

Pooled 246

E150

E150

Pooled All

Color

n

p

%

n

p

%

E300

n

p

%

Blue

60

64

0.43

42.67

60

64

0.43

42.67

120

128

0.43

42.67

Green

22.5

18

0.12

12.00

23

25

0.17

16.67

45

43

0.14

14.33

Red

15

23

0.15

15.33

15

9

0.06

6.00

30

32

0.11

10.67

Yellow

52.5

45

0.30

30.00

53

52

0.35

34.67

105

97

0.32

32.33

Total

150

150

1

100

150

150

1

100

300

300

1

100

Samples – 8.00

#1

#2

Pooled 12

Color

E50

n

p

%

E50

n

p

%

E100

n

p

%

Blue

20

17

0.34

34

20

22

0.44

44

40

39

0.4

39

Green

7.5

8

0.16

16

7.5

8

0.16

16

15

16

0.2

16

Red

5

6

0.12

12

5

6

0.12

12

10

12

0.1

12

Yellow

17.5

19

0.38

38

18

14

0.28

28

35

33

0.3

33

Total

50

50

1

100

50

50

1

100

100

100

1

100

#3

Color

E50

n

p

%

E50

n

p

%

E100

n

p

%

Blue

20

17

0.34

34

20

17

0.34

34

40

34

0.3

34

Green

7.5

5

0.1

10

7.5

5

0.1

10

15

10

0.1

10

Red

5

6

0.12

12

5

7

0.14

14

10

13

0.1

13

Yellow

17.5

22

0.44

44

18

21

0.42

42

35

43

0.4

43

Total

50

50

1

100

50

50

1

100

100

100

1

100

#5

Color

E50

n

p

%

E50

n

p

%

E100

n

p

%

Blue

20

22

0.44

44

20

17

0.34

34

40

39

0.4

39

Green

7.5

7

0.14

14

7.5

4

0.08

8

15

11

0.1

11

Red

5

3

0.06

6

5

7

0.14

14

10

10

0.1

10

Yellow

17.5

18

0.36

36

18

22

0.44

44

35

40

0.4

40

Total

50

50

1

100

50

50

1

100

100

100

1

100

Pooled 135

E150

E150

Color

n

p

%

n

p

%

E300

n

p

%

Blue

60

56

0.37

37.33

60

56

0.37

37.33

120

112

0.37

37.33

Green

22.5

20

0.13

13.33

23

17

0.11

11.33

45

37

0.12

12.33

Red

15

15

0.10

10.00

15

20

0.13

13.33

30

35

0.12

11.67

Yellow

52.5

59

0.39

39.33

53

57

0.38

38.00

105

116

0.39

38.67

Total

150

150

1

100

150

150

1

100

300

300

1

100

Pooled across Sessions (6.30 + 8.00)

Pooled 300 (Samples 135x2)

Pooled 300 (Samples 246x2)

Pooled 600 (All 12 Samples)

n

p

n

p

n

p

Blue

120

0.4 (versus .40)

120

0.4 (versus .40)

240

0.4 (versus .40)

Green

38

0.1267 (versus .15)

42

0.14 (versus .15)

80

0.133 (versus .15)

Red

38

0.1267 (versus .10)

29

0.097 (versus .10)

67

0.112 (versus .10)

Yellow

104

0.3467 (versus .35)

109

0.363 (versus .35)

213

0.355 (versus .35)

Total

300

1

300

1

600

1

 

The structure of the bowl, expressed as color proportions, determines the basic structure of samples drawn from the bowl. The perfect sample is a blueprint for the actual samples. The actual samples show choppy resemblance to the perfect sample, with samples #3, #4, pooled 34, pooled 135, pooled 246 and pooled all showing the best overall agreement. Probability models allow the prediction of sample behavior, but said predictions are only as reliable as the validity of the original model and of the sampling procedures.

The foundation of statistical applications is the careful preparation of a study population and the random sampling procedures to go with it. Proper execution of this sampling procedure ensures a potable sample.

You are now ready to learn the Long Run Argument and Perfect Sample case types in 1st Hourly Stuff.