Summaries

Session 1.2

20th January 2010

Predicting Sample Behavior from a Model

We use a population model to predict the behavior of random samples. We check the predictions by direct inspection of samples. We repeat sampling with replacement, obtaining multiple random samples from the same population, obtained in the same process. We combine (pool) compatible samples to form larger samples. Pooling samples of size 50, we obtain samples of size 100, 150 and 300. In general, as sample size increases, samples become more precise and reliable, provided that the sampling process is reliable.

In general, if we are working with the correct model, then the predicted sample behavior reliably describes observed samples.

Session Overview

Estimation: Previously, we saw how random samples drawn from a population could be used to estimate the structure of a population as an empirical model.

Prediction: In this session, we continue our study of probability. We begin with a very basic example of a population with known structure, and use that structure to predict the behavior of random samples from that population.

We begin by constructing a probability model for our population, and define the concept of perfect sample. We then relate the perfect sample to random samples, both observed and unobserved. We then obtain real random samples, and check them against the perfect samples.

Exclude Case Study 1.2.1

Case Study 1.2.2

In this case study the idea of a perfect sample is introduced – a perfect sample matches perfectly the population from which it is sampled. On average, real samples corresponded nicely, though not perfectly, with their corresponding perfect samples.

We begin by building a color bowl. We then compute a probability model for draws with replacement(DWR) from the bowl, and then perfect samples of size 50, 100, 150, 200, 250 and 300. We then engage six groups to generate six samples each of n=50 DWR. We then compare sample frequencies and proportions to the model and to the perfect samples.

Bowl Counts and Perfect Sample Calculations

Expected Count Blue ( for Sample Size n) = n*PBlue

Expected Count Green ( for Sample Size n) = n*PGreen

Expected Count Red ( for Sample Size n) = n*PRed

Expected Count Yellow ( for Sample Size n) = n*PYellow

Revise for Spring 2010

The Bowl

Color

Count

Proportion

Percent

Blue

8

8/20 = 0.4

100*.40 = 40

Green

3

3/20 = 0.15

100*.15 = 15

Red

2

2/20 = 0.1

100*.10 = 10

Yellow

7

7/20 = 0.35

100*.35 = 35

Total

20

20/20 = 1

100*1 = 100

 

Probabilities with Long Run Interpretation

 

PBlue = 8/20 = .40

In long runs of draws with replacement from the bowl, approximately 40% of draws show blue.

 

PGreen = 3/20 = .15

In long runs of draws with replacement from the bowl, approximately 15% of draws show green.

 

PRed = 2/2= .10

In long runs of draws with replacement from the bowl, approximately 10% of draws show red.

 

PYellow = 7/20 = .35

In long runs of draws with replacement from the bowl, approximately 35% of draws show yellow.

Perfect Counts for the Bowl – n = 50, 100, 150, 200, 250 and 300 Draws with Replacement

Color

Count

Proportion

Percent

Blue

8

8/20 = 0.4

100*.40 = 40

Green

3

3/20 = 0.15

100*.15 = 15

Red

2

2/20 = 0.1

100*.10 = 10

Yellow

7

7/20 = 0.35

100*.35 = 35

Total

20

20/20 = 1

100*1 = 100

 

Color

E50

E100

E150

E200

E250

E300

Blue

50*.40 = 20

100*.40 = 40

150*.40 = 60

200*.40 = 80

250*.40 = 100

300*.40 =120

Green

50*.15 = 7.5

100*.15 = 15

150*.15 = 22.5

200*.15 = 30

250*.15 = 37.5

300*.15 = 45

Red

50*.10 = 5

100*.10 =10

150*.10 = 15

200*.10 = 20

250*.10 = 25

300*.10 = 30

Yellow

50*.35 =17.5

100*.35 = 35

100*.35 = 52.5

200*.35 = 70

250*.35 = 87.5

300*.35 = 105

Total

50

100

150

200

250

300

 

Perfect Samples 

n=50

E50Blue  = n*PBlue= 50*(8/20) = 20

E50Green  = n*PGreen = 50*(3/20) = 7.5

E50Red = n*PRed = 50*(2/20) = 5

E50Yellow = n*PYellow = 50*(7/20) = 17.5

 

In samples of 50 draws with replacement from the bowl, we expect approximately 20 blue draws, 7 or 8 green draws, 5 red draws, and 17 or 18 yellow draws.

 

n=100

E100Blue  = n*PBlue= 100*(8/20) = 40

E100Green  = n*PGreen = 100*(3/20) = 15

E100Red = n*PRed = 100*(2/20) = 10

E100Yellow = n*PYellow = 100*(7/20) = 35

 

In samples of 100 draws with replacement from the bowl, we expect approximately 40 blue draws, 15 green draws, 10 red draws, and 35 yellow draws.

 

n=150

E150Blue  = n*PBlue= 150*(8/20) = 60

E150Green  = n*PGreen = 150*(3/20) = 22.5

E150Red = n*PRed = 150*(2/20) = 15

E150Yellow = n*PYellow = 150*(7/20) = 42.5

 

In samples of 150 draws with replacement from the bowl, we expect approximately 60 blue draws, 22 or 23 green draws, 15 red draws, and 42 or 43 yellow draws.

 

n=200

E200Blue  = n*PBlue= 200*(8/20) = 80

E200Green  = n*PGreen = 200*(3/20) = 30

E200Red = n*PRed = 200*(2/20) = 20

E200Yellow = n*PYellow = 200*(7/20) = 70

 

In samples of 200 draws with replacement from the bowl, we expect approximately 80 blue draws, 30 green draws, 20 red draws, and 70 yellow draws.

 

 

n=250

E250Blue  = n*PBlue= 250*(8/20) = 100

E250Green  = n*PGreen = 250*(3/20) = 37.5

E250Red = n*PRed = 250*(2/20) = 25

E250Yellow = n*PYellow = 250*(7/20) = 87.5

 

In samples of 250 draws with replacement from the bowl, we expect approximately 100 blue draws, 37 or 38 green draws, 25 red draws, and 87 or 88 yellow draws.

 

n=300

E300Blue  = n*PBlue= 300*(8/20) = 120

E300Green  = n*PGreen = 300*(3/20) = 45

E300Red = n*PRed = 300*(2/20) = 30

E300Yellow = n*PYellow = 300*(7/20) = 105

 

In samples of 300 draws with replacement from the bowl, we expect approximately 120 blue draws, 45 green draws, 30 red draws, and 105 yellow draws.

Samples – 6.30

#1

#2

Pooled 12

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

22

0.44

44

Blue

24

0.48

48

Blue

46

0.46

46

40

Green

10

0.2

20

Green

9

0.18

18

Green

19

0.19

19

15

Red

3

0.06

6

Red

6

0.12

12

Red

9

0.09

9

10

Yellow

15

0.3

30

Yellow

11

0.22

22

Yellow

26

0.26

26

35

Total

50

1

100

Total

50

1

100

Total

100

1

100

100

#3

#4

Pooled 34

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

23

0.46

46

Blue

18

0.36

36

Blue

41

0.41

41

40

Green

7

0.14

14

Green

9

0.18

18

Green

16

0.16

16

15

Red

2

0.04

4

Red

3

0.06

6

Red

5

0.05

5

10

Yellow

18

0.36

36

Yellow

20

0.4

40

Yellow

38

0.38

38

35

Total

50

1

100

Total

50

1

100

Total

100

1

100

100

#5

#6

Pooled 56

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

22

0.44

44

Blue

18

0.36

36

Blue

40

0.4

40

40

Green

6

0.12

12

Green

9

0.18

18

Green

15

0.15

15

15

Red

5

0.1

10

Red

4

0.08

8

Red

9

0.09

9

10

Yellow

17

0.34

34

Yellow

19

0.38

38

Yellow

36

0.36

36

35

Total

50

1

100

Total

50

1

100

Total

100

1

100

100

Pooled 135

Pooled 246

Pooled All

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

67

0.4466667

44.6667

Blue

60

0.4

40

Blue

127

0.4233333

42.3333

40

Green

23

0.1533333

15.3333

Green

27

0.18

18

Green

50

0.1666667

16.6667

15

Red

10

0.0666667

6.66667

Red

13

0.086667

8.6667

Red

23

0.0766667

7.66667

10

Yellow

50

0.3333333

33.3333

Yellow

50

0.333333

33.333

Yellow

100

0.3333333

33.3333

35

Total

150

1

100

Total

150

1

100

Total

300

1

100

100

Samples – 8.00

#1

#2

Pooled 12

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

22

0.44

44

Blue

26

0.52

52

Blue

48

0.48

48

40

Green

10

0.2

20

Green

3

0.06

6

Green

13

0.13

13

15

Red

6

0.12

12

Red

2

0.04

4

Red

8

0.08

8

10

Yellow

12

0.24

24

Yellow

19

0.38

38

Yellow

31

0.31

31

35

Total

50

1

100

Total

50

1

100

Total

100

1

100

100

#3

#4

Pooled 34

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

20

0.4

40

Blue

21

0.42

42

Blue

41

0.41

41

40

Green

8

0.16

16

Green

6

0.12

12

Green

14

0.14

14

15

Red

3

0.06

6

Red

3

0.06

6

Red

6

0.06

6

10

Yellow

19

0.38

38

Yellow

20

0.4

40

Yellow

39

0.39

39

35

Total

50

1

100

Total

50

1

100

Total

100

1

100

100

#5

#6

Pooled 56

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

15

0.3

30

Blue

23

0.46

46

Blue

38

0.38

38

40

Green

11

0.22

22

Green

12

0.24

24

Green

23

0.23

23

15

Red

6

0.12

12

Red

2

0.04

4

Red

8

0.08

8

10

Yellow

18

0.36

36

Yellow

13

0.26

26

Yellow

31

0.31

31

35

Total

50

1

100

Total

50

1

100

Total

100

1

100

100

Pooled 135

Pooled 246

Pooled All

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

57

0.38

38

Blue

70

0.466667

46.667

Blue

127

0.4233333

42.333333

40

Green

29

0.193333

19.333

Green

21

0.14

14

Green

50

0.1666667

16.666667

15

Red

15

0.1

10

Red

7

0.046667

4.6667

Red

22

0.0733333

7.3333333

10

Yellow

49

0.326667

32.667

Yellow

52

0.346667

34.667

Yellow

101

0.3366667

33.666667

35

Total

150

1

100

Total

150

1

100

Total

300

1

100

100

Pooled across Sessions (6:30 + 8:00)

Super Pool

6:30

Pooled 135

Pooled 246

Pooled All

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

67

0.4467

44.667

Blue

60

0.4

40

Blue

127

0.42

42.33

40

Green

23

0.1533

15.333

Green

27

0.18

18

Green

50

0.17

16.67

15

Red

10

0.0667

6.6667

Red

13

0.087

8.66667

Red

23

0.08

7.667

10

Yellow

50

0.3333

33.333

Yellow

50

0.333

33.3333

Yellow

100

0.33

33.33

35

Total

150

1

100

Total

150

1

100

Total

300

1

100

100

8:00

Pooled 135

Pooled 246

Pooled All

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Color

Count

Proportion

Percent

Truth

Blue

57

0.38

38

Blue

70

0.467

46.6667

Blue

127

0.42

42.33

40

Green

29

0.1933

19.333

Green

21

0.14

14

Green

50

0.17

16.67

15

Red

15

0.1

10

Red

7

0.047

4.66667

Red

22

0.07

7.333

10

Yellow

49

0.3267

32.667

Yellow

52

0.347

34.6667

Yellow

101

0.34

33.67

35

Total

150

1

100

Total

150

1

100

Total

300

1

100

100

Pooled 300

Pooled 300

Pooled 600

n

p

n

p

n

p

Truth

Blue

124

0.4133

130

0.433

254

0.423

0.4

Green

52

0.1733

48

0.16

100

0.167

0.15

Red

25

0.0833

20

0.067

45

0.075

0.1

Yellow

99

0.33

102

0.34

201

0.335

0.35

Total

300

1

300

1

600

1

1

The structure of the bowl, expressed as color proportions, determines the basic structure of samples drawn from the bowl. Probability models allow the prediction of sample behavior, but said predictions are only as reliable as the validity of the original model and of the sampling procedures.

The foundation of statistical applications is the careful preparation of a study population and the random sampling procedures to go with it. Proper execution of this sampling procedure ensures a potable sample.

You are now ready to learn the Long Run Argument and Perfect Sample case types in 1st Hourly Stuff.

Case Data (Excel)