31st August 2009

Session 1.4

Computing Probabilities Algebraically

 

Last Look at Long Run Interpretation / Perfect Samples

 

From here: http://www.mindspring.com/~cjalverson/_1sthourlyfall2008_VBKey.htm

Case Two | Long Run Argument, Perfect Samples | Birthweight

Low birthweight is a strong marker of complications in liveborn infants. Low birthweight is strongly associated with a number of complications, including infant mortality, incomplete and impaired organ development and a number of birth defects. Suppose that the following probability model applies to year 2005 United States Resident Live Births:

Birthweight Status

Probability

Very Low Birthweight (<1500g)

.016

Low Birthweight  (1500g ≤ Birthweight < 2500g)

.067

Full Birthweight (≥ 2500g)

.917

Total

1.00

 

Each row of the model yields a statement about an event within a population.

 

Interpret each probability using the Long Run Argument.

 

Clearly specify both the event and the population in an indefinite random sampling context.

 

In long runs of random sampling of US resident Live Births during year 2005, approximately 1.6% of sampled births present birthweights strictly below 1500 grams.

 

In long runs of random sampling of US resident Live Births during year 2005, approximately 6.7% of sampled births present birthweights of 1500 grams or greater, but strictly below 2500 grams.

 

In long runs of random sampling of US resident Live Births during year 2005, approximately 91.7% of sampled births present birthweights of 2500 grams or greater.

 

Compute and discuss Perfect Samples for n=2000.

 

Show full detail in computing an expected count for each event in the model.

 

Very Low Birthweight: EVLB = 2000*Pr{VLB} = 2000*0.016 = 32

Low Birthweight: ELB = 2000*Pr{LB} = 2000*0.067 = 134

Full Birthweight: EFB = 2000*Pr{FB} = 2000*0.917 = 1834

 

Clearly specify both the event and the population in the specific random sampling context.

 

In random samples of 2000 US resident Live Births during year 2005, approximately 32 of the sampled births present birthweights strictly below 1500 grams.

 

In random samples of 2000 US resident Live Births during year 2005, approximately 134 of sampled births present birthweights of 1500 grams or greater, but strictly below 2500 grams.

 

In random samples of 2000 US resident Live Births during year 2005, approximately 1834 of sampled births present birthweights of 2500 grams or greater.

 

Probability Rules

 

Computing Probabilities Algebraically

 

A Probability function Pr{*} is a relationship between a process or population and the real interval [0,1].

 

Domain: D{Pr}= “The set of events associated with a process or a population.”

 

Range: R{Pr} = “The set of probabilities associated with the events in D{Pr}.”

 

For each event E in D{Pr}, there is a value PE=Pr{E}, called the probability for event E.

 

Pr: D(Event Space) ® PS(Probability Space)

 

Any event E with Pr{E} = 0 is impossible.

 

Events E with Pr{E} @ 0 are rare.

 

Any event E with Pr{E} = 1 is certain.

Probability Rules: A Fair, Six-sided Die

Begin with a die with six sides: 1,2,3,4,5,6. Suppose that this die is fair - that each face has an equal chance of showing in tosses of the die. From earlier discussions, this table shouldn't require much explanation:

Face Value

Probability (Proportion)

Probability (Percentage)

1

1/6

100*(1/6) » 16.67%

2

1/6

100*(1/6) » 16.67%

3

1/6

100*(1/6) » 16.67%

4

1/6

100*(1/6) » 16.67%

5

1/6

100*(1/6) » 16.67%

6

1/6

100*(1/6) » 16.67%

Total

6/6

100*(6/6) » 100%**

 

The Fair d6 Model

 

FV: Face Values: 1, 2, 3, 4, 5, 6

Fair Model: Equally likely face values – 1/6 per face value

 

Pr{d6 Shows 1} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “1”.

 

Pr{d6 Shows 2} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “2”.

 

Pr{d6 Shows 3} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “3”.

 

Pr{d6 Shows 4} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “4”.

 

Pr{d6 Shows 5} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “5”.

 

Pr{d6 Shows 6} = (1/6) @ .1667 or 16.67%

In long runs of tosses, approximately 1 toss in 6 shows “6”.

Basic Events

In repeated tosses of our die, the most basic possible outcomes are the faces themselves - the individual face values are the basic events. Each basic event has the same probability - (1/6).

Additive Rule: first, write down the simple events which form the event. Then, add the probabilities for each of those simple events - this total is the probability for the event.

Pr{Event} = Pr{One of the Items in the List Occurs} = Pr{1st List Item} + Pr{2nd List Item} + … + Pr{Last List Item}

Define the event EVEN as follows: "an even face (2,4,6) shows". Then the probability of the event EVEN can be computed as

Pr{EVEN} = Pr{ exactly one of 2 or 4 or 6 shows } = Pr{2 shows} + Pr{4 shows} + Pr{6 shows} = (1/6) + (1/6) + (1/6) = 3/6 = .50

Complementary Rule: first, write down the opposite of the event. Next, write down the simple events which form the opposite event. Then, add the probabilities for each of those simple events - this total is the probability for the opposite event. Finally, subtract the probability for the opposite event from 1. The result of this subtraction is the probability for the original event.

Event=E

Opposite Event = ~E

Compute Pr{~E}

Then compute Pr{E} = 1 - Pr{~E}

Define the event 2PLUS as "a face greater than or equal to 2 shows". Then its complementary event is Not2PLUS is "a face strictly less than 2 shows", and can be computed as

Pr{not2PLUS} = Pr{ 1 shows } = 1/6.

Then compute the probability for the event 2PLUS as :

Pr{2PLUS} =  1 - Pr{not2PLUS} = 1 - (1/6) = 5/6 ≈ .8333 or as  83.33%.

Working directly,

Pr{2PLUS} =  

Pr{one of 2,3,4,5,6 shows} =

Pr{2 shows}+ Pr{3 shows}+ Pr{4 shows}+ Pr{5 shows}+ Pr{6 shows} =

(1/6) + (1/6) + (1/6) + (1/6) + (1/6) = 5/6 ≈ .8333 or as 83.33%.

 

A Color Sequence Experiment

Suppose that we have a special box - each time we press a button on the box, it prints out a sequence of colors, in order - it prints four colors at a time. Suppose the box follows the following Probabilities for each Color Sequence:

The Model 

Color Sequence (CS)

Probability (Proportion | Percent)

BBBB

.10 | 100*(.10) = 10%

BGGB

.25 | 100*(.25) = 25%

RGGR

.05 | 100*(.05) = 05%

YYYY

.30 | 100*(.30) = 30%

BYRG

.15 | 100*(.15) = 15%

RYYB

.15 | 100*(.15) = 15%

Total

1.00 | 100*(1.00) = 100%

 

Let's define the experiment: We push the button, and then the box prints out exactly one (1) of the above listed color sequences. We then note the resulting (printed out) color sequence.

Let's discuss the simple (or basic) events.

The simple events are the color sequences. The probabilities for each color sequence are given in the table.

 

Suppose we define the event E={Blue(B) is printed in the 2nd or 3rd slot}. Compute the probability for event E, and show me how you did it. Also, interpret the probability for event E.

 

The only color sequence meeting the definition of E is the sequence BBBB.

So, we write Pr{E} = Pr{BBBB} = .10 = 10%.

This means that in long runs of box-prints, that approximately 10% of the prints will show as BBBB.

Suppose we define the event F={Yellow is printed at least once in the sequence}. Compute the probability for event F, and show me how you did it. Also, interpret the probability for event F.

The event F merely requires that Yellow be present.

Yellow is present in the following color sequences: YYYY, BYRG and RYYB.

So we write Pr{F} = Pr{exactly one of YYYY, BYRG or RYYB is printed} =

Pr{ YYYY }+Pr{ BYRG }+Pr{ RYYB } = .30+.15+.15 = .60 = 60%

In long runs of box-prints, approximately 60% of prints will contain at least one Yellow in the color sequence.

Suppose we define the event G={Green(G) is printed in the 2nd slot}. Compute the probability for event G, and show me how you did it. Also, interpret the probability for event G.

The event G requires that Green be present in the 2nd slot.

This requirement is met in the following color sequences: BGGB, RGGR.

So we write Pr{G} = Pr{exactly one of BGGB or RGGR is printed} =

Pr{ BGGB }+Pr{ RGGR = .25+.05 = .30 = 30%

In long runs of box-prints, approximately 30% of prints will show green in the 2nd slot.

Suppose we define the event H={Red is not the 1st color}. Compute t|he probability for the event H, and use the Complementary Rule. Also, interpret the probability for H.

The event notH requires that the sequence lead off with Red.

This requirement is met in the following color sequences: RGGR, RYYB.

Pr{notH} =

Pr{Red is the first color } =

Pr{exactly one of RGGR or RYYB is printed } =

Pr{ RGGR }+Pr{ RYYB } = .05 + .15 = .20 = 20%.

So, Pr{H} = 1 - Pr{notH} = 1 - .20 = .80 = 80%

So in long runs of box-prints, approximately 80% of color sequences will not show Red as the first color in the sequence.

Suppose we define the event I={Blue(B) is not the 4th color}. Compute the probability for the event I, and use the Complementary Rule. Also, interpret the probability for I.

The event notI requires that the sequence end with Blue.

This requirement is met in the following color sequences: BBBB, BGGB and RYYB.

Pr{notI} =

Pr{Blue is the 4th color } =

Pr{exactly one of BBBB, BGGB or RYYB is printed } =

Pr{BBBB}+Pr{ BGGB }+Pr{ RYYB } = .10 + .25 + .15 = 50%.

So, Pr{I} = 1 - Pr{notI} = 1 - .50 = .50 = 50%

So in long runs of box-prints, approximately 50% of color sequences will not show Blue as the last color in the sequence.

 

Events and Random Variables

 

Color Sequence (CS)

Probability (Proportion | Percent)

BBBB

.10 | 100*(.10) = 10%

BGGB

.25 | 100*(.25) = 25%

RGGR

.05 | 100*(.05) = 05%

YYYY

.30 | 100*(.30) = 30%

BYRG

.15 | 100*(.15) = 15%

RYYB

.15 | 100*(.15) = 15%

Total

1.00 | 100*(1.00) = 100%

 

Consider the random variable Blue Count, defined as the number of blue slots showing in the sequence. Blue Count groups the color sequences based on common value.

 

Blue Count

Probability (Proportion | Percent)

4 {BBBB}

.10 | 100*(.10) = 10%

2 {BGGB}

.25 | 100*(.25) = 25%

0 {RGGR, YYYY}

.05 + .30 = .35 | 35%

1 {BYRG, RYYB}

.15 + .15 = .30 | 30%

Total

1.00 | 100*(1.00) = 100%

 

Blue Count induces four events: Blue Count = 0, Blue Count = 1, Blue Count = 2, Blue Count = 4. If you’re being picky, you might include an empty event for Blue Count = 3.

 

Pr{Blue Count = 0} = Pr{One of RGGR or YYYY Shows}  = Pr{RGGR} + Pr{YYYY} = .05 + .30 = .35

Pr{Blue Count = 1} = Pr{One of BYRG or RYYB Shows} = Pr{BYRG} + Pr{RYYB} = .15 + .15 = .30

Pr{Blue Count = 2} = Pr{ BGGB } = .25

Pr{Blue Count = 3} = Pr{No qualifying sequences} = 0

Pr{Blue Count = 4} = Pr{BBBB} = .10

 

Color Sequence (CS)

Probability (Proportion | Percent)

BBBB

.10 | 100*(.10) = 10%

BGGB

.25 | 100*(.25) = 25%

RGGR

.05 | 100*(.05) = 05%

YYYY

.30 | 100*(.30) = 30%

BYRG

.15 | 100*(.15) = 15%

RYYB

.15 | 100*(.15) = 15%

Total

1.00 | 100*(1.00) = 100%

 

Consider the random variable Green Count, defined as the number of green slots showing in the sequence. Green Count groups the color sequences based on common value.

 

Color Sequence (CS)

Probability (Proportion | Percent)

0 {BBBB,YYYY, RYYB}

.10 + .30 + .15 = .55| 55%

2 {BGGB, RGGR}

.25 + .05 = .30 | 30%

1 {BYRG}

.15 | 100*(.15) = 15%

Total

1.00 | 100*(1.00) = 100%

 

Grren Count induces three events: Green Count = 0, Green Count = 1and Green Count = 2.

 

Pr{Green Count = 0} = Pr{One of BBBB, YYYYor RYYB Shows}  = Pr{ BBBB} + Pr{YYYY} +

Pr{RYYB} = .10 + .30 + .15 = .55

Pr{Green Count = 1} = Pr{BYRG} = .15

Pr{Blue Count = 2} = Pr{One of BGGB or RGGR Shows} = Pr{BGGB} + Pr{RGGR} = .25 + .05 = .30

 

Begin looking at the Color Sequence Case Type – we haven’t covered Conditional Probability yet.

 

 

Long Run Argument/Perfect Samples – should be finished

Probability Rules

Color Slot Machine

Pairs of Dice

Random Variables

Conditional Probability