__THE BASICS OF PROBABILITY DISTRIBUTIONS__Under this scenario, there are ten possible finishing positions for each race. We say that there are ten bins in this distribution. What if, rather than using ten bins, we used five? The first bin would be for a first- or second-place finish, the second bin for a third-or fourth-place finish, and so on. What would have been the result? Using fewer bins on the same set of data would have resulted in a probability distribution with the same profile as one determined on the same data with more bins. That is, they would look pretty much the same graphically. However, using fewer bins does reduce the information content of a distribution. Likewise, using more bins increases the information content of a distribution. If, rather than recording the finishing position of the pole position horse in each race, we record the time the horse ran in, rounded to the nearest second, we will get more than ten bins; and thus the information content of the distribution obtained will be greater.

If we recorded the exact finish time, rather than rounding finish times to use the nearest second, we would be creating what is called a continuous distribution. In a continuous distribution, there are no bins. Think of a continuous distribution as a series of infinitely thin bins. A continuous distribution differs from a discrete distribution, the type we discussed first in that a discrete distribution is a binned distribution. Although binning does reduce the information content of a distribution, in real life it is often necessary to bin data. Therefore, in real life it is often necessary to lose some of the information content of a distribution, while keeping the profile of the distribution the same, so that you can process the distribution. Finally, you should know that it is possible to take a continuous distribution and make it discrete by binning it, but it is not possible to take a discrete distribution and make it continuous.

*Figure A continuous distribution is a series of infinitely thin bins*

When we are discussing the profits and losses of trades, we are essentially discussing a continuous distribution. A trade can take a multitude of values (although we could say that the data is binned to the nearest cent). In order to work with such a distribution, you may find it necessary to bin the data into, for example, one-hundred-dollar-wide bins. Such a distribution would have a bin for trades that made nothing to $99.99, the next bin would be for trades that made $100 to $199.99, and so on. There is a loss of information content in binning this way, yet the profile of the distribution of the trade profits and losses remains relatively unchanged.

**DESCRIPTIVE MEASURES OF DISTRIBUTIONS**

Most people are familiar with the average, or more specifically the arithmetic mean. This is simply the sum of the data points in a distribution divided by the number of data points:

A = (∑[i = 1,N] Xi)/N

where,

A = The arithmetic mean.

Xi = The ith data point.

N = The total number of data points in the distribution.

The arithmetic mean is the most common of the types of measures of location, or central tendency of a body of data, a distribution. However, you should be aware that the arithmetic mean is not the only available measure of central tendency and often it is not the best. The arithmetic mean tends to be a poor measure when a distribution has very broad tails. Suppose you randomly select data points from a distribution and calculate their mean. If you continue to do this you will find that the arithmetic means thus obtained converge poorly, if at all, when you are dealing with a distribution with very broad tails.

Another important measure of location of a distribution is the median. The median is described as the middle value when data are arranged in an array according to size. The median divides a probability distribution into two halves such that the area under the curve of one half is equal to the area under the curve of the other half. The median is frequently a better measure of central tendency than the arithmetic mean. Unlike the arithmetic mean, the median is not distorted by extreme outlier values. Further, the median can be calculated even for open-ended distributions. An open-ended distribution is a distribution in which all of the values in excess of a certain bin are thrown into one bin.

An example of an open-ended distribution is the one we were compiling when we recorded the finishing position in horse racing for the horse starting out in the pole position. Any finishes worse than tenth place were recorded as a tenth place finish. Thus, we had an open distribution. The median is extensively used by the U.S. Bureau of the Census. The third measure of central tendency is the mode-the most frequent occurrence. The mode is the peak of the distribution curve. In some distributions there is no mode and sometimes there is more than one mode. Like the median, the mode can often be regarded as a superior measure of central tendency. The mode is completely independent of extreme outlier values, and it is more readily obtained than the arithmetic mean or the median.

We have seen how the median divides the distribution into two equal areas. In the same way a distribution can be divided by three quartiles, or nine deciles or 99 percentiles. The 50th percentile is the median, and along with the 25th and 75th percentiles give us the quartiles. Finally, another term you should become familiar with is that of a quantile. A quantile is any of the N-1 variate values that divide the total frequency into N equal parts. We now return to the mean. We have discussed the arithmetic mean as a measure of central tendency of a distribution. You should be aware that there are other types of means as well. These other means are less common, but they do have significance in certain applications.

First is the geometric mean, which we saw how to calculate. The geometric mean is simply the Nth root of all the data points multiplied together.

G = (∏[i = 1,N]Xi)^(1/N)

where,

G = The geometric mean.

Xi = The ith data point.

N = The total number of data points in the distribution.

The geometric mean cannot be used if any of the variate-values is zero or negative. We can state that the arithmetic mathematical expectation is the arithmetic average outcome of each play minus the bet size. Likewise, we can state that the geometric mathematical expectation is the geometric average outcome of each play minus the bet size.

Another type of mean is the harmonic mean. This is the reciprocal of the mean of the reciprocals of the data points.

1/∏ = 1/N ∑[i = 1,N]1/Xi

where,

H = The harmonic mean.

Xi = The ith data point.

N = The total number of data points in the distribution.

The final measure of central tendency is the quadratic mean or roof mean square.

R^2 = l/N∑[i = 1,N]Xi^2

where,

R = The root mean square.

Xi = The ith data point.

N = The total number of data points in the distribution.

You should realize that the arithmetic mean (A) is always greater than or equal to the geometric mean (G), and the geometric mean is always greater than or equal to the harmonic mean (H):

H<=G<=A

where,

H = The harmonic mean.

G = The geometric mean.

A = The arithmetic mean.

**MOMENTS OF A DISTRIBUTION**

The central value or location of a distribution is often the first thing you want to know about a group of data, and often the next thing you want to know is the data's variability or "width" around that central value. We call the measures of a distributions central tendency the first moment of a distribution. The variability of the data points around this central tendency is called the second moment of a distribution. Hence the second moment measures a distribution's dispersion about the first moment. As with the measure of central tendency, many measures of dispersion are available. We cover seven of them here, starting with the least common measures and ending with the most common.

The range of a distribution is simply the difference between the largest and smallest values in a distribution. Likewise, the 10-90 percentile range is the difference between the 90th and 10th percentile points. These first two measures of dispersion measure the spread from one extreme to the other. The remaining five measures of dispersion measure the departure from the central tendency. The semi-interquartile range or quartile deviation equals one half of the distance between the first and third quartiles. This is similar to the 10-90 percentile range, except that with this measure the range is commonly divided by 2. The half-width is an even more frequently used measure of dispersion.

Here, we take the height of a distribution at its peak, the mode. If we find the point halfway up this vertical measure and run a horizontal line through it perpendicular to the vertical line, the horizontal line will touch the distribution at one point to the left and one point to the right. The distance between these two points is called the half-width. Next, the mean absolute deviation or mean deviation is the arithmetic average of the absolute value of the difference between the data points and the arithmetic average of the data points. In other words, as its name implies, it is the average distance that a data point is from the mean. Expressed mathematically:

M = 1/N ∑[i = 1,N] ABS (Xi-A)

where,

M = The mean absolute deviation.

N = The total number of data points.

Xi = The ith data point.

A = The arithmetic average of the data points.

ABS() = The absolute value function.

Equation gives us what is known as the population mean absolute deviation. You should know that the mean absolute deviation can also be calculated as what is known as the sample mean absolute deviation. To calculate the sample mean absolute deviation, replace the term 1/N in Equation with 1/(N-1). You use the sample version when you are making judgments about the population based on a sample of that population.

The next two measures of dispersion, variance and standard deviation, are the two most commonly used. Both are used extensively, so we cannot say that one is more common than the other; suffice to say they are both the most common. Like the mean absolute deviation, they can be calculated two different ways, for a population as well as a sample. The population version is shown, and again it can readily be altered to the sample version by replacing the term 1/N with 1/(N-1).

The variance is the same thing as the mean absolute deviation except that we square each difference between a data point and the average of the data points. As a result, we do not need to take the absolute value of each difference, since multiplying each difference by itself makes the result positive whether the difference was positive or negative. Further, since each distance is squared, extreme outliers will have a stronger effect on the variance than they would on the mean absolute deviation. Mathematically expressed:

V = 1/N ∑[i = 1,N] ((Xi-A)^2)

where,

V = The variance.

N = The total number of data points.

Xi = The ith data point.

A = The arithmetic average of the data points.

Finally, the standard deviation is related to the variance in that the standard deviation is simply the square root of the variance. The third moment of a distribution is called skewness, and it describes the extent of asymmetry about a distributions mean. Whereas the first two moments of a distribution have values that can be considered dimensional, skewness is defined in such a way as to make it nondimensional. It is a pure number that represents nothing more than the shape of the distribution.

*Figure Skewness*

A positive value for skewness means that the tails are thicker on the positive side of the distribution, and vice versa. A perfectly symmetrical distribution has a skewness of 0.

*Figure Skewness alters location.*

In a symmetrical distribution the mean, median, and mode are all at the same value. However, when a distribution has a nonzero value for skewness. The relationship for a skewed distribution (any distribution with a nonzero skewness) is:

Mean-Mode = 3*(Mean-Median)

As with the first two moments of a distribution, there are numerous measures for skewness, which most frequently will give different answers. These measures now follow:

S = (Mean-Mode)/Standard Deviation

S = (3*(Mean-Median))/Standard Deviation

These last two equations are often referred to as Pearson's first and second coefficients of skewness, respectively. Skewness is also commonly determined as:

S = 1/N ∑[i = 1,N] (((Xi-A)/D)^3)

where,

S = The skewness.

N = The total number of data points.

Xi = The ith data point.

A = The arithmetic average of the data points.

D = The population standard deviation of the data points.

*Figure Kurtosis.*

Finally, the fourth moment of a distribution, kurtosis measures the peakedness or flatness of a distribution. Like skewness, it is a nondimensional quantity. A curve less peaked than the Normal is said to be platykurtic, and a curve more peaked than the Normal is called leptokurtic. When the peak of the curve resembles the Normal Distribution curve, kurtosis equals zero, and we call this type of peak on a distribution mesokurtic. Like the preceding moments, kurtosis has more than one measure. The two most common are:

K = Q/P

where,

K = The kurtosis.

Q = The semi-interquartile range.

P = The 10-90 percentile range.

K = (1/N (∑[i = 1,N] (((Xi-A)/D)^ 4)))-3

where,

K = The kurtosis.

N = The total number of data points.

Xi = The ith data point.

A = The arithmetic average of the data points.

D = The population standard deviation of the data points.

Finally, it should be pointed out there is a lot more "theory" behind the moments of a distribution than is covered here, For a more in-depth discussion you should consult one of the statistics books mentioned in the Bibliography. The depth of discussion about the moments of a distribution presented here will be more than adequate for our purposes throughout this text. Thus far, we have covered data distributions in a general sense. Now we will cover the specific distribution called the Normal Distribution.

**THE NORMAL DISTRIBUTION**

Frequently the Normal Distribution is referred to as the Gaussian distribution, or de Moivre's distribution, after those who are believed to have discovered it-Karl Friedrich Gauss (1777-1855) and, about a century earlier and far more obscurely, Abraham de Moivre (1667-1754). The Normal Distribution is considered to be the most useful distribution in modeling. This is due to the fact that the Normal Distribution accurately models many phenomena. Generally speaking, we can measure heights, weights, intelligence levels, and so on from a population, and these will very closely resemble the Normal Distribution.

Let's consider what is known as Galton's board. This is a vertically mounted board in the shape of an isosceles triangle. The board is studded with pegs, one on the top row, two on the second, and so on. Each row down has one more peg than the previous row. The pegs are arranged in a triangular fashion such that when a ball is dropped in, it has a 50/50 probability of going right or left with each peg it encounters. At the base of the board is a series of troughs to record the exit gate of each ball.

*Figure Galton's board.*

The balls falling through Galton's board and arriving in the troughs will begin to form a Normal Distribution. The "deeper" the board and the more balls are dropped through, the more closely the final result will resemble the Normal Distribution. The Normal is useful in its own right, but also because it tends to be the limiting form of many other types of distributions. For example, if X is distributed binomially, then as N tends toward infinity, X tends to be Normally distributed. Further, the Normal Distribution is also the limiting form of a number of other useful probability distributions such as the Poisson, the Student's, or the T distribution. In other words, as the data (N) used in these other distributions increases, these distributions increasingly resemble the Normal Distribution.

**THE CENTRAL LIMIT THEOREM**

One of the most important applications for statistical purposes involving the Normal Distribution has to do with the distribution of averages. The averages of samples of a given size, taken such that each sampled item is selected independent of the others, will yield a distribution that is close to Normal. This is an extremely powerful fact, for it means that you can generalize about an actual random process from averages computed using sample data. Thus, we can state that if N random samples are drawn from a population, then the sums of the samples will be approximately Normally distributed, regardless of the distribution of the population from which the samples are drawn The closeness to the Normal Distribution improves as N increases. As an example, consider the distribution of numbers from 1 to 100.

This is what is known as a uniform distribution: all elements occur only once. The number 82 occurs once and only once, as does 19, and so on. Suppose now that we take a sample of five elements and we take the average of these five sampled elements. Now, we replace those five elements back into the population, and we take another sample and calculate the sample mean. If we keep on repeating this process, we will see that the sample means are Normally distributed, even though the population from which they are drawn is uniformly distributed. Furthermore, this is true regardless of how the population is distributed! The Central Limit Theorem allows us to treat the distribution of sample means as being Normal without having to know the distribution of the population. This is an enormously convenient fact for many areas of study.

If the population itself happens to be Normally distributed, then the distribution of sample means will be exactly Normal. This is true because how quickly the distribution of the sample means approaches the Normal, as N increases, is a function of how close the population is to Normal. As a general rule of thumb, if a population has a unimodal distribution-any type of distribution where there is a concentration of frequency around a single mode, and diminishing frequencies on either side of the mode (i.e., it is convex)-or is uniformly distributed, using a value of 20 for N is considered sufficient, and a value of 10 for N is considered probably sufficient. However, if the population is distributed according to the Exponential Distribution, then it may be necessary to use an N of 100 or so.

*Figure The Exponential Distribution and the Normal.*

The Central Limit Theorem, this amazingly simple and beautiful fact, validates the importance of the Normal Distribution.

**WORKING WITH THE NORMAL DISTRIBUTION**

In using the Normal Distribution, we most frequently want to find the percentage of area under the curve at a given point along the curve. In the parlance of calculus this would be called the integral of the function for the curve itself. Likewise, we could call the function for the curve itself the derivative of the function for the area under the curve. Derivatives are often noted with a prime after the variable for the function. Therefore, if we have a function, N(X), that represents the percentage of area under the curve at a given point, X, we can say that the derivative of this function, N'(X) (called N prime of X), is the function for the curve itself at point X.

We will begin with the formula for the curve itself, N'(X). This function is represented as:

N'(X) = 1/(S*(2*3.1415926536)^(1/2))*EXP(-((X-U)^2)/(2*S^2))

where,

U = The mean of the data.

S = The standard deviation of the data.

X = The observed data point.

EXP() = The exponential function.

This formula will give us the Y axis value, or the height of the curve if you Will, at any given X axis value. Often it is easier to refer to a point along the curve with reference to its X coordinate in terms of how many standard deviations it is away from the mean. Thus, a data point that was one standard deviation away from the mean would be said to be one standard unit from the mean. Further, it is often easier to subtract the mean from all of the data points, which has the effect of shifting the distribution so that it is centered over zero rather than over the mean.

Therefore, a data point that was one standard deviation to the right of the mean would now have a value of 1 on the X axis. When we make these conversions, subtracting the mean from the data points, then dividing the difference by the standard deviation of the data points, we are converting the distribution to what is called the standardized normal, which is the Normal Distribution with mean = 0 and variance = 1. Now, N'(Z) will give us the Y axis value (the height of the curve) for any value of Z:

N'(Z) = l/((2*3.1415926536)^(1/2))*EXP(-(Z^2/2))

= .398942*EXP(-(Z^2/2))

where,

Z = (X-U)/S and U = The mean of the data.

S = The standard deviation of the data.

X = The observed data point.

EXP() = The exponential function.

Equation gives us the number of standard units that the data point corresponds to-in other words, how many standard deviations away from the mean the data point is. When Equation equals 1, it is called the standard normal deviate. A standard deviation or a standard unit is sometimes referred to as a sigma. Thus, when someone speaks of an event being a "five sigma event," they are referring to an event whose probability of occurrence is the probability of being beyond five standard deviations.

Consider Figure, which shows this equation for the Normal curve. Notice that the height of the standard Normal curve is .39894. From Equation, the height is:

N'(Z) = .398942*EXP(-(Z^2/2))

N'(0) = .398942*EXP(-(0^2/2))

N'(0) = .398942

Notice that the curve is continuous-that is, there are no "breaks" in the curve as it runs from minus infinity on the left to positive infinity on the right. Notice also that the curve is symmetrical, the side to the right of the peak being the mirror image of the side to the left of the peak. Suppose we had a group of data where the mean of the data was 11 and the standard deviation of the group of data was 20. To see where a data point in that set would be located on the curve, we could first calculate it as a standard unit. Suppose the data point in question had a value of -9. To calculate how many standard units this is we first must subtract the mean from this data point:

-9 -11 = -20

Next we need to divide the result by the standard deviation:

-20/20 = -1

We can therefore say that the number of standard units is -1, when the data point equals -9, and the mean is 11, and the standard deviation is 20. In other words, we are one standard deviation away from the peak of the curve, the mean, and since this value is negative we know that it means we are one standard deviation to the left of the peak. To see where this places us on the curve itself (i.e., how high the curve is at one standard deviation left of center, or what the Y axis value of the curve is for a corresponding X axis value of -1), we need to now plug this into Equation :

N'(Z) = .398942*EXP(-(Z^2/2))

= .398942*2.7182818285^(-(-1^2/2))

= .398942*2.7182818285^(-1/2)

= .398942*.6065307

= .2419705705

Thus we can say that the height of the curve at X = -1 is .2419705705. The function N'(Z) is also often expressed as:

N'(Z) = EXP(-(Z^2/2))/((8*ATN(1))^(1/2)

= EXP(-(Z^2/2))/((8*.7853983)^(1/2)

= EXP(-(Z^2/2))/2.506629

where,

Z = (X-U)/S

and

ATN() = The arctangent function.

U = The mean of the data.

S = The standard deviation of the data.

X = The observed data point.

EXP() = The exponential function.

Nonstatisticians often find the concept of the standard deviation hard to envision. A remedy for this is to use what is known as the mean absolute deviation and convert it to and from the standard deviation in these equations. The mean absolute deviation is exactly what its name implies. The mean of the data is subtracted from each data point. The absolute values of each of these differences are then summed, and this sum is divided by the number of data points. What you end up with is the average distance each data point is away from the mean. The conversion for mean absolute deviation and standard deviation are given now:

Mean Absolute Deviation = S*((2/3.1415926536)^(1/2))

= S*.7978845609

= S*.7978845609

where,

M = The mean absolute deviation.

S = The standard deviation.

Thus we can say that in the Normal Distribution, the mean absolute deviation equals the standard deviation times .7979. Likewise:

S = M*1/.7978845609

= M*1.253314137

where,

S = The standard deviation.

M = The mean absolute deviation.

So we can also say that in the Normal Distribution the standard deviation equals the mean absolute deviation times 1.2533. Since the variance is always the standard deviation squared, we can make the conversion between variance and mean absolute deviation.

M = V^(1/2)*((2/3.1415926536)^(1/2))

= V^(l/2)*.7978845609

= V^(l/2)*.7978845609

where,

M = The mean absolute deviation.

V = The variance.

V = (M*1.253314137)^2

where,

V = The variance.

M = The mean absolute deviation.

Since the standard deviation in the standard normal curve equals 1, we can state that the mean absolute deviation in the standard normal curve equals .7979. Further, in a bell-shaped curve like the Normal, the semi-interquartile range equals approximately two-thirds of the standard deviation, and therefore the standard deviation equals about 1.5 times the semi-interquartile range. This is true of most bell-shaped distributions, not just the Normal, as are the conversions given for the mean absolute deviation and standard deviation.

**NORMAL PROBABILITIES**

We now know how to convert our raw data to standard units and how to form the curve N'(Z) itself (i.e., how to find the height of the curve, or Y coordinate for a given standard unit) as well as N'(X). To really use the Normal Probability Distribution though, we want to know what the probabilities of a certain outcome happening arc. This is not given by the height of the curve. Rather, the probabilities correspond to the area under the curve. These areas are given by the integral of this N'(Z) function which we have thus far studied. We will now concern ourselves with N(Z), the integral . to N'(Z), to find the areas under the curve (the probabilities).1

N(Z)=1-N'(Z)*((1.330274429*Y^5)-(1.821255978*Y^4)+(1.781477937*Y^3)-(.356563782*Y^2)+(.31938153*Y))

If Z<0 then N(Z) = 1-N(Z)

N'(Z) = .398942*EXP(-(Z^2/2))

where,

Y = 1/(1+2316419*ABS(Z))

and

ABS() = The absolute value function.

EXP() = The exponential function.

We will always convert our data to standard units when finding probabilities under the curve. That is, we will not describe an N(X) function, but rather we will use the N(Z) function where:

Z = (X-U)/S

and

U = The mean of the data.

S = The standard deviation of the data.

X = The observed data point.

Suppose we want to know what the probability is of an event not exceeding +2 standard units (Z = +2).

Y = 1/(1+2316419*ABS(+2))

= 1/1.4632838

= .68339443311

N'(Z) = .398942*EXP(-(+2^2/2))

= .398942*EXP(-2)

= .398942*.1353353

= .05399093525

Notice that this tells us the height of the curve at +2 standard units. Plugging these values for Y and N'(Z) into Equation we can obtain the probability of an event not exceeding +2 standard units:

N(Z)=1-N'(Z)*((1.330274429*Y^5)-(1.821255978*Y^4)+(1.781477937*Y^3)-(.356563782*Y^2)+(.31938153*Y))

=1-.05399093525*((1.330274429*.68339443311^5)-(1.821255978*.68339443311^4+1.781477937*.68339443311^3)-(.356563782*.68339443311^2)+(.31938153*.68339443311))

= 1-.05399093525*((1.330274429*.1490587)-

(1.821255978*.2181151+(1.781477937*.3191643)-(-

356563782*.467028+.31938153*.68339443311))

= 1-.05399093525*(.198288977-.3972434298+.5685841587-.16652527+.2182635596)

= 1-.05399093525*.4213679955

= 1-.02275005216

= .9772499478

Thus we can say that we can expect 97.72% of the outcomes in a Normally distributed random process to fall shy of +2 standard units.

*Figure Equation showing probability with Z = +2.*

If we wanted to know what the probabilities were for an event equaling or exceeding a prescribed number of standard units (in this case +2), we would simply amend equation, taking out the 1- in the beginning of the equation and doing away with the -Z provision (i.e., doing away with "If Z < 0 then N(Z) = 1-N(Z)"). Therefore, the second to last line in the last computation would be changed from

= 1-.02275005216 to simply .02275005216

We would therefore say that there is about a 2.275% chance that an event in a Normally distributed random process would equal or exceed +2 standard units.

*Figure Doing away with the 1- and -Z provision in Equation .*

Thus far we have looked at areas under the curve (probabilities) where we are only dealing with what are known as "1-tailed" probabilities. That is to say we have thus far looked to solve such questions as,"What are the probabilities of an event being less (more) than such-and-such standard units from the mean?" Suppose now we were to pose the question as, “What are the probabilities of an event being within so many standard units of the mean?" In other words, we wish to find out what the "e-tailed" probabilities are.

*Figure A two-tailed probability of an event being+or-2 sigma.*

This represents the probabilities of being within 2 standard units of the mean. Unlike Figure, this probability computation does not include the extreme left tail area, the area of less than -2 standard units. To calculate the probability of being within Z standard units of the mean, you must first calculate the I-tailed probability of the absolute value of Z. This will be yourinput to the next Equation, which gives us the 2-tailed probabilities (i.e., the probabilities of being within ABS(Z) standard units of the mean):

e-tailed probability = 1-((1-N(ABS(Z)))*2)

If we are considering what our probabilities of occurrence within 2 standard deviations are (Z = 2), then from Equation we know that N(2) = .9772499478, and using this as input to Equation :

2-tailed probability = 1-((1-.9772499478)*2)

= 1-(.02275005216* 2)

= 1-.04550010432

= .9544998957

= 1-(.02275005216* 2)

= 1-.04550010432

= .9544998957

Thus we can state from this equation that the probability of an event in a Normally distributed random process falling within 2 standard units of the mean is about 95.45%.

*Figure Two-tailed probability of an event being beyond 2 sigma.*

Just as with equation, we can eliminate the leading 1- in equation to obtain (1-N(ABS(Z)))*2, which represents the probabilities of an event falling outside of ABS(Z) standard units of the mean. For the example where Z = 2, we can state that the probabilities of an event in a Normally distributed random process falling outside of 2 standard units is:

2 tailed probability (outside) = (1-.9772499478)*2

= .02275005216*2

= .04550010432

Finally, we come to the case where we want to find what the probabilities (areas under the N'(Z) curve) are for two different values of Z.

Suppose we want to find the area under the N'(Z) curve between -1 standard unit and +2 standard units. There are a couple of ways to accomplish this. To begin with, we can compute the probability of not exceeding +2 standard units with equation, and from this we can subtract the probability of not exceeding -1 standard units. This would give us:

.9772499478-.1586552595

= .8185946883

Another way we could have performed this is to take the number 1, representing the entire area under the curve, and then subtract the sum of the probability of not exceeding -1 standard unit and the probability of exceeding 2 standard units:

= 1-(.022750052+.1586552595)

= 1 .1814053117

= .8185946883

= 1 .1814053117

= .8185946883

With the basic mathematical tools regarding the Normal Distribution thus far, you can now use your powers of reasoning to figure any probabilities of occurrence for Normally distributed random variables.