## Parametric Techniques on Other Distributions- THE KOLMOGOROV-SMIRNOV (K-S) TEST

Parametric Techniques on Other Distributions

THE KOLMOGOROV-SMIRNOV (K-S) TEST

The chi-square test is no doubt the most popular of all methods of comparing two distributions. Since many market-oriented applications other than the ones. However, the best test for our purposes may well be the K-S test. This very efficient test is applicable to unbinned distributions that are a function of a single independent variable. All cumulative density functions have a minimum value of 0 and a maximum value of 1. What goes on in between differentiates them. The K-S test measures a very simple variable, D, which is defined as the maximum absolute value of the difference between two distributions' cumulative density functions. To perform the K-S test is relatively simple. N objects are standardized and sorted in ascending order.

As we go through these sorted and standardized trades, the cumulative probability is however many trades we've gone through divided by N. When we get to our first trade in the sorted sequence, the trade with the lowest standard value, the cumulative density function (CDF) is equal to 1/N. With each standard value that we pass along the way up to our highest standard value, 1 is added to the numerator until, at the end of the sequence, our CDF is equal to N/N or 1. For each standard value we can compute the theoretical distribution that we wish to compare to. Thus, we can compare our actual cumulative density to any theoretical cumulative density.

The variable D, the K-S statistic, is equal to the greatest distance between any standard values of our actual cumulative density and the value of the theoretical distribution's CDF at that standard value. Whichever standard value results in the greatest difference is assigned to the variable D. When comparing our actual CDF at a given standard value to the theoretical CDF at that standard value, we must also compare the previous standard value's actual CDF to the current standard value's actual CDF. The reason is that the actual CDF breaks upward instantaneously at the data points, and, if the actual is below the theoretical, the difference between the lines is greater the instant before the actual jumps up.

Figure  The K-S test.

Notice that at point A the actual line is above the theoretical. Therefore, we want to compare the current actual CDF value to the current theoretical value to find the greatest difference. Yet at point B, the actual line is below the theoretical. Therefore, we want to compare the previous actual value to the current theoretical value. The rationale is that we are measuring the greatest distance between the two lines. Since we are measuring at the instant the actual jumps up, we can consider using the previous value for the actual as the current value for the actual the instant before it jumps.

In summary, then, for each standard value, we want to take the absolute value of the difference between the current actual CDF value and the current theoretical CDF value. We also want to take the absolute value of the difference between the previous actual CDF value and the current theoretical CDF value. By doing this for all standard values, all points where the actual CDF jumps up by 1/N, and taking the greatest difference, we will have determined the variable D.

The lower the value of D, the more the two distributions are alike. We can readily convert the D value to a significance level by the following formula:

SIG = ∑[j = 1, ∞] (j%2)*4-2*EXP(-2*j^2*(N^(1/2)*D)^2)

where,

SIG = The significance level for a given D and N.
D = The K-S statistic.
N = The number of trades that the K-S statistic is determined over.
% = The modulus operator, the remainder from division. As it is used here, J % 2 yields the remainder when J is divided by 2.
EXP() = The exponential function.

There is no need to keep summing the values until J gets to infinity. The equation converges to a value. Once the convergence is obtained to a close enough user tolerance, there is no need to continue summing values.

To illustrate Equation by example. Suppose we had 100 trades that yielded a K-S statistic of .04:

J1 = (1%2)*4-2*EXP(-2*1^2*(100^(1/2)*.04)^2)
= 1*4-2*EXP(-2*1^2*(10*.04)^2)
= 2*EXP(-2*1^2*.4^2)
=2*EXP(-2*1*.16)
= 2*EXP(-.32)
= 2*.726149
= 1.452298

So our first value is 1.452298. Now to this we will add the next pass through the equation, and as such we must increment J by 1 so that J now equals J2:

J2 = (2%2)*4-2*EXP(-2*2^2*(100^(1/2)*.04)^2)
= 0*4-2*EXP(-2*2^2*(10*.04)^2)
= -2*EXP(-2*2^2*.4^2)
= -2*EXP(-2*4*.16)
= -2*EXP(-1.28)
= -2*.2780373
= -.5560746

Adding this value of -.5560746 back into our running sum of 1.452298 gives us a new running sum of .8962234. We again increment J by 1, so it equals J3, and perform the equation. We take the resulting sum and add it to our running total of .8962234. We keep on doing this until we converge to a value within a close enough tolerance. For our example, this point of convergence will be right around .997, depending upon how many decimal places we want to be accurate to. This answer means that for 100 trades where the greatest value between the two distributions was .04, we can be 99.7% certain that the actual distribution was generated by the theoretical distribution function. In other words, we can be 99.7% certain that the theoretical distribution function represents the actual distribution. Incidentally, this is a very good significance level.

CREATING OUR OWN CHARACTERISTIC DISTRIBUTION FUNCTION

We have determined that the Normal Probability Distribution is generally not a very good model of the distribution of trade profits and losses. Further, none of the more common probability distributions are either. Therefore, we must create a function to model the distribution of our trade profits and losses ourselves. The distribution of the logs of price changes is generally assumed to be of the stable Paretian variety. The distribution of trade P&L's can be regarded as a transformation of the distribution of prices. This transformation occurs as a result of trading techniques such as traders trying to cut their losses and let their profits run.

Hence, the distribution of trade P&L's can also be regarded as of the stable Paretian variety. What we are about to study, however, is not the stable Paretian. The stable Paretian, like all other distributional functions, models a specific probability phenomenon. The stable Paretian models the distribution of sums of independent, identically distributed random variables. The distributional function we arc about to study does not model a specific probability phenomenon. Rather, it models other unimodal distributional functions. As such, it can replicate the shape, and therefore the probability densities, of the stable Paretian as well as any other unimodal distribution.

Now we will create this function. To begin with, consider the following equation:

Y = 1/(X^2+1)

This equation graphs as a general bell-shaped curve, symmetric about the X axis, as is shown in figure.

Figure  LOC = 0 SCALE = 1 SKEW = 0 KURT = 2.

We will thus build from this general equation. The variable X can be thought of as the number of standard units we are either side of the mean, or Y axis. We can affect the first moment of this "distribution," the location, by adding a value to represent a change in location to X. Thus, the equation becomes:

Y = 1/((X-LOC)^2+1)

where,

Y = The ordinate of the characteristic function.
X = The standard value amount.
LOC = A variable representing the location, the first moment of the distribution.

Thus, if we wanted to alter location by moving it to the left by 1/2 of a standard unit, we would set LOC to -.5.

Figure  LOC =-.5 SCALE = 1 SKEW = 0 KURT = 2

Likewise, if we wanted to shift location to the right, we would use a positive value for the LOC variable. Keeping LOC at zero will result in no shift in location.

The exponent in the denominator affects kurtosis. Thus far, we have seen the distribution with the kurtosis set to a value of 2, but we can control the kurtosis of the distribution by changing the value of the exponent. This alters our characteristic function, which now appears as:

Y = 1/((X-LOC)^KURT+1)

where,

Y = The ordinate of the characteristic function.
X = The standard value amount.
LOC = A variable representing the location, the first moment of the distribution.
KURT = A variable representing kurtosis, the fourth moment of the distribution.

Figures demonstrate the effect of the kurtosis variable on our characteristic function. Note that the higher the exponent the more flat topped and thin-tailed the distribution (platykurtic), and the lower the exponent, the more pointed the peak and thicker the tails of the distribution (leptokurtic).

Figure  LOC = 0 SCALE = 1 SKEW =0 KURT = 3.

Figure  LOC = 0 SCALE = 1 SKEW = 0 KURT = 1

So that we do not run into problems with irrational numbers when KURT<1, we will use the absolute value of the coefficient in the denominator. This does not affect the shape of the curve. Thus, we can rewrite equation as:

Y = 1/(ABS(X-LOC)^KURT+1)

We can put a multiplier on the coefficient in the denominator to allow us to control the scale, the second moment of the distribution. Thus, our characteristic function has now become:

Y = 1/(ABS((X-LOC)*SCALE) ^ KURT+1)

where,

Y = The ordinate of the characteristic function.
X = The standard value amount.
LOC = A variable representing the location, the first moment of the distribution.
SCALE = A variable representing the scale, the second moment of the distribution.
KURT = A variable representing kurtosis, the fourth moment of the distribution.

Figures  demonstrate the effect of the scale parameter. The effect of this parameter can be thought of as moving the horizontal axis up or down on the distribution. When the axis is moved up, the graph is also enlarged. This has the effect of moving the horizontal axis up and enlarging the distribution curve. The result is as though we were looking at the "cap" of the distribution. As is borne out in the figure, the effect is that the horizontal axis has been moved down and the distribution curve shrunken.

Figure  LOC = 0 SCALE = .5 SKEW = 0 KURT = 2.

Figure  LOC = 0 SCALE = 2 SKEW = 0 KURT = 2.

We now have a characteristic function to a distribution whereby we have complete control over three of the first four moments of the distribution. Presently, the distribution is symmetric about the location. What we now need is to be able to incorporate a variable for skewness, the third moment of the distribution, into this function. To account for skewness, we must amend our function further. Our characteristic function has now evolved to:

Y = (1/(ABS((X-LOC)*SCALE)^KURT+1))^C

where,

C = The exponent for skewness, calculated as:
C = (1+(ABS(SKEW)^ABS( 1/(X-LOC))*sign(X)*- sign(SKEW)))^.5
Y = The ordinate of the characteristic function. X = The standard value amount.
LOC = A variable representing the location, the first moment of the distribution.
SCALE = A variable representing the scale, the second moment of the distribution.
SKEW = A variable representing the skewness, the third moment of the distribution.
KURT = A variable representing kurtosis, the fourth moment of the distribution.
sign() = The sign function, equal to 1 or -1. The sign of X is calculated as X/ABS(X) for X not equal to 0. If X is equal to zero, the sign should be regarded as positive.

Figure  LOC = 0 SCALE = 1 SKEW = -.5 KURT = 2.

Figure  LOC = 0 SCALE = 1 SKEW = +.5 KURT = 2.

A few important notes on the four parameters LOC, SCALE, SKEW, and KURT. With the exception of the variable LOCthe other three variables are nondimensional - that is, their values are pure numbers which have meaning only in a relative context, characterizing the shape of the distribution and are relevant only to this distribution. Furthermore, the parameter values are not the same values you would get if you employed any of the standard measuring techniques detailed in "Descriptive Measures of Distributions".

For instance, if you determined one of Pearson's coefficients of skewness on a set of data, it would not be the same value that you would use for the variable SKEW in the adjustable distributions here. The values for the four variables are unique to our distribution and have meaning only in a relative context. Also of importance is the range that the variables can take. The SCALE. variable must always be positive with no upper bound, and likewise with KURT. In application, though, you will generally use values between .5 and 3, and in extreme cases between .05 and 5. However, you can use values beyond these extremes, so long as they are greater than zero.

The LOC variable can be positive, negative, or zero. The SKEW parameter must be greater than or equal to -1 and less than or equal to +1. When SKEW equals +1, the entire right side of the distribution (right ofthe peak) is equal to the peak, and vice versa when SKEW equals -1. The ranges on the variables are summarized as:

-infinity<LOC<+infinity
SCALE>0
-1<=SKEW<=+1
KURT>0

FITTING THE PARAMETERS OF THE DISTRIBUTION

Just as with the process for finding our optimal f on the Normal Distribution, we must convert our raw trades data over to standard units. We do this by first subtracting the mean from each trade, then dividing by the population standard deviation. From this point forward, we will be working with the data in standard units rather than in its raw form. After we have our trades in standard values, we can sort them in ascending order. With our trades data arranged this way, we will be able to perform the K-S test on it.

Our objective now is to find what values for LOC, SCALE, SKEW, and KURT best fit our actual trades distribution. To determine this "best fit" we rely on the K-S test. We estimate the parameter values by employing the "twentieth-century brute force technique." We run every combination for KURT from 3 to .5 by -.1. We also run every combination for SCALE from 3 to .5 by -.1, For the time being we leave LOC and SKEW at 0. Thus, we are going to run the following combinations:
LOC      SCALE     SKEW           KURT

We perform the K-S test for each combination. The combination that results in the lowest K-S statistic we assume to be our optimal best-fitting Parameter values for SCALE and KURT. To perform the K-S test for each combination, we need both the actual distribution and the theoretical distributionWe already have seen how to construct the actual cumulative density as X/N, where N is the total number of trades and X is the ranking of a given trade. Now we need to calculate the CDF, for our theoretical distribution for the given LOC, SCALE, SKEW, and KURT parameter values we are presently looping through. We have the characteristic function for our adjustable distribution. To obtain a CDF from a distribution's characteristic function we must find the integral of the characteristic function.  We define the integral, the percentage of area under the characteristic function at point X, as N(X).

Thus, since Equation gives us the first derivative to the integral, we define Equation as N'(X). Often you may not be able to derive the integral of a function, even if you are proficient in calculus. Therefore, rather than determining the integral to Equation, we are going to rely on a different technique, one that, although a bit more labor intensive, is hardier than the technique of finding the integral. The respective probabilities can always be estimated for any point on the function's characteristic line by making the distribution be a series of many bars. Then, for any given bar on the distribution, you can calculate the probability associated at that bar by taking the sum of the areas of all those bars to the left of your bar, including your bar, and dividing it by the sum of the areas of all the bars in the distribution. The more bars you use, the more accurate your estimated probabilities will be.

If you could use an infinite number of bars, your estimate would be exact. We now discuss the procedure for finding the areas under our adjustable distribution by way of an example. Assume we wish to find probabilities associated with every .1 increment in standard values from -3 to +3 sigmas of our adjustable distribution. Notice that our table starts at -5 standard units and ends at +5 standard units, the reason being that you should begin and end 2 sigmas beyond the bounding parameters to get more accurate results. Therefore, we begin our table at -5 sigmas and end it at +5 sigmas. Notice that X represents the number of standard units that we are away from the mean. This is then followed by the four parameter values. The next column is the N'(X) column, the height of the curve at point X given these parameter values. N'(X) is calculated as equation.

Assume that we want to calculate N'(X) for X at -3, with the values for the parameters of .02, 2.76, 0, and 1.78 for LOC, SCALE, SKEW, and KURT respectively. First, we calculate the exponent of skewness, C in equation as:

C = (1+(ABS(SKEW)^ABS(1/(X-LOC))*sign(X)*- sign(SKEW)))^.5
= (1+(ABS(0)^ABS(l/(-3-.02))*-1*-1))^5
= (1+0)^.5 = 1

Thus, substituting 1 for C in Equation :

Y= (1/(ABS((X-LOC)*SCALE)^KUKT+1))^C
= (l/(ABS((-3-.02)*2.76)^1.78+1))^1
= (1/((3.02*2.76)^1.78+1))^1
= (1/(8.3352^1.78+1))^1
= (1/(43.57431058+1))^1
= (1/44.57431058)^1
= .02243444681^1
= .02243444681

Thus, at the point X = -3, the N'(X) value is .02243444681. (Notice that we calculate an N'(X) column, which corresponds to every value of X). The next step we must perform, the next column, is the running sum of the N'(X)'s as we advance up through the X's. This is straight forward enough. Now we calculate the N(X) column, the resultant probabilities associated with each value of X, for the given parameter values. To do this, we must perform equation :

N(C) = (∑[i = 1,C]N'(Xi)+∑[i = 1,C-1]N'(Xi))/2/ ∑[i = 1,M]N'(Xi)

where,

C = The current X value.
M = The total count of X values.

Equation says, literally, to add the running sum at the current value of X to the running sum at the previous value of X as we advance up through the X's. Now divide this sum by 2. Then take the new quotient and divide it by the last value in the column of the running sum of the N'(X)'s. This gives us the resultant probabilities for a given value of X, for given parameter values. Thus, for the value of -3 for X, the running sum of the N'(X)'s at -3 is .302225586, and the previous X, -3.1, has a running sum value of .2797911392. Summing these two running sums together gives us 5820167252. Dividing this by 2 gives us .2910083626. Then dividing this by the last value in the running sum column, the total of all of the N'(X)'s, 11.8535923812, gives us a quotient of .02455022522.

This is the associated probability, N(X), at the standard value of X = -3. Once we have constructed cumulative probabilities for each trade in the actual distribution and probabilities for each standard value increment in our adjustable distribution, we can perform the K-S test for the parameter values we are currently using. Before we do, however, we must make adjustments for a couple of other preliminary considerations. In the example of the table of cumulative probabilities shown earlier for our adjustable distribution, we calculated probabilities at every .1 increment in standard values. This was for the sake of simplicity. In practice, you can obtain a greater degree of accuracy by using a smaller step increment. I find that using .01 standard values is a good step increment.

A word on how to determine your bounding parameters in actual practice-that is, how many sigmas either side of the mean you should go in determining your probabilities for our adjustable distribution. In our example we were using 3 sigmas either side of the mean, but in reality you must use the absolute value of the farthest point from the mean. For our 232-trade example, the extreme left standard value is -2.96 standard units and the extreme right  is 6.935321 standard units. Since 6.93 is greater than ABS(-2.96), we must take the 6.935321. Now, we add at least 2 sigmas to this value, for the sake of accuracy, and construct probabilities for a distribution from -8.94 to +8.94 sigmas. Since we want a good deal of accuracy, we will use a step increment of .01. Therefore, we will figure probabilities for standard values of:

-8.94
-8.93
-8.92
-8.91
+8.94

Now, the last thing we must do before we can actually perform our K-S statistic is to round the actual standard values of the sorted trades to the nearest .01. For example, the value 6.935321 will not have a corresponding theoretical probability associated with it, since it is in between the step values 6.93 and 6.94. Since 6.94 is closer to 6.935321, we round 6.935321 to 6.94. Before we can begin the procedure of optimizing our adjustable distribution parameters to the actual distribution by employing the K-S test, we must round our actual sorted standardized trades to the nearest step increment.

In lieu of rounding the standard values of the trades to the nearest Xth decimal place you can use linear interpolation on your table of cumulative probabilities to derive probabilities corresponding to the actual standard values of the trades. For more on linear interpolation, consult a good statistics book, such as some of the ones suggested in the bibliography or Commodity Market Money Management by Fred Gehm. Thus far, we have been optimizing only for the best-fitting KURT and SCALE values. Logically, it would seem that if we standardized our data, as we have, then the LOC parameter should be kept at 0 and the SCALE parameter should be kept at 1. This is not necessarily true, as the true location of the distribution may not be the arithmetic mean, and the true optimal value for scale may not be at 1.

The KURT and SCALE values have a very strong relationship to one another.  Thus, we first try to isolate the -"neighborhood" of best-fitting parameter values for KURT and SCALE.  For our 232 trades this occurs at SCALE equal to 2.7 and KURT equal to 1.9. Now we progressively try to zero in on the best-fitting parameter values. This is a computer-time-intensive process. We run our next pass through, cycling the LOC parameter from .1 to -.1 by -.05, the SCALE parameter from 2.6 to 2.8 by .05, the SKEW parameter from .1 to -.1 by -.05, and the KURT parameter from 1.86 to 1.92 by .02. The results of this cycle through give the optimal at LOC = 0, SCALE = 2.8, SKEW = 0, and KURT = 1.86. Thus we perform a third cycle through.

This time we run LOC from .04 to -.04 by -.02, SCALE from 2.76 to 2.82 by .02, SKEW from .04 to -.04 by -.02, and KURT from 1.8 to 1.9 by .02. The results of the third cycle through show optimal values at LOC = .02, SCALE = 2.76, SKEW = 0, and KURT = 1.8. Now we have zeroed right in on the optimal neighborhood, the areas where the parameters make for the best fit of our adjustable characteristic function to the actual data. For our last cycle through we are going to run LOC from 0 to .03 by .01, SCALE from 2.76 to 2.73 by -.01, SKEW from ,01 to -.01 by -.01, and KURT from 1.8 to 1.75 by -.01. The results of this final pass show optimal parameters for our 232 trades at LOC = .02, SCALE = 2.76, SKEW = 0, and KURT = 1.78.

Share:

## Parametric Optimal f on the Normal Distribution- FURTHER DERIVATIVES OF THE NORMAL

Parametric Optimal f on the Normal Distribution

FURTHER DERIVATIVES OF THE NORMAL

Sometimes you may want to know the second derivative of the N(Z) function. Since the N(Z) function gives us the area under the curve at Z, and the N'(Z) function gives us the height of the curve itself at Z, then the N"(Z) function gives us the instantaneous slope of the curve at a given Z:

N"(Z) = -Z/2.506628274*EXP(-(Z^2/2)

where,

EXP() = The exponential function.

To determine what the slope of the N'(Z) curve is at +2 standard units:

N"(Z) = -2/2.506628274*EXP(-(+2^2)/2)
= -212.506628274*EXP(-2)
= -2/2.506628274*.1353353
= -.1079968336

Therefore, we can state that the instantaneous rate of change in the N'(Z) function when Z = +2 is -.1079968336. This represents rise/run, so we can say that when Z = +2, the N'(Z) curve is rising -.1079968336 for ever) 1 unit run in Z.

Figure  N"(Z) giving the slope of the line tangent tangent to N'(Z) at Z = +2.

For the reader's own reference, further derivatives are now given. These will not be needed throughout the remainder of this text, but arc provided for the sake of completeness:

N'"(Z) = (Z^2-1)/2.506628274*EXP(-(Z^2)/2)
N""(Z) = ((3*Z)-Z^3)/2.506628274*EXP(-(Z^2)/2)
N'""(Z) = (Z^4-(6*Z^2)+3)/2.506628274*EXP(-(Z^2)/2)

As a final note regarding the Normal Distribution, you should be aware that the distribution is nowhere near as “peaked” as the graphic. The real shape of the Normal Distribution is depicted.

Figure  The real shape of the Normal Distribution.

Notice that here the scales of the two axes are the same, whereas in the other graphic examples they differ so as to exaggerate the shape of the distribution.

THE LOGNORMAL DISTRIBUTION

Many of the real-world applications in trading require a small but crucial modification to the Normal Distribution. This modification takes the Normal, and changes it to what is known as the Lognormal Distribution. Consider that the price of any freely traded item has zero as a lower limit.2 Therefore, as the price of an item drops and approaches zero, it should in theory become progressively more difficult for the item to get lower. For example, consider the price of a hypothetical stock at \$10 per share.

If the stock were to drop \$5, to \$5 per share, a 50% loss, then according to the Normal Distribution it could just as easily drop from \$5 to \$0. However, under the Lognormal, a similar drop of 50% from a price of \$5 per share to \$2.50 per share would be about as probable as a drop from \$10 to \$5 per share. The Lognormal Distribution, works exactly like the Normal Distribution except that with the Lognormal we are dealing with percentage changes rather than absolute changes.

Figure  The Normal and Lognormal distributions.

Consider now the upside. According to the Lognormal, a move from \$10 per share to \$20 per share is about as likely as a move from \$5 to \$10 per share, as both moves represent a 100% gain. That isn't to say that we won't be using the Normal Distribution. The purpose here is to introduce you to the Lognormal, show you its relationship to the Normal , and point out that it usually is used when talking about price moves, or anytime that the Normal would apply but be bounded on the low end at zero.2

To use the Lognormal distribution, you simply convert the data you are working with to natural logarithms.3 Now the converted data will be Normally distributed if the raw data was Lognormally distributed. For instance, if we are discussing the distribution of price changes as being Lognormal, we can use the Normal distribution on it. First, we must divide each closing price by the previous closing price. Suppose in this instance we are looking at the distribution of monthly closing prices. Suppose we now see \$10, \$5, \$10, \$10, then \$20 per share as our first five months closing prices.

This would then equate to a loss of 50% going into the second month, a gain of 100% going into the third month, a gain of 0% going into the fourth month, and another gain of 100% into the fifth month. Respectively then, we have quotients of .5, 2, 1, and 2 for the monthly price changes of months 2 through 5. These are the same as HPRs from one month to the next in succession. We must now convert to natural logarithms in order to study their distribution under the math for the Normal Distribution. Thus, the natural log of .5 is .6931473, of 2 it is .6931471, and of 1 it is 0. We are now able to apply the mathematics pertaining to the Normal distribution to this converted data.

THE PARAMETRIC OPTIMAL F

Now that we have studied the mathematics of the Normal and Lognormal distributions, we will see how to determine an optimal f based on outcomes that are Normally distributed. The Kelly formula is an example of a parametric optimal f in that the optimal f returned is a function of two parameters. In the Kelly formula the input parameters are the percentage of winning bets and the payoff ratio. However, the Kelly formula only gives you the optimal f when the possible outcomes have a Bernoulli distribution. In other words, the Kelly formula will only give the correct optimal f when there are only two possible outcomes. When the outcomes do not have a Bernoulli distribution, such as Normally distributed outcomes, the Kelly formula will not give you the correct optimal f.4

When they are applicable, parametric techniques are far more powerful than their empirical counterparts. Assume we have a situation that can be described completely by the Bernoulli distribution. We can derive our optimal f here by way of either the Kelly formula or the empirical technique detailed in Portfolio Management Formulas. Suppose in this instance we win 60% of the time. Say we are tossing a coin that is biased, that we know that in the long run 60% of the tosses will be heads. We are therefore going to bet that each toss will be heads, and the payoff is 1:1. The Kelly formula would tell us to bet a fraction of .2 of our stake on the next bet. Further suppose that of the last 20 tosses, 11 were heads and 9 were tails.

If we were to use these last 20 trades as the input into the empirical techniques, the result would be that we should risk .1 of our stake on the next bet. Which is correct, the .2 returned by the parametric technique or the .1 returned empirically by the last 20 tosses? The correct answer is .2, the answer returned from the parametric technique. The reason is that the next toss has a 60% probability of being heads, not a 55% probability as the last 20 tosses would indicate. Although we are only discussing a 5% probability difference, 1 toss in 20, the effect on how much we should bet is dramatic. Generally, the parametric techniques are inherently more accurate in this regard than are their empirical counterparts. This is the first advantage of the parametric to the empirical.

This is also a critical proviso-that we must know what the distribution of outcomes is in the long run in order to use the parametric techniques. This is the biggest drawback to using the parametric techniques. The second advantage is that the empirical technique requires a past history of outcomes whereas the parametric does not. Further, this past history needs to be rather extensive. In the example just cited, we can assume that if we had a history of 50 tosses we would have arrived at an empirical optimal f closer to .2. With a history of 1,000 tosses, it would be even closer according to the law of averages. The fact that the empirical techniques require a rather lengthy stream of past data has almost restricted them to mechanical trading systems.

Someone trading anything other than a mechanical trading system, be it by Elliott Wave or fundamentals, has almost been shut out from using the optimal f technique. With the parametric techniques this is no longer true. Someone who wishes to blindly follow some market guru, for instance, now has a way to employ the power of optimal f. Therein lies the third advantage of the parametric technique over the empirical-it can be used by any trader in any market. There is a big assumption here, however, for someone not employing a mechanical trading system. The assumption is that the future distribution of profits and losses will resemble the distribution in the past. This may be less likely than with a mechanical system. This also sheds new light on the expected performance of any technique that is not purely mechanical.

Even the best practitioners of such techniques, be it by fundamentals, Gann, Elliott Wave, and so on, are doomed to fail if they are too far beyond the peak of the f curve. If they are too far to the left of the peak, they are going to end up with geometrically lower profits than their expertise in their area should have made for them. Furthermore, practitioners of techniques that are not purely mechanical must realize that everything said about optimal f and the purely mechanical techniques applies. This should be considered when contemplating expected drawdowns of such techniques. Remember that the drawdowns Will be substantial, and this fact does not mean that the technique should be abandoned. The fourth and perhaps the biggest advantage of the parametric over the empirical method of determining optimal f, is that the parametric method allows you to do 'What if' types of modeling.

For example, suppose you are trading a market system that has been running very hot. You want to be prepared for when that market system stops performing so well, as you know it Inevitably will. With the parametric techniques, you can vary your input parameters to reflect this and thereby put yourself at what the optimal f will be when the market system cools down to the state that the parameters you Input reflect. The parametric techniques are therefore far more powerful than the empirical ones. So why use the empirical techniques at all? The empirical techniques are more intuitively obvious than the parametric ones are. Hence, the empirical techniques are what one should learn first before moving on to the parametric. We have now covered the empirical techniques in detail and are therefore prepared to study the parametric techniques.

Consider the following sequence of 232 trade profits and losses in points. It doesn't matter what the commodity is or what system generated this stream-it could be any system on any market.

If we wanted to determine an equalized parametric optimal f we would now convert these trade profits and losses to percentage gains and losses. Next, we would convert these percentage profits and losses by multiplying them by the current price of the underlying instrument. For example, P&L #1 is .18. Suppose that the entry price to this trade was 100.50. Thus, the percentage gain on this trade would be .18/100.50 = .001791044776. Now suppose that the current price of this underlying instrument is 112.00.

Multiplying .001791044776 by 112.00 translates into an equalized P&L of .2005970149, If we were seeking to do this procedure on an equalized basis, we would perform this operation on all 232 trade profits and losses. Whether or not we are going to perform our calculations on an equalized basis, we must now calculate the mean (arithmetic) and population standard deviation of these 232 individual trade profits and losses as .330129 and 1.743232 respectively. With these two numbers we can use equation to translate each individual trade profit and loss into standard units.

Z = (X-U)/S

where,

U = The mean of the data.
S = The standard deviation of the data.
X = The observed data point.

Thus, to translate trade #1, a profit of .18, to standard units:

Z = (.18-.330129)/1.743232 = -.150129/1.743232 = -.08612106708

Likewise, the next three trades of -1.11, .42, and -.83 translate into -.8261258398, .05155423948, and -.6655046488 standard units respectively. If we are using equalized data, we simply standardize by subtracting the mean of the data and dividing by the data's standard deviation. Once we have converted all of our individual trade profits and losses over to standard units, we can bin the now standardized data. Recall that with binning there is a loss of information content about a particular distribution but the character of the distribution remains unchanged. Suppose we were to now take these 232 individual trades and place them into 10 bins.

We are choosing arbitrarily here-we could have chosen 9 bins or 50 bins. In fact, one of the big arguments about binning data is that most frequently there is considerable arbitrariness as to how the bins should be chosen. Whenever we bin something, we must decide on the ranges of the bins. We will therefore select a range of -2 to +2 sigmas, or standard deviations. This means we will have 10 equally spaced bins between -2 standard units to +2 standard units. Since there are 4 standard units in total between -2 and +2 standard units and we are dividing this space into 10 equal regions, we have 4/10 = -4 standard units as the size or "width" of each bin.

Therefore, our first bin, the one "farthest to then left," will contain those trades that were within -2 to -1.6 standard units, the next one trades from -1.6 to -1.2, then -1.2 to -.8, and so on, until our final bin contains those trades that were 1.6 to 2 standard units. Those trades that are less than -2 standard units or greater than +2 standard units will not be binned in this exercise, and we will ignore them. If we so desired, we could have included them in the extreme bins, placing those data points less than -2 in the -2 to -1.6 bin, and likewise for those data points greater than 2. Of course, we could have chosen a wider range for binning, but since these trades are beyond the range of our bins, we have chosen not to include them.

In other words, we are eliminating from this exercise those trades with P&L's less than .330129- (1.743232*2) = -3.156335 or greater than .330129+(1.743232*2) = 3.816593. What we have created now is a distribution of this system's trade P&L's. Our distribution contains 10 data points because we chose to work with 10 bins. Each data point represents the number of trades that fell into that bin. Each trade could not fall into more than 1 bin, and if the trade was beyond 2 standard units either side of the mean (P&L's<-3.156335 or >3.816593), then it is not represented in this distribution.

Figure  232 individual trades in 10 bins from -2 to +2 sigma versus the Normal Distribution.

"Wait a minute," you say. "Shouldn't the distribution of a trading system's P&L's be skewed to the right because we are probably going to have a few large profits?" This particular distribution of 232 trade P&L's happens to be from a system that very often takes small profits via a target. Many people have the mistaken impression that P&L distributions are going to be skewed to the right for all trading systems. Different market systems will have different distributions, and you shouldn't expect them all to be the same.

Also in Figure, superimposed over the distribution we have just put together, is the Normal Distribution as it would look for 232 trade P&L's if they were Normally distributed. This was done so that you can compare, graphically, the trade P&L's as we have just calculated them to the Normal. The Normal Distribution here is calculated by first taking the boundaries of each bin. For the leftmost bin in our example this would be Z = -2 and Z = -1.6. Now we run these Z values through equation to convert these boundaries to a cumulative probability. In our example, this corresponds to .02275 for Z = -2 and .05479932 for Z = -1.6.

Next, we take the absolute value of the difference between these two values, which gives us ABS(.02275-.05479932) = .03204932 for our example. Last, we multiply this answer by the number of data points, which in this case is 232 because there are 232 total trades. Therefore, we can state that if the data were Normally distributed and placed into 10 bins of equal width between -2 and +2 sigmas, then the leftmost bin would contain .03204932*232 = 7.43544224 elements. If we were to calculate this for each of the 10 bins, we would calculate the Normal curve superimposed.

FINDING OPTIMAL F ON THE NORMAL DISTRIBUTION

Now we can construct a technique for finding the optimal f on Normally distributed data. Like the Kelly formula, this will be a parametric technique. However, this technique is far more powerful than the Kelly formula, because the Kelly formula allows for only two possible outcomes for an event whereas this technique allows for the full spectrum of the outcomes. The beauty of Normally distributed outcomes is that they can be described by 2 parameters. The Kelly formulas will give you the optimal f for Bernoulli distributed outcomes by inputting the 2 parameters of the payoff ratio and the probability of winning. The technique about to be described likewise only needs two parameters as input, the average and the standard deviation of the outcomes, to return the optimal f. Recall that the Normal Distribution is a continuous distribution, In order to use this technique we need to make this distribution be discrete.

Further recall that the Normal Distribution is unbounded. That is, the distribution runs from minus infinity on the left to plus infinity on the right. Therefore, the first two steps that we must take to find the optimal f on Normally distributed data is that we must determine (1) at how many sigmas from the mean of the distribution we truncate the distribution, and (2) into how many equally spaced data points will we divide the range between the two extremes determined in (1). For instance, we know that 99.73% of all the data points will fall between plus and minus 3 sigmas of the mean, so we might decide to use 3 sigmas as our parameter for (1). In other words, we are deciding to consider the Normal Distribution only between minus 3 sigmas and plus 3 sigmas of the mean. In so doing, we will encompass 99.73% of all of the activity under the Normal Distribution. Generally we will want to use a value of 3 to 5 sigmas for this parameter.

Regarding step (2), the number of equally spaced data points, we will generally want to use a bare minimum of ten times the number of sigmas we are using in (1).  If we select 3 sigmas for (1), then we should select at least 30 equally spaced data points for (2). This means that we are going to take the horizontal axis of the Normal Distribution, of which we are using the area from minus 3 sigmas to plus 3 sigmas from the mean, and divide that into 30 equally spaced points. Since there are 6 sigmas between minus 3 sigmas and plus 3 sigmas, and we want to divide this into 30 equally spaced points, we must divide 6 by 30-1, or 29. This gives us .2068965517. So, our first data point will be minus 3, and we will add .2068965517 to each previous point until we reach plus 3, at which point we will have created 30 equally spaced data points between minus 3 and plus 3. Therefore, our second data point will be -3+.2068965517 = -2.793103448, our third data point 2.79310344+.2068965517 = -2.586206896, and so on.

In so doing, we will have determined the 30 horizontal input coordinates to this system. The more data points you decide on, the better will be the resolution of the Normal curve. Using ten times the number of sigmas is a rough rule for determining the bare minimum number of data points you should use. Recall that the Normal distribution is a continuous distribution. However, we must make it discrete in order to find the optimal f on it. The greater the number of equally spaced data points we use, the closer our discrete model will be to the actual continuous distribution itself, with the limit of the number of equally spaced data points approaching infinity where the discrete model approaches the continuous exactly. Why not use an extremely large number of data points? The more data points you use in the Normal curve, the more calculations will be required to find the optimal f on it. Even though you will usually be using a computer to solve for the optimal f, it will still be slower the more data points you use.

Further, each data point added resolves the curve further to a lesser degree than the previous data point did. We will refer to these first two input parameters as the bounding parameters. Now, the third and fourth steps are to determine the arithmetic average trade and the population standard deviation for the market system we are working on. If you do not have a mechanical system, you can get these numbers from your brokerage statements or you can estimate them. That is the one of the real benefits of this technique-that you don't need to have a mechanical system, you don't even need brokerage statements or paper trading results to use this technique. The technique can be used by simply estimating these two inputs, the arithmetic mean average trade and the population standard deviation of trades. Be forewarned, though, that your results will only be as accurate as your estimates.

If you are having difficulty estimating your population standard deviation, then simply try to estimate by how much, on average, a trade will differ from the average trade. By estimating the mean absolute deviation in this way, you can use equation to convert your estimated mean absolute deviation into an estimated standard deviation:

S = M*1/.7978845609 = M*1.253314137

where,

S = The standard deviation.
M = The mean absolute deviation.

We will refer to these two parameters, the arithmetic mean average trade and the standard deviation of the trades, as the actual input parameters. Now we want to take all of the equally spaced data points from step (2) and find their corresponding price values, based on the arithmetic mean and standard deviation. Recall that our equally spaced data points are expressed in terms of standard units. Now for each of these equally spaced data points we will find the corresponding price as:

D = U+(S*E)

where,

D = The price value corresponding to a standard unit value.
E = The standard unit value.
S = The population standard deviation.
U = The arithmetic mean.

Once we have determined all of the price values corresponding to each data point we have truly accomplished a great deal. We have now constructed the distribution that we expect the future data points to tend to.

However, this technique allows us to do a lot more than that. We can incorporate two more parameters that will allow us to perform "What if ' types of scenarios about the future. These parameters, which we will call' the "What if" parameters, allow us to see the effect of a change in our average trade or a change in the dispersion of our trades. The first of these parameters, called shrink, affects the average trade. Shrink is simply a multiplier on our average trade. Recall that when we find the optimal f we also obtain other calculations, which are useful by-products of the optimal f. Such calculations include the geometric mean, TWR, and geometric average trade. Shrink is the factor by which we will multiply our average trade before we perform the optimal f technique on it. Hence, shrink lets us see what the optimal f would be if our average trade were affected by shrink as well as how the other by-product calculations would be affected.

For example, suppose you are trading a system that has been running very hot lately. You know from past experience that the system is likely to stop performing so well in the future. You would like to see what would happen if the average trade were cut in half. By using a shrink value of .5 you can perform the optimal f technique to determine what your optimal f should be if the average trade were to be cut in half. Further, you can see how such changes affect your geometric average trade, and so on. By using a shrink value of 2, you can also see the affect that a doubling of your average trade would have. In other words, the shrink parameter can also be used to increase (unshrink?) your average trade. What's more, it lets you take an unprofitable system, and, by using a negative value for shrink, see what would happen if that system became profitable.

For example, suppose you have a system that shows an average trade of -\$100. If you use a shrink value of -.5, this will give you your optimal f for this distribution as if the average trade were \$50, since -100*-.5 = 50. If we used a shrink factor of -2, we would obtain the distribution centered about an average trade of \$200. You must be careful in using these "What if" parameters, for they make it easy to mismanage performance. Mention was just made of how you can turn a system with a negative arithmetic average trade into a positive one. This can lead to problems if, for instance, in the future, you still have a negative expectation. The other "What if" parameter is one called stretch. This is not, as its name would imply, the opposite of shrink. Rather, stretch is the multiplier to be used on the standard deviation.

You can use this parameter to determine the effect on f and its by-products by an increase or decrease in the dispersion. Also, unlike shrink, stretch must always be a positive number, whereas shrink can be positive or negative. If you want to see what will happen if your standard deviation doubles, simply use a value of 2 for stretch. To see what Would happen if the dispersion quieted down, use a value less than 1. You will notice in using this technique that lowering the stretch toward zero will tend to increase the by-product calculations, resulting in a more optimistic assessment of the future and vice versa. Shrink works in an opposite fashion, as lowering the shrink towards zero will result in more pessimistic assessments about the future and vice versa.

Once we have determined what values we want to use for stretch and shrink (and for the time being we will use values of 1 for both, which means to leave the actual parameters unaffected) we can amend equation to:

D = (U*Shrink)+(S*E*Stretch)

where,

D = The price value corresponding to a standard unit value.
E = The standard unit value.
S = The population standard deviation.
U = The arithmetic mean.

To summarize thus far, the first two steps are to determine the bounding parameters of the number of sigmas either side of the mean we are going to use, as well as how many equally spaced data points we are going to use within this range. The next two steps are the actual input parameters of the arithmetic average trade and population standard deviation. We can derive these parameters empirically by looking at the results of a given trading system or by using brokerage statements or paper trading results.

We can also derive these figures by estimation, but remember that the results obtained will only be as accurate as your estimates. The fifth and sixth steps are to determine the factors to use for stretch and shrink if you are going to perform a "What if type of scenario. If you are not, simply use values of 1 for both stretch and shrink. Once you have completed these six steps, you can now use equation to perform the seventh step. The seventh step is to convert the equally spaced data points from standard values to an actual amount of either points or dollars.

Now the eighth step is to find the associated probability with each of the equally spaced data points. This probability is determined by using equation :

N(Z)=1-N'(Z)*((1.330274429*Y^5)-(1.821255978*Y^4)+(1.781477937*Y^3)- (.356563782*Y^2)+(.31938153*Y))

If Z<0 then N(Z) = 1-N(Z)

where,

Y = 1/(1+.2316419*ABS(Z))
ABS() = The absolute value function.
N'(Z) = .398942*EXP(-(Z^2/2))
EXP() = The exponential function.

However, we will use Equation without its 1-as the first term in the equation and without the -Z provision (i.e., without the "If Z<0 then N(Z)-1-N(Z)"), since we want to know what the probabilities are for an event equaling or exceeding a prescribed amount of standard units. So we go along through each of our equally spaced data points. Each point has a standard value, which we will use as the Z parameter in equation, and a dollar or point amount. Now there will be another variable corresponding to each equally spaced data point-the associated probability.

THE MECHANICS OF THE PROCEDURE

The procedure will now be demonstrated on the trading example introduced earlier in this chapter. Since our 232 trades are currently in points, we should convert them to their dollar representations. However, since the market is a not specified, we will assign an arbitrary value of \$1,000 per point. Thus, the average trade of .330129 now becomes .330129*\$1000, or an average trade of \$330.13. Likewise the population standard deviation of 1.743232 is also multiplied by \$1,000 per point to give \$1,743.23. Now we construct the matrix. First, we must determine the range, in sigmas from the mean, that we want our calculations to encompass. For our example we will choose 3 sigmas, so our range will go from minus 3 sigmas to plus 3 sigmas.

Note that you should use the same amount to the left of the mean that you use to the right of the mean. That is, if you go 3 sigmas to the left then you should not go only 2 or 4 sigmas to the right, but rather you should go 3 sigmas to the right as well. Next we must determine how many equally spaced data points to divide this range into. Choosing 61 as our value gives a data point at every tenth of a standard unit-simple. Thus we can determine our column of standard values. Now we must determine the arithmetic mean that we are going to use as input. We determine this empirically from the 232 trades as \$330.13. Further, we must determine the population standard deviation, which we also determine empirically from the 232 trades as \$1,743.23.

Now to determine the column of associated P&L's. That is, we must determine a P&L amount for each standard value. Before we can determine our associated P&L column, we must decide on values for stretch and shrink. Since we are not going to perform any "What if types of scenarios at this time, we will choose a value of 1 for both stretch and shrink.

Arithmetic mean = 330.13
Population Standard Deviation = 1743.23
Stretch = 1
Shrink = 1

Using equation we can calculate our associated P&L column. We do this by taking each standard value and using it as E in Equation to get the column of associated P&L's:

D = (U*Shrink)+(S*E*Stretch)

where,

D = The price value corresponding to a standard unit value.
E = The standard unit value.
S = The population standard deviation.
U = The arithmetic mean.

For the -3 standard value, the associated P&L is:

D = (U*Shrink)+(S*E*Stretch)
= (330.129*1)+(1743.232*(-3)*1)
= 330.129+(-5229.696)
= 330.129-5229.696
= 4899.567

Thus, our associated P&L column at a standard value of -3 equals 4899.567. We now want to construct the associated P&L for the next standard value, which is -2.9, so we simply perform the same Equation, again-only this time we use a value of -2.9 for E. Now to determine the associated probability column. This is calculated using the standard value column as the Z input to Equation without the preceding 1-and without the-Z provision (i.e, the "If Z < 0 then N(Z) = 1-N(Z)"). For the standard value of -3 (Z = -3), this is:

N(Z)=N'(Z)*((1.330274429*Y^5)-(1.821255978*Y^4)+(1.781477937*Y^3)- (.356563782*Y^2+(.31938153*Y))

If Z<0 then N(Z) = 1-N(Z)

where,

Y = 1/(1+.2316419*ABS(Z))
ABS() = The absolute value function.
N'(Z) = .398942*EXP(-(Z^2/2))
EXP() = The exponential function.

Thus:

N'(3) = .398942*EXP(-((-3)^2/2))
= .398942*EXP(-(9/2))
= .398942*EXP(-4.5)
= .398942*.011109
= .004431846678

Y = 1/(1+2316419*ABS(-3))
= 1/(1+2316419*3)
= 1/(1+6949257)
= 1/1.6949257
= .5899963639

N(-3) = .004431846678*((1.330274429*.5899963639^5)-(1.821255978*.5899963639^4)+(1.781477937*.5899963639^3)-(.356563782*.5899963639^2)+(.31938153*.5899963639))
=.004431846678*((1.330274429*.07149022693)-(1.821255978*.1211706)+(1.781477937*.2053752)-(.356563782*.3480957094)+(.31938153*.5899963639))
= .004431846678*(.09510162081-.2206826796+.3658713876-.1241183226+.1884339414)
= .004431846678*.3046059476 = .001349966857

Note that even though Z is negative (Z = -3), we do not adjust N(Z) here by making N(Z) = 1-N(Z). Since we are not using the-Z provision, we just let the answer be.

Now for each value in the standard value column there will be a corresponding entry in the associated P&L column and in the associated probability column. Once you have these three columns established you are ready to begin the search for the optimal f and its by-products.

By-products atf-.01:

TWR = 1.0053555695
Sum of the probabilities = 7.9791232176
Geomean = 1.0006696309
GAT = \$328.09

Here is how you go about finding the optimal f. First, you must determine the search method for f. You can simply loop from 0 to 1 by a predetermined amount (e.g., .01), use an iterative technique, or use the technique of parabolic interpolation described in Portfolio Management formulas. What you seek to find is what value for f will result in the highest geometric mean. Once you have decided upon a search technique, you must determine what the worst-case associated P&L is in your table. In our example it is the P&L corresponding to -3 standard units, 4899.57. You will need to use this particular value repeatedly throughout the calculations.

In order to find the geometric mean for a given f value, for each value of f that you are going to process in your search for the optimal, you must convert each associated P&L and probability to an HPR.

Equation shows the calculation for the HPR:

HPR = (1+(L/(W/(-f))))^P

where,

L = The associated P&L.
W = The worst-case associated P&L in the table.
f = The tested value for f.
P = The associated probability.

Working through an example now where we use the value of .01 for the tested value for f, we will find the associated HPR at the standard value of -3. Here, our worst-case associated P&L is 4899.57, as is our associated P&L. Therefore, our HPR here is:

HPR = (1+(-4899.57/-4899.57/(-.01))))^.001349966857
= (1+(-4899.57/489957))^.001349966857
= (1+(-.01))^.001349966857
= .99^.001349966857
= .9999864325

Now we move down to our next standard value, of -2.9, where we have an associated P&L of -2866.72 and an associated probability of 0.001865. Our associated HPR here will be:

HPR = (-4725.24/(-4899.57/(-.01))))^.001866
= (1+(-4725.24/489957))^001866
= (1+(-4725.24/489957))^.001866
= (1+(-.009644193266))^.001866
= .990355807^.001866
= .9999819

Once we have calculated an associated HPR for each standard value for a given test value off, you are ready to calculate the TWR. The TWR is simply the product of all of the HPRs for a given f value multiplied together:

TRW = (∏[i = 1,N]HPRi)

where,

N = The total number of equally spaced data points.
HPRi = The HPR corresponding to the i'th data point, given by equation.

So for our test value off = .01, the TWR will be:

TWR = .9999864325*.9999819179*...*1.0000152327
= 1.0053555695

We can readily convert a TWR into a geometric mean by taking the TWR to the power of 1 divided by the sum of all of the associated probabilities.

G = TRW^(1/∑[i = 1,N] Pi)

where,

N = The number of equally spaced data points.
Pi = The associated probability of the ith data point.

Note that if we sum the column that lists the 61 associated probabilities it equals 7.979105. Therefore, our geometric mean at f = .01 is:

G = 1.0053555695^(1/7.979105)
= 1.0053555695^.1253273393
= 1.00066963

We can also calculate the geometric average trade (GAT). This is the amount you would have made, on average per contract per trade, if you were trading this distribution of outcomes at a specified f value.

GAT = (G(f)-1)*(w/(-f))

where,

G(f) = The geometric mean for a given f value.
f = The given f value.
W = The worst-case associated P&L.

In the case of our example, the f value is .01:

GAT = (1.00066963-1)*(-4899.57/(-.01))
= .00066963*489957
= 328.09

Therefore, we would expect to make, on average per contract per trade, \$328.09. Now we go to our next value for f that must be tested according to our chosen search procedure for the optimal f In the case of our example we are looping from 0 to 1 by .01 for f, so our next test value for f is .02. We will do the same thing again. We will calculate a new associated HPRs column, and calculate our TWR and geometric mean. The f value that results in the highest geometric mean is that value for f which is the optimal based on the input parameters we have used.

In our example, if we were to continue with our search for the optimal f, we would find the optimal at f = .744. This results in a geometric mean of 1.0265. Therefore, the corresponding geometric average trade is \$174.45. It is important to note that the TWR itself doesn't have any real meaning as a by-product. Rather, when we are calculating our geometric mean parametrically, as we are here, the TWR is simply an interim step in obtaining that geometric mean. Now, we can figure what our TWR would be after X trades by taking the geometric mean to the power of X.

Therefore, if we want to calculate our TWR for 232 trades at a geometric mean of 1.0265, we would raise 1.0265 to the power of 232, obtaining 431.79. So we can state that trading at an optimal f of .744, we would expect to make 43,079% ((431.79-1)*100) on our stake after 232 trades.

Another by-product we will calculate is our threshold to geometric equation :

Threshold to geometric = 330.13/174.45*-4899.57/-.744
= 12,462.32

Notice that the arithmetic average trade of \$330.13 is not something that we have calculated with this technique, rather it is a given as it is one of the input parameters. We can now convert our optimal f into how many contracts to trade by the equations:

K = E/Q

where,

K = The number of contracts to trade.
E = The current account equity.
Q = W/( -f)

where,

W = The worst-case associated P&L.
f = The optimal f value.

Note that this variable, Q, represents a number that you can divide your account equity by as your equity changes on a day-by-day basis to know how many contracts to trade. Returning now to our example:

Q = -4,899.57/-.744
= \$6,585.44

Therefore, we will trade 1 contract for every \$6,585.44 in account equity. For a \$25,000 account this means we would trade:

K = 25000/6585.44
= 3.796253553

Since we cannot trade in fractional contracts, we must round this figure of 3.796253553 down to the nearest integer. We would therefore trade 3 contracts for a \$25,000 account. The reason we always round down rather than up is that the price extracted for being slightly below optimal is less than the price for being slightly beyond it.

Notice how sensitive the optimal number of contracts to trade is to the worst loss. This worst loss is solely a function of how many sigmas you have decided to go to the left of the mean. This bounding parameter, the range of sigmas, is very important in this calculation. We have chosen three sigmas in our calculation. This means that we are, in effect, budgeted for a three-Sigma loss. However, a loss greater than three sigmas can really hurt us, depending on how far beyond three sigmas it is. Therefore, you should be very careful what value you choose for this range bounding parameter. You'll have a lot riding on it.

Notice that for the sake of simplicity in illustration, we have not deducted commissions and slippage from these figures. If you wanted to incorporate commissions and slippage, you should deduct X dollars in commissions and slippage from each of the 232 trades at the outset of this exercise. You would calculate your arithmetic average trade and population standard deviation from this set of 232 adjusted trades, and then perform the exercise exactly as described. We could now go back and perform a "What if type of scenario here. Suppose we want to see what will happen if the system begins to perform at only half the profitability it is now (shrink = .5).

Further, assume that the market that the system we are looking at is in gets very volatile, and that as a consequence the dispersion among the trades increases by 60% (stretch = 1.6). By pumping these parameters through this system we can see what the optimal will be so that we can make adjustments to our trading before these changes become history. In so doing we find that the optimal f now becomes ,262, or to trade 1 contract for every \$31,305.92 in account equity. This is quite a change. This means that if these changes in the market system start to materialize, we are going to have to do some altering in our money management regarding that system.

The geometric mean will drop to 1.0027, the geometric average trade will be cut to \$83.02, and the TWR over 232 trades will be 1.869. This is not even close to what it presently would be. All of this is predicated upon a 50% decrease in average trade and a 60% increase in standard deviation. This quite possibly could happen. It is also quite possible that the future could work out more favorably than the past. We can test this out, too. Suppose we want to see what will happen if our average profit increases by only 10%. We can check this by inputting a shrink value of 1.1. These “What if” parameters, stretch and shrink, really give us a great deal of power in our money management.

The closer your distribution of trade P&L's is to Normal to begin with, the better the technique will work for you. The problem with almost any money management technique is that there is a certain amount of "slop" involved. Here, we can define slop as the difference between the Normal Distribution and the distribution we are actually using. The difference between the two is slop, and the more slop there is, the less effective the technique becomes. To illustrate, recall that using this method we have determined that to trade 1 contract for every \$6,585.44 in account equity is optimal. However, if we were to go over these trades and find our optimal f empirically, we would find that the optimal is to trade 1 contract for every \$7,918.04 in account equity.

As you can see, using the Normal Distribution technique here would have us slightly to the right of the f curve, trading slightly more contracts than the empirical would suggest. However, as we shall see, there is a lot to be said for expecting the future distribution of prices to be Normally distributed. When someone buys or sells an option, the assumption that the future distribution of the log of price changes in the underlying instrument will be Normal is built into the price of the option. Along this same line of reasoning, someone who is entering a trade in a market and is not using a mechanical system can be said to be looking at the same possible future distribution.

The technique was shown using data that was not equalized. We can also use this very same technique on equalized data by incorporating the following changes:

Before the data is standardized, it should be equalized by first converting all of the trade profits and losses to percentage profits and losses. Then these percentage profits and losses should be translated into percentages of the current price by simply multiplying them by the current price.

1. When you go to standardize this data, standardize the now equalized data by using the mean and standard deviation of the equalized data.

2. The rest of the procedure is the same as written in terms of determining the optimal f, geometric mean, and TWR. The geometric average trade, arithmetic average trade, and threshold to the geometric are only valid for the current price of the underlying instrument. When the price of the underlying instrument changes, the procedure must be done again, going back to step 1 and multiplying the percentage profits and losses by the new underlying price. When you go to redo the procedure with a different underlying price, you will obtain the same optimal f, geometric mean, and TWR. However, your arithmetic average trade, geometric average trade, and threshold to the geometric will differ, depending on the new price of the underlying instrument.

3. The number of contracts to trade as given in equation must be changed. The worst-case associated P&L, the W variable in equation will be different as a result of the changes caused in the equalized data by a different current price.

Share: