Parametric Optimal f on the Normal Distribution- FURTHER DERIVATIVES OF THE NORMAL

Parametric Optimal f on the Normal Distribution


Sometimes you may want to know the second derivative of the N(Z) function. Since the N(Z) function gives us the area under the curve at Z, and the N'(Z) function gives us the height of the curve itself at Z, then the N"(Z) function gives us the instantaneous slope of the curve at a given Z:

N"(Z) = -Z/2.506628274*EXP(-(Z^2/2)


EXP() = The exponential function.

To determine what the slope of the N'(Z) curve is at +2 standard units:

N"(Z) = -2/2.506628274*EXP(-(+2^2)/2)
           = -212.506628274*EXP(-2)
           = -2/2.506628274*.1353353
           = -.1079968336

Therefore, we can state that the instantaneous rate of change in the N'(Z) function when Z = +2 is -.1079968336. This represents rise/run, so we can say that when Z = +2, the N'(Z) curve is rising -.1079968336 for ever) 1 unit run in Z. 

Figure  N"(Z) giving the slope of the line tangent tangent to N'(Z) at Z = +2.

For the reader's own reference, further derivatives are now given. These will not be needed throughout the remainder of this text, but arc provided for the sake of completeness:

N'"(Z) = (Z^2-1)/2.506628274*EXP(-(Z^2)/2)
N""(Z) = ((3*Z)-Z^3)/2.506628274*EXP(-(Z^2)/2)
N'""(Z) = (Z^4-(6*Z^2)+3)/2.506628274*EXP(-(Z^2)/2)

As a final note regarding the Normal Distribution, you should be aware that the distribution is nowhere near as “peaked” as the graphic. The real shape of the Normal Distribution is depicted.

Figure  The real shape of the Normal Distribution.

Notice that here the scales of the two axes are the same, whereas in the other graphic examples they differ so as to exaggerate the shape of the distribution.


Many of the real-world applications in trading require a small but crucial modification to the Normal Distribution. This modification takes the Normal, and changes it to what is known as the Lognormal Distribution. Consider that the price of any freely traded item has zero as a lower limit.2 Therefore, as the price of an item drops and approaches zero, it should in theory become progressively more difficult for the item to get lower. For example, consider the price of a hypothetical stock at $10 per share. 

If the stock were to drop $5, to $5 per share, a 50% loss, then according to the Normal Distribution it could just as easily drop from $5 to $0. However, under the Lognormal, a similar drop of 50% from a price of $5 per share to $2.50 per share would be about as probable as a drop from $10 to $5 per share. The Lognormal Distribution, works exactly like the Normal Distribution except that with the Lognormal we are dealing with percentage changes rather than absolute changes.

Figure  The Normal and Lognormal distributions.

Consider now the upside. According to the Lognormal, a move from $10 per share to $20 per share is about as likely as a move from $5 to $10 per share, as both moves represent a 100% gain. That isn't to say that we won't be using the Normal Distribution. The purpose here is to introduce you to the Lognormal, show you its relationship to the Normal , and point out that it usually is used when talking about price moves, or anytime that the Normal would apply but be bounded on the low end at zero.2 

To use the Lognormal distribution, you simply convert the data you are working with to natural logarithms.3 Now the converted data will be Normally distributed if the raw data was Lognormally distributed. For instance, if we are discussing the distribution of price changes as being Lognormal, we can use the Normal distribution on it. First, we must divide each closing price by the previous closing price. Suppose in this instance we are looking at the distribution of monthly closing prices. Suppose we now see $10, $5, $10, $10, then $20 per share as our first five months closing prices. 

This would then equate to a loss of 50% going into the second month, a gain of 100% going into the third month, a gain of 0% going into the fourth month, and another gain of 100% into the fifth month. Respectively then, we have quotients of .5, 2, 1, and 2 for the monthly price changes of months 2 through 5. These are the same as HPRs from one month to the next in succession. We must now convert to natural logarithms in order to study their distribution under the math for the Normal Distribution. Thus, the natural log of .5 is .6931473, of 2 it is .6931471, and of 1 it is 0. We are now able to apply the mathematics pertaining to the Normal distribution to this converted data.


Now that we have studied the mathematics of the Normal and Lognormal distributions, we will see how to determine an optimal f based on outcomes that are Normally distributed. The Kelly formula is an example of a parametric optimal f in that the optimal f returned is a function of two parameters. In the Kelly formula the input parameters are the percentage of winning bets and the payoff ratio. However, the Kelly formula only gives you the optimal f when the possible outcomes have a Bernoulli distribution. In other words, the Kelly formula will only give the correct optimal f when there are only two possible outcomes. When the outcomes do not have a Bernoulli distribution, such as Normally distributed outcomes, the Kelly formula will not give you the correct optimal f.4

When they are applicable, parametric techniques are far more powerful than their empirical counterparts. Assume we have a situation that can be described completely by the Bernoulli distribution. We can derive our optimal f here by way of either the Kelly formula or the empirical technique detailed in Portfolio Management Formulas. Suppose in this instance we win 60% of the time. Say we are tossing a coin that is biased, that we know that in the long run 60% of the tosses will be heads. We are therefore going to bet that each toss will be heads, and the payoff is 1:1. The Kelly formula would tell us to bet a fraction of .2 of our stake on the next bet. Further suppose that of the last 20 tosses, 11 were heads and 9 were tails. 

If we were to use these last 20 trades as the input into the empirical techniques, the result would be that we should risk .1 of our stake on the next bet. Which is correct, the .2 returned by the parametric technique or the .1 returned empirically by the last 20 tosses? The correct answer is .2, the answer returned from the parametric technique. The reason is that the next toss has a 60% probability of being heads, not a 55% probability as the last 20 tosses would indicate. Although we are only discussing a 5% probability difference, 1 toss in 20, the effect on how much we should bet is dramatic. Generally, the parametric techniques are inherently more accurate in this regard than are their empirical counterparts. This is the first advantage of the parametric to the empirical. 

This is also a critical proviso-that we must know what the distribution of outcomes is in the long run in order to use the parametric techniques. This is the biggest drawback to using the parametric techniques. The second advantage is that the empirical technique requires a past history of outcomes whereas the parametric does not. Further, this past history needs to be rather extensive. In the example just cited, we can assume that if we had a history of 50 tosses we would have arrived at an empirical optimal f closer to .2. With a history of 1,000 tosses, it would be even closer according to the law of averages. The fact that the empirical techniques require a rather lengthy stream of past data has almost restricted them to mechanical trading systems. 

Someone trading anything other than a mechanical trading system, be it by Elliott Wave or fundamentals, has almost been shut out from using the optimal f technique. With the parametric techniques this is no longer true. Someone who wishes to blindly follow some market guru, for instance, now has a way to employ the power of optimal f. Therein lies the third advantage of the parametric technique over the empirical-it can be used by any trader in any market. There is a big assumption here, however, for someone not employing a mechanical trading system. The assumption is that the future distribution of profits and losses will resemble the distribution in the past. This may be less likely than with a mechanical system. This also sheds new light on the expected performance of any technique that is not purely mechanical. 

Even the best practitioners of such techniques, be it by fundamentals, Gann, Elliott Wave, and so on, are doomed to fail if they are too far beyond the peak of the f curve. If they are too far to the left of the peak, they are going to end up with geometrically lower profits than their expertise in their area should have made for them. Furthermore, practitioners of techniques that are not purely mechanical must realize that everything said about optimal f and the purely mechanical techniques applies. This should be considered when contemplating expected drawdowns of such techniques. Remember that the drawdowns Will be substantial, and this fact does not mean that the technique should be abandoned. The fourth and perhaps the biggest advantage of the parametric over the empirical method of determining optimal f, is that the parametric method allows you to do 'What if' types of modeling. 

For example, suppose you are trading a market system that has been running very hot. You want to be prepared for when that market system stops performing so well, as you know it Inevitably will. With the parametric techniques, you can vary your input parameters to reflect this and thereby put yourself at what the optimal f will be when the market system cools down to the state that the parameters you Input reflect. The parametric techniques are therefore far more powerful than the empirical ones. So why use the empirical techniques at all? The empirical techniques are more intuitively obvious than the parametric ones are. Hence, the empirical techniques are what one should learn first before moving on to the parametric. We have now covered the empirical techniques in detail and are therefore prepared to study the parametric techniques.


Consider the following sequence of 232 trade profits and losses in points. It doesn't matter what the commodity is or what system generated this stream-it could be any system on any market.

If we wanted to determine an equalized parametric optimal f we would now convert these trade profits and losses to percentage gains and losses. Next, we would convert these percentage profits and losses by multiplying them by the current price of the underlying instrument. For example, P&L #1 is .18. Suppose that the entry price to this trade was 100.50. Thus, the percentage gain on this trade would be .18/100.50 = .001791044776. Now suppose that the current price of this underlying instrument is 112.00. 

Multiplying .001791044776 by 112.00 translates into an equalized P&L of .2005970149, If we were seeking to do this procedure on an equalized basis, we would perform this operation on all 232 trade profits and losses. Whether or not we are going to perform our calculations on an equalized basis, we must now calculate the mean (arithmetic) and population standard deviation of these 232 individual trade profits and losses as .330129 and 1.743232 respectively. With these two numbers we can use equation to translate each individual trade profit and loss into standard units.

Z = (X-U)/S


U = The mean of the data.
S = The standard deviation of the data.
X = The observed data point.

Thus, to translate trade #1, a profit of .18, to standard units:

Z = (.18-.330129)/1.743232 = -.150129/1.743232 = -.08612106708

Likewise, the next three trades of -1.11, .42, and -.83 translate into -.8261258398, .05155423948, and -.6655046488 standard units respectively. If we are using equalized data, we simply standardize by subtracting the mean of the data and dividing by the data's standard deviation. Once we have converted all of our individual trade profits and losses over to standard units, we can bin the now standardized data. Recall that with binning there is a loss of information content about a particular distribution but the character of the distribution remains unchanged. Suppose we were to now take these 232 individual trades and place them into 10 bins. 

We are choosing arbitrarily here-we could have chosen 9 bins or 50 bins. In fact, one of the big arguments about binning data is that most frequently there is considerable arbitrariness as to how the bins should be chosen. Whenever we bin something, we must decide on the ranges of the bins. We will therefore select a range of -2 to +2 sigmas, or standard deviations. This means we will have 10 equally spaced bins between -2 standard units to +2 standard units. Since there are 4 standard units in total between -2 and +2 standard units and we are dividing this space into 10 equal regions, we have 4/10 = -4 standard units as the size or "width" of each bin. 

Therefore, our first bin, the one "farthest to then left," will contain those trades that were within -2 to -1.6 standard units, the next one trades from -1.6 to -1.2, then -1.2 to -.8, and so on, until our final bin contains those trades that were 1.6 to 2 standard units. Those trades that are less than -2 standard units or greater than +2 standard units will not be binned in this exercise, and we will ignore them. If we so desired, we could have included them in the extreme bins, placing those data points less than -2 in the -2 to -1.6 bin, and likewise for those data points greater than 2. Of course, we could have chosen a wider range for binning, but since these trades are beyond the range of our bins, we have chosen not to include them. 

In other words, we are eliminating from this exercise those trades with P&L's less than .330129- (1.743232*2) = -3.156335 or greater than .330129+(1.743232*2) = 3.816593. What we have created now is a distribution of this system's trade P&L's. Our distribution contains 10 data points because we chose to work with 10 bins. Each data point represents the number of trades that fell into that bin. Each trade could not fall into more than 1 bin, and if the trade was beyond 2 standard units either side of the mean (P&L's<-3.156335 or >3.816593), then it is not represented in this distribution. 

Figure  232 individual trades in 10 bins from -2 to +2 sigma versus the Normal Distribution.

"Wait a minute," you say. "Shouldn't the distribution of a trading system's P&L's be skewed to the right because we are probably going to have a few large profits?" This particular distribution of 232 trade P&L's happens to be from a system that very often takes small profits via a target. Many people have the mistaken impression that P&L distributions are going to be skewed to the right for all trading systems. Different market systems will have different distributions, and you shouldn't expect them all to be the same.

Also in Figure, superimposed over the distribution we have just put together, is the Normal Distribution as it would look for 232 trade P&L's if they were Normally distributed. This was done so that you can compare, graphically, the trade P&L's as we have just calculated them to the Normal. The Normal Distribution here is calculated by first taking the boundaries of each bin. For the leftmost bin in our example this would be Z = -2 and Z = -1.6. Now we run these Z values through equation to convert these boundaries to a cumulative probability. In our example, this corresponds to .02275 for Z = -2 and .05479932 for Z = -1.6. 

Next, we take the absolute value of the difference between these two values, which gives us ABS(.02275-.05479932) = .03204932 for our example. Last, we multiply this answer by the number of data points, which in this case is 232 because there are 232 total trades. Therefore, we can state that if the data were Normally distributed and placed into 10 bins of equal width between -2 and +2 sigmas, then the leftmost bin would contain .03204932*232 = 7.43544224 elements. If we were to calculate this for each of the 10 bins, we would calculate the Normal curve superimposed.


Now we can construct a technique for finding the optimal f on Normally distributed data. Like the Kelly formula, this will be a parametric technique. However, this technique is far more powerful than the Kelly formula, because the Kelly formula allows for only two possible outcomes for an event whereas this technique allows for the full spectrum of the outcomes. The beauty of Normally distributed outcomes is that they can be described by 2 parameters. The Kelly formulas will give you the optimal f for Bernoulli distributed outcomes by inputting the 2 parameters of the payoff ratio and the probability of winning. The technique about to be described likewise only needs two parameters as input, the average and the standard deviation of the outcomes, to return the optimal f. Recall that the Normal Distribution is a continuous distribution, In order to use this technique we need to make this distribution be discrete. 

Further recall that the Normal Distribution is unbounded. That is, the distribution runs from minus infinity on the left to plus infinity on the right. Therefore, the first two steps that we must take to find the optimal f on Normally distributed data is that we must determine (1) at how many sigmas from the mean of the distribution we truncate the distribution, and (2) into how many equally spaced data points will we divide the range between the two extremes determined in (1). For instance, we know that 99.73% of all the data points will fall between plus and minus 3 sigmas of the mean, so we might decide to use 3 sigmas as our parameter for (1). In other words, we are deciding to consider the Normal Distribution only between minus 3 sigmas and plus 3 sigmas of the mean. In so doing, we will encompass 99.73% of all of the activity under the Normal Distribution. Generally we will want to use a value of 3 to 5 sigmas for this parameter. 

Regarding step (2), the number of equally spaced data points, we will generally want to use a bare minimum of ten times the number of sigmas we are using in (1).  If we select 3 sigmas for (1), then we should select at least 30 equally spaced data points for (2). This means that we are going to take the horizontal axis of the Normal Distribution, of which we are using the area from minus 3 sigmas to plus 3 sigmas from the mean, and divide that into 30 equally spaced points. Since there are 6 sigmas between minus 3 sigmas and plus 3 sigmas, and we want to divide this into 30 equally spaced points, we must divide 6 by 30-1, or 29. This gives us .2068965517. So, our first data point will be minus 3, and we will add .2068965517 to each previous point until we reach plus 3, at which point we will have created 30 equally spaced data points between minus 3 and plus 3. Therefore, our second data point will be -3+.2068965517 = -2.793103448, our third data point 2.79310344+.2068965517 = -2.586206896, and so on. 

In so doing, we will have determined the 30 horizontal input coordinates to this system. The more data points you decide on, the better will be the resolution of the Normal curve. Using ten times the number of sigmas is a rough rule for determining the bare minimum number of data points you should use. Recall that the Normal distribution is a continuous distribution. However, we must make it discrete in order to find the optimal f on it. The greater the number of equally spaced data points we use, the closer our discrete model will be to the actual continuous distribution itself, with the limit of the number of equally spaced data points approaching infinity where the discrete model approaches the continuous exactly. Why not use an extremely large number of data points? The more data points you use in the Normal curve, the more calculations will be required to find the optimal f on it. Even though you will usually be using a computer to solve for the optimal f, it will still be slower the more data points you use. 

Further, each data point added resolves the curve further to a lesser degree than the previous data point did. We will refer to these first two input parameters as the bounding parameters. Now, the third and fourth steps are to determine the arithmetic average trade and the population standard deviation for the market system we are working on. If you do not have a mechanical system, you can get these numbers from your brokerage statements or you can estimate them. That is the one of the real benefits of this technique-that you don't need to have a mechanical system, you don't even need brokerage statements or paper trading results to use this technique. The technique can be used by simply estimating these two inputs, the arithmetic mean average trade and the population standard deviation of trades. Be forewarned, though, that your results will only be as accurate as your estimates.

If you are having difficulty estimating your population standard deviation, then simply try to estimate by how much, on average, a trade will differ from the average trade. By estimating the mean absolute deviation in this way, you can use equation to convert your estimated mean absolute deviation into an estimated standard deviation:

S = M*1/.7978845609 = M*1.253314137


S = The standard deviation.
M = The mean absolute deviation.

We will refer to these two parameters, the arithmetic mean average trade and the standard deviation of the trades, as the actual input parameters. Now we want to take all of the equally spaced data points from step (2) and find their corresponding price values, based on the arithmetic mean and standard deviation. Recall that our equally spaced data points are expressed in terms of standard units. Now for each of these equally spaced data points we will find the corresponding price as:

D = U+(S*E)


D = The price value corresponding to a standard unit value.
E = The standard unit value.
S = The population standard deviation.
U = The arithmetic mean.

Once we have determined all of the price values corresponding to each data point we have truly accomplished a great deal. We have now constructed the distribution that we expect the future data points to tend to.

However, this technique allows us to do a lot more than that. We can incorporate two more parameters that will allow us to perform "What if ' types of scenarios about the future. These parameters, which we will call' the "What if" parameters, allow us to see the effect of a change in our average trade or a change in the dispersion of our trades. The first of these parameters, called shrink, affects the average trade. Shrink is simply a multiplier on our average trade. Recall that when we find the optimal f we also obtain other calculations, which are useful by-products of the optimal f. Such calculations include the geometric mean, TWR, and geometric average trade. Shrink is the factor by which we will multiply our average trade before we perform the optimal f technique on it. Hence, shrink lets us see what the optimal f would be if our average trade were affected by shrink as well as how the other by-product calculations would be affected.

For example, suppose you are trading a system that has been running very hot lately. You know from past experience that the system is likely to stop performing so well in the future. You would like to see what would happen if the average trade were cut in half. By using a shrink value of .5 you can perform the optimal f technique to determine what your optimal f should be if the average trade were to be cut in half. Further, you can see how such changes affect your geometric average trade, and so on. By using a shrink value of 2, you can also see the affect that a doubling of your average trade would have. In other words, the shrink parameter can also be used to increase (unshrink?) your average trade. What's more, it lets you take an unprofitable system, and, by using a negative value for shrink, see what would happen if that system became profitable. 

For example, suppose you have a system that shows an average trade of -$100. If you use a shrink value of -.5, this will give you your optimal f for this distribution as if the average trade were $50, since -100*-.5 = 50. If we used a shrink factor of -2, we would obtain the distribution centered about an average trade of $200. You must be careful in using these "What if" parameters, for they make it easy to mismanage performance. Mention was just made of how you can turn a system with a negative arithmetic average trade into a positive one. This can lead to problems if, for instance, in the future, you still have a negative expectation. The other "What if" parameter is one called stretch. This is not, as its name would imply, the opposite of shrink. Rather, stretch is the multiplier to be used on the standard deviation. 

You can use this parameter to determine the effect on f and its by-products by an increase or decrease in the dispersion. Also, unlike shrink, stretch must always be a positive number, whereas shrink can be positive or negative. If you want to see what will happen if your standard deviation doubles, simply use a value of 2 for stretch. To see what Would happen if the dispersion quieted down, use a value less than 1. You will notice in using this technique that lowering the stretch toward zero will tend to increase the by-product calculations, resulting in a more optimistic assessment of the future and vice versa. Shrink works in an opposite fashion, as lowering the shrink towards zero will result in more pessimistic assessments about the future and vice versa.

Once we have determined what values we want to use for stretch and shrink (and for the time being we will use values of 1 for both, which means to leave the actual parameters unaffected) we can amend equation to:

D = (U*Shrink)+(S*E*Stretch)


D = The price value corresponding to a standard unit value.
E = The standard unit value.
S = The population standard deviation.
U = The arithmetic mean.

To summarize thus far, the first two steps are to determine the bounding parameters of the number of sigmas either side of the mean we are going to use, as well as how many equally spaced data points we are going to use within this range. The next two steps are the actual input parameters of the arithmetic average trade and population standard deviation. We can derive these parameters empirically by looking at the results of a given trading system or by using brokerage statements or paper trading results. 

We can also derive these figures by estimation, but remember that the results obtained will only be as accurate as your estimates. The fifth and sixth steps are to determine the factors to use for stretch and shrink if you are going to perform a "What if type of scenario. If you are not, simply use values of 1 for both stretch and shrink. Once you have completed these six steps, you can now use equation to perform the seventh step. The seventh step is to convert the equally spaced data points from standard values to an actual amount of either points or dollars.

Now the eighth step is to find the associated probability with each of the equally spaced data points. This probability is determined by using equation :

N(Z)=1-N'(Z)*((1.330274429*Y^5)-(1.821255978*Y^4)+(1.781477937*Y^3)- (.356563782*Y^2)+(.31938153*Y))

If Z<0 then N(Z) = 1-N(Z)


Y = 1/(1+.2316419*ABS(Z))
ABS() = The absolute value function.
N'(Z) = .398942*EXP(-(Z^2/2))
EXP() = The exponential function.

However, we will use Equation without its 1-as the first term in the equation and without the -Z provision (i.e., without the "If Z<0 then N(Z)-1-N(Z)"), since we want to know what the probabilities are for an event equaling or exceeding a prescribed amount of standard units. So we go along through each of our equally spaced data points. Each point has a standard value, which we will use as the Z parameter in equation, and a dollar or point amount. Now there will be another variable corresponding to each equally spaced data point-the associated probability.


The procedure will now be demonstrated on the trading example introduced earlier in this chapter. Since our 232 trades are currently in points, we should convert them to their dollar representations. However, since the market is a not specified, we will assign an arbitrary value of $1,000 per point. Thus, the average trade of .330129 now becomes .330129*$1000, or an average trade of $330.13. Likewise the population standard deviation of 1.743232 is also multiplied by $1,000 per point to give $1,743.23. Now we construct the matrix. First, we must determine the range, in sigmas from the mean, that we want our calculations to encompass. For our example we will choose 3 sigmas, so our range will go from minus 3 sigmas to plus 3 sigmas. 

Note that you should use the same amount to the left of the mean that you use to the right of the mean. That is, if you go 3 sigmas to the left then you should not go only 2 or 4 sigmas to the right, but rather you should go 3 sigmas to the right as well. Next we must determine how many equally spaced data points to divide this range into. Choosing 61 as our value gives a data point at every tenth of a standard unit-simple. Thus we can determine our column of standard values. Now we must determine the arithmetic mean that we are going to use as input. We determine this empirically from the 232 trades as $330.13. Further, we must determine the population standard deviation, which we also determine empirically from the 232 trades as $1,743.23.

Now to determine the column of associated P&L's. That is, we must determine a P&L amount for each standard value. Before we can determine our associated P&L column, we must decide on values for stretch and shrink. Since we are not going to perform any "What if types of scenarios at this time, we will choose a value of 1 for both stretch and shrink.

Arithmetic mean = 330.13
Population Standard Deviation = 1743.23
Stretch = 1
Shrink = 1

Using equation we can calculate our associated P&L column. We do this by taking each standard value and using it as E in Equation to get the column of associated P&L's:

D = (U*Shrink)+(S*E*Stretch)


D = The price value corresponding to a standard unit value.
E = The standard unit value.
S = The population standard deviation.
U = The arithmetic mean.

For the -3 standard value, the associated P&L is:

D = (U*Shrink)+(S*E*Stretch)
    = (330.129*1)+(1743.232*(-3)*1)
    = 330.129+(-5229.696)
    = 330.129-5229.696
    = 4899.567

Thus, our associated P&L column at a standard value of -3 equals 4899.567. We now want to construct the associated P&L for the next standard value, which is -2.9, so we simply perform the same Equation, again-only this time we use a value of -2.9 for E. Now to determine the associated probability column. This is calculated using the standard value column as the Z input to Equation without the preceding 1-and without the-Z provision (i.e, the "If Z < 0 then N(Z) = 1-N(Z)"). For the standard value of -3 (Z = -3), this is:

N(Z)=N'(Z)*((1.330274429*Y^5)-(1.821255978*Y^4)+(1.781477937*Y^3)- (.356563782*Y^2+(.31938153*Y))

If Z<0 then N(Z) = 1-N(Z)


Y = 1/(1+.2316419*ABS(Z))
ABS() = The absolute value function.
N'(Z) = .398942*EXP(-(Z^2/2))
EXP() = The exponential function.


N'(3) = .398942*EXP(-((-3)^2/2)) 
          = .398942*EXP(-(9/2)) 
          = .398942*EXP(-4.5) 
          = .398942*.011109 
          = .004431846678

Y = 1/(1+2316419*ABS(-3)) 
    = 1/(1+2316419*3) 
    = 1/(1+6949257) 
    = 1/1.6949257 
    = .5899963639

N(-3) = .004431846678*((1.330274429*.5899963639^5)-(1.821255978*.5899963639^4)+(1.781477937*.5899963639^3)-(.356563782*.5899963639^2)+(.31938153*.5899963639))
= .004431846678*(.09510162081-.2206826796+.3658713876-.1241183226+.1884339414)
= .004431846678*.3046059476 = .001349966857

Note that even though Z is negative (Z = -3), we do not adjust N(Z) here by making N(Z) = 1-N(Z). Since we are not using the-Z provision, we just let the answer be.

Now for each value in the standard value column there will be a corresponding entry in the associated P&L column and in the associated probability column. Once you have these three columns established you are ready to begin the search for the optimal f and its by-products.

By-products atf-.01:

TWR = 1.0053555695
Sum of the probabilities = 7.9791232176
Geomean = 1.0006696309 
GAT = $328.09

Here is how you go about finding the optimal f. First, you must determine the search method for f. You can simply loop from 0 to 1 by a predetermined amount (e.g., .01), use an iterative technique, or use the technique of parabolic interpolation described in Portfolio Management formulas. What you seek to find is what value for f will result in the highest geometric mean. Once you have decided upon a search technique, you must determine what the worst-case associated P&L is in your table. In our example it is the P&L corresponding to -3 standard units, 4899.57. You will need to use this particular value repeatedly throughout the calculations.

In order to find the geometric mean for a given f value, for each value of f that you are going to process in your search for the optimal, you must convert each associated P&L and probability to an HPR.

Equation shows the calculation for the HPR:

HPR = (1+(L/(W/(-f))))^P


L = The associated P&L.
W = The worst-case associated P&L in the table.
f = The tested value for f.
P = The associated probability.

Working through an example now where we use the value of .01 for the tested value for f, we will find the associated HPR at the standard value of -3. Here, our worst-case associated P&L is 4899.57, as is our associated P&L. Therefore, our HPR here is:

HPR = (1+(-4899.57/-4899.57/(-.01))))^.001349966857
          = (1+(-4899.57/489957))^.001349966857
          = (1+(-.01))^.001349966857
          = .99^.001349966857
          = .9999864325

Now we move down to our next standard value, of -2.9, where we have an associated P&L of -2866.72 and an associated probability of 0.001865. Our associated HPR here will be:

HPR = (-4725.24/(-4899.57/(-.01))))^.001866
          = (1+(-4725.24/489957))^001866
          = (1+(-4725.24/489957))^.001866
          = (1+(-.009644193266))^.001866
          = .990355807^.001866
          = .9999819

Once we have calculated an associated HPR for each standard value for a given test value off, you are ready to calculate the TWR. The TWR is simply the product of all of the HPRs for a given f value multiplied together:

TRW = (∏[i = 1,N]HPRi)


N = The total number of equally spaced data points.
HPRi = The HPR corresponding to the i'th data point, given by equation.

So for our test value off = .01, the TWR will be:

TWR = .9999864325*.9999819179*...*1.0000152327 
          = 1.0053555695

We can readily convert a TWR into a geometric mean by taking the TWR to the power of 1 divided by the sum of all of the associated probabilities.

G = TRW^(1/∑[i = 1,N] Pi)


N = The number of equally spaced data points.
Pi = The associated probability of the ith data point.

Note that if we sum the column that lists the 61 associated probabilities it equals 7.979105. Therefore, our geometric mean at f = .01 is:

G = 1.0053555695^(1/7.979105) 
    = 1.0053555695^.1253273393 
    = 1.00066963

We can also calculate the geometric average trade (GAT). This is the amount you would have made, on average per contract per trade, if you were trading this distribution of outcomes at a specified f value.

GAT = (G(f)-1)*(w/(-f))


G(f) = The geometric mean for a given f value.
f = The given f value.
W = The worst-case associated P&L.

In the case of our example, the f value is .01:

GAT = (1.00066963-1)*(-4899.57/(-.01))
         = .00066963*489957
         = 328.09

Therefore, we would expect to make, on average per contract per trade, $328.09. Now we go to our next value for f that must be tested according to our chosen search procedure for the optimal f In the case of our example we are looping from 0 to 1 by .01 for f, so our next test value for f is .02. We will do the same thing again. We will calculate a new associated HPRs column, and calculate our TWR and geometric mean. The f value that results in the highest geometric mean is that value for f which is the optimal based on the input parameters we have used.

In our example, if we were to continue with our search for the optimal f, we would find the optimal at f = .744. This results in a geometric mean of 1.0265. Therefore, the corresponding geometric average trade is $174.45. It is important to note that the TWR itself doesn't have any real meaning as a by-product. Rather, when we are calculating our geometric mean parametrically, as we are here, the TWR is simply an interim step in obtaining that geometric mean. Now, we can figure what our TWR would be after X trades by taking the geometric mean to the power of X.

Therefore, if we want to calculate our TWR for 232 trades at a geometric mean of 1.0265, we would raise 1.0265 to the power of 232, obtaining 431.79. So we can state that trading at an optimal f of .744, we would expect to make 43,079% ((431.79-1)*100) on our stake after 232 trades.

Another by-product we will calculate is our threshold to geometric equation :

Threshold to geometric = 330.13/174.45*-4899.57/-.744 
                                            = 12,462.32

Notice that the arithmetic average trade of $330.13 is not something that we have calculated with this technique, rather it is a given as it is one of the input parameters. We can now convert our optimal f into how many contracts to trade by the equations:

K = E/Q


K = The number of contracts to trade.
E = The current account equity.
Q = W/( -f)


W = The worst-case associated P&L.
f = The optimal f value.

Note that this variable, Q, represents a number that you can divide your account equity by as your equity changes on a day-by-day basis to know how many contracts to trade. Returning now to our example:

Q = -4,899.57/-.744 
    = $6,585.44

Therefore, we will trade 1 contract for every $6,585.44 in account equity. For a $25,000 account this means we would trade:

K = 25000/6585.44 
    = 3.796253553

Since we cannot trade in fractional contracts, we must round this figure of 3.796253553 down to the nearest integer. We would therefore trade 3 contracts for a $25,000 account. The reason we always round down rather than up is that the price extracted for being slightly below optimal is less than the price for being slightly beyond it.

Notice how sensitive the optimal number of contracts to trade is to the worst loss. This worst loss is solely a function of how many sigmas you have decided to go to the left of the mean. This bounding parameter, the range of sigmas, is very important in this calculation. We have chosen three sigmas in our calculation. This means that we are, in effect, budgeted for a three-Sigma loss. However, a loss greater than three sigmas can really hurt us, depending on how far beyond three sigmas it is. Therefore, you should be very careful what value you choose for this range bounding parameter. You'll have a lot riding on it.

Notice that for the sake of simplicity in illustration, we have not deducted commissions and slippage from these figures. If you wanted to incorporate commissions and slippage, you should deduct X dollars in commissions and slippage from each of the 232 trades at the outset of this exercise. You would calculate your arithmetic average trade and population standard deviation from this set of 232 adjusted trades, and then perform the exercise exactly as described. We could now go back and perform a "What if type of scenario here. Suppose we want to see what will happen if the system begins to perform at only half the profitability it is now (shrink = .5). 

Further, assume that the market that the system we are looking at is in gets very volatile, and that as a consequence the dispersion among the trades increases by 60% (stretch = 1.6). By pumping these parameters through this system we can see what the optimal will be so that we can make adjustments to our trading before these changes become history. In so doing we find that the optimal f now becomes ,262, or to trade 1 contract for every $31,305.92 in account equity. This is quite a change. This means that if these changes in the market system start to materialize, we are going to have to do some altering in our money management regarding that system. 

The geometric mean will drop to 1.0027, the geometric average trade will be cut to $83.02, and the TWR over 232 trades will be 1.869. This is not even close to what it presently would be. All of this is predicated upon a 50% decrease in average trade and a 60% increase in standard deviation. This quite possibly could happen. It is also quite possible that the future could work out more favorably than the past. We can test this out, too. Suppose we want to see what will happen if our average profit increases by only 10%. We can check this by inputting a shrink value of 1.1. These “What if” parameters, stretch and shrink, really give us a great deal of power in our money management.

The closer your distribution of trade P&L's is to Normal to begin with, the better the technique will work for you. The problem with almost any money management technique is that there is a certain amount of "slop" involved. Here, we can define slop as the difference between the Normal Distribution and the distribution we are actually using. The difference between the two is slop, and the more slop there is, the less effective the technique becomes. To illustrate, recall that using this method we have determined that to trade 1 contract for every $6,585.44 in account equity is optimal. However, if we were to go over these trades and find our optimal f empirically, we would find that the optimal is to trade 1 contract for every $7,918.04 in account equity. 

As you can see, using the Normal Distribution technique here would have us slightly to the right of the f curve, trading slightly more contracts than the empirical would suggest. However, as we shall see, there is a lot to be said for expecting the future distribution of prices to be Normally distributed. When someone buys or sells an option, the assumption that the future distribution of the log of price changes in the underlying instrument will be Normal is built into the price of the option. Along this same line of reasoning, someone who is entering a trade in a market and is not using a mechanical system can be said to be looking at the same possible future distribution.

The technique was shown using data that was not equalized. We can also use this very same technique on equalized data by incorporating the following changes:

Before the data is standardized, it should be equalized by first converting all of the trade profits and losses to percentage profits and losses. Then these percentage profits and losses should be translated into percentages of the current price by simply multiplying them by the current price.

1. When you go to standardize this data, standardize the now equalized data by using the mean and standard deviation of the equalized data.

2. The rest of the procedure is the same as written in terms of determining the optimal f, geometric mean, and TWR. The geometric average trade, arithmetic average trade, and threshold to the geometric are only valid for the current price of the underlying instrument. When the price of the underlying instrument changes, the procedure must be done again, going back to step 1 and multiplying the percentage profits and losses by the new underlying price. When you go to redo the procedure with a different underlying price, you will obtain the same optimal f, geometric mean, and TWR. However, your arithmetic average trade, geometric average trade, and threshold to the geometric will differ, depending on the new price of the underlying instrument.

3. The number of contracts to trade as given in equation must be changed. The worst-case associated P&L, the W variable in equation will be different as a result of the changes caused in the equalized data by a different current price.


Popular posts from this blog