## Parametric Techniques on Other Distributions- THE KOLMOGOROV-SMIRNOV (K-S) TEST

__Parametric Techniques on Other Distributions__

__THE KOLMOGOROV-SMIRNOV (K-S) TEST__
The chi-square test is no doubt the most popular of all methods of comparing two distributions. Since many market-oriented applications other than the ones. However, the best test for our purposes may well be the K-S test. This very efficient test is applicable to unbinned distributions that are a function of a single independent variable. All cumulative density functions have a minimum value of 0 and a maximum value of 1. What goes on in between differentiates them. The K-S test measures a very simple variable, D, which is defined as the maximum absolute value of the difference between two distributions' cumulative density functions. To perform the K-S test is relatively simple. N objects are standardized and sorted in ascending order.

As we go through these sorted and standardized trades, the cumulative probability is however many trades we've gone through divided by N. When we get to our first trade in the sorted sequence, the trade with the lowest standard value, the cumulative density function (CDF) is equal to 1/N. With each standard value that we pass along the way up to our highest standard value, 1 is added to the numerator until, at the end of the sequence, our CDF is equal to N/N or 1. For each standard value we can compute the theoretical distribution that we wish to compare to. Thus, we can compare our actual cumulative density to any theoretical cumulative density.

The variable D, the K-S statistic, is equal to the greatest distance between any standard values of our actual cumulative density and the value of the theoretical distribution's CDF at that standard value. Whichever standard value results in the greatest difference is assigned to the variable D. When comparing our actual CDF at a given standard value to the theoretical CDF at that standard value, we must also compare the previous standard value's actual CDF to the current standard value's actual CDF. The reason is that the actual CDF breaks upward instantaneously at the data points, and, if the actual is below the theoretical, the difference between the lines is greater the instant before the actual jumps up.

*Figure The K-S test.*

Notice that at point A the actual line is above the theoretical. Therefore, we want to compare the current actual CDF value to the current theoretical value to find the greatest difference. Yet at point B, the actual line is below the theoretical. Therefore, we want to compare the previous actual value to the current theoretical value. The rationale is that we are measuring the greatest distance between the two lines. Since we are measuring at the instant the actual jumps up, we can consider using the previous value for the actual as the current value for the actual the instant before it jumps.

In summary, then, for each standard value, we want to take the absolute value of the difference between the current actual CDF value and the current theoretical CDF value. We also want to take the absolute value of the difference between the previous actual CDF value and the current theoretical CDF value. By doing this for all standard values, all points where the actual CDF jumps up by 1/N, and taking the greatest difference, we will have determined the variable D.

The lower the value of D, the more the two distributions are alike. We can readily convert the D value to a significance level by the following formula:

SIG = ∑[j = 1, ∞] (j%2)*4-2*EXP(-2*j^2*(N^(1/2)*D)^2)

where,

SIG = The significance level for a given D and N.

D = The K-S statistic.

N = The number of trades that the K-S statistic is determined over.

% = The modulus operator, the remainder from division. As it is used here, J % 2 yields the remainder when J is divided by 2.

EXP() = The exponential function.

There is no need to keep summing the values until J gets to infinity. The equation converges to a value. Once the convergence is obtained to a close enough user tolerance, there is no need to continue summing values.

To illustrate Equation by example. Suppose we had 100 trades that yielded a K-S statistic of .04:

J1 = (1%2)*4-2*EXP(-2*1^2*(100^(1/2)*.04)^2)

= 1*4-2*EXP(-2*1^2*(10*.04)^2)

= 2*EXP(-2*1^2*.4^2)

=2*EXP(-2*1*.16)

= 2*EXP(-.32)

= 2*.726149

= 1.452298

So our first value is 1.452298. Now to this we will add the next pass through the equation, and as such we must increment J by 1 so that J now equals J2:

J2 = (2%2)*4-2*EXP(-2*2^2*(100^(1/2)*.04)^2)

= 0*4-2*EXP(-2*2^2*(10*.04)^2)

= -2*EXP(-2*2^2*.4^2)

= -2*EXP(-2*4*.16)

= -2*EXP(-1.28)

= -2*.2780373

= -.5560746

Adding this value of -.5560746 back into our running sum of 1.452298 gives us a new running sum of .8962234. We again increment J by 1, so it equals J3, and perform the equation. We take the resulting sum and add it to our running total of .8962234. We keep on doing this until we converge to a value within a close enough tolerance. For our example, this point of convergence will be right around .997, depending upon how many decimal places we want to be accurate to. This answer means that for 100 trades where the greatest value between the two distributions was .04, we can be 99.7% certain that the actual distribution was generated by the theoretical distribution function. In other words, we can be 99.7% certain that the theoretical distribution function represents the actual distribution. Incidentally, this is a very good significance level.

**CREATING OUR OWN CHARACTERISTIC DISTRIBUTION FUNCTION**

We have determined that the Normal Probability Distribution is generally not a very good model of the distribution of trade profits and losses. Further, none of the more common probability distributions are either. Therefore, we must create a function to model the distribution of our trade profits and losses ourselves. The distribution of the logs of price changes is generally assumed to be of the stable Paretian variety. The distribution of trade P&L's can be regarded as a transformation of the distribution of prices. This transformation occurs as a result of trading techniques such as traders trying to cut their losses and let their profits run.

Hence, the distribution of trade P&L's can also be regarded as of the stable Paretian variety. What we are about to study, however, is not the stable Paretian. The stable Paretian, like all other distributional functions, models a specific probability phenomenon. The stable Paretian models the distribution of sums of independent, identically distributed random variables. The distributional function we arc about to study does not model a specific probability phenomenon. Rather, it models other unimodal distributional functions. As such, it can replicate the shape, and therefore the probability densities, of the stable Paretian as well as any other unimodal distribution.

Now we will create this function. To begin with, consider the following equation:

Y = 1/(X^2+1)

This equation graphs as a general bell-shaped curve, symmetric about the X axis, as is shown in figure.

*Figure LOC = 0 SCALE = 1 SKEW = 0 KURT = 2.*

We will thus build from this general equation. The variable X can be thought of as the number of standard units we are either side of the mean, or Y axis. We can affect the first moment of this "distribution," the location, by adding a value to represent a change in location to X. Thus, the equation becomes:

Y = 1/((X-LOC)^2+1)

where,

Y = The ordinate of the characteristic function.

X = The standard value amount.

LOC = A variable representing the location, the first moment of the distribution.

Thus, if we wanted to alter location by moving it to the left by 1/2 of a standard unit, we would set LOC to -.5.

*Figure LOC =-.5 SCALE = 1 SKEW = 0 KURT = 2*

Likewise, if we wanted to shift location to the right, we would use a positive value for the LOC variable. Keeping LOC at zero will result in no shift in location.

The exponent in the denominator affects kurtosis. Thus far, we have seen the distribution with the kurtosis set to a value of 2, but we can control the kurtosis of the distribution by changing the value of the exponent. This alters our characteristic function, which now appears as:

Y = 1/((X-LOC)^KURT+1)

where,

Y = The ordinate of the characteristic function.

X = The standard value amount.

LOC = A variable representing the location, the first moment of the distribution.

KURT = A variable representing kurtosis, the fourth moment of the distribution.

Figures demonstrate the effect of the kurtosis variable on our characteristic function. Note that the higher the exponent the more flat topped and thin-tailed the distribution (platykurtic), and the lower the exponent, the more pointed the peak and thicker the tails of the distribution (leptokurtic).

*Figure LOC = 0 SCALE = 1 SKEW =0 KURT = 3.*

So that we do not run into problems with irrational numbers when KURT<1, we will use the absolute value of the coefficient in the denominator. This does not affect the shape of the curve. Thus, we can rewrite equation as:

Y = 1/(ABS(X-LOC)^KURT+1)

We can put a multiplier on the coefficient in the denominator to allow us to control the scale, the second moment of the distribution. Thus, our characteristic function has now become:

Y = 1/(ABS((X-LOC)*SCALE) ^ KURT+1)

where,

Y = The ordinate of the characteristic function.

X = The standard value amount.

LOC = A variable representing the location, the first moment of the distribution.

SCALE = A variable representing the scale, the second moment of the distribution.

KURT = A variable representing kurtosis, the fourth moment of the distribution.

Figures demonstrate the effect of the scale parameter. The effect of this parameter can be thought of as moving the horizontal axis up or down on the distribution. When the axis is moved up, the graph is also enlarged. This has the effect of moving the horizontal axis up and enlarging the distribution curve. The result is as though we were looking at the "cap" of the distribution. As is borne out in the figure, the effect is that the horizontal axis has been moved down and the distribution curve shrunken.

*Figure LOC = 0 SCALE = .5 SKEW = 0 KURT =*

*2.*

*Figure LOC = 0 SCALE = 2 SKEW = 0 KURT = 2.*

We now have a characteristic function to a distribution whereby we have complete control over three of the first four moments of the distribution. Presently, the distribution is symmetric about the location. What we now need is to be able to incorporate a variable for skewness, the third moment of the distribution, into this function. To account for skewness, we must amend our function further. Our characteristic function has now evolved to:

Y = (1/(ABS((X-LOC)*SCALE)^KURT+1))^C

where,

C = The exponent for skewness, calculated as:

C = (1+(ABS(SKEW)^ABS( 1/(X-LOC))*sign(X)*- sign(SKEW)))^.5

Y = The ordinate of the characteristic function. X = The standard value amount.

LOC = A variable representing the location, the first moment of the distribution.

SCALE = A variable representing the scale, the second moment of the distribution.

SKEW = A variable representing the skewness, the third moment of the distribution.

KURT = A variable representing kurtosis, the fourth moment of the distribution.

sign() = The sign function, equal to 1 or -1. The sign of X is calculated as X/ABS(X) for X not equal to 0. If X is equal to zero, the sign should be regarded as positive.

*Figure LOC = 0 SCALE = 1 SKEW = -.5 KURT = 2.*

A few important notes on the four parameters LOC, SCALE, SKEW, and KURT. With the exception of the variable LOC, the other three variables are nondimensional - that is, their values are pure numbers which have meaning only in a relative context, characterizing the shape of the distribution and are relevant only to this distribution. Furthermore, the parameter values are not the same values you would get if you employed any of the standard measuring techniques detailed in "Descriptive Measures of Distributions".

For instance, if you determined one of Pearson's coefficients of skewness on a set of data, it would not be the same value that you would use for the variable SKEW in the adjustable distributions here. The values for the four variables are unique to our distribution and have meaning only in a relative context. Also of importance is the range that the variables can take. The SCALE. variable must always be positive with no upper bound, and likewise with KURT. In application, though, you will generally use values between .5 and 3, and in extreme cases between .05 and 5. However, you can use values beyond these extremes, so long as they are greater than zero.

The LOC variable can be positive, negative, or zero. The SKEW parameter must be greater than or equal to -1 and less than or equal to +1. When SKEW equals +1, the entire right side of the distribution (right ofthe peak) is equal to the peak, and vice versa when SKEW equals -1. The ranges on the variables are summarized as:

-infinity<LOC<+infinity

SCALE>0

-1<=SKEW<=+1

KURT>0

**FITTING THE PARAMETERS OF THE DISTRIBUTION**

Just as with the process for finding our optimal f on the Normal Distribution, we must convert our raw trades data over to standard units. We do this by first subtracting the mean from each trade, then dividing by the population standard deviation. From this point forward, we will be working with the data in standard units rather than in its raw form. After we have our trades in standard values, we can sort them in ascending order. With our trades data arranged this way, we will be able to perform the K-S test on it.

Our objective now is to find what values for LOC, SCALE, SKEW, and KURT best fit our actual trades distribution. To determine this "best fit" we rely on the K-S test. We estimate the parameter values by employing the "twentieth-century brute force technique." We run every combination for KURT from 3 to .5 by -.1. We also run every combination for SCALE from 3 to .5 by -.1, For the time being we leave LOC and SKEW at 0. Thus, we are going to run the following combinations:

**LOC SCALE SKEW KURT**
We perform the K-S test for each combination. The combination that results in the lowest K-S statistic we assume to be our optimal best-fitting Parameter values for SCALE and KURT. To perform the K-S test for each combination, we need both the actual distribution and the theoretical distribution. We already have seen how to construct the actual cumulative density as X/N, where N is the total number of trades and X is the ranking of a given trade. Now we need to calculate the CDF, for our theoretical distribution for the given LOC, SCALE, SKEW, and KURT parameter values we are presently looping through. We have the characteristic function for our adjustable distribution. To obtain a CDF from a distribution's characteristic function we must find the integral of the characteristic function. We define the integral, the percentage of area under the characteristic function at point X, as N(X).

Thus, since Equation gives us the first derivative to the integral, we define Equation as N'(X). Often you may not be able to derive the integral of a function, even if you are proficient in calculus. Therefore, rather than determining the integral to Equation, we are going to rely on a different technique, one that, although a bit more labor intensive, is hardier than the technique of finding the integral. The respective probabilities can always be estimated for any point on the function's characteristic line by making the distribution be a series of many bars. Then, for any given bar on the distribution, you can calculate the probability associated at that bar by taking the sum of the areas of all those bars to the left of your bar, including your bar, and dividing it by the sum of the areas of all the bars in the distribution. The more bars you use, the more accurate your estimated probabilities will be.

If you could use an infinite number of bars, your estimate would be exact. We now discuss the procedure for finding the areas under our adjustable distribution by way of an example. Assume we wish to find probabilities associated with every .1 increment in standard values from -3 to +3 sigmas of our adjustable distribution. Notice that our table starts at -5 standard units and ends at +5 standard units, the reason being that you should begin and end 2 sigmas beyond the bounding parameters to get more accurate results. Therefore, we begin our table at -5 sigmas and end it at +5 sigmas. Notice that X represents the number of standard units that we are away from the mean. This is then followed by the four parameter values. The next column is the N'(X) column, the height of the curve at point X given these parameter values. N'(X) is calculated as equation.

Thus, since Equation gives us the first derivative to the integral, we define Equation as N'(X). Often you may not be able to derive the integral of a function, even if you are proficient in calculus. Therefore, rather than determining the integral to Equation, we are going to rely on a different technique, one that, although a bit more labor intensive, is hardier than the technique of finding the integral. The respective probabilities can always be estimated for any point on the function's characteristic line by making the distribution be a series of many bars. Then, for any given bar on the distribution, you can calculate the probability associated at that bar by taking the sum of the areas of all those bars to the left of your bar, including your bar, and dividing it by the sum of the areas of all the bars in the distribution. The more bars you use, the more accurate your estimated probabilities will be.

If you could use an infinite number of bars, your estimate would be exact. We now discuss the procedure for finding the areas under our adjustable distribution by way of an example. Assume we wish to find probabilities associated with every .1 increment in standard values from -3 to +3 sigmas of our adjustable distribution. Notice that our table starts at -5 standard units and ends at +5 standard units, the reason being that you should begin and end 2 sigmas beyond the bounding parameters to get more accurate results. Therefore, we begin our table at -5 sigmas and end it at +5 sigmas. Notice that X represents the number of standard units that we are away from the mean. This is then followed by the four parameter values. The next column is the N'(X) column, the height of the curve at point X given these parameter values. N'(X) is calculated as equation.

Assume that we want to calculate N'(X) for X at -3, with the values for the parameters of .02, 2.76, 0, and 1.78 for LOC, SCALE, SKEW, and KURT respectively. First, we calculate the exponent of skewness, C in equation as:

C = (1+(ABS(SKEW)^ABS(1/(X-LOC))*sign(X)*- sign(SKEW)))^.5

= (1+(ABS(0)^ABS(l/(-3-.02))*-1*-1))^5

= (1+0)^.5 = 1

Thus, substituting 1 for C in Equation :

Y= (1/(ABS((X-LOC)*SCALE)^KUKT+1))^C

= (l/(ABS((-3-.02)*2.76)^1.78+1))^1

= (1/((3.02*2.76)^1.78+1))^1

= (1/(8.3352^1.78+1))^1

= (1/(43.57431058+1))^1

= (1/44.57431058)^1

= .02243444681^1

= .02243444681

Thus, at the point X = -3, the N'(X) value is .02243444681. (Notice that we calculate an N'(X) column, which corresponds to every value of X). The next step we must perform, the next column, is the running sum of the N'(X)'s as we advance up through the X's. This is straight forward enough. Now we calculate the N(X) column, the resultant probabilities associated with each value of X, for the given parameter values. To do this, we must perform equation :

N(C) = (∑[i = 1,C]N'(Xi)+∑[i = 1,C-1]N'(Xi))/2/ ∑[i = 1,M]N'(Xi)

where,

C = The current X value.

M = The total count of X values.

Equation says, literally, to add the running sum at the current value of X to the running sum at the previous value of X as we advance up through the X's. Now divide this sum by 2. Then take the new quotient and divide it by the last value in the column of the running sum of the N'(X)'s. This gives us the resultant probabilities for a given value of X, for given parameter values. Thus, for the value of -3 for X, the running sum of the N'(X)'s at -3 is .302225586, and the previous X, -3.1, has a running sum value of .2797911392. Summing these two running sums together gives us 5820167252. Dividing this by 2 gives us .2910083626. Then dividing this by the last value in the running sum column, the total of all of the N'(X)'s, 11.8535923812, gives us a quotient of .02455022522.

This is the associated probability, N(X), at the standard value of X = -3. Once we have constructed cumulative probabilities for each trade in the actual distribution and probabilities for each standard value increment in our adjustable distribution, we can perform the K-S test for the parameter values we are currently using. Before we do, however, we must make adjustments for a couple of other preliminary considerations. In the example of the table of cumulative probabilities shown earlier for our adjustable distribution, we calculated probabilities at every .1 increment in standard values. This was for the sake of simplicity. In practice, you can obtain a greater degree of accuracy by using a smaller step increment. I find that using .01 standard values is a good step increment.

This is the associated probability, N(X), at the standard value of X = -3. Once we have constructed cumulative probabilities for each trade in the actual distribution and probabilities for each standard value increment in our adjustable distribution, we can perform the K-S test for the parameter values we are currently using. Before we do, however, we must make adjustments for a couple of other preliminary considerations. In the example of the table of cumulative probabilities shown earlier for our adjustable distribution, we calculated probabilities at every .1 increment in standard values. This was for the sake of simplicity. In practice, you can obtain a greater degree of accuracy by using a smaller step increment. I find that using .01 standard values is a good step increment.

A word on how to determine your bounding parameters in actual practice-that is, how many sigmas either side of the mean you should go in determining your probabilities for our adjustable distribution. In our example we were using 3 sigmas either side of the mean, but in reality you must use the absolute value of the farthest point from the mean. For our 232-trade example, the extreme left standard value is -2.96 standard units and the extreme right is 6.935321 standard units. Since 6.93 is greater than ABS(-2.96), we must take the 6.935321. Now, we add at least 2 sigmas to this value, for the sake of accuracy, and construct probabilities for a distribution from -8.94 to +8.94 sigmas. Since we want a good deal of accuracy, we will use a step increment of .01. Therefore, we will figure probabilities for standard values of:

-8.94

-8.93

-8.92

-8.91

+8.94

Now, the last thing we must do before we can actually perform our K-S statistic is to round the actual standard values of the sorted trades to the nearest .01. For example, the value 6.935321 will not have a corresponding theoretical probability associated with it, since it is in between the step values 6.93 and 6.94. Since 6.94 is closer to 6.935321, we round 6.935321 to 6.94. Before we can begin the procedure of optimizing our adjustable distribution parameters to the actual distribution by employing the K-S test, we must round our actual sorted standardized trades to the nearest step increment.

In lieu of rounding the standard values of the trades to the nearest Xth decimal place you can use linear interpolation on your table of cumulative probabilities to derive probabilities corresponding to the actual standard values of the trades. For more on linear interpolation, consult a good statistics book, such as some of the ones suggested in the bibliography or Commodity Market Money Management by Fred Gehm. Thus far, we have been optimizing only for the best-fitting KURT and SCALE values. Logically, it would seem that if we standardized our data, as we have, then the LOC parameter should be kept at 0 and the SCALE parameter should be kept at 1. This is not necessarily true, as the true location of the distribution may not be the arithmetic mean, and the true optimal value for scale may not be at 1.

The KURT and SCALE values have a very strong relationship to one another. Thus, we first try to isolate the -"neighborhood" of best-fitting parameter values for KURT and SCALE. For our 232 trades this occurs at SCALE equal to 2.7 and KURT equal to 1.9. Now we progressively try to zero in on the best-fitting parameter values. This is a computer-time-intensive process. We run our next pass through, cycling the LOC parameter from .1 to -.1 by -.05, the SCALE parameter from 2.6 to 2.8 by .05, the SKEW parameter from .1 to -.1 by -.05, and the KURT parameter from 1.86 to 1.92 by .02. The results of this cycle through give the optimal at LOC = 0, SCALE = 2.8, SKEW = 0, and KURT = 1.86. Thus we perform a third cycle through.

In lieu of rounding the standard values of the trades to the nearest Xth decimal place you can use linear interpolation on your table of cumulative probabilities to derive probabilities corresponding to the actual standard values of the trades. For more on linear interpolation, consult a good statistics book, such as some of the ones suggested in the bibliography or Commodity Market Money Management by Fred Gehm. Thus far, we have been optimizing only for the best-fitting KURT and SCALE values. Logically, it would seem that if we standardized our data, as we have, then the LOC parameter should be kept at 0 and the SCALE parameter should be kept at 1. This is not necessarily true, as the true location of the distribution may not be the arithmetic mean, and the true optimal value for scale may not be at 1.

The KURT and SCALE values have a very strong relationship to one another. Thus, we first try to isolate the -"neighborhood" of best-fitting parameter values for KURT and SCALE. For our 232 trades this occurs at SCALE equal to 2.7 and KURT equal to 1.9. Now we progressively try to zero in on the best-fitting parameter values. This is a computer-time-intensive process. We run our next pass through, cycling the LOC parameter from .1 to -.1 by -.05, the SCALE parameter from 2.6 to 2.8 by .05, the SKEW parameter from .1 to -.1 by -.05, and the KURT parameter from 1.86 to 1.92 by .02. The results of this cycle through give the optimal at LOC = 0, SCALE = 2.8, SKEW = 0, and KURT = 1.86. Thus we perform a third cycle through.

This time we run LOC from .04 to -.04 by -.02, SCALE from 2.76 to 2.82 by .02, SKEW from .04 to -.04 by -.02, and KURT from 1.8 to 1.9 by .02. The results of the third cycle through show optimal values at LOC = .02, SCALE = 2.76, SKEW = 0, and KURT = 1.8. Now we have zeroed right in on the optimal neighborhood, the areas where the parameters make for the best fit of our adjustable characteristic function to the actual data. For our last cycle through we are going to run LOC from 0 to .03 by .01, SCALE from 2.76 to 2.73 by -.01, SKEW from ,01 to -.01 by -.01, and KURT from 1.8 to 1.75 by -.01. The results of this final pass show optimal parameters for our 232 trades at LOC = .02, SCALE = 2.76, SKEW = 0, and KURT = 1.78.