Using statistics to set oil analysis alarms and limits is a powerful and time-saving tool that allows the analyst to focus attention on those machines that are believed to be in trouble. Typically, this approach is used to set level limits, which act as flags or trip wires alerting the analyst to nonconforming results. Level limits simply mean setting a maximum or minimum acceptable level for oil analysis parameters such as iron, copper, lead, etc.
In setting statistically derived level limits, it is usually assumed that that the data is normally distributed (follows the familiar bell-shaped distribution curve), allowing caution and critical levels to be identified as a function of the mean and standard deviation of the data. For a large enough data set, this is generally the case. For more on using statistics to set level alarms, see “Use Statistical Analysis to Create Wear Debris Alarm Limits,” by Jonathon Sowers, in the November-December 2001 issue of Practicing Oil Analysis magazine.
This article illustrates how these same powerful statistical principles can be applied to the calculation of precise rate-of-change alarms, which can offer far greater sensitivity when interpreting historical oil analysis data.
Calculating Rate-of-Change Alarms
Rate-of-change, as the name implies, focuses not on the actual measured level of the parameter, but rather the rate at which the parameter is moving over a given period of time. With some work, rate-of-change analysis can be much more sensitive than level-based alarms because the analyst tracks movement of the target parameter relative to time, miles or cycles.
Rate-of-change analysis is an evaluation of the slope of the data (Figure 1). In this model, the slope is simply the rise divided by the run, or the change since the last reading.
For example, if a machine generates 100 ppm of iron during a 100-hour run period, the slope is one (1.0) and it can be said that the machine has an iron production rate of 1 ppm/hour. If the same machine generates 160 ppm of iron during the next 100-hour run period, the slope is 1.6, or an iron production rate of 1.6 ppm/hr. Similarly, if the iron level, for some reason decreases by 100 ppm during the next 100-hour run period, the slope is negative one (-1.0). So, slope calculations can be either positive or negative, depending on whether the measured parameter goes up or down.
The first stage in a statistical analysis is to calculate the mean. However, because the slope can be either positive or negative, this can create a problem when calculating the average using the standard formula for the arithmetic mean. To circumvent this problem, a slightly modified formula using the absolute value of the slope is required. To determine the absolute value of an observation, simply disregard the sign of the slope, so that a slope of +1.0 or -1.0 is considered to have the same absolute value of 1.0.
Using the absolute value to calculate the average and standard deviation seems intuitively unsettling to many people. The natural reaction is that the analyst is interested in determining if the target parameter is moving up or down, so the sign has meaning. However, keep in mind that statistical alarms are simply a mathematical tool, designed to help the analyst differentiate normal variations from an abnormal event. Once the alarm sounds, so to speak, the mathematical tool prompts one to investigate and determine the nature of the problem, including the direction of movement.
For the purpose of setting oil analysis alarms, it is also necessary to calculate the standard deviation from the target oil analysis parameter slope observations. While the mean defines what might be considered a normal reading for a given parameter, the standard deviation is simply a way of determining the probability that the parameter will vary from the arithmetic mean. Simply put, the larger the standard deviation, the wider the expected variation in the slope. A large standard deviation will result in a wider range of acceptable values, but with less precision as to what can be considered normal.
The calculation of the standard deviation for rate-of-change slopes observations is exactly like the standard formula except, again, one must use the absolute value of the observation.
Some machines will normally have increasing levels for some oil analysis parameters, such as the iron concentration in an unfiltered gearbox. In these instances, rate-of-change may be the only reasonable way to judge a machine’s condition because the level is in a constant state of flux. The statistically derived rate-of-change method accounts for this fluctuation. The table in Figure 2 identifies a clearly escalating iron level, which when applying level limits, may indicate cause for concern. However, the corresponding rate-of-change trend plot is relatively flat because the change per hour remains fairly stable at just under 0.1 ppm per hour, indicating that nothing significant has changed with this gearbox.
Figure 2
Going one stage further, apply the statistical model to calculate the lower and upper caution and critical limits based on the mean and standard deviation of the slopes. The limits in Figure 2 were set at ± one and two standard deviations from the mean for caution and critical respectively. As the figure illustrates, only the 1002- and 2240-hour readings are cautionary, and in fact only the 1002-hour reading is a high caution, indicating a potential problem.
Setting alarms at the one and two standard deviation level is very conservative. Setting alarms at ± two and three standard deviations for caution and critical respectively is much more liberal, and requires higher confidence in the data, typically as a result of a larger data set.
For statistical analysis, the larger the data set, the greater the precision to which limits can be set. The rule of thumb has always been that 30 observations is the point at which the sample begins to accurately estimate the population mean. Don’t wait until you have 30 observations to begin, simply recognize tht your confidence will grow as the data set becomes larger. For smaller data sets, a more sophisticated, but not terribly difficult, approach can be used to define the confidence interval using Student’s t tables. This is beyond the scope of this article, but can be applied in much the same way as the simple normal population approach outlined here.
Statistically derived rate-of-change oil analysis limits are very effective and easy to apply. Often, these alarms are more sensitive than simple level limits, but are most effective when used in conjunction with level limits. Here are some final tips for making this strategy work:
Because this method is based on the change in the target parameter relative to a given change in hours, miles or cycles, it is critical that hours, miles or cycles accurately recorded. So, get into the habit of noting this information each and every time a sample is collected. When using a miles or kilometers basis, be sure to check the odometer at the time the sample is drawn and clearly record the findings. Likewise, for stationary equipment record the precise operating hours at the time the sample is drawn. If the machine does not have an hour meter, it may be necessary to install one if the data can’t be retrieved from the control system software.
Like other statistically derived alarms, a larger number of data points will produce higher quality inferences, but go ahead and run the statistics once four or five readings have been recorded. Simply improve the estimate of the mean and standard deviation over time as data is acquired.
It is important to effectively group sampling points according to machine type, sample location, application, oil type, operating environment and other differentiating factors. Don’t simply lump everything together. In fact, two identical machines performing the same function in the same place might produce significantly different readings. Where possible, it is desirable to perform statistical analysis on a machine-by-machine basis.
Edit (remove) unusually high or low readings regardless of the cause of the abnormality. These readings increase the estimate of both the average and standard deviation, reducing the sensitivity of the calculated alarms. By and large, data editing is a subjective activity. While more sophisticated techniques are available for identifying outliers that require follow-up, one can usually (and quite effectively) remove the data points that look wrong without compromizing the integrity of the data.
Any systemic changes (filter upgrade, lubricant upgrade, increased precision in alignment, etc.) will affect the slope and variability of the data. Because statistics always implies looking backward and using historical data to make a judgment call about the future, these systemic changes cloud one’s view, and statistical analysis must begin again after such a change.
Oil analysis software typically accommodates multiple alarms for a single parameter. Statistically derived rate-of-change alarms are most effective when coupled with goal-based proactive alarms or statistically derived level limits.