Advantages of a Unified Condition Monitoring Approach

Jim Fitch, Noria Corporation
Tags: maintenance and reliability

For most plants, condition monitoring consists of multiple technologies that are cobbled together in an attempt to enhance machine reliability. Clearly, these efforts are founded in good intentions, and many such programs enjoy considerable success. Still others languish due to a lack of symmetry and central focus. Money is spent and efforts expended, but results are too often disappointing.

Condition monitoring requires a proper foundation from understanding and aligning criticality and failure mode analysis. Alignment greatly helps to optimize deployment of activities and spending to minimize waste and redundancy. Alignment also keeps the maintenance and reliability professionals on the same page by providing a clear understanding of what’s being done and why.

This column is Part 3 on this topic. In Part 1 “A New Look at Criticality Analysis for Machinery Lubrication”, I discussed the concept of Overall Machine Criticality (OMC) and its importance on a wide range of decisions relating to machinery lubrication and oil analysis. When optimized, these decisions define the Optimum Reference State (ORS) needed to achieve the desired level of machine reliability. It is intuitively obvious that smart maintenance decisions require a heightened sense of both the probability and consequences of machine failure.

Figure 1. The relationship between machine criticality and lubricant criticality

Part 2, “Don’t Forget Lubricant Criticality When Designing Oil Analysis Programs”, explained how there are consequences when lubricants fail that are, at least initially, independent of machine failure. These include the lubricant replacement costs (material, labor, flushing, etc.) and associated downtime.

These costs can exist in the presence of a perfectly healthy and operating machine. Of course, lack of timely replacement of a defective lubricant will invariably lead to dire machine failure consequences. For some machines, these cascading events can produce enormous collateral damage and financial hardship to an organization.

To my knowledge, the method presented in this article is the first truly rationalized and unified approach to condition monitoring based on both machine and lubricant failure mode ranking and criticality analysis.

The condition monitoring methods and technologies being integrated include oil analysis (real-time, portable and laboratory), field inspections (advanced methods providing frequent and comprehensive assessments), and other portable and real-time condition monitoring technologies (thermography, vibration, etc.).

This approach is important enough that it deserves a name: Unified Condition Monitoring (UCM). What makes UCM different from other strategies is the following:

Periodic condition monitoring technologies and methods for each machine are integrated and optimized.
Periodicity for each technology and method is optimized.
The method of optimization is based on criticality analysis and failure mode ranking.

Failure Mode Ranking

Ranking failure modes helps customize and optimize the condition monitoring strategy. This is another way to say gaining the greatest benefit for the least possible cost and risk. According to the Pareto principle, the top 20 percent of failure causes are responsible for roughly 80 percent of the failure occurrences. It only makes sense to focus resources and condition monitoring on the top 20 percent.

Failure modes and failure root causes are closely associated and are often the same. For instance, abrasive wear may be the failure mode, but particle contamination is the root cause. Ignorance, culture, insufficient maintenance and poor machine design are all possible pre-existing conditions that individually or collectively lead to contamination. Because you can always search for deeper levels of “cause,” for simplicity, the terms “failure mode” and “root cause” are used interchangeably.

Figure 1 shows the relationship between machine and lubricant failure. On the left are common causes (failure modes) of lubricant failure (LFM) and machine failure (MFM). For example, heat, aeration and contaminants are known to be highly destructive to lubricants.

In a similar sense, overloading, misalignment and contamination can abruptly cause a machine to fail. Note how contamination not only can fail a lubricant but also can fail a machine directly without the need to harm the lubricant first.

It is best to not only list failure causes but also to rank them in terms of probability and severity. This helps allocate resources by priority. From lubricant and machine failure come specific consequences, which are listed on the right in Figure 1.

Again, these consequences are mutually exclusive. Lubricant failure consequences include oil replacement costs, downtime during the oil change, labor to change the oil and flushing costs. Machine failure consequences relate to safety, spare parts, labor to repair and downtime (e.g., production losses).

The Overall Lubricant Criticality (OLC) defines the importance of lubricant health and longevity as influenced by the probability of premature lubricant failure and the likely consequences (for both the lubricant and the machine).

The Overall Machine Criticality (OMC) defines the likelihood and consequences of machine failure alone. The methods for calculating OLC and OMC were previously discussed. Like many such methods, the approach is not an exact science but nevertheless is grounded in solid principles in applied tribology and machine reliability.

Building the Surveillance Planning Table

Figure 2 shows an example of a Surveillance Planning Table (SPT) for a given machine, e.g., a reciprocating compressor. The SPT is used to define the degree of surveillance (oil analysis and inspection, for instance) for each of the ranked failure modes.

These failure modes are ranked from one to seven on the left of the SPT. Tribo-analysts and reliability professionals are best suited to assign this ranking for individual machines. The list shown in Figure 2 is hypothetical for the compressor example to illustrate how to build an SPT.

Across the top is the OMC range (see Part 1 for calculating the OMC score) from 10 to 100. A score of 100 represents high criticality from the standpoint of probability of failure and consequences of failure. In this example, the arrow shows the compressor to have an OMC score of 80.

There are seven color-coded condition monitoring zones corresponding to time-based surveillance levels. These are also represented by the designations CM1 to CM7. The surveillance levels range from CM1 (real-time) to CM4 (monthly) to CM7 (never). For an OMC of 80, the condition monitoring zones range from CM1 to CM4.

The only things that change from machine to machine using the SPT are the failure mode rankings and the placement of the arrow corresponding to the OMC score. Otherwise, all SPTs look exactly the same. For instance, the compressor has particle contamination assigned to the highest ranked failure mode. With an OMC of 80, the intersecting box shows a CM1 condition monitoring zone. This relates to real-time surveillance.

You can see in Figure 2 that “real time” refers to the use of real-time sensors (A) and monthly oil analysis (D) from the test and inspection categories list. There are numerous online particle counters on the market that could be conveniently used for CM1 surveillance. On the other hand, water contamination merits a CM2 surveillance level. This can be done using daily inspections and monthly oil analysis.

Figure 3 presents a similar SPT but specifically for the lubricant. The Lubricant Failure Mode ranking is on the left, and the Overall Lubricant Criticality is across the top. In this case, the OLC score is 70, which has condition monitoring zones ranging from CM2 to CM4.

Combining Machine and Lubricant SPTs

Figure 4 shows the SPTs for both the machine and lubricant in a single unified table. The failure modes for both MFM and LFM are listed across the top with the corresponding condition monitoring surveillance zones just below. Down the left are various oil analysis tests and inspections that satisfy the condition monitoring requirements for each failure mode. This list was developed based on the available and required technologies and methods. The legend lists specific surveillance types (e.g., lab testing or inspection) and periodicity (frequency of use).

Figure 4. Surveillance Planning Table for the Machine and Lubricant

By referring to the condition monitoring zones under each failure mode, the surveillance type(s) and periodicity can be properly selected and optimized. For instance, under particle contamination is the R designation (for real-time) and L4 (for monthly laboratory analysis). Under aeration and foam is the F3 designation for weekly field inspections of the compressor’s sight glass.

Misalignment is monitored using multiple methods including elemental analysis of wear metals (monthly laboratory analysis), ferrous density analysis (also monthly), wear particle identification (on exception based on elemental analysis and ferrous density), magnetic plug inspections (weekly) and vibration analysis (weekly). These tests and inspections can easily be rationalized and streamlined to improve efficiency and reduce costs.

Figure 5. Condition Monitoring Work Plan

All of the tests and inspections can be condensed into a single condition monitoring work plan for the compressor, as seen in Figure 5. The tests and methods needed are clearly shown as well as the frequency for the four main monitoring categories: real-time sensors, field inspections/tests, onsite lab testing and full-service lab testing. This work plan is the final product of the UCM strategy.

Using the Unified Condition Monitoring Model

From the preceding discussion, you can see how nearly all decisions related to periodic condition monitoring depend on four factors: Overall Machine Criticality, Overall Lubricant Criticality, Machine Failure Modes and Lubricant Failure Modes.

These factors influence what to test, when to test and how to test. In relation to oil analysis, these factors affect where to sample, how often to sample, which tests to conduct, which alarms to set and the general data-interpretation strategy.

UCM is an overarching principle that can be adapted for many applications and uses in the reliability field. The more you know about machine-specific failure modes and criticality, the better you can plan and optimize condition maintenance across multiple technologies within both predictive and proactive schemes.

On the surface, these foundation pieces can seem time-consuming and arduous, but in the long run you gain by reducing costs and optimizing the benefits. These are solid and wise reliability investments indeed.