A New Look at Criticality Analysis for Machinery Lubrication

Jim Fitch, Noria Corporation
Tags: maintenance and reliability

For decades, reliability professionals have been stressing the importance of prioritizing new maintenance thrusts and investments based on need. The word they like to use is "criticality."

For any given machine, how critical is its reliability? What if it failed suddenly and catastrophically?

What would be the consequences - lost production, expensive repairs, fatality? Criticality is the logical starting point for all reliability initiatives.

There are many different ways to enhance reliability and improve the quality of maintenance. The best options should be risk-based. After all, if it doesn’t reduce risk, why do it? Why spend an incremental dollar to enhance a machine’s reliability if it doesn’t yield multiple dollars in return?

There’s also priority. What should be done first, second and third, and what should not be done at all? How do you know which machines return big dollars for enhanced reliability, which machines return marginal dollars and which machines return nothing at all?

Once you understand machine criticality and a machine’s risk profile, you can work smarter to customize improvements. For guidance, look to the Pareto principle, which states that 20 percent of the machines cause 80 percent of the reliability problems. Which machines are these?

In addition, consider that 20 percent of the causes of failure are responsible for 80 percent of the occurrences of failure. Which causes are these? It’s about precision - precision maintenance and precision lubrication. It’s also knowing how to make wise, risk-informed choices.

I’ve written previously about the Optimum Reference State (ORS). This is the prescribed state of machine configuration, operating conditions and maintenance activities required to achieve and sustain specific reliability objectives. As stated, defining the ORS requires a definition of the specific reliability objectives for a given machine. Defining the reliability objectives demands an understanding of failure modes and machine criticality.

This reminds me of the plant manager who told me years ago that he decided the best way to solve his lubrication problems was to put synthetic lubricants in every machine. Do you think he got the result he sought? Does paying a premium for synthetics guarantee a premium return in machine reliability and maintenance cost reduction? Do synthetics offer forgiveness for negligent and shoddy maintenance? Is this wise decision-making?

Understand the Reliability-Risk Connection

The probability of machine failure needs to be inversely proportional to risk. There’s no better example than commercial aviation. Because the consequences of failure are extremely high (death), the probability of failure must be equally low (extreme reliability).

It is the only practical means to hedge risk. Those responsible for maintenance usually have little control over the consequences of failure (often limited only to early detection technology). However, reliability maintainers frequently have considerable control over the probability of failure. Indeed, you can use risk and criticality to develop a master plan for lubrication-enabled machine reliability. This will be the focus of this article.

Let’s begin with a list of common lubrication and oil analysis decisions (all attributes of the ORS) that can be customized (optimized) by understanding failure modes and machine criticality:

Lubricant selection, e.g., premium vs. economy-formulated lubricants
Filtration, including things such as filter quality, pore size, capture efficiency, location and flow rate
Lubricant preventive maintenance (daily PMs) and inspection strategy
Lubricant delivery method selection and use (e.g., circulating, auto-lube, mist, etc.)
Oil analysis (which machines are included and which are not?)
Oil sampling frequency (weekly, monthly, quarterly, never)
Laboratory and test slate selection
Oil analysis alarms and limits

All of these decisions and activities must be within the scope of the Optimum Reference State. For this reason, the importance of criticality should not be taken lightly. However, a practical means of assigning a value to criticality, customized to machine lubrication and tribology, has largely been elusive.

In fact, the fields of lubrication and tribology raise unique issues and questions related to criticality that aren’t typically addressed and aren’t common to other types of machinery.

Calculating Overall Machine Criticality

Overall Machine Criticality (OMC) is a risk-profile assessment that can be calculated to a single numerical value. The OMC is what you seek to know and control. The lower the OMC, the lower the risk. The OMC is the multiplied product of two factors: the Machine Criticality Factor (MCF) and the Failure Occurrence Factor (FOF).

The MCF relates to the consequences of machine failure, which combines both mission criticality and repair costs, while the FOF relates to the probability of machine failure. This probability is highly influenced by maintenance and lubrication practices and therefore is far more controllable.

Figure 1. Machine Criticality Factor (MCF) (Relates to the consequences of machine failure)

Machine Criticality Factor

A simple method for estimating the Machine Criticality Factor is shown in Figure 1. It requires an understanding of mission criticality and repair costs. While you could call these SWAGs (educated guesses), it is far better to guess using a logical method than to apply dartboard science or do nothing at all.

The MCF is scaled 1 to 10, with 10 corresponding to extreme criticality (high risk). You start by answering the question of mission criticality. Machines that are process-critical can accumulate huge production losses as a result of sudden and prolonged failure.

Extremely high mission criticality relates to safety (injury or death). In the event there is minimal business interruption or safety risk, there might still be high repair costs. Although many processes have redundant systems or standby equipment in the event of failure, these systems don’t mitigate the cost of repair, which can be millions of dollars in some circumstances.

Figure 2. Use this table to determine the Failure Occurrence Factor,
corresponding to the probability of failure.

The final consideration is the current or potential use of early detection technology (predictive maintenance) to annunciate alarms of impending or precipitous failure events. In such cases, both downtime and the cost of repair can be substantially reduced. Oil analysis (wear debris analysis), vibration analysis, bearing metal temperatures, proximity probes, motor current, etc., are all technologies that can offer real benefit in reducing the Machine Criticality Factor (see the adjusted scale at the bottom of Figure 1, which applies only if effective early warning systems are used).

Failure Occurrence Factor

As mentioned previously, the Failure Occurrence Factor relates to the probability of machine failure. This can be estimated from the machine’s failure history or statistical analysis of a group of identical machines. Machines that are inherently prone to failure (bad actors) get the highest rating on a scale of 1 to 10. High FOFs usually correspond to extreme and chronic conditions (see the table in Figure 2).

If you have good historical knowledge of the machine’s reliability, then use the descriptive rating scheme (Method A) under the “Machine Reliability History is Known” heading. If machine reliability is unknown or uncertain, go to the Reliability Elements Quotient (REQ) in Figure 3 (Method B). This is a scoring system that shows what causes and controls failure in lubricated machines. Most importantly, it reveals the fundamental strategy for optimizing machine reliability.

Reliability Elements Quotient

The REQ (Figure 3) tallies five critical elements to arrive at a customized composite score that will be used for the FOF in Figure 2. It gets down into the weeds of what causes a greater or lesser likelihood of machine failure. Let’s discuss these elements, starting at the top and working our way down.

Figure 3. An example of a pre-ORS Reliability Elements Quotient.

Machine Duty - Machine duty is a compilation of operational conditions that can induce premature machine failure. Machines that score high are those that run at or beyond rated loads (catalog loads), operate at high pressure, run at high speed, are exposed to high shock loads or duty cycles, and have other similar mechanical conditions.
Lubricant Quality/Performance - Good lubricant selection extends machine life, while poor lubricant selection shortens it. The benefit of good lubricants not only reduces friction and wear but can also protect the machine from corrosion, air entrainment, deposit formation and lubricant starvation. Therefore, lubricant quality directly influences the probability of failure.
Lubrication Effectiveness - More machines fail due to poor lubrication than poor lubricants. Lubrication relates to a range of activities and conditions including relubrication frequency, relubrication method, controlling lubricant levels, lubrication procedures, inspection methods and contamination control. For most plants, there is a large gap between doing lubrication and doing lubrication right.
Fluid Environment Severity - This is largely contamination control related. Contamination compromises the quality of the lubricant and the state of lubrication. It relates to what the machine is exposed to in its work environment (and the severity of exposure), plus the effectiveness of the machine in excluding and removing contaminants from the lubricant. Machines that are bombarded with dirt, water, corrosive materials, ambient heat/cold and process chemicals have high fluid environment severity.
Early Warning Systems - Early warning technology also impacts the probability of failure. This is done by catching incipient failures or root-cause conditions that are the precursors to failures. Oil analysis and comprehensive daily machine inspections are extremely effective at providing early warning to a host of problems.

The Reliability Elements Quotient is a scorecard that counts all five factors. For each element, the score range goes left to right, from very low (far left) to extremely high (far right). The numerical scale changes for each factor. The best way to use the REQ is to circle the assigned score for each factor and then write the score in the box to the right. The total score is tallied at the bottom. In the example, this total is 8, which designates high failure probability.

Overall Machine Criticality Matrix and
De-Risking Your Plant

The OMC is probably best viewed as a matrix. This is shown in Figure 4 with the MCF on the X-axis and the FOF on the Y-axis. The intersecting box reveals the OMC value (multiplication of the MCF and the FOF). The matrix has five color zones which are actually risk zones (the location of these zones on the grid can be customized). The highest risk is represented by the color red. Next is amber, followed by yellow, then green and finally blue (low risk).

Figure 4. The Overall Machine Criticality (OMC) matrix includes the Machine Criticality Factor
on the X-axis, the Failure Occurrence Factor on the Y-axis and five risk
zones, each represented by a different color.

Machines that fall in the amber or red zones are targeted for immediate remediation. This is best done by reducing risk values from one or more of the four “addressable” reliability elements (see Figure 3), which are subcomponents of the FOF. These are lubricant quality/performance, lubrication effectiveness, fluid environment severity and effectiveness of early warning systems.

Figure 5. This table shows how the ORS performance attributes directly influence the
elements in the Reliability Elements Quotient (REQ).

This is exactly the purpose of the Optimum Reference State. Figure 5 shows how key ORS performance attributes influence the addressable reliability elements that in turn influence Overall Machine Criticality. Everything is connected.

Additionally, failure modes and effects analysis (FMEA) can be used to assign priority to ORS attribute improvements. Read FMEA Process for Lubrication Failures for more detail.

Figure 6. This OMC matrix illustrates how improvements in lubricant
selection, lubrication methods, contamination control and oil analysis
brought a machine’s risk profile down from 40 to 5.

It makes sense that all reliability initiatives need to adjust (improve) the OMC. This typically involves a range of modifications to the ORS performance attributes as shown in Figure 5. These can include machinery modifications, lubricant selection changes, people skills improvements, procedure modifications and others.

“Optimizing” the modification master plan through FMEA and criticality analysis achieves the lowest risk profile or OMC at the lowest possible cost.

An example of this is seen in Figures 6 and 7. By making modifications to lubricant selection, lubrication methods, contamination control and oil analysis, the Failure Occurrence Factor improved from 8 to 1. For a machine that has a Machine Criticality Factor of 5, this brought the risk profile down from 40 (amber, high-risk zone) to 5 (blue, low-risk zone).

Figure 7. This post-ORS Reliability Elements Quotient shows how the Failure Occurrence Factor
improved from 8 to 1 after several modifications were made.

What It All Means

In a past article, I wrote about the Technology Adoption Cycle and the impediments to adoption of the Optimum Reference State. People, especially managers, “go with what they know.”

If they don’t understand risk and reward as it relates to machine reliability, they will shy away from acceptance and adoption. The state of lubrication continues “business as usual.” This is a curse indeed, but one that can be remedied.

Figure 8. Illustration of how bringing a machine to
the Optimum Reference State can reduce risk.

An excellent place to start is by developing a current risk profile of your critical machinery (pre-ORS). This reveals the opportunity and all the low-hanging fruit that no one has seemed to notice. Optimum is undefinable without understanding risk.

By using the tools described here, you not only can understand risk (criticality and occurrence), but you can also have a solid plan for remediation to de-risk your plant. Don’t fail to capitalize on the riches (collect the fruit) that can be gained by transformation to the Optimum Reference State.

A New Look at Criticality Analysis for Machinery Lubrication

For any given machine, how critical is its reliability? What if it failed suddenly and catastrophically?

Understand the Reliability-Risk Connection

Calculating Overall Machine Criticality

Machine Criticality Factor

Failure Occurrence Factor

Reliability Elements Quotient

Overall Machine Criticality Matrix and De-Risking Your Plant

What It All Means

Overall Machine Criticality Matrix and
De-Risking Your Plant