Structured Failure Analysis Strategies Solve Pump Problems

Heinz P. Bloch
Tags: maintenance and reliability

Repeat pump failures are clear indications that the root causes of problems were not found. Or, in case the problem cause is known, someone must have decided to do nothing about it. Either way, pursuing a structured failure analysis approach is necessary to solve problems. Guessing or “going by feel,” will never do. Fortunately, there are powerful methods we can use to uncover the root causes of repeat problems and these are highlighted in this article.

We define structured analysis as a repeatable approach which can be learned and employed by more than one person.¹

Once an accurate analysis is documented, remedial steps can be defined and implemented. Logic, common sense and a process of elimination are applied in this process. Suppose it could be established that a pump at location “A” suffers more failures than an identical pump at location “B.” One would determine what’s different on the pump in “A” and compare that to “B.” All differences are found in component deviations (or deviations from best practices) in one or more of the following seven cause categories:

Faulty Design
Material Defects
Fabrication and/or Processing (Machining) Errors
Assembly or Installation Defects
Off-Design or Unintended Service Conditions
Maintenance Deficiencies, including Neglect/Procedures
Improper Operation

Searching for additional cause categories will not add value because anything uncovered will, at best, be a subset of these seven. However, if one systematically concentrates on eliminating five or six of the seven categories in succession, one will arrive at the category where a deviation exists. That will make it possible to concentrate on understanding what led to the deviation.²

The pump person must pay close attention to the under-appreciated, generally non-glamorous “basics,” and do so before opting for the generally costly, sometimes unnecessary, and often unprofitable, high-tech solution. Pumps obey the laws of physics and there is always a cause-and-effect relationship. It follows that even seemingly elusive and generally costly repeat problems can very often be eliminated without spending much money.

An integrated, comprehensive approach to failure analysis starts by either describing the deviation, or by isolating the problem. Next, such an approach encourages, or even mandates, careful observation and definition of failure modes. The approach should employ pre-existing or developed-as-you-go checklists and troubleshooting tables. Many specific checklists have been provided by pump manufacturers and helpful generic varieties can also be found in a very large body of recent literature.^1-3

The “FRETT” Approach to Eradicating Repeat Failures of Pumps

From observation and examination of a failed pump part one identifies failure agent(s), realizing that there are only these four possibilities:³

Force
Reactive Environment
Time
Temperature

It is important to accept the basic premise that components will only fail due to one, or perhaps a combination of several, of these four failure agents. We use the acronym “FRETT” to recall these four agents.

Because there are no additional failure agents beyond “FRETT,” the troubleshooter must remain fully focused on these four agents. To re-emphasize by an example: A bearing can only fail if it has been subjected to a deviation (or deviations) in allowable force (“F”), or has been exposed to a reactive environment (“RE”), or has been in service beyond its design life (“T”), or was subjected to temperatures outside the permissible range (“T”).

The need for knowledge must not be overlooked. For instance, bearings can fail (overheat) when they are too lightly loaded. They will then skid - a topic that falls into the “elusive” category; yet, it is well-documented in up-to-date, inexpensive texts.3 In turn skidding is traceable to an inadequate force (“F”) and will manifest itself as a temperature excursion (“T”). Two of the four agents “FRETT” are at work in this example.

Each failure, and indeed each problem incident, is the effect of a causal event.

In other words, for every effect there is a cause; or, there is a reason for every failure. Here’s an example:

[Man Injured]-because man fell

[Man Fell]-because man slipped

[Man Slipped]-because there was oil on the floor

[Oil on Floor]-because a gasket leaked

By arriving at the word “gasket”, the cause-and-effect chain is focused at the component level. Once we have narrowed issues down to the component level, we know that one (or sometimes) two troublesome or unexpected or overlooked “FRETT” contributors must now be found. In this case, a gasket leaked. A gasket is clearly a component.

So:

[Gasket Leaked] - Must be due to: Force? Reactive Environment? Time? Temperature? We must check it out on the basis of data. Without data we would be guessing, and guessing does not lead to repeatable results.

Force: Too much - Why do we rule it in or rule it out? Not enough - Why do we rule it in or rule it out?
Reactive Environment: Wrong material selected for the medium transported in the pipe? Why rule it in, or rule it out?
Time: Was the same gasket left in place for many years? Why do we rule it in, or rule it out?
Temperature: Too high? Too low? Which one of these two (or perhaps why both) might be ruled in or can confidently be ruled out in a particular instance?

The pump person must take a very similar approach with pumps and virtually all other machinery. Recall that for every effect there is a cause; accordingly, there is a reason for every failure and we now have to find it:

[Pump is down] - Because the shaft broke

[Shaft broke] - Our failure mode inventory was consulted; let’s assume
we found the surface has fretting damage. That is a deviation from the norm.

[Surface damaged] - Because the coupling hub was loose. That would
explain the fretting damage

An analyst can now try to get to the root cause by remembering that all pump failure events fit in one or more of the seven cause categories listed above. If the coupling hub was found loose, what cause categories are likely and which ones can we reasonably rule out or eliminate?

This is how we would proceed:

Design Error? Unlikely, since other couplings are designed the same way and we have verified that they are holding quite well
Material Defects? No, since a thorough metallurgical exam checks OK
Fabrication Error? No, because the hardness checked OK; dimensional correctness was verified and had been recorded upon installation, 3 years ago
Assembly/Installation Defect? Suppose we have no data and therefore defer it for possible consideration later
Off-Design or Unintended Service Conditions? No; we rule it out
Maintenance Deficiencies (Neglect/Procedures)? No, since no maintenance (PM) is required on a coupling hub
Improper Operation? No, because we have ascertained operator activities were in accordance with our established standards

At this point, the analyst would get back to what needs to be investigated further or requires follow-up examination. This might be a good time to start compiling:

(A) A checklist of possible assembly errors: From discussions with maintenance personnel we might conclude that none apply in this instance

(B) A checklist of possible installation errors:

Force: - Could have overstretched hub
- Could have had insufficient axial advance on taper (insufficient interference fit)
Reactive Environment: None found; pump is at a standard plant location
Time: We ascertained that pump run length was not excessive; the hub had failed after just a few weeks of operation
Temperature: Suppose the coupling was heated to facilitate its installation.
- How was the heat applied? What tells us that the temperature was within limits?
- The temperature could have been too high (causing overstretch) or too low (not allowing enough dilation would result in insufficient axial advance)

In both of these examples, the pump failure analyst has to determine in which cause category there is a deviation from the norm, which item needs to be modified, and how this modification must be implemented to prevent a repeat failure. Data will be required to support any conclusions. With data one can define the root causes of a problem. Without data one can, at best, determine a probable cause.

Change analysis is one of the elements of any structured, comprehensive approach.¹ Change analysis seeks to identify what is different in the defective item as compared to an identical but unaffected item. The analyst probes into when, where and why the change occurred. The analyst then outlines a number of remedial action steps and will have to choose the steps that best meet defined objectives. These objectives must achieve the highest safety and the analyst may pick from a list that includes lowest life-cycle cost, present value, highest initial quality, meeting a certain industry standard, a deadline, etc.

The objective of aiming for lowest life-cycle cost usually makes considerable sense. Calculating this parameter would include the cost of staffing a pump selection or reliability review with dedicated, knowledgeable individuals. Life-cycle cost analyses must also include the value of downtime avoidance and MTBF extensions, as well as the value of avoided fire and safety incidents.

Recall that fewer pump failures translate into fewer fires and decreased insurance premiums. Failure avoidance creates goodwill and enhances a company’s reputation. Also, having to cope with fewer failures frees up personnel whose proactive activities avoid other failures, and so forth.

Making Good Choices

Needless to say, any choice we make will have its advantages and disadvantages. When pumps and pumping applications are involved, the most elementary choice requires opting for two out of three broad-brush deliverables: Good, Fast and Cheap. Take any two, but don’t expect to ever obtain all three.

Whenever we are confronted with the two-out-of-three choice, we should remember that for an analysis or repair to be good and fast, it probably will not be cheap. If we want it to be good and cheap, it probably will not be fast. And if we opt to pursue the fast and cheap paths, it probably will not be good. In case we are persuaded to go the fast and cheap route, let’s brace ourselves for repeat failures that can cost a small fortune and bring on all kinds of calamities.

Through the decades, we have come to realize that pump failure statistics are rarely very scientific. Still, they are experience-based and should not be disregarded. If your MTBF hovers around average, identify the repeat offenders and subject them to an uncompromising improvement program. In the hydrocarbon processing industry, about 7% of the pump population consumes 60% of the money spent on pump maintenance and repair. Getting at the root causes of failures on these 7% will save much money.

A strategy that involves rational thinking is solidly supported by a minute’s worth of looking up vendor documentation. A sound strategy also mandates respect for the simple laws of physics. It’s a strategy that results in failure cause identification; it will lead to future failure avoidance and will extend pump MTBF.

It can be said that all successful and cost-effective failure analysis methods represent structured approaches that give focus to an otherwise scattered search for the causes of equipment failures. Structured analysis approaches are repeatable; they aren’t hit-or-miss guesses. A successful approach guides the user/analyst through a sequence of steps; it invariably accepts the premise that all problems are ultimately caused by the decisions, actions, inactions, omissions or commissions of human beings. A successful approach is objective; it seeks explanations but does not tolerate compromises and excuses.

It is fitting, then, to conclude or recap by pointing to a very simple illustration, figure 1. This illustration tries to convey that many parameters interact to cause repeat failures in pumps. Many of these are classified as hydraulic issues and much work has been done to improve pump hydraulics.

Figure 1. Staying near the center of this "Reliability Curve" is a wise course of action

However, the majority of what we chose to call elusive failure causes is linked to mechanical issues. We have become accustomed to maintenance routines that rarely question the adequacy of a vendor’s design. Failure causes have become elusive because we, users and vendors, sometimes overlook or forget (and even disregard) the laws of physics.

In this article we have also alluded to process pump vendors that often merely furnish barely adequate designs. Users may unwittingly contribute to the propagation of marginally acceptable designs. They create the impression of being unwilling to pay for a superior design. Add also the possibility that vendors and pump manufacturers benefit from the sale of replacement parts and are in business to generate income.

We must not forget that pump manufacturers have right-sized, down-sized and economized the way they do business. Few (if any) of these organizational re-alignments benefit the user and a preponderance of repeat failures attests to it. Some vendors and manufacturers no longer employ process pump experts and diligent craftsmen. The user-purchaser may belatedly come to realize that he has become the manufacturer’s quality control inspector. Many allow hundreds of failures before they accept this fact. When they learn the hard way, they must allocate money to ward off this eventuality by suitable pre-delivery inspections.
Timely and competent up-front action by the owner-purchaser is one of the keys to failure avoidance. This up-front action includes development of detailed specifications for process pumps and for some of the key components that go into good process pumps. Once a process pump arrives in the field, it must be properly installed and maintained. To be effective, the facility must adopt work processes and procedures that harmonize with best-of-class thinking.

To avoid repeat failures, pump owner-operators must deliberately push certain routine maintenance actions into the superior maintenance category. Superior maintenance efforts will lead to (or are synonymous with) pump reliability upgrading.

In essence, the course of wisdom demands that we move away from “business as usual.” Before one can apply practical wisdom one must acquire knowledge and understanding.

References
1. Bloch, Heinz P. and F. K.Geitner; “Machinery Failure Analysis and Troubleshooting,” 3rd Edition, Butterworth-Heinemann Publishing Company, Woburn, MA, 1997, ISBN 0-88415-662-1

2. Bloch, Heinz P., and Allan R. Budris; Pump User’s Handbook, Fairmont Press, Lilburn, GA, ISBN 0-88173-517-5, Second Edition, 2006

3. Bloch, Heinz P.; “Pump Wisdom: Problem Solving for Operators and Specialists,” John Wiley & Sons, Hoboken, NJ, 2011, ISBN 978-1-118-04123-9

4. Barringer, Paul; “API Pump Curve Practices and Effects on Pump Life from Variability About BEP”; Weibull Analysis Course (see also Ref. 1, pg. 621)