Reliability-centered maintenance (RCM), as originally conceived, was not about maximizing reliability. Rather, the intent was to use reliability tools to minimize costly and ineffective scheduled maintenance. Indeed, the initial implementation of what became RCM massively reduced airline operators’ scheduled maintenance requirements and saved millions of dollars.
By: Peter Munson
Today, RCM is rarely implemented as the dynamic, cost-driven, continuous improvement program that it was intended to be. In most incarnations, it has become a toolkit of reliability-guided best practices under rule-of-thumb cost savings estimates. Demand-driven maintenance (DDM) corrects this trend, incorporating not only maintenance costs, but also the all-important lost production impacts, to put dollars and cents at the fore of your RCM program. While it is difficult to achieve and maintain momentum for ongoing purely reliability-focused analysis, bringing the cost element in literally changes the equation – and makes the value of the exercise explicit to plant leadership.
Demand-driven maintenance is the process of adapting equipment strategies and zero-based maintenance plans to align with dynamic plant operating conditions in order to meet forecasted production demand for any given time period. DDM, which can be modeled in T.A. Cook’s STRYVE tool, dynamically evaluates technical and cost data, including production impacts, in order to optimize the frequency and content of your scheduled maintenance actions. By modeling the balance between planned preventive maintenance actions and unplanned equipment failures, DDM allows you to adjust your maintenance strategies as equipment and market conditions change.
When reliability is not explicitly tethered to cost, it becomes a slogan at best. The value of keeping the throttles to the firewall on the production line is seemingly clear; the value of the next maintenance action on a piece of production critical equipment is anything but. Demand-driven maintenance provides the data to make risk-informed decisions based on the latest conditions.
Applying demand-driven maintenance:
- Enables data-informed, risk-based decision-making about setting and adjusting equipment maintenance strategies.
- Provides explicit benchmarks and assumptions to guide in-process reviews, validate successes, and analyze deviations.
- Empowers organizations to rapidly and confidently adjust their maintenance plan execution and projections as market forces change – a major competitive advantage
The Economic Roots of Reliability-Centered Maintenance
The 1950s and 60s saw massive changes in aircraft technology. Lessons of World War II, the new Cold War, and the space race drove a rapid transition from reciprocating propeller aircraft to jet transports, as well as leaps and bounds in system complexity and redundancy. The conventional wisdom of the time was that each system needed calendar-based preventive maintenance. The result was ballooning, yet curiously ineffective, maintenance man-hours per flight hour. Operators and the Federal Aviation Administration studied the problem in a series of maintenance strategy groups, which ultimately led to the birth of “reliability-centered maintenance.”
The new discipline was codified in a 1978 report titled “Reliability-Centered Maintenance,” written by United Airlines engineers F. Stanley Nowlan and Howard F. Heap. Their approach was pragmatic and economically grounded. “Scheduled maintenance is required for any item whose loss of function or mode of failure could have safety consequences.” It is also required “for any item whose functional failure would not be evident to the operating crew, and therefore reported for corrective action. … In all other cases the consequences of failure are economic, and maintenance tasks directed at preventing such failures must be justified on economic grounds.” They go on to note that “safety consequences can in nearly all cases be reduced to economic consequences by the use of redundancy” (xvii, emphasis added).
RCM then, per its foundational text, is driven almost wholly by economic considerations. This economic basis is nodded at in the literature of general RCM practice. There are general rules of thumb about percentages to be saved by RCM implementation and conceptual plots showing how to minimize total maintenance costs, but reliability and economics generally part ways when actual data is involved.
RCM is meant to maintain equipment at their inherent reliability and safety levels, to “obtain the information necessary for design improvement of those items whose inherent reliability proves inadequate,” and to achieve both of these goals at the minimum total cost (Nowland and Heap, xvi). Achieving these goals is a tall order, and one that requires continuous refinement of the program through analysis of operational, maintenance, and cost data.
Optimizing Total Cost
Demand-driven maintenance is based on the total cost curve (fig. 1). This curve is most often seen as a conceptual figure used to illustrate the value of RCM. It consists of two input curves, the sum of which is the total cost curve. First is the planned cost curve, which is the frequency-weighted cost of scheduled preventive maintenance actions. Second is the unplanned cost curve, which is the risk-weighted cost of corrective maintenance for pending or breakdown failures. The unplanned cost curve should also include lost production opportunity costs for production-critical equipment.
Planned costs are high on the left side of the graph, falling as the frequency of the task is reduced. At the same time, the unplanned cost curve rises to the right as the risk-weighted likelihood of failure increases with reduced maintenance. The minimum value of the total maintenance cost curve corresponds with the optimal frequency of the preventive maintenance task.
Here, it is important to note that the model balances different types of costs. Planned maintenance costs are directly controllable and relatively fixed. When you decide to do a preventive maintenance task at a 6-month interval, you know with good accuracy what that task entails, what it will cost, and that it is hitting your budget exactly twice a year. On the repair side, there are many more variables in play. You have a rough estimate of the probability of failure, but actual times to failure can vary widely around that probabilistic estimate. Likewise, repair costs and times vary much more widely than preventive maintenance costs. Finally, the amount of lost production opportunity depends on both the actual instances of failure and the actual time to repair. Thus, the left side of the total cost curve is more precisely determined than the right side.
There are several problems with the total maintenance cost curve as commonly used.
1. The total maintenance cost curve is most often offered as a conceptual representation of the value of RCM or PM optimization, rather than a plot of actual data.
2. When values are assigned, the unplanned maintenance curve rarely reflects the full impact of failure, which includes lost production.
3. Even should the lost production costs be included, they are not dynamically updated to reflect the latest asset data or, more importantly, changing operating context and production margins.
The curve tells a well-known story. Too-frequent PMs (left side of the curve) can be nearly as costly as running equipment to failure (right side of the curve). In between, there is a sweet spot where the optimal frequency of PMs yields the lowest total cost. The power of this method, however, is getting beyond the realm of concepts and conducting data-based optimization of maintenance actions.
In the example in figure 1, which assumes a $250 cost per PM and $2,500 repair cost for unplanned maintenance, the optimal maintenance interval is every 6 months.. At this interval, annual cost for preventive maintenance is $500. The expected annual repair cost is $247. This means that roughly one in ten of these items of equipment is expected to require the $2,500 repair.
Conducting PMs just a bit too frequently at every 4 months results in a 22% cost increase – meaning that chasing zero failures is more costly than accepting a low failure rate in this case. Extending the PM interval from 6 to 9 months will cost 64% more than optimal (see fig. 2). The point here is that the cost savings are in the details. The savings are significant, but they cannot be won through rules of thumb and conceptual charts.
It should be noted that, before including PM actions in this sort of analysis, they must be reviewed and verified as value added. Doing DDM analysis will ultimately provide data to prove that a PM task is intrusive, ineffective, and causing more harm than good, but this lesson can be identified much more quickly through good preventive maintenance optimization.
Getting Down to Details
Starting a demand-driven maintenance program requires data. Some organizations with existing reliability programs may already have some refined data, while others will be starting from scratch. Initially, modeled or benchmark data can be substituted if quality historical data is not available. Inputs for a given equipment item include:
- Preventive and/or predictive maintenance tasks, including interval and cost data. (For predictive maintenance, the cost of any corrective action based on inspection findings is needed, as well
- Mean time between failure)
- Failure pattern (e.g., Weibull beta value)
- Effective age (age since installation or last major overhaul)
- Rate cut on failure
- Rate cost per hour
- Mean time to repair
- Failure task cost (labor and materials)
The devil – and the savings – is in the detail of these inputs, so regardless of your starting point, ongoing refinement of your data will produce the best results. Unlike many “science projects” which quickly run out of steam, DDM provides the ability to continuously evaluate actual cost performance against the model, which should keep any manager’s interest. By continuing to refine both the model’s assumptions and the performance gaps that the model highlights, you have a powerful and sustainable continuous improvement tool.
As we have shown, DDM can drive significant savings by optimizing intervals based on maintenance costs alone. The real power of the tool, however, is unlocked when you integrate lost production opportunity factors. Maintenance costs are relatively static, whereas production volumes and margins can vary widely. DDM enables you to rapidly adjust to changing market conditions. Just as important, the flexibility of the model allows you to make data-informed decisions, rather than winging it based on conventional wisdom that no longer matches market reality. It is intuitive that production critical assets should get more preventive attention, but this is an economic question that cannot be quantified without bringing in the lost production cost driver via DDM.
Figure 3 shows the effect of lost production costs on the total cost curve. In this, we take the previous example (left side of the figure) and add in lost production costs of $20,000 per failure (right side of the figure). The DDM model provides a data-backed answer to the question of just how much more preventive maintenance you should allocate to a production critical asset. In this case, the answer is shortening the interval from 6 months to 3. Annual preventive maintenance costs are doubled, from $500 to $1,000. Expected repair costs are cut in half – from $247 to $122 (roughly 5% annual failure probability). In sum, annual maintenance costs are 50% higher than the previous example. At this level of maintenance effort, we expect $975 in lost production costs. If we stuck to the 6 month maintenance interval recommended in figure 1, we would be facing a lost production risk of $1,973 – nearly double. As this example demonstrates, you do not have the full picture on maintenance optimization if lost production opportunity is not factored in.
Adding lost production risk to the equation is what enables maintenance to become truly demand-driven. In figure 4, we see that DDM enables maintenance intervals to be dynamically optimized as margins and production levels change. The progression starts with our base case from figure 3 on the left, which is optimized at a 3-month maintenance interval. A 30% reduction in product margin lowers the total cost curve and extends the optimal maintenance interval to 4 months. On the right, not only is the margin reduced, but due to reduced demand, the equipment is running only 60% of the time. In this example, the optimal interval is extended to 6 months.
These may seem like minor adjustments, but the underlying economics are significant. Extending the interval from 3 to 4 months cuts PM costs by 25% and overall maintenance costs by 23%. With lower margins, the lower lost production risk warrants this reduction in maintenance effort, yet the model provides guidelines to guard against slashing maintenance to the point that would compromise reliability when demand peaks again.
In the rightmost example with both reduced margins and production, extending from the original 3 month interval to 6 months represents a 40% reduction in overall maintenance costs. Even with the production loss factored in, sticking with a 3-month interval would be 25% more costly than extending to 6 months.
Putting Operations and Maintenance on the Same Page
It can sometimes feel like Operations and Maintenance are working at cross purposes. Operations wants to run equipment full-out as continuously as possible and may turn a skeptical eye toward the value of proactive maintenance interventions. Maintenance wants to protect the equipment and occasionally take it offline – and spend money – to prevent breakdowns. These goals feel like cross purposes because organizations are forced to guess at the balance between preventive maintenance and production. The result is often too little preventive maintenance, too many breakdowns, and too much lost production – which makes Operations want to run the machines all the harder to make up lost ground.
DDM and T.A. Cook’s STRYVE modeling tool give Operations and Maintenance the common, data-driven picture they need to pull in the same direction. Ultimately, implementing DDM across most or all assets can change the way an organization plans, budgets for, and executes maintenance. The explicit linkage of economic benefits to reliability data will bring and sustain leadership attention on collecting and acting on quality data. This effort can bring Operations and Maintenance decision-makers into much more dynamic and adaptive cycles of improvement. An effort that starts off with benchmark data because a plant does not have quality maintenance and reliability data can ultimately lead to a data-rich organization that is using digital twinning and other advanced analytical techniques to fine-tune their maintenance planning and budgeting.
These bold statements are founded on solid precedent. Demand-driven supply chain (DDSC) has already shown the realm of the possible, taking consumer industries with a much more complex problem set from the realm of historical-plus spreadsheet forecasting to mind-boggling data-driven optimization in recent years. The key in this effort was linking profitability to the DDSC exercise, not only driving sustainment of the effort, but producing results that early proponents never would have believed possible. Managing and maintaining equipment for optimal availability is to the asset-industries what supply chain management is to retail industries. In both cases, there are ever-changing weak links in the chain that feed the profit machine. Data enables the competitive advantages that only ongoing optimization can achieve.