Risk Management using Failure Mode Effects Analysis - FMEA

The FMEA (Failure Mode and Effects Analysis) has been generally used in Manufacturing Industries to determine the process failures, identify the causes of the failures, the impact of the failures and define mitigation strategies to reduce the failures and improve the overall end to the process. The concept of FMEA has been used for existing processes, for new processes. This article describes the application of FMEA for existing operational risk processes.
 

How does Risk Management process are different from FMEA?

In general, the risk management process involves following high-level steps
  • Define the business context and scope of the risk assessment domains
     
  • Risk identification
     
  • Risk Assessment / Analysis 
     
  • Risk Response and Mitigation
     
  • Risk Monitoring and reporting to senior management

The FMEA is different from other risk management processes as it takes process failures as the starting point. This includes

  • Define business boundaries.
     
  • Identify the failure events on the business processes.
     
  • Perform RCA – Root Cause Analysis and identify the causes which are responsible for the failures.
     
  • Identify the Impact and likelihood of such failures.
     
  • Identify and implement the controls to reduce the failures.
     
  • Monitor and perform the periodic review of the controls in a continuous manner.
The core objectives of all the risk assessment process are to identify the risk and mitigate the risks with an appropriate cost effective controls to an acceptable level.
 

FMEA process

The FMEA process can be used for current operational processes to improve it further and also for the new processes to be implemented. The application of FMEA has sequential steps and same has been listed below for existing processes:
 
  • Define the boundary for the operational processes to be evaluated.
     
  • Collect current and historic high risk and medium risk IT Infrastructure failure events from Incident Management database.
     
  • Define the business owner / team for each such failures.
     
  • Understand the existing end to end operational process.
     
  • Perform Root Cause Analysis and identify potential causes that may create IT Infrastructure failures. Also, determine the sub causes that may also contribute to such failures.
     
  • Determine the Impacts due to such failures and also identify the impacts in the downstream applications / business processes.
     
  • Identify the likelihood of such failures.
     
  • Determine the degree of risk level using likelihood and Impact factors. The impact can be a monetary impact, regulatory penalties, business impact and brand reputation impact.
     
  • Prioritize the risk based on the degree of risk level.
     
  • The RPN (Risk Priority Number) can also be used to determine the risk priority (RPN = Severity X Occurrence (likelihood) X Detection capability of the failures).
     
  • Document all the risk analysis data into Risk Register and assign the owner for each risk.
     
  • Determine the controls to be implemented to mitigate the failures and implement the controls after performing cost-benefit analysis provided the cost of impact is more than the cost of controls.
     
  • Define Key Risk Indicators and alerts to detect such failures earlier and predict the potential risks.
     
  • Monitor the controls on regular basis to ensure that the implemented controls meet its intended objectives.
     
  • Finally, update the Risk Register for each mitigated risks.

Conclusion

The FMEA process will provide more reliability to an operational process as entire failure processes have been analyzed in a detailed manner that includes a list of associated primary and secondary causes that may lead to such failures.
 
Authored by - Ananda Narayanan G
TCS Enterprise Security and Risk Management
Rate this article: 
Average: 2.3 (3 votes)
Article category: