An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering

Page created by Travis Roberts
 
CONTINUE READING
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
An Approach to Integrated Problem Solving

      Maretha Price
      Sasol Secunda Operations: Reliability Engineering

                                                          Copyright ©, 2018, Sasol

SMARTER approaches in Asset Management                    www.saama.org.za
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
Why Integrative Problem Solving?

•   RCA is often mistaken with problem solving – it is a tool to use in problem solving!
•   This leads to inadequate use of RCA in the larger context.
•   Latent cause = root cause… all types of causes are important!
•   Understand the entire picture and integrate findings to understand the entire failure mechanism.
•   Knowing how to use the various methods and how to integrate methods is key.
•   Often a single method does not lead to a successful failure investigation.
•   Referring to RCM, RCA can be a valuable pro-active tool:

         •   Understand consequences
         •   Understand degradation mechanisms
         •   Conduct a proper failure modes & effects analysis (FMEA) and update asset strategies

                                                                                                Copyright ©, 2018, Sasol

                                                                            SMARTER approaches in Asset Management
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
Fault Finding vs. Root Cause Analysis

Fault Finding
•   Fault finding is when you find the reason for a technical error.
•   To find the component that failed / causing the problem in equipment / systems.
•   Physical Cause – technical and specialized knowledge is needed.
•   Normally you can identify one specific failure.
•   Technical orientated – Basic and Specialized Technical knowledge required.

Root Cause Analysis
•   Root Cause Analysis looks at the entire package and underlying reasons that should have
    prevented the incident / failure from occurring.
•   Normally more than one cause – latent, systems etc.
•   Knowledge of the law, procedures, business, policies, people, maintenance, etc. required.

                                                                                              Copyright ©, 2018, Sasol

                                                                          SMARTER approaches in Asset Management
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
When to do a Root Cause Analysis

•   Injuries/Safety
•   Process upsets (trips, blockages, bottlenecks, etc.)
•   Equipment failures (Process/Non-process Equipment)
•   Process Safety Incident (Fires, Product releases or Explosions)
•   Environmental Incidents
•   Health related Incidents
•   Security Incidents
•   Quality related incidents

                                To contribute to Plant Stability & Reliability
                                                                                          Copyright ©, 2018, Sasol

                                                                      SMARTER approaches in Asset Management
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
Typical Steps in an Investigation

  1   Problem Statement.

  2   History and Basic Conditions.

  3   Data Gathering and Multi-Disciplinary Interpretation.

  4   Sequence of Events.

  5   Generate all Possible Causes.

  6   The Formal RCA Evaluation against a Defined Method.

  7   Classify the Causes.

  8   Generate and Document Solutions and Recommendations.

                                                                                  Copyright ©, 2018, Sasol

                                                              SMARTER approaches in Asset Management
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
The Importance of a Good Problem Statement

•   List Specific Date and Time
•   List Conditions when incident occurred.              What?
•   Clarify Location of the incident.                          When?
•   Capture Initial observations                         How?
On Friday Evening (31 July 2016) around 20:14, high bypass pond levels were recorded. The booster pumps
were commissioned and the bypass pond level was returned to normal.

      People like to be informed –
      always show them where they are
      in the process and what to expect.

                                                                                                 Copyright ©, 2018, Sasol

                                                                             SMARTER approaches in Asset Management
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
Understand the History and Impact

•   Explain the function of the process and the
    function of the main equipment.
•   Clarify specific products, temperatures and
    pressures.
•   List critical control measures
•   List conditions that will lead to failures.
•   List Equipment / System criticality

                                                  This criticality is based on …

                                                                                   Copyright ©, 2018, Sasol
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
Data Gathering and Multi-Disciplinary Interpretation

•   Capture Equipment History – Installation, Modifications, Strategies, etc.
•   Review previous incidents and findings – ensure next step execution.
•   Capture all abnormalities, statistics and observations per discipline.
•   Discussion in a team to understand how the various findings are interlinked to cause the incident.

    Sometimes asking
    “WHY?” can go a long way …

                                     Don’t underestimate
                                     the value of a photo!

                                                                                         Copyright ©, 2018, Sasol
An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
The All Important Sequence of Events

•   A sequence of events gives a chronologic account
    of events.                                                     DATE         TIME                                               EVENT
                                                                                          A trip was logged due to a loss of the uninterrupted power supply. Instrumentation
•   Often contains the “hidden traces” of the solution to    12 December 2017
                                                             (Tuesday)
                                                                                + 12:22   department was called, as the operator reported that all HMI graphics were lost and
                                                                                          that plant status cannot be assured. Acid dosing control on another unit was also lost.
    a problem.                                                                            Instrumentation department reviewed and check all alarm logs and received indication

•   It is important to be factual and only list items that                      + 12:30
                                                                                          that there is an apparent loss of power (mains) on the specified units. The cabinets
                                                                                          and power supplies were physically opened. It was observed that the power from the

    can be proven as fact.                                                                mains were confirmed to be in an off position. All other controllers and modules
                                                                                          (inverter/rectifier) was also in the off position.     All distribution boards were also
                                                                                          confirmed to be off.

•   Start the sequence of events from the closest                                         Electrical department was called out and found that the substation was in an alarm
                                                                                          state. It further indicated a rectifier / inverter offline as a result of power lost from the
    similar occurrence as the investigation in question.                        + 12:50   uninterrupted power supply. The rectifier / inverter went offline as per the design of
                                                                                          this circuit and stopped working when the UPS battery ran flat. The functionality of the

•   If the incident is truly first in its nature, the                                     rectifier / inverter was validated and confirmed to be in a working condition.
                                                                                          Electrical department proceeded to check the substation and confirmed that the power
                                                                                + 13:00
    equipment history and modifications may from the                                      was intact and that there was no loss of power.
                                                                                          All systems and components were reset and production proceeded to start up the plant
                                                                                + 13:10
    sequence of events.                                                                   successfully.

•   Always play it back to the investigation team in as
    simple terms as possible and let it follow a storyline
    flow.
                                                                                                                                              Copyright ©, 2018, Sasol

                                                                                                    SMARTER approaches in Asset Management
Sometimes it is as Easy as 1, 2, 3 …

                                       Copyright ©, 2018, Sasol
Generate ALL Causes, including the Mechanism

•   List all potential causes – even causes that are improbable.
•   Evaluate immediate improbable causes for elimination and capture reasons for elimination.
•   Ensure that the remaining potential causes align with the sequence of events.
       Dip in Plant Air Supply
              A dip in plant air supply could have resulted in solenoid activation, causing the firewater system to be
              activated, without sending an alarm to the control panel.

              Eliminated as a Root Cause:      Apart from the fact that the solenoid was found to be in a working
                                               order after both incidents, no fluctuations in plant air supply was
                                               observed during the time of any of the incidents.

                         Switch Manually Pressed in Error
                                 If the switch in the control room was activated, it would have activated the firewater system without
                                 sending an alarm to the panel, as this switch is not linked to a panel alarm. The event of this
                                 occurring, however, is highly unlikely, as this switch is of an “sunken in” type and has to be pressed   … and ALWAYS
                                 with deliberate action. It is possible that this switch could have been pressed in error, due to not
                                 understanding its purpose, but based on the probability of failures; the most probable cause is that     play it back…
                                 this switch was faulty.

                                                                                                                                                              Copyright ©, 2018, Sasol

                                                                                                                                          SMARTER approaches in Asset Management
Which Method is Which …?

                ● Identify barriers to prevent incidents
Barrier         ● Barriers are evaluate to see where barriers failed or worked less effectively
Analysis        ● Reasons for barrier failures and preventative actions are listed and implemented
                ● It often does not identify missing barriers

                ● Align the sequence of events with the conditions that caused them
Causal Factor   ● Indicate events / conditions with evidence in a solid line to represent the incident
Tree            ● Evaluate the incident by evaluating changes in events / conditions
                ● Identify the root cause, as well as causal factors

                ● Identify faults that could lead to the incident, as well as a cause for every fault
Fault Tree      ● Group logically related items using “AND” or “OR” between faults and causes
Analysis        ● Continue identifying causes for each fault until you reach a root cause
                ● List countermeasures for each root cause

                                                                                                                             Copyright ©, 2018, Sasol

                                                                                                         SMARTER approaches in Asset Management
Which Method is Which …(continued)?

                 ● Group all causes into categories (include people, process and procedures)
Fish-bone
                 ● Group causes into sub-causes and evaluate the causes against proof
Diagram
                 ● Can be used effectively with any other technique

                 ● 20 % of the most important failures causes 80 % of the incidents
Pareto           ● List all potential causes and rank the causes according to highest probability
Analysis         ● The top 20 % of the causes will be the most likely root causes
                 ● Used effectively when multiple causes are to be evaluated

Failure Mode &   ● Identify all the various failure modes as well as frequency and preventative actions
Effect (FMEA)    ● Evaluate if all preventative actions were implemented and followed

                           (Insert Strategy Here …)
                                                                                                                         Copyright ©, 2018, Sasol

                                                                                                     SMARTER approaches in Asset Management
Which Method is Which … (continued)?

           ● Appraise the situation
KEPNER-
           ● Compare the situation against a workable scenario
TREGOE
           ● List the differences and changes to derive a root cause

                                                                       Copyright ©, 2018, Sasol
If all else Fails …

                      … evaluate the
                      on the balance of
                      probabilities …

                                          Copyright ©, 2018, Sasol

                      SMARTER approaches in Asset Management
Understanding the Types of Causes

               ● Origin of an incident.
Root Cause
               ● Most basic cause which can be fixed and / or improved to prevent recurrences.

               ● Primary cause leading to an incident.
Direct Cause
               ● Incident would not have occurred without this cause.

Contributing   ● Not a self-sufficient cause.
Cause          ● Contributes to the severity / frequency of an incident.

               ● True reason for an incident.
True Cause
               ● Can be physical or latent in nature.

Physical       ● Causes with tangible roots and visible after the incidents.
Roots          ● In other words causes that can be seen physically.

               ● Refers to human intervention actions
Human Roots
               ● For instance tasks that were executed / not executed

Latent Cause   ●   Underlying reasons that explains physical & human causes.

                                                                                                                     Copyright ©, 2018, Sasol

                                                                                                 SMARTER approaches in Asset Management
Remember when Closing Out an Investigation

•   Verify your findings - follow the trail of cause and effect to verify that the root cause is correct and
    that all causes and effects are listed to prevent the failure from recurring.
•   Group all causes into the various disciplines.
•   List corrective actions for every cause to minimise severity of to prevent recurrence.
•   Develop an action plan with responsible people and due dates to implement findings.

•   Implement all actions and solutions.
•   Review performance and track effectiveness of solutions.
•   Remember to capture the lessons learnt from the investigation.

                                                                                                     Copyright ©, 2018, Sasol

                                                                                 SMARTER approaches in Asset Management
A Few Practical Learnings

• Capture information relating to the incident as soon as possible.
• Involve all relevant parties from the beginning of the investigation to ensure alignment.
• Identify an incident owner who will ensure that all next steps are executed.
• Ensure that the RCA is conducted within a multi-disciplinary team.
• Ensure that the Sequence of Events is as accurate and captures all incident conditions.
• Ensure that the team is aligned with the RCA methodology to be followed.
• Ensure that dates and responsible people are assigned to next steps.
• Always intent to solve the problem and not to blame people.
• Ensure that everyone is allowed to voice their opinion.
• Evaluate all possible listed causes.
• Substantiate causes with facts.

                                                                                               Copyright ©, 2018, Sasol

                                                                           SMARTER approaches in Asset Management
In the Words of Jack Sparrow

                                                   Copyright ©, 2018, Sasol

                               SMARTER approaches in Asset Management
Thank you

Maretha Price
Sasol Secunda Operations: Reliability Engineering

                                                                  Copyright ©, 2018, Sasol

                                              SMARTER approaches in Asset Management
You can also read