Threat and Error Management

Taking CRM to the next level

Scott Stahl

January 7, 2018

5049

Most people in aviation are familiar with the concept of Crew Resource Management and the fact that it is designed to foster an open and inclusive operational policy, so that all viewpoints and perspectives may be considered in order to identify and execute the best course of action in any given situation. It seems very straight forward, but how is it actually achieved? How do multiple people from multiple disciplines come together to formulate a standardized method for identifying problems, errors, courses of action and possible remedies? Developed in the mid- to late-1990’s, Threat and Error Management (TEM) is one of the main areas in which CRM is implemented to mitigate risk.

To understand the implementation of TEM, we much first understand that any Undesired Aircraft State (UAS) is precipitated by a series of occurrences, known as a the “chain of events.” The number of completely unpreventable, catastrophic failures is nearly zero, while almost 100% of all incidents have one or more events that could have been avoided to prevent the situation from ever having occurred. Essentially, it is postulated that the vast majority of issues could have been prevented multiple times prior to its occurrence.

Threat and Error Management acknowledges three main principles:

Threats exist in all operations and are constantly changing, and as such, identifying them and attempting to mitigate them is a constant and active process.
Error is unavoidable, and rather than focusing on eliminating error, it instead seeks to recognize, identify and remedy error prior to creating a UAS. This is referred to as “trapping” the error.
It recognizes that if a crew fails to trap an error, it can result in a UAS. This is important because it also acknowledges that a UAS greatly increases the risk of an accident or incident. Once the UAS occurs, it relies on probabilities to determine whether the results is an accident or an incident. The vast majority of UAS situations do not end this way, and are instead recovered to a desirable aircraft state. However, the overall goal is to avoid these situations through active management of the first two factors.

The first step of the process is accomplished using a combination of thorough briefings and open discussion of the likely risks or situations that are expected to arise based on the factors for that operation. A thorough briefing will review the roles of each crew member, expected taxi routes, departures, weather, use of automation, equipment failure considerations and aircraft condition/equipment considerations as well as any applicable limitations. It will also include a thorough consideration of contingency actions should something unexpected occur, such as an engine failure at V1 (engine failure path, takeoff alternates, procedures to be accomplished, whether crew will change roles, etc.). This is often referred to as “plans stated” or “plans briefed.” It is expected that during this briefing, the pilot not flying will ask any questions or advocate discussion of any further questions or briefings that need to occur. For example, when there is a possibility for more than one departure runway, a departure runway change, or any other factor that may not go as briefed, should be discussed so that all crew members are prepared for the eventuality. The plans are laid out providing a foundation for error management should actual events deviate from expected events.

Of course, this also applies to other phases of flight, such as arrival, approach and landing. Again, a thorough discussion of any possible courses of action, risks, and contingencies should be briefed before the phase of flight starts so that the entire crew is prepared for any expected contingencies.

The second step in the process adds another layer of adaptability and protection. This is the part of the process that works on the assumption that errors are inevitable. All humans make errors, so rather than focusing on an operational system that punishes error in an attempt to eliminate it, it has proven more effective to accept that errors will occur and additional measures must be used to identify and mitigate them to the lowest possible level. For instance, if the crew in the first step did a thorough job briefing the possible departures, but forgot to update the departure in the FMS and the airplane is programmed to fly the wrong departure, TEM has additional measures to address this lapse. Effective TEM would have the crew stop the plane when the runway change is received and then reprogram, rebrief and reverify that everything is properly setup for departure. It does add a few minutes of time, but it also greatly reduces the risk of having the wrong altitudes or fixes in the box with multiple airplanes departing multiple runways simultaneously.

Another key factor to the second step is a shared mental model created by situational awareness. This seems pretty obvious, but is actually one of the more difficult parts of working in a multi-crew environment where different people have different roles. The first key to making this work is very high levels of standardization, from flows to checklists to briefing guidance. Another means by which the shared mental model is created outside of routine briefings is called Verbalize, Verify, Monitor (VVM). Essentially, anything that is going to be changed is verbalized as being changed. All crew members then verify that the change is appropriate and correct, and once the change is made, they both verify that the change occurred as intended. The main purpose of VVM is to keep everyone on the same page with regard to the progress and course of the flight as well as serving as a verification for unintentional errors and keeping all parties apprised when stated plans change.

Another factor to trapping errors is to use the appropriate level of automation for the situation and phase of flight. Data trends increasingly show that automation actually plays a large role in UAS occurrence because crews don’t properly manage the automation, which may actually require reducing the level used. The data show that in many cases, the crew chose to try to fix the automation to still use it, which could actually result in exacerbating a problem when simply turning the automation off, addressing the problem and then bringing the automation back online.

The third and final component of TEM is dealing with an error that is not successfully trapped by all of the other layers of protection and becomes a UAS. In such a situation, it becomes paramount for the crew to contain the undesired state, work diligently to return the aircraft to a desirable state as quickly as possible, and then deal with any associated reporting after the fact. To continue our example of the runway change, a UAS might be that on departure, where the runway change was not successfully changed in the FMS, the aircraft begins to navigate to the wrong waypoints heading for departure on the wrong runway. It is obvious that this could very quickly lead to an extremely dangerous situation, and as such, timely detection, identification and correction are critical. Though unintentional, something was done incorrectly and not caught by the other crew member, or it wasn’t done at all and isn’t discovered until the aircraft doesn’t respond in the way it should.

Hopefully, if the crew is still utilizing the first two components of TEM, as they should be, the detection will occur very quickly because the process for monitoring is constant. In this particular situation, it would probably have been best for the crew to discontinue the use of automation, as the risk associated with keeping it engaged is extremely high. They should then coordinate with air traffic control for a vector while they get the problem fixed in the FMS, and once the FMS is correct, the crew verifies that the corrected data is entered and it is safe to continue using it, it may be reengaged one level at a time, until full automation is back online. This crew would have to file a series of reports about the incident, but that is far better than many of the alternative outcomes.

The beauty of TEM is that it also works in those rare situations where the failure is not the crew’s fault. Perhaps there is some mechanical failure in flight and they find themselves in a completely unavoidable UAS. The TEM tools can again be put to use to identify the situation and correct it while allowing all further decisions to be adjusted according to the factors of the new conditions of flight.

Threat and Error Management is highly effective at not only keeping the crew working together, but also to identifying risk and mitigating it to the lowest possible level using available information. In those rare situations where something unpredictable happens, it also allows the crew to work together to deal with situations and bring the aircraft back into a desired state of operation quickly and effectively.