Article
Service DesignRoot Cause Analysis: Definition, Methods & Step-by-Step Guide
Root cause analysis step by step: 6 methods compared, decision framework, service industry example & honest limitations backed by academic evidence.
Root cause analysis (RCA) is a systematic process that identifies the fundamental causes of problems rather than treating their symptoms. The goal is not “What happened?” but “Why did it happen — and how do we prevent it from happening again?” Rooney and Vanden Heuvel define root causes as specific, identifiable causes “that are within management’s control to remedy and for which effective recommendations for preventing recurrence can be generated” [1].
What distinguishes root cause analysis from other problem-solving approaches: it demands digging beyond the obvious. A complaint about long processing times is not the output of a root cause analysis — it is the starting point. RCA asks: Why are processing times long? Is a checklist missing? Why is it missing? Was it never created, or was it created and ignored? Only when you arrive at a cause you can actually fix have you found the root cause.
Search the web for “root cause analysis” and you will find dozens of pages with the same structure: definition, method list, a manufacturing example. Not one provides a decision framework for choosing between methods. None addresses the academically documented weaknesses of RCA. And none demonstrates the method in a service context — insurance, banking, telecommunications.
This guide closes those gaps. You get a method comparison with decision criteria, a step-by-step process, an example from claims processing, and an honest analysis of the situations where root cause analysis fails.
What Is Root Cause Analysis? Definition and Scope
Root cause analysis is an overarching problem-solving approach that encompasses multiple methods — from the simple 5 Whys to the complex Fault Tree Analysis. What all methods share is one principle: don’t treat the symptom, eliminate the cause.
Three levels of cause analysis:
| Level | Question | Example |
|---|---|---|
| Symptom | What do we observe? | “23% of claims submissions are incomplete” |
| Direct cause | What triggered the symptom? | ”The submission form doesn’t require all mandatory fields” |
| Root cause | Why does the direct cause exist? | ”The last form revision wasn’t coordinated with the claims processing team” |
Most teams stop at the direct cause level. They fix the symptom (update the form) without addressing the root cause (no coordination process for form changes). The result: the problem returns with the next revision.
Distinguishing related terms:
| Term | Meaning | Difference from root cause analysis |
|---|---|---|
| Failure analysis | Systematic investigation of a specific failure | RCA goes deeper — it seeks the root cause, not just the failure mechanism |
| Problem solving | General process from problem to solution | RCA is a specific step within problem solving |
| FMEA | Proactive analysis of potential failures before they occur | RCA is reactive — it analyzes problems that have already occurred |
| Corrective actions | Measures to eliminate nonconformities (ISO 9001:2015, Section 10.2) | RCA is the analytical step before the corrective action |
6 Root Cause Analysis Methods Compared
There is no single “best” method for root cause analysis. Depending on the problem, available data, and team composition, different approaches are appropriate. The following overview compares six proven methods — from simple to complex.
| Method | Complexity | Duration | Team | Best for | Weakness |
|---|---|---|---|---|---|
| 5 Whys | Low | 15–30 min | 2–5 people | Single, linear cause chains | Fails with multiple, interacting causes [2] |
| Ishikawa Diagram | Medium | 60–90 min | 4–8 people | Problems with multiple cause categories (People, Method, Machine) | Doesn’t capture interactions between causes [3] |
| Fault Tree Analysis (FTA) | High | Half day+ | 3–6 specialists | Safety-critical systems requiring calculated failure probabilities | Requires statistical expertise and complete system knowledge |
| Barrier Analysis | Medium | 60–120 min | 3–6 people | Incidents where protective barriers failed | Focuses on barriers, not systemic causes |
| Events and Causal Factors Analysis | High | Half day+ | 4–8 people | Complex incidents with a chronological sequence | Complex, can produce complicated relationship diagrams [4] |
| Change Analysis | Low–Medium | 30–60 min | 2–5 people | Sudden performance changes after a process modification | Only effective when a change can be identified as the trigger |
Which Method Fits Your Problem?
Method selection depends on three factors: problem complexity, available data, and team expertise. Use this decision framework:
Decision tree:
-
Is the problem a single, linear cause chain?
- Yes → 5 Whys (fast, simple, no preparation needed)
- No → continue to 2
-
Does the problem have multiple potential cause categories (People, Process, Technology, Environment)?
- Yes → Ishikawa Diagram (structured team brainstorming across categories)
- No → continue to 3
-
Did a recent change trigger the problem?
- Yes → Change Analysis (before-and-after comparison)
- No → continue to 4
-
Did protective barriers (controls, checkpoints, approvals) fail?
- Yes → Barrier Analysis (which barrier failed and why)
- No → continue to 5
-
Is the incident complex with a chronological sequence of events?
- Yes → Events and Causal Factors Analysis (timeline + causal factors)
- No → continue to 6
-
Do you need quantitative failure probabilities for safety-critical systems?
- Yes → Fault Tree Analysis (Boolean logic, calculable probabilities)
- No → Start with 5 Whys and escalate to Ishikawa if needed
Practical tip — combine methods rather than using them in isolation: In practice, experienced teams rarely use a single method in isolation. A typical integrated workflow: Ishikawa for breadth (capture all cause categories), 5 Whys for depth (trace prioritized causes to the root), Pareto for prioritization (which root cause has the greatest leverage). To embed this workflow into a structured improvement cycle, use the PDCA cycle as a framework.
Step by Step: Conducting a Root Cause Analysis
Regardless of the chosen method, every root cause analysis follows a fundamental process. These six steps form the framework — the individual method (5 Whys, Ishikawa, FTA) is applied in Step 3.
Step 1: Define the Problem Precisely (30 minutes)
An imprecise problem definition is the most common reason root cause analyses fail. The Juran Institute warns: without a clear problem statement, the fishbone diagram becomes “unnecessarily large, complex, and difficult to use” [5] — this applies to every RCA method.
Checklist for a good problem definition:
- Measurable: “First-contact resolution rate in customer service is 42% versus the target of 70%”
- Specific: Not “Service is bad” but which service, which aspect, which metric
- Time-bound: Since when has the problem existed? Trend or sudden change?
- Symptom, not cause: The problem definition describes the observable symptom, not the suspected cause
| Poor | Better |
|---|---|
| ”Customers are unhappy" | "NPS for auto claims dropped from 32 to 18 (Q2 vs. Q1)" |
| "The process doesn’t work" | "18% of claims require phone callbacks for missing information, extending processing time by an average of 4 days" |
| "Error rate is too high" | "Cancellation rate for new policies is 23% within the first 90 days” |
Step 2: Collect Data (1–5 days)
Before generating hypotheses about causes, you need facts. Root cause analysis is based on data, not opinions.
What to collect:
- Quantitative data: metrics, error logs, process times, samples
- Qualitative data: interviews with those affected, on-site process observation (a Gemba Walk is ideal for this), customer feedback
- Context data: What changed recently? New software, new process, staff turnover?
Important: Collect data before applying the analysis method. Many teams jump straight to the Ishikawa workshop without first gathering data — this produces opinions, not insights.
Step 3: Identify Causes (60–180 minutes)
This is where you apply the chosen method (see decision tree above). Regardless of the method, three principles apply:
- Separate collection from evaluation. First capture all possible causes, then prioritize — not simultaneously.
- Ask “Why?” until you reach the root cause. Don’t stop at the first answer. If a cause is still within your sphere of influence and can be remedied, keep asking.
- Include cross-functional perspectives. Different departments see different causes. A pure IT team will miss process causes; a pure process team will underestimate technical causes.
Step 4: Verify Causes (1–2 weeks)
This step is missing from most guides — and is the reason many root cause analyses prove ineffective. The results of an Ishikawa workshop or 5 Whys session “reflect the team’s assumptions, not confirmed facts” [6].
Verification methods:
- Data analysis: Do available data confirm the hypothesis?
- Sample test: Can you reproduce the problem under controlled conditions?
- Counter-test: If you eliminate the suspected cause — does the problem disappear?
For each prioritized cause: assign a responsible person, define the verification method, set a deadline.
Step 5: Develop and Implement Countermeasures
Only now — after verification — do you develop solutions. Not before. The temptation to discuss solutions during Step 3 is strong. Resist it — solutions for unverified causes are waste.
Countermeasure hierarchy (from most to least effective):
| Strength | Type | Example | Effectiveness |
|---|---|---|---|
| Strong | Systemic elimination | Structurally remove the cause (redesign the process) | High — cause no longer exists |
| Medium | Technical control | Automated validation, plausibility check | Medium — error is detected and prevented |
| Weak | Administrative control | Training, work instruction, reminder | Low — depends on human discipline |
Peerally et al. (2016) documented that organizations preferentially choose “administrative and ‘weaker’ solutions such as reminders” rather than addressing deeper causes [7]. That is the more convenient but less effective path.
Step 6: Verify Effectiveness and Standardize
After implementation: measure whether the countermeasure works. Compare metrics with the baseline from Step 2. If it works — standardize it. If not — start a new analysis cycle with the insights from the first.
This step corresponds to the Check/Act phase in the PDCA cycle. Without effectiveness verification, root cause analysis is a one-time event, not an improvement process.
Example: Root Cause Analysis in Claims Processing
Problem: “Average processing time for auto claims is 14 business days. The SLA target is 7 days. 22% of claims require at least one callback due to missing information.”
A cross-functional team from claims processing, customer service, and IT conducts a combined root cause analysis — Ishikawa for breadth, 5 Whys for depth:
Ishikawa results (7M categories):
| Category | Identified Causes |
|---|---|
| People | New claims adjusters don’t know all required fields; high team turnover |
| Method | No standardized callback process; callbacks via phone instead of email |
| Machine | Claims form has no mandatory field validation; upload limited to 1 file |
| Material | Claims often include photos in insufficient resolution |
| Environment | Customers don’t know which documents to include |
| Management | No KPI tracking at individual adjuster level |
| Measurement | Processing time measured from receipt, not from incident date |
5 Whys for top-ranked cause (dot voting):
Prioritized cause: “22% of claims have incomplete information”
| Why? | Answer |
|---|---|
| Why is the information incomplete? | The claims form has no mandatory field validation |
| Why does the form have no validation? | It was created as a PDF 5 years ago and never digitized |
| Why was it never digitized? | IT prioritization focused on the new CRM system |
| Why wasn’t the form prioritized in parallel? | No process existed for systematically reporting form issues to IT |
| Root cause: | No feedback loop between claims adjusters and IT for form deficiencies |
Countermeasure: Digital claims form with mandatory field validation (strong — systemic elimination) + monthly feedback meeting between claims processing and IT for form/tool issues (medium — organizational control).
Note: This example is illustrative, constructed to demonstrate the method in a service context. The figures are based on typical industry values.
The 6 Methods in Detail
5 Whys
The simplest and most commonly used method. You repeatedly ask “Why?” until you arrive at a root cause you can fix. The number 5 is a rule of thumb — depending on the problem, 2 or 10 questions may be needed.
Strengths: Fast, no preparation, no materials. Ideal for simple, linear problems.
Weaknesses: Fails with multiple, interacting causes. Different people arrive at different answers for the same question — the result is subjective and depends on participants’ knowledge [2].
Ishikawa Diagram (Fishbone Diagram)
The Ishikawa Diagram organizes potential causes into categories (People, Machine, Material, Method, Environment, Measurement, Management) and displays them visually as a fishbone pattern. It was first used in 1943 by Kaoru Ishikawa at Kawasaki Steel Works [3].
Strengths: Forces systematic examination of all cause categories. Ideal for cross-functional teams. Visually traceable.
Weaknesses: Doesn’t capture interactions between causes. Produces hypotheses, not confirmed causes. For details, see our comprehensive Ishikawa Diagram guide.
Fault Tree Analysis (FTA)
Fault Tree Analysis starts with the undesired event (top event) and works deductively backward: What combination of sub-events must occur for the top event to happen? Connections are represented using Boolean operators (AND/OR). The method was developed in the 1960s at Bell Telephone Laboratories for the US Air Force [4].
Strengths: Enables calculation of quantitative failure probabilities. Shows logical dependencies between causes. Standard in safety-critical industries (aviation, nuclear, chemical).
Weaknesses: Requires statistical expertise and complete system knowledge. Too resource-intensive for most service quality problems.
Barrier Analysis
Barrier Analysis is based on the principle that every process has protective barriers — physical, administrative, or procedural controls designed to prevent errors. When a problem occurs, it means one or more barriers failed, were bypassed, or were inadequate [4].
Strengths: Focuses directly on the question “Why didn’t our safeguard work?” Particularly effective for safety-critical incidents and near-misses.
Weaknesses: Assumes barriers are defined in the first place. Doesn’t capture systemic causes beyond the barriers.
Events and Causal Factors Analysis
This method creates a timeline of events leading to the incident and identifies causal factors and conditions for each event. It is frequently used in combination with Barrier Analysis or 5 Whys [4].
Strengths: Shows chronological sequence. Suited for complex incidents involving many actors and systems.
Weaknesses: Resource-intensive to create. Can produce complicated diagrams when many factors interact.
Change Analysis
Change Analysis compares the “before” state (when the problem didn’t exist) with the “after” state (when the problem appeared) and systematically examines what changed [4].
Strengths: Fast and focused when a recent change is suspected as the trigger. Intuitive method requiring no special training.
Weaknesses: Only effective when a change can be identified as the trigger. Unsuitable for gradual deterioration without a clear turning point.
When Root Cause Analysis Does NOT Work
Root cause analysis is a useful tool — but not a universal one. The academic literature documents systematic weaknesses you should understand before starting your next analysis.
Problem 1: The Illusion of a Single Root Cause
The name “root cause” implies that every problem has one single root cause. In reality, multiple factors almost always interact. Peerally et al. (2016) criticize: the terminology promotes “a simple linear narrative that displaces more complex — and potentially more fruitful — accounts of multiple and interacting contributions” [7].
Consequence: Accept that most problems have multiple root causes. Speak of “root causes” (plural), not “the root cause” (singular). Prioritize causes by leverage rather than searching for the one “true” cause.
Problem 2: Hindsight Bias
Root cause analysis is fundamentally retrospective — it analyzes what happened after the outcome is known. This distorts the analysis: when people already know what happened, they tend to judge decisions as “obviously wrong” that were ambiguous at the time they were made [7].
Consequence: For each identified cause, ask yourself: “Would a reasonable person with the information available at the time have acted differently?” If not, it may not be a cause but rather a result of systemic conditions.
Problem 3: The Implementation Gap
The greatest weakness of root cause analysis lies not in the analysis but in the implementation. Martin-Delgado et al. (2020) analyzed 21 studies on RCA effectiveness in a systematic review: only 9% of studies demonstrated measurable improvement in patient safety [8]. RCA identifies causes effectively — but “not for implementing effective measures to prevent their recurrence” [8].
Consequence: The bottleneck is not the brainstorming — it is the follow-through. Plan effectiveness verification (Step 6) as a fixed component, not an optional add-on.
Problem 4: Weak Countermeasures
Organizations preferentially choose administrative controls (training, reminders, work instructions) rather than systemic solutions [7]. This is because administrative measures are quick to implement and cheap — but they depend on human discipline and fail as soon as attention wanes.
Consequence: Use the countermeasure hierarchy from Step 5. For each measure, ask: “Would this measure also work if nobody remembers it?” If not, seek a stronger alternative.
Root Cause Analysis and ISO 9001
ISO 9001:2015, Section 10.2, requires certified organizations to conduct systematic cause analysis for nonconformities [9]. The standard does not prescribe which method to use — but it does require:
- Evaluating the nonconformity, including determining its causes
- Ensuring the nonconformity does not recur or occur elsewhere
- Reviewing the effectiveness of corrective actions
Root cause analysis is thus not an optional best practice for ISO-certified organizations — it is a normative requirement. Applying and documenting the 6-step process described in this guide fulfills the requirements of Section 10.2.
Download Template
We provide a free root cause analysis template you can use directly in your next analysis process. The template includes:
- The method selection decision framework
- A 5 Whys template with documentation fields
- An Ishikawa template with 7M categories and guiding questions
- A countermeasure matrix with effectiveness rating
- The 6-step checklist
Frequently Asked Questions
What is root cause analysis?
Root cause analysis (RCA) is a systematic process that identifies the fundamental causes of problems rather than just treating their symptoms. It encompasses various methods — from the simple 5 Whys to the complex Fault Tree Analysis — and is used in quality management to permanently eliminate recurring errors.
What methods are used in root cause analysis?
The six most important methods are: (1) 5 Whys for simple, linear cause chains, (2) Ishikawa Diagram for problems with multiple cause categories, (3) Fault Tree Analysis for safety-critical systems requiring calculated probabilities, (4) Barrier Analysis for incidents where protective barriers failed, (5) Events and Causal Factors Analysis for complex incidents with chronological sequences, (6) Change Analysis for problems following recent changes.
How do you conduct a root cause analysis?
In six steps: (1) Define the problem precisely and measurably, (2) Collect data (quantitative and qualitative), (3) Identify causes using the appropriate method, (4) Verify causes through data, (5) Develop and implement countermeasures (systemic before administrative), (6) Verify effectiveness and standardize if successful. The entire process takes between one week and several months, depending on problem complexity.
What is the difference between Ishikawa Diagram and 5 Whys?
The Ishikawa Diagram works horizontally across cause categories: it collects possible causes from different areas (People, Method, Machine, etc.). The 5 Whys works vertically into depth: it traces a single cause chain step by step to the root cause. Both methods complement each other ideally: Ishikawa for breadth, 5 Whys for depth.
What is the difference between root cause analysis and FMEA?
Root cause analysis is reactive — it analyzes problems that have already occurred to find their root causes. FMEA (Failure Mode and Effects Analysis) is proactive — it identifies potential failures before they occur and evaluates their risk by severity, occurrence frequency, and detectability.
How do you document a root cause analysis?
Documentation should include: (1) Problem definition with a measurable statement, (2) Collected data and their sources, (3) Applied method and its results (Ishikawa diagram, 5 Whys protocol, etc.), (4) Verified root causes with supporting evidence, (5) Countermeasures with responsible persons and deadlines, (6) Effectiveness verification after implementation. For ISO 9001 audits, traceable documentation of all steps is required.
Related Methods
- Ishikawa Diagram: The most important single tool in root cause analysis — for structured identification of potential causes across categories
- PDCA Cycle: The improvement framework that embeds root cause analysis — RCA finds the cause, PDCA manages the entire improvement cycle
- Gemba Walk: For data collection in Step 2 — observe the actual process on-site before generating hypotheses
- Kano Model: When you want to prioritize customer needs rather than analyze problems — a complementary approach to root cause analysis
Research Methodology
This article synthesizes findings from two systematic reviews (Martin-Delgado et al. 2020, N=21 studies; Percarpio et al. 2008, N=73 articles), the BMJ critique by Peerally et al. (2016), the DOE Root Cause Analysis Guidance Document (DOE-NE-STD-1004-92), Ishikawa’s original work, and the analysis of 10 German-language specialist publications on root cause analysis. Sources were selected based on methodological rigor, practical relevance, and recency.
Limitations: The academic literature on the effectiveness of root cause analysis comes predominantly from healthcare. Empirical studies on application in service innovation and insurance are limited. The practical example (claims processing) is illustratively constructed, not a documented case study.
Disclosure
SI Labs provides consulting services in the area of service innovation and uses root cause analysis as a tool in the analysis phase of the Integrated Service Development Process (iSEP). This practical experience informs the assessment of methods in this article. Readers should be aware of the potential perspective bias.
Bibliography
[1] Rooney, James J., and Lee N. Vanden Heuvel. “Root Cause Analysis for Beginners.” Quality Progress 37, no. 7 (July 2004): 45-56. https://asq.org/quality-progress/articles/root-cause-analysis-for-beginners [Practitioner Article | ASQ | Citations: 800+ | Quality: 80/100]
[2] Serrat, Olivier. “The Five Whys Technique.” In Knowledge Solutions, 307-310. Singapore: Springer, 2017. DOI: 10.1007/978-981-10-0983-9_32 [Book Chapter | Methodological | Quality: 70/100]
[3] Ishikawa, Kaoru. Guide to Quality Control. Tokyo: Asian Productivity Organization, 1968. [Foundational Work | Historical | Citations: 10,000+ | Quality: 95/100]
[4] US Department of Energy. DOE-NE-STD-1004-92: Root Cause Analysis Guidance Document. Washington, DC, 1992. https://www.standards.doe.gov/standards-documents/1000/1004-std-1992 [Government Standard | Comprehensive | Quality: 85/100]
[5] Juran Institute. “The Ultimate Guide to Cause and Effect Diagrams.” https://www.juran.com/blog/the-ultimate-guide-to-cause-and-effect-diagrams/ [Industry Authority | Quality: 85/100]
[6] Qualitaetsmanagement.me. “Ishikawa-Diagramm.” https://qualitaetsmanagement.me/kvp-einfuehren/ishikawa-diagramm/ [Practitioner | Quality: 70/100]
[7] Peerally, Mohammad Farhad, Susan Carr, Justin Waring, and Mary Dixon-Woods. “The problem with root cause analysis.” BMJ Quality & Safety 26, no. 5 (2017): 417-422. DOI: 10.1136/bmjqs-2016-005511 [Critical Review | BMJ | Citations: 200+ | Quality: 90/100]
[8] Martin-Delgado, Juan, Aranaz-Andres Jesus Maria, Mira Jose Joaquin, et al. “How Much of Root Cause Analysis Translates into Improved Patient Safety: A Systematic Review.” Medical Principles and Practice 29, no. 6 (2020): 524-531. DOI: 10.1159/000508677 [Systematic Review | N=21 studies | Citations: 100+ | Quality: 85/100]
[9] ISO 9001:2015. Quality management systems — Requirements. International Organization for Standardization, 2015. Section 10.2 (Nonconformity and corrective action). [International Standard | Authoritative Source | Quality: 95/100]
[10] Percarpio, Katherine B., Vicki S. Watts, and William B. Weeks. “The Effectiveness of Root Cause Analysis: What Does the Literature Tell Us?” Joint Commission Journal on Quality and Patient Safety 34, no. 7 (2008): 391-398. DOI: 10.1016/s1553-7250(08)34049-5 [Literature Review | N=73 articles | Citations: 200+ | Quality: 80/100]