Article
Service DesignScoring Model: Guide, Practical Example & Method Comparison
The scoring model step by step: weighted evaluation with a practical insurance example, sensitivity analysis, and systematic method comparison.
The scoring model (also called scoring method, point-rating method, or weighted scoring) is a structured evaluation tool that systematically compares multiple alternatives against defined and weighted criteria. Each criterion receives a point score multiplied by its weight. The sum of weighted points — the score — makes transparent which alternative performs best under the chosen criteria [1].
What distinguishes the scoring model from a simple ranking: it forces an explicit separation between criteria selection, weighting, and rating. This three-way split reveals where team disagreements actually lie — on the question “What matters?” (weighting) or “How well does Alternative X fulfill the criterion?” (rating). Without this separation, the two judgments blur together, and discussions go in circles.
Search for “scoring model” in the decision-making context, and you will find dozens of results with vacation-destination or smartphone-purchase examples. None demonstrates the method in a service process. None explains which cognitive biases systematically distort results — and how to concretely prevent them. None walks through a sensitivity analysis showing how robust your result actually is. And none systematically compares the scoring model with AHP, the Pugh matrix, RICE, or Kano.
This guide closes those gaps.
Definition: What is a scoring model?
A scoring model is a quantitative evaluation method that makes qualitative and quantitative criteria comparable on a common scale. The core mechanism:
- Define criteria — which dimensions are decision-relevant?
- Weight criteria — which dimensions are more important than others?
- Rate alternatives — how well does each alternative fulfill each criterion?
- Calculate scores — weighted point sum per alternative
- Check robustness — how stable is the result under changed assumptions?
Formula:
Score(Alternative) = Sum of all criteria (Rating_i x Weight_i)
The scoring model belongs to the family of Multi-Criteria Decision Analysis (MCDA) — specifically, the Weighted Sum Model (WSM), the oldest and most widely used MCDA method [5]. Its strength lies in simplicity: everyone on the team understands how the result was derived. Its weakness lies in that same simplicity: it assumes that a low score on one criterion can be compensated by a high score on another — which is not always appropriate [9].
Scoring model, utility analysis, decision matrix — what is the difference?
These terms are frequently used interchangeably in practice but have different origins and nuances:
| Term | Origin | Distinguishing feature |
|---|---|---|
| Scoring model | English-language business literature | Most general term — any point-rating method |
| Utility analysis (Nutzwertanalyse) | Zangemeister, 1976 [1] | Formalized German methodology with defined process, weights in percentages (sum = 100%), documentation requirements |
| Decision matrix | Practitioner term | Tabular format — rows = criteria, columns = alternatives. May be weighted or unweighted |
| Pugh matrix | Pugh, 1981 [2] | Special form: relative rating against a reference concept (+/0/-), no absolute points |
| AHP | Saaty, 1980 [3] | Pairwise comparisons with consistency check — mathematically rigorous but more demanding |
The critical distinction: Zangemeister’s utility analysis requires that weights sum to 100% and produce a total utility value. A general scoring model can also work with raw points whose sum has no normalized meaning [1]. In practice, the difference is minor — this guide covers the weighted scoring model with normalized weights, because it has the broadest application range.
From Zangemeister to Saaty: The academic roots
Christof Zangemeister and utility analysis (1970/1976)
Christof Zangemeister, lecturer in systems engineering at TU Berlin, published Nutzwertanalyse in der Systemtechnik in 1970 — the first German-language methodology for systematically evaluating project alternatives against non-monetary criteria [1]. The fourth edition of 1976 became the standard reference. Zangemeister’s contribution was the formalization of a reproducible process: from objective-setting through criteria operationalization to sensitivity analysis. He elevated multi-dimensional evaluation from informal brainstorming to the rigor of an engineering methodology.
Zangemeister’s approach quickly spread through public planning, construction, and transport planning — everywhere decisions could not rely solely on cost-benefit analysis because important criteria (environmental impact, acceptance, safety) are not measurable in monetary terms [1].
Stuart Pugh and relative evaluation (1981)
Stuart Pugh (1929–1993), professor of design at the University of Strathclyde, developed an alternative to absolute scoring in 1981: the Pugh matrix [2]. Instead of rating each alternative in isolation, the team compared all alternatives against a reference concept (datum): better (+), same (0), or worse (-). Pugh’s innovation lay in cognitive relief: relative judgments (“Is A better than B on criterion X?”) are more reliable than absolute judgments (“How good is A on criterion X on a scale of 1–5?”) because they require less information processing.
Thomas L. Saaty and the Analytic Hierarchy Process (1980)
Thomas L. Saaty formalized the Analytic Hierarchy Process (AHP) in 1980 — a method that derives weights through pairwise comparisons and mathematically tests their consistency [3]. When a team says “customer impact is three times more important than cost” and simultaneously “cost is twice as important as feasibility,” AHP checks whether these statements are contradiction-free. If inconsistent, AHP demands correction. This makes AHP more precise than simple percentage allocation — but also significantly more labor-intensive. In practice, teams use AHP for strategically important decisions while the simple scoring model suffices for operational decisions.
When is the scoring model the right tool?
The scoring model is most valuable when you need to make a selection decision between clearly defined alternatives with multiple evaluation dimensions — and the decision must be traceably documented.
Use the scoring model when:
- You want to compare 3–7 alternatives (for 2, a pros-and-cons list suffices; beyond 7, evaluation becomes unwieldy)
- You need to consider qualitative and quantitative criteria simultaneously — e.g., cost (quantitative) and customer experience (qualitative)
- The decision is made in a team and different stakeholders have different priorities
- You need to justify the decision to management, clients, or oversight bodies
- You want to systematically evaluate concepts generated through a creative process — for example with the morphological box or in the service design process
Use a different tool when:
| Situation | Better alternative | Why |
|---|---|---|
| You want to understand how features affect customer satisfaction | Kano model | Kano classifies by satisfaction asymmetry — the scoring model only rates “how good” |
| You need a quick sprint prioritization | MoSCoW | Faster, no numerical effort, workshop format |
| You want to analyze strategic strengths and weaknesses | SWOT analysis | SWOT analyzes the starting position; the scoring model selects between options |
| You need mathematically consistent weights | AHP (Saaty) | Pairwise comparisons with consistency check, but more demanding |
| You want a quick pre-selection with relative evaluation | Pugh matrix | Relative rating (+/0/-) instead of absolute point-scoring |
| You want to iteratively improve a single concept | PDCA cycle | PDCA improves processes; the scoring model selects between alternatives |
Comparison: Scoring model vs. AHP vs. Pugh matrix vs. RICE vs. Kano
| Dimension | Scoring model | AHP | Pugh matrix | RICE | Kano |
|---|---|---|---|---|---|
| Focus | Selecting the best alternative from a defined set | Hierarchical decomposition of complex decisions | Quick concept selection against reference | Quantitative backlog prioritization | Satisfaction impact of individual features |
| Rating type | Absolute (points x weight) | Pairwise comparisons with consistency check | Relative to datum (+/0/-) | Formula: (R x I x C) / E | Customer survey (functional/dysfunctional) |
| Complexity | Low to medium | High — requires n x (n-1)/2 comparisons | Low | Medium — requires metrics | Medium — requires questionnaire design |
| Best for | One-time concept selection with 5–8 criteria | Strategic decisions with many criteria | Early concept phase with rough alternatives | Product backlog prioritization | Understanding WHY features matter |
| Greatest weakness | Weighting is subjective, compensatory | Time investment, cognitive load | No numerical total score | Confidence is often estimated | Requires customer survey |
| Origin | Zangemeister (1970/1976) [1] | Saaty (1980) [3] | Pugh (1981) [2] | Sean McBride / Intercom (2016) | Kano (1984) |
Step by step: Creating a scoring model
Time frame: 90–120 minutes as a team. Allow 20 minutes for knockout criteria and criteria definition, 20 minutes for weighting (ideally in a separate session), 30 minutes for rating, 15 minutes for calculation, 15 minutes for sensitivity check.
Step 1: Check knockout criteria first
Before scoring begins, define knockout criteria (exclusion criteria). A knockout criterion is a minimum requirement that every alternative must meet — regardless of how well it scores on other criteria. Alternatives that fail a knockout criterion are eliminated before scoring.
Why separate: The scoring model is a compensatory method — a low score on one criterion can be offset by high scores on others [9]. This is unacceptable for minimum requirements (e.g., regulatory compliance, technical prerequisites). Knockout criteria must be handled outside the scoring logic.
Example: For an insurer, “meets BaFin data security requirements” could be a knockout criterion. A service concept that fails this requirement is not scored — no matter how innovative it is.
Step 2: Define evaluation criteria
List the criteria against which remaining alternatives will be evaluated. Good criteria are:
- Independent of each other — two criteria measuring the same thing (e.g., “costs” and “budget requirement”) distort results through double-counting. Montibeller and von Winterfeldt (2015) call this the splitting bias: when one aspect is split into multiple criteria, it receives disproportionate total weight without the team being aware [7].
- Measurable or ratable — the team must be able to provide an assessment for each alternative
- Differentiating — if all alternatives score equally on a criterion, it contributes nothing to the decision
- Complete — all decision-relevant aspects must be covered
Typical criteria for service decisions:
| Category | Example criteria |
|---|---|
| Customer impact | Customer satisfaction, usability, time-to-value |
| Economics | Implementation costs, ongoing costs, payback period |
| Feasibility | Technical complexity, resource availability, timeline |
| Strategic fit | Brand fit, scalability, differentiation potential |
| Risk | Implementation risk, regulatory requirements, dependencies |
Recommended number: 5–8 criteria. Fewer than 5 is too coarse and leaves important dimensions unaddressed. More than 10 exceeds the team’s rating capacity and creates false precision — the illusion that more criteria lead to better decisions [7].
Step 3: Weight the criteria
Weighting is the most critical step — and the most frequently underestimated. Weber and Borcherding (1993) showed empirically that different weighting methods produce different weights, even when the same people express the same preferences [8]. The elicitation method itself is not neutral.
Method 1: Percentage allocation (simple) Distribute 100 percentage points across criteria. Each participant distributes individually, then average. Advantage: Intuitive. Disadvantage: Tendency toward equal distribution when participants hesitate to commit.
Method 2: Swing weighting (more thorough) Imagine all alternatives have the worst possible value on every criterion. You may then set ONE criterion to its best possible value. Which do you choose? That is the most important criterion (100 points). Then: Which second? How much is this “swing” worth relative to the first? Weber and Borcherding (1993) showed that swing weighting produces the most consistent results [8].
Method 3: Ranking (quick) Sort criteria by importance. Assign the highest value to the most important criterion (e.g., 5 for 5 criteria), the next 4, and so on. Normalize to 100%. Advantage: Fast. Disadvantage: No differentiation of gaps — the difference between rank 1 and 2 is treated as equal to that between rank 4 and 5.
Critical principle: Define weights before alternatives are rated — and do not change them afterward. Tversky and Kahneman (1974) documented how anchoring values systematically distort subsequent judgments [4]. When the team sees ratings first and then “adjusts” weights, it unconsciously optimizes toward the desired result. The apparent objectivity of the method becomes a facade.
| Criterion | Weight |
|---|---|
| Customer impact | 30% |
| Feasibility | 25% |
| Economics | 20% |
| Strategic fit | 15% |
| Risk (inverted) | 10% |
| Total | 100% |
Step 4: Rate the alternatives
Rate each alternative against each criterion on a uniform scale.
Recommendation: 1–5 scale with verbal anchors. A 1–10 scale creates false precision — most teams cannot reliably distinguish between a 6 and a 7. Weber and Borcherding (1993) showed that central tendency (avoidance of scale endpoints) is more pronounced on wider scales, reducing effective differentiation [8].
| Value | Meaning | Example (Criterion: Customer impact) |
|---|---|---|
| 1 | Very poor | Worsens the existing customer experience |
| 2 | Poor | Minimal improvement, below industry average |
| 3 | Average | At industry level, no differentiation |
| 4 | Good | Noticeable improvement, above industry average |
| 5 | Very good | Excellent, potential competitive advantage |
Silent rating: Have each participant rate individually first — on their own sheet or in a separate spreadsheet — before ratings are revealed. This prevents groupthink [6] and the anchoring effect [4]: the first number spoken aloud influences all subsequent ratings in the room.
Step 5: Calculate weighted scores
Multiply each rating by the criterion’s weight and sum across all criteria:
Weighted Score = Sum (Rating_i x Weight_i)
The alternative with the highest score is — under the chosen criteria and weights — the best option.
Step 6: Conduct sensitivity analysis
This step is missing from nearly all scoring model guides — yet it is the most important. Without sensitivity analysis, you do not know whether your result is robust or whether a small change in assumptions would flip the ranking.
Three checks:
-
Weight sensitivity: Increase and decrease the weight of the most important criterion by 10 percentage points (e.g., from 30% to 40% or 20%). Does the ranking change? If yes: the decision hinges on that weight — discuss it more thoroughly.
-
Rating sensitivity: Identify ratings where the team was uncertain (“Was that a 3 or a 4?”). Change these by +/-1 point. Does the ranking flip? If yes: invest in better information for exactly that criterion and that alternative.
-
Closeness test: If the gap between rank 1 and rank 2 is less than 5%, the result is not definitive. You then need either additional differentiating criteria, further information, or a deliberate qualitative decision beyond the numbers.
If the sensitivity analysis shows the result is fragile: This is not a method failure — it is a valuable insight. It means the alternatives are closer together than the numbers suggest. Supplement with qualitative factors (team capacity, political feasibility, timing) for the final decision.
Practical example: Scoring model in insurance
Context: An insurer has developed three concepts for a new claims portal as part of a digitalization project. Two service specialists, a UX designer, and an IT architect must decide which concept enters the pilot phase.
Knockout criteria check: All three concepts meet BaFin data security requirements and are compatible with the existing core system. None is eliminated.
The three concepts:
- A: Self-service portal — Customers report claims digitally, upload photos, track status in real time
- B: Hybrid model — Digital reporting with video-call option for complex claims (fire damage, liability cases)
- C: AI-powered portal — Automatic claims classification via image analysis, automated initial assessment through machine learning
Weights (set by the team in advance):
| Criterion | Weight |
|---|---|
| Customer impact | 30% |
| Feasibility (12 months) | 25% |
| Economics | 20% |
| Strategic fit | 15% |
| Risk (inverted: 5 = low risk) | 10% |
Rating (silent rating, then consensus):
| Criterion | Weight | A: Self-service | B: Hybrid | C: AI-powered |
|---|---|---|---|---|
| Customer impact | 30% | 4 | 5 | 3 |
| Feasibility | 25% | 5 | 4 | 2 |
| Economics | 20% | 4 | 3 | 2 |
| Strategic fit | 15% | 3 | 4 | 5 |
| Risk (inverted) | 10% | 5 | 4 | 2 |
| Weighted Score | 4.20 | 4.10 | 2.75 |
Calculation Concept A: (4 x 0.30) + (5 x 0.25) + (4 x 0.20) + (3 x 0.15) + (5 x 0.10) = 1.20 + 1.25 + 0.80 + 0.45 + 0.50 = 4.20
Calculation Concept B: (5 x 0.30) + (4 x 0.25) + (3 x 0.20) + (4 x 0.15) + (4 x 0.10) = 1.50 + 1.00 + 0.60 + 0.60 + 0.40 = 4.10
Calculation Concept C: (3 x 0.30) + (2 x 0.25) + (2 x 0.20) + (5 x 0.15) + (2 x 0.10) = 0.90 + 0.50 + 0.40 + 0.75 + 0.20 = 2.75
Sensitivity analysis
Weight sensitivity: The gap between A (4.20) and B (4.10) is only 2.4%. What happens if “customer impact” rises from 30% to 40%?
Concept A (new): (4 x 0.40) + (5 x 0.20) + (4 x 0.17) + (3 x 0.13) + (5 x 0.10) = 1.60 + 1.00 + 0.68 + 0.39 + 0.50 = 4.17
Concept B (new): (5 x 0.40) + (4 x 0.20) + (3 x 0.17) + (4 x 0.13) + (4 x 0.10) = 2.00 + 0.80 + 0.51 + 0.52 + 0.40 = 4.23
Result: With increased customer-impact weighting, B overtakes A. The result is fragile — the decision hinges on how customer impact is prioritized.
Rating sensitivity: The team was uncertain about Concept C (AI-powered) on “feasibility” — is that a 2 or a 3? With feasibility = 3, C rises to 3.00 instead of 2.75. The gap to A and B remains large enough: C stays in third place.
Team decision: Concept A is launched as the pilot, with the video-call component from Concept B added for complex claims as a Phase 2 extension. Concept C is earmarked as a strategic option for the following year when the AI infrastructure matures.
Note: This example is illustratively constructed to demonstrate the method in a service context. The ratings are based on typical industry values, not a documented case study.
Template: Scoring model ready to use
Use this checklist directly for your next scoring model:
Preparation
- Decision question formulated as a concrete question (not: “What do we do?” — rather: “Which of the three portal concepts should we pilot in Q3?”)
- Knockout criteria defined and alternatives pre-checked
- 3–7 alternatives identified
- 5–8 independent, ratable criteria defined
Weighting
- Weighting method chosen (percentage, swing, or ranking)
- Weights set BEFORE rating
- Sum of weights = 100%
- Weights documented in writing and declared fixed
Rating
- Scale with verbal anchors defined (recommended: 1–5)
- Each participant rated individually (silent rating)
- Deviations discussed in team and consensus reached
- Weighted scores calculated
Quality assurance
- Sensitivity check: weight +/-10 percentage points
- Sensitivity check: uncertain ratings +/-1 point
- Closeness test: gap between rank 1 and rank 2 > 5%?
- Qualitative factors supplemented that the scoring model does not capture
- Decision documented including rationale, weights, and alternative evaluations
5 cognitive biases that distort scoring results
1. Anchoring effect during rating
Symptom: The project leader states their rating first. Everyone else agrees or deviates minimally. The result reflects one person’s opinion, not collective intelligence.
Cause: The anchoring effect is one of the most robust cognitive biases: the first number mentioned systematically influences all subsequent estimates [4].
Countermeasure: Silent rating — everyone rates individually before ratings are revealed. Discuss only deviations, not agreements.
2. Splitting bias in criteria
Symptom: The team defines “implementation costs,” “operating costs,” and “total costs” as three separate criteria. All three measure the same aspect.
Cause: Montibeller and von Winterfeldt (2015) identified the splitting bias: when one aspect is split into multiple criteria, it receives disproportionate total weight without the team being aware [7].
Countermeasure: Test each criterion: “If Alternative X scores better on this criterion — would it automatically score better on another criterion too?” If yes: combine them.
3. Post-hoc weight adjustment
Symptom: The team sees the result — and changes the weighting until the “right” result appears. “Cost is actually more important than customer impact” is only said after it becomes clear that the cheapest concept would otherwise lose.
Cause: Confirmation bias — the tendency to interpret information in ways that confirm existing beliefs.
Countermeasure: Set weights in a separate session, document in writing, declare fixed before rating. New weights only for new factual information, not new preferences.
4. Central tendency on broad scales
Symptom: On a 1–10 scale, all participants assign values between 4 and 7. The scale endpoints are avoided. The effective differentiation is lower than on a 1–5 scale.
Cause: Risk aversion in rating — extreme values feel “daring.” Weber and Borcherding (1993) documented this effect empirically [8].
Countermeasure: 1–5 scale with clear verbal anchors for each value. The verbal definition (“Very good = competitive advantage”) reduces central tendency.
5. Proxy bias for qualitative criteria
Symptom: The team is asked to rate “customer impact” and uses Net Promoter Score as a proxy. But NPS measures recommendation likelihood, not customer impact overall.
Cause: Montibeller and von Winterfeldt (2015) described proxy bias: teams replace hard-to-measure criteria with easy-to-measure metrics that only partially capture the same thing [7].
Countermeasure: For each criterion, explicitly define: “What exactly are we measuring? And does our metric actually measure what the criterion describes?” If not: rate the criterion qualitatively rather than using a false proxy.
When the scoring model is the wrong choice
1. Radical innovation with unknown criteria. When evaluating an entirely novel service concept, you often do not yet know the relevant criteria. A scoring model with wrong criteria produces a precise but irrelevant result. Design thinking (prototyping + testing) or exploratory methods like the morphological box are better suited — learn first, then evaluate.
2. Compensation is unacceptable. The scoring model is compensatory: a low score on “data security” can be offset by a high score on “cost efficiency.” When certain criteria represent absolute minimums — e.g., regulatory compliance — these must be handled as knockout criteria outside scoring [9].
3. Political decisions. When the decision has already been made and the scoring model serves only post-hoc legitimation, it is a waste of time. Worse: it breeds cynicism among the team toward future “objective” evaluation processes.
4. Insufficient information. When the team does not know enough about the alternatives to rate them seriously, the scoring model hides ignorance behind numbers. First invest in information gathering — user research, stakeholder mapping, or desk research — and conduct scoring when you can rate with substance.
5. Criteria dependencies. The additive scoring formula assumes criteria are independent of each other. But when “time to market” and “feature scope” are inversely correlated — faster delivery only possible with fewer features — the scoring model cannot capture the interaction [5]. In such cases, an extended MCDA method like ELECTRE or PROMETHEE is better suited.
Variations and advanced techniques
Weighted vs. unweighted scoring
The simplest form of the scoring model dispenses with weighting: every criterion counts equally. This is rarely sensible — it implicitly assumes that “cost” and “brand fit” are equally important. Weighted scoring is superior in nearly all practical situations because it reflects the team’s actual priorities rather than ignoring them.
Scoring model + Kano: The combined method
Use the Kano model to classify features by satisfaction impact — and transfer the results as input into the scoring model. Must-be features automatically receive the highest rating on “customer impact” (their absence creates dissatisfaction), attractive features a high rating, indifferent features a low one. This connects empirical customer data with structured alternative evaluation.
Scoring model in project portfolio management
Beyond individual decisions, organizations use scoring models to prioritize entire project portfolios. Here, not service alternatives but projects are evaluated against each other — with criteria such as strategic fit, expected ROI, resource requirements, and risk. The principle is identical; the challenge lies in consistent evaluation across a larger number of objects.
Scoring model for supplier evaluation
In procurement and supply chain management, scoring models are the standard tool for systematic supplier evaluation. Criteria such as quality, delivery reliability, price level, innovation capability, and sustainability are weighted and rated. The advantage: the evaluation is documentable, repeatable, and can be conducted in regular cycles (e.g., annually).
Frequently asked questions
What is a scoring model in simple terms?
A scoring model is a point-rating method that compares multiple options against weighted criteria. Each option receives a score per criterion. This score is multiplied by the criterion’s weight. The sum of all weighted scores produces the total score — the option with the highest score is the best choice under the chosen criteria.
How do I create a scoring model?
In six steps: (1) Check knockout criteria and pre-select alternatives. (2) Define 5–8 independent evaluation criteria. (3) Weight criteria (sum = 100%) — before rating. (4) Rate alternatives on a 1–5 scale with clear anchors (silent rating, then team consensus). (5) Calculate weighted scores. (6) Conduct sensitivity analysis — check whether small changes flip the ranking.
What is the difference between a scoring model and a utility analysis?
In practice, the terms are often used interchangeably. Zangemeister’s utility analysis (Nutzwertanalyse, 1976) is the formalized German methodology with specific requirements: weights sum to 100%, the result is a total utility value, and the process is documented [1]. “Scoring model” is the broader umbrella term for any point-rating method — including those without normalized weights.
What is the difference between a scoring model and a decision matrix?
The decision matrix describes the tabular format — rows for criteria, columns for alternatives. The scoring model describes the calculation logic — points times weight. In practice, both terms are used synonymously for the same tool: a weighted tabular alternative evaluation.
What are the advantages and disadvantages of a scoring model?
Advantages: Transparent and traceable; integrates qualitative and quantitative criteria; structures team discussions; produces documentable results; easy to understand and implement.
Disadvantages: Weighting and rating remain subjective; compensatory (low scores can be offset by high ones); susceptible to cognitive biases (anchoring, splitting bias); assumes criteria independence that is not always given [5][7].
When should I use AHP instead of a simple scoring model?
AHP (Analytic Hierarchy Process) is worthwhile for strategically important decisions with more than 8 criteria where mathematical weight consistency is required — for example in public tenders or regulated decisions. For operational team decisions with 5–8 criteria, the simple scoring model is more efficient [3][5].
Related methods
A typical sequence in service development: Use the morphological box to systematically generate service concepts. Use the scoring model or decision matrix to select the most promising concept. Use the Kano model to refine features. In the service design process, you implement the concept.
- Decision matrix: Tabular format for weighted alternative evaluation — methodologically closely related
- Kano model: When you want to classify features by satisfaction impact rather than choose between alternatives
- Morphological box: When you want to systematically generate solution combinations before scoring
- SWOT analysis: When you want to analyze the strategic starting position before concept selection
- Service design: For the overall context in which the scoring model is embedded as an evaluation tool
Research Methodology
This article synthesizes insights from Zangemeister’s foundational publication on utility analysis (1970/1976), Pugh’s concept selection methodology (1981), Saaty’s AHP framework (1980), research on cognitive biases in decision processes (Tversky & Kahneman 1974; Montibeller & von Winterfeldt 2015; Weber & Borcherding 1993), and comparative MCDA literature (Velasquez & Hester 2013; Keeney & Raiffa 1993). Additionally, 10 German-language articles on the scoring model were analyzed. Sources were selected for methodological rigor, practical relevance, and citation frequency.
Limitations: Academic literature on the application of scoring models specifically in service development is limited — most empirical studies originate from engineering, product development, and public planning. The practical example (claims portal) is illustratively constructed, not a documented case study. Recommendations on scale length and weighting method are based on experimental evidence whose transferability to service decisions is plausible but not empirically verified.
Disclosure
SI Labs provides consulting services in service innovation. In the Integrated Service Development Process (iSEP), we use scoring models during the concept phase to select between service alternatives. This practical experience informs the positioning of the method in this article. Readers should be aware of the potential perspective bias.
Bibliography
[1] Zangemeister, Christof. Nutzwertanalyse in der Systemtechnik: Eine Methodik zur multidimensionalen Bewertung und Auswahl von Projektalternativen. Munich: Wittemann, 1976 (1st ed. 1970, 5th ed. 2014). [Foundational work | Utility analysis | Citations: 1,500+ | Quality: 90/100]
[2] Pugh, Stuart. “Concept Selection: A Method That Works.” Proceedings of the International Conference on Engineering Design (ICED), Rome, 1981. Later expanded in: Pugh, Stuart. Total Design: Integrated Methods for Successful Product Engineering. Wokingham: Addison-Wesley, 1991. ISBN: 978-0201416398 [Foundational work | Pugh matrix | Citations: 3,000+ | Quality: 88/100]
[3] Saaty, Thomas L. The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation. New York: McGraw-Hill, 1980. ISBN: 978-0070543713 [Foundational work | AHP | Citations: 40,000+ | Quality: 92/100]
[4] Tversky, Amos, and Daniel Kahneman. “Judgment under Uncertainty: Heuristics and Biases.” Science 185, no. 4157 (1974): 1124–1131. DOI: 10.1126/science.185.4157.1124 [Foundational work | Cognitive biases | Citations: 45,000+ | Quality: 95/100]
[5] Velasquez, Mark, and Patrick T. Hester. “An Analysis of Multi-Criteria Decision Making Methods.” International Journal of Operations Research 10, no. 2 (2013): 56–66. [Journal article | MCDA comparison | Citations: 2,500+ | Quality: 78/100]
[6] Janis, Irving L. Groupthink: Psychological Studies of Policy Decisions and Fiascoes. Boston: Houghton Mifflin, 1982. ISBN: 978-0395317044 [Foundational work | Groupthink | Citations: 10,000+ | Quality: 85/100]
[7] Montibeller, Gilberto, and Detlof von Winterfeldt. “Cognitive and Motivational Biases in Decision and Risk Analysis.” Risk Analysis 35, no. 7 (2015): 1230–1251. DOI: 10.1111/risa.12360 [Journal article | Cognitive biases in MCDA | Citations: 500+ | Quality: 82/100]
[8] Weber, Martin, and Katrin Borcherding. “Behavioral influences on weight judgments in multiattribute decision making.” European Journal of Operational Research 67, no. 1 (1993): 1–12. [Journal article | Weighting methods | Citations: 300+ | Quality: 80/100]
[9] Keeney, Ralph L., and Howard Raiffa. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Cambridge: Cambridge University Press, 1993 (orig. 1976). ISBN: 978-0521438834 [Foundational work | Multi-Attribute Utility Theory | Citations: 20,000+ | Quality: 93/100]