The criticism of Dempster-Shafer Theory (DST) by L. A. Zadeh ^{2} has
generated a lot of discussions and articles on the subject of
conflicting evidence. In 2005, R. Haenni ^{3} showed that the
surprising result obtained by applying Dempster’s rule to Zadeh’s
example was more due to the modelling of the situation than Dempster’s
rule not working. I add my grain of salt to the debate on Zadeh’s
example by showing a formulation of the problem as a small belief
network and using Dempster’s rule of combination to obtain a realistic
result. This belief network gives the same results as the combination of
the two evidences with the disjunctive rule of combination ^{4}. At the
same time, I show how to do the calculations using my *R* package
*dst*^{5}.

We suppose that a patient is examined by two doctors, A and B. A’s diagnosis is that P has either meningitis (M), with probability 0.99, or brain tumor (T), with probability 0.01. B agrees with A that the probability of a brain tumor is 0.01, but believes that it is the probability of concussion (C) rather than meningitis that is 0.99.

Zadeh considers the same space of diseases {M, T, C} for the two experts. Hence, after the combination of the two pieces of evidence by Dempster’s rule, we find as a result that the belief of a brain tumor is certain.

```
Space of possibilities and Basic Chance Assignment of Expert 1
$Diagnosis1
[1] "M" "T" "C"
```

```
Expert1 specnb mass
1 M 1 0.99
2 T 2 0.01
```

```
Space of possibilities and Basic Chance Assignment of Expert 2
$Diagnosis2
[1] "M" "T" "C"
```

```
Expert2 specnb mass
1 T 1 0.01
2 C 2 0.99
```

```
Combination of the two experts by Dempster's rule
M T C mass bel disbel unc plau rplau
M 1 0 0 0 0 1 -1.1e-13 -1.1e-13 -1.1e-13
T 0 1 0 1 1 0 -1.1e-13 1.0e+00 -9.1e+12
C 0 0 1 0 0 1 -1.1e-13 -1.1e-13 -1.1e-13
frame 1 1 1 0 1 0 -1.1e-13 1.0e+00 -9.1e+12
```

Indeed, the result does not reflect the opinions of the two experts. Before rejecting Dempster’s rule as inappropriate to this situation, let’s look more closely at the problem at hand.

Let’s take *Expert one*. *Expert number one*
distributes the whole mass between the two singletons {M} and {T}.
*Expert number one* does not consider {C} as a possibility. Hence
we conclude that the space of possibilities of *Expert number
one* cannot be {M, T, C}. We can say that *Expert number one*
has restricted the space of possibilities of his/her diagnostic to the
set {M, T}:

\(F(D1) = \{M, T\}\). For simplicity, we write \(D1 =\{M, T\}\).

The same line of reasoning is applied to *Expert number two*.
The whole mass of one is allotted to the set {T, C}, and the third
possibility (M) is not considered at all. Hence \(D2 = \{T, C\}\).

I show the coding of these two pieces of evidence with the function
*bca* of the package *dst*.

```
Space of possibilities and Basic Chance Assignment of Expert 1
$D1
[1] "M" "T"
```

```
e1 specnb mass
1 M 1 0.99
2 T 2 0.01
```

```
Space of possibilities and Basic Chance Assignment of Expert 2
$D2
[1] "C" "T"
```

```
e2 specnb mass
1 C 1 0.99
2 T 2 0.01
```

The two experts are reasoning in two different spaces of
possibilities. To be able to combine their diagnosis, we need a common
ground. This can be done if we introduce a third person, the patient,
with a variable of interest, his/her disease (D). Then it is natural to
take the union of the space of possibilities of *Expert number
one* and *Expert number two* as the space of possibilities of
the patient:

\(D = \{M, T\} \cup \{C, T\} = \{M, T, C\}\).

Thus, the diagnosis of the patient’s disease involves pooling the assessments of the two experts, using the “or” operator. This situation is described by a relation of implication between experts and patient:

r1: \(D1 \cup D2 \rightarrow D\).

The relation *r1* is represented in the product space \(\prod(D1, D2, D)\) by one focal set of mass
one:

\(m(M C M + M C C + M T M + M T T + T C T +
T C C + T T T) = 1\) (for simplicity, the “+” sign is used as the
\(\vee\) disjunctive operator in the
functions of the package dst). Now I use the function *bcaRel* to
code this relation.

```
The relation r1
r1 specnb mass
1 M C M + M C C + M T M + M T T + T C T + T C C + T T T 1 1
```

We now have all the elements of a small network made of one relation
(r1) between three variables: Disease (D), Diagnosis1 (D1), Diagnosis2
(D2), and two pieces of evidence coming from *Expert one* (e1)
and *Expert two* (e2).

The three variables *D*, *D1* and *D2* are the
nodes of the graph. The edges (hyperedges) are given by the relation
*r1* and the two pieces of evidence *e1* and
*e2*.

Using the igraph package, ^{6} a bipartite graph corresponding to the
hypergraph can be obtained.

Our goal is the calculation of the belief function of the variable of
interest “D” (Disease of the patient). We apply an algorithm called
“Peeling” to the belief network. This is a process of successive
elimination of variables (peeling) until only the variable of interest
(*D* here) remains. The elimination of a variable has the effect
of integrating its contribution to the reduced graph.

Four parameters are necessary to trigger the algorithm. The first three are already defined when constructing the hypergraph. The fourth parameter is an order of elimination of the variables that we have to set.

Identification of variables and their space of possibilities \[F(D1) = \{M, T\}\] \[F(D2) = \{M, C\}\] \[F(D) = \{M, T, C\}\]

Incidence matrix of the graph (nodes and edges)

```
Row names are variables names (nodes).
Column names are for pieces of evidence and relations (edges).
e1 e2 r1
D1 1 0 1
D2 0 1 1
D 0 0 1
```

- The names of data specifications (evidence and relations between variables)

`[1] "e1" "e2" "r1"`

- Variable numbers are used to fix the order of elimination. Here we eliminate D1 first, then D2.

```
varnb size V3
D1 1 2 D1
D2 2 2 D2
D 3 3 D
```

The calculations involved follow the principles of the valuation
language of Shenoy ^{7}; see also ^{8}. The variables are
linked to functions (called valuations). A function can be a piece of
evidence attached to a variable or a relation between two or more
variables.

Three kinds of operations are involved in the process of variable elimination: a) the minimal (vacuous) extension of a mass function to a larger space of possibilities; b) the combination of two mass functions by Dempster’s rule; c) the marginalization of a mass function, i.e. eliminating a variable to reduce the function to a smaller space of possibilities. Let’s do it.

First step: Eliminate variable D1 (Diagnosis1). The mass function
*e1* is extended to the space \(\prod(D1, D2, D)\); then *e1
extended* is combined with *r1* by Dempster’s rule; finally,
D1 is eliminated by marginalizing the result of the combination to \(\prod(D2, D)\). The mass function obtained
is named *rel_2*.

Second step: Eliminate variable D2 (Diagnosis2). Evidence *e2*
is extended to the space \(\prod(D2,
D)\); Then *e2 extended* is combined with *rel_2*
by Dempster’s rule; the result of the combination is marginalized to D
to produce the final result.

```
i = : 1 . Eliminating variable no 1 : D1
rels numbers to elim 1 3
i = : 2 . Eliminating variable no 2 : D2
rels numbers to elim 2 4
Peeling ended
```

```
M T C mass bel disbel unc plau rplau
T 0 1 0 0.0001 0.0001 0.9801 0.02 0.02 0.02
M 1 0 0 0.0000 0.0000 0.0100 0.99 0.99 0.99
C 0 0 1 0.0000 0.0000 0.0100 0.99 0.99 0.99
T + C 0 1 1 0.0099 0.0100 0.0000 0.99 1.00 1.01
M + C 1 0 1 0.9801 0.9801 0.0001 0.02 1.00 50.25
M + T 1 1 0 0.0099 0.0100 0.0000 0.99 1.00 1.01
frame 1 1 1 0.0000 1.0000 0.0000 0.00 1.00 Inf
```

The plausibility ratio column shows that the odds of \(M \vee C\) against T are 50:1. hence, the disease of the patient must be M or C. We also see that each single hypothesis, M and C, remains highly plausible (0.99). Although there is some support for T, its plausibility is very weak at 0.019.

Finally, we use the plausibility transformation ^{9} to look at the results
from the point of view of probability distribution. We see again that
the odds for M against T or for C against T are very similar.

```
M T C trplau
M 1 0 0 0.49502
T 0 1 0 0.00995
C 0 0 1 0.49502
```

Retired Statistician, Stat.ASSQ↩︎

L. A. Zadeh. A mathematical theory of evidence (book review). AI Magazine, 55(81—83), 1984↩︎

R. Haenni. Shedding New Light on Zadeh’s Criticism of Dempster’s Rule of Combination. Conference: Information Fusion, 2005 8th International Conference↩︎

P. Smets (1993). Belief Functions: The Disjunctive Rule of Combination and the Generalized Bayesian Theorem. IRIDIA - Université Libre de Bruxelles, Brussels, Belgium↩︎

Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. https://igraph.org↩︎

P. P. Shenoy. A Valuation-Based Language for Expert systems. International Journal of Approximate Reasoning 1989, 3 383–411↩︎

P. P. Shenoy. Valuation-Based Systems. Third School on Belief Functions and Their Applications, Stella Plage, France. September 30, 2015↩︎

Cobb, B. R. and Shenoy, P.P. (2006). On the plausibility transformation method for translating belief function models to probability models. Journal of Approximate Reasoning, 41(3), April 2006, 314–330↩︎