Generalised max entropy classifiers
Fabio Cuzzolin1
Oxford Brookes University, UK
fabio.cuzzolin@brookes.ac.uk
Abstract. In this paper we propose a generalised maximum-entropy
classification framework, in which the empirical expectation of the feature functions is bounded by the lower and upper expectations associated
with the lower and upper probabilities associated with a belief measure.
This generalised setting permits a more cautious appreciation of the information content of a training set. We analytically derive the KarushKuhn-Tucker conditions for the generalised max-entropy classifier in the
case in which a Shannon-like entropy is adopted.
Keywords: Classification · Max entropy · Constrained optimisation.
1
Introduction
The emergence of new challenging real-world applications has exposed serious
issues with current approaches to model adaptation in machine learning. Existing theory and algorithms focus on fitting the available training data, but
cannot provide worst-case guarantees in mission-critical applications. Vapnik’s
statistical learning theory is useless for model selection, as the bounds on generalisation errors it predicts are too wide to be useful, and rely on the assumption
that training and testing data come from the same (unknown) distribution. The
crucial question is: what exactly can one infer from a training set?
Max entropy classifiers [19] provide a significant example, due to their simplicity and widespread application. There, the entropy of the sought joint (or
conditional) probability distribution of data and class is maximised, following
the maximum entropy principle that the least informative distribution which
matches the available evidence should be chosen. Having picked a set of feature
functions, selected to efficiently encode the training information, the joint distribution is subject to the constraint that their empirical expectation equals that
associated with the max entropy distribution. The assumptions that (i) training
and test data come from the same probability distribution, and that (ii) the
empirical expectation of the training data is correct, and the model expectation
should match it, are rather strong, and work against generalisation power.
A way around this issue is to adopt as models convex sets of probability
distributions, rather than standard probability measures. Random sets, in particular, are mathematically equivalent to a special class of credal sets induced
by probability mass assignments on the power set of the sample space. When
random sets are defined on finite domain, they are often called belief functions
2
F. Cuzzolin
[20]. One can then envisage a robust theory of learning based on generalising
traditional statistical learning theory in order to allow for test data to be sampled from a different probability distribution than the training data, under the
weaker assumption that both belong to the same random set.
In this paper we make a step in that direction by generalising the max entropy classification framework. We take the view that a training set does not
provide, in general, sufficient information to precisely estimate the joint probability distribution of class and data. We assume instead that a belief measure
can be estimated, providing lower and upper bounds on the joint probability of
data and class. As in the classical case, an appropriate measure of entropy for
belief measures is maximised. In opposition to the classical case, however, the
empirical expectation of the chosen feature functions is assumed to be compatible
with lower and upper bounds associated with the sought belief measure. This
leads to a constrained optimisation problem with inequality constraints, rather
than equality ones, which needs to be solved by looking at the Karush-KuhnTucker (KKT) conditions. Due to the concavity of the objective function and the
convexity of the constraints, KKT conditions are both necessary and sufficient.
Related work. A significant amount of work has been conducted in the past on
machine learning approaches based on belief theory. Most efforts were directed
at developing clustering tools, including evidential clustering [4], evidential and
belief C-means [15]. Ensemble classification [23], in particular, has been extensively studied. Concerning classification, Denoeux [5] proposed in a seminal work
a k-nearest neighbor classifier based on belief theory. Relevantly to this paper,
interesting work has been conducted to generalise the framework of decision
trees to situations in which uncertainty is encoded by belief functions, mainly
by Elouedi and co-authors [7], and Vannoorenberghe and Denoeux [22].
Paper outline. After reviewing in Section 2 max-entropy classification, we recall in Section 3 the necessary notions of belief theory. In Section 4 the possible
generalisations of Shannon’s entropy to the case of belief measures are reviewed.
In Section 5 the generalised max-entropy problem is formulated, together with
the associated Kush-Karun-Tucker conditions. It is shown that for several generalised measures of entropy the KKT conditions are necessary and sufficient
for the optimalised of generalised max-entropy (Section 5.1). In Section 5.2 we
derive the analytical expression of the system of KKT conditions for the case of
a Shannon-like entropy for belief measures. Section 6 concludes the paper.
2
Max-entropy classifiers
The objective of maximum entropy classifiers is to maximise the Shannon entropy of the conditional classification distribution p(Ck |x), where x ∈ X is the
observable and Ck ∈ C = {C1 , ..., CK } is the associated class.
Given a training set in which each observation is attached a class, namely:
D = {(xi , yi ), i = 1, ..., N |xi ∈ X, yi ∈ C}, a set M of feature maps is designed,
φ(x, Ck ) = [φ1 (x, Ck ), · · · , φM (x, Ck )]′ whose values depend on both the object
observed and its class. Each feature map φm : X × C → R is then a random
Generalised max entropy classifiers
3
P
variable whose expectation is: E[φm ] = x,k p(x, Ck )φm (x, Ck ). In opposition,
P
the empirical expectation of φm is: Ê[φm ] = x,k p̂(x, Ck )φm (x, Ck ), where p̂
is a histogram constructed by
P counting occurrences of the pair (x, Ck ) in the
training set: p̂(x, Ck ) = N1 (xi ,yi )∈D δ(xi = x ∧ yi = Ck ). The theoretical
expectation E[φm ] can be approximated by decomposing p(x, Ck ) = p(x)p(Ck |x)
via Bayes’ rule, and approximating the (unknown) prior of the observations p(x)
with the empirical
P prior p̂, i.e., the histogram of observed values in the training
set: Ẽ[φm ] = x,k p̂(x)p(Ck |x)φm (x, Ck ).
Definition 1. Given a training set D = {(xi , yi ), i = 1, ..., N |xi ∈ X, yi ∈ C}
related to problem of classifying x ∈ X as belonging to one of the classes C =
{C1 , ..., CK }, the max entropy classifier is the conditional probability p∗ (Ck |x)
.
such that: p∗ (Ck |x) = arg maxp(Ck |x) Hs (P ), where Hs is the traditional Shannon
entropy, subject to: Ẽp [φm ] = Ê[φm ] ∀m = 1, ..., M.
The constraint requires the classifier to be consistent with the empirical frequencies of the features in the training set, while seeking the least informative
probability distribution that does so. The solution of the maximum entropy clas∗
sification
P problem (Definition 1) is the so-called log-linear model : p (Ck |x) =
1
λm φm (x,Ck )
′
m
, where λ = [λ1 , ..., λM ] are the Lagrange multipliers assoZλ (x) e
ciated with the linear constraints Ẽp [φm ] = Ê[φm ], and Zλ (x)P
is a normalisation
factor. The related classification function is: y(x) = arg maxk m λm φm (x, Ck ),
i.e., x is assigned the class which maximises the linear combination of the feature
functions with coefficients λ.
3
Belief functions
Definition 2. A basic probability assignment (BPA) [1] over a discrete set Θ
is aPfunction m : 2Θ → [0, 1] defined on 2Θ = {A ⊆ Θ} such that: m(∅) =
BPA m : 2Θ →
0,
A⊂Θ m(A) = 1. The belief function (BF) associated with aP
Θ
[0, 1] is the set function Bel : 2 → [0, 1] defined as: Bel(A) = B⊆A m(B).
The elements of the power set 2Θ associated with non-zero values of m are called
the focal elements of m. For each subset (‘event’) A ⊂ Θ the quantity Bel(A) is
called the degree of belief that the outcome lies in A, and represents the total
belief committed to a set of outcomes A by the available evidence m. Dually, the
.
upper probability of A: P l(A) = 1−Bel(Ā), Ā = Θ\A, expresses the ‘plausibility’
of a proposition A or, in other words, the amount of evidence not against A [3].
The plausibility function P l : 2Θ → [0, 1]
Pthus conveys the same information as
Bel, and can be expressed as: P l(A) = B∩A6=∅ m(B) ≥ Bel(A).
Belief functions are mathematically equivalent to a special class of credal
sets (convex sets of probability measures), as each BF Bel is associated with the
set P[Bel] = {P : P (A) ≥ Bel(A)} of probabilities
P dominating it. Its centre of
mass is the pignistic function BetP [Bel](x) = A∋x m(A)/|A|, x ∈ Θ. Given
a function f : Θ → R, the lower expectation and upper
P expectation of f w.r.t.
.
Bel are, respectively: EBel∗ [f ] = inf P ∈P[Bel] EP [f ] = A⊆Θ m(A) inf x∈A f (x),
P
.
∗
EBel
[f ] = supP ∈P[Bel] EP [f ] = A⊆Θ m(A) supx∈A f (x).
4
4
F. Cuzzolin
Measures of generalised entropy
The issue of how to assess the level of uncertainty associated with a belief function [10] is not trivial, as authors such as Yager and Klir argued that there
are several facets to uncertainty, such as conflict (or discord, dissonance) and
non-specificity (also called vagueness, ambiguity or imprecision).
Some measures arePdirectly inspired by Shannon’s entropy of probability
measures: Hs [p] = − x∈Θ p(x) log p(x). While Nguyen’s measure is a direct
generalisation
P in which probability values are replaced by mass values [17]:
Hn [m] = − A∈F m(A) log m(A), where F is the list of focal elements of m, in
Yager’s
P entropy [24] probabilities are (partly) replaced by plausibilities: Hy [m] =
− A∈F m(A)
P log P l(A). Hohle’s measure of confusion [9] is the dual measure:
Ho [m] = − A∈F m(A) log Bel(A). All such measures only capture the ‘conflict’ portion of uncertainty. Other measures are designed to capture the specificity of belief measures, i.e., the degree of concentration of the mass assigned
to focal elements.
A first such measure was due to Klir, Dubois & Prade [6]:
P
Hd [m] = A∈F m(A) log |A|, and can be considered as a generalization of Hartley’s entropy (H = log(|Θ|))
to belief functions. A more sophisticated proposal
P
by Pal [18]: Ha [m] = A∈F m(A)/|A|, assesses the dispersion of the evidence
and is linked to the pignistic
transform. A final proposal based
P on the commonP
1
).
ality function Q(A) = B⊃A m(B) is due to Smets: Ht = A∈F log( Q(A)
Composite measures, such as Lamata and Moral’s Hl [m] = Hy [m] + Hd [m]
[14], as designed to capture both entropy and specificity. Klir & Ramer [13]
proposed a ‘global uncertainty measure’ defined as: Hk [m] = D[m] + Hd [m],
P
P
where: D(m) = − A∈F m(A) log[ B∈F m(B) |A∩B|
|B| ]. Pal et al [18] argued that
none of these composite measures is really satisfactory, as they do not admit a
unique maximum and there is no sounding rationale for simply adding conflict
and non-specificity measures together.
In the credal interpretation of belief functions, Harmanec and Klir’s aggregated uncertainty (AU) [8] is defined as the maximal Shannon entropy of all the
probabilities consistent with the given BF: Hh [m] = maxP ∈P[Bel] {Hs [P ]}. Hh [m]
is the minimal measure meeting a set of rationality requirements which include:
symmetry, continuity, expansibility, subadditivity, additivity, monotonicity, normalisation. Similarly, Maeda and Ichihashi [16] proposed a composite measure
Hi [m] = Hh [m]+Hd [m] whose first component consists of the maximum entropy
of the set of probability distributions consistent with m, and whose second part
is the generalized Hartley entropy. As both Hh and Hi have high computational
complexity, Jousselme et al [11] proposed an ambiguity measure (AM), as the
classical entropy of the pignistic function: Hj [m] = Hs [BetP [m]].
Jirousek and Shenoy [10] analysed all these proposal in 2016, assessing them
versus a number of significant properties, concluding that only the MaedaIchihashi proposal meets all these properties. The issue remains still unsettled. In
the following we will adopt a straighforward generalisation of Shannon’s entropy,
and a few selected proposals based on their concavity property.
Generalised max entropy classifiers
5
5
Generalised max-entropy problem
Technically, in order to generalise the max-entropy optimisation problem (Definition 1) to the case of belief functions, we need to: (i) choose an appropriate
measure of entropy for belief function as the objective function; (ii) revisit the
constraints that the (theoretical) expectations of the feature maps are equal to
the empirical ones computed over the training set.
As for (ii), it is sensible to require that the empirical expectation of the
feature functions is bracketed by the lower and upper expectations associated
with the sought belief function Bel : 2X×C → [0, 1]. In this paper we only make
use of the 2-monotonicity of belief functions, and write:
X
X
Bel(x, Ck )φm (x, Ck ) ≤ Ê[φm ] ≤
P l(x, Ck )φm (x, Ck )
(1)
(x,Ck )
(x,Ck )
∀m = 1, ..., M , as we only consider probability intervals on singleton elements
(x, Ck ) ∈ X × C. Fully fledged lower and upper expectations (cfr. Section 3),
which express the full monotonicity of BFs, will be considered in future work.
Going even further, should constraints of the form (1) be enforced on all
possible subsets A ⊂ X × C, rather than just singleton pairs (x, Ck )? This goes
back to the question of what information does a training set actually carry. More
general constraints would require extending the domain of feature functions to
set values – we will investigate this idea in the near future as well.
5.1
Formulation and Karush-Kuhn-Tucker (KKT) conditions
In the same classification setting of Section 2, the maximum belief entropy classifier is the joint belief measure Bel∗ (x, Ck ) : 2X×C → [0, 1] which solves the
.
following optimisation problem: Bel∗ (x, Ck ) = arg maxBel(x,Ck ) H(Bel) subject
to the inequality constraints (1), where H is an appropriate measure of entropy
for belief measures. As the above optimisation problem involves inequality constraints (1), as opposed to the equality constraints of traditional max entropy
classifiers, we need to analyse the Karush-Kuhn-Tucker (KKT) [12] necessary
conditions for a belief function Bel to be an optimal solution to the problem.
Definition 3. Suppose that the objective function f : Rn → R and the constraint
functions gi : Rn → R and hj : Rn → R of a nonlinear optimisation problem
arg maxx f (x) subject to: gi (x) ≤ 0 i = 1, ..., m, hj (x) = 0 j = 1, ..., l are
continuously differentiable at a point x∗ . If x∗ is a local optimum, under appropriate regularity conditions then there exist constants µi , (i = 1, . . . , m) and
λj (j = 1, . . . , l), called KKT multipliers, such that the following conditions hold:
Pm
Pl
1. Stationarity: ∇f (x∗ ) = i=1 µi ∇gi (x∗ ) + j=1 λj ∇hj (x∗ );
2. Primal feasibility: gi (x∗ ) ≤ 0 ∀i = 1, . . . , m, and hj (x∗ ) = 0, ∀j = 1, . . . , l;
3. Dual feasibility: µi ≥ 0 for all i = 1, . . . , m;
4. Complementary slackness: µi gi (x∗ ) = 0 for all i = 1, . . . , m.
6
F. Cuzzolin
Crucially, the KKT conditions are also sufficient whenever the objective function f is concave, the inequality constraints gi are continuously differentiable
convex functions, and the equality constraints hj are affine1 .
Theorem 1. If either Ht , Hn , Hd , Hs [Bel] or Hs [P l] is adopted as measure of
entropy, the generalised max entropy optimisation problem has concave objective
function and convex constraints. Therefore, the KKT conditions are sufficient
for the optimality of its solution(s).
Concavity of the entropy objective function. It is well known that Shannon’s
entropy is a concave function of probability distributions, represented as vectors
of probability values2 . Furthermore: any linear combination of concave functions is concave; a monotonic and concave function of a concave function is still
concave; the logarithm is a concave function.
As shown by Smets [21], the transformations which map mass vectors to
vectors of belief (and commonality) values are linear, as they can be expressed
in the form of matrices. In particular, bel = Bf rM m, where Bf rM is a matrix
whose (A, B) entry is: Bf rM (A, B) = 1 if B ⊆ A, 0 otherwise, and bel, m are
vectors collecting the belief (mass) values of all events A ⊆ Θ. The same can
be said of the mapping q = Qf rM m between a mass vector and the associated
commonality vector. As a consequence, belief, plausibility and commonality are
all linear (and therefore concave) functions of a mass vector.
Using this matrix representation, it is easy to conclude that several of the entropies definedP
in Section 4 are indeed concave. In particular, Smets’ specificity
1
) is concave, as a linear combination of concave funcmeasure Ht = A log( Q(A)
P
tions. Nguyen’s entropy Hn = − A m(A) log(m(A)) = Hs [m] is also concave,
as the P
Shannon’s entropy of a mass assignment. Dubois and Prade’s measure
Hd = A m(A) log(|A|) is also concave with respect to m, as a linear combination of mass values. Direct applications
of Shannon’s entropy function to Bel
P
1
), HP l [m] = Hs [P l] =
and P l: HBel [m] = Hs [Bel] = A⊆Θ Bel(A) log( Bel(A)
P
1
A⊆Θ P l(A) log( P l(A) ) are also trivially concave, due to the concavity of the
entropy function and to the linearity of the mapping from m to Bel, P l. Drawing
conclusions on the other measures is less immediate, as they involve products of
concave functions (which are not, in general, guaranteed to be concave).
Convexity of the interval expectation constraints. As for the contraints (1) of
the generalised max entropy problem, we first note P
that (1) can be decomposed
.
1
into the following pair of constraints: gm
(m) =
x,k Bel(x, Ck )φm (x, Ck ) −
P
2
Ê[φm ] ≤ 0, gm (m) =
x,k φm (x, Ck )[p̂(x, Ck ) − P l(x, Ck )] ≤ 0 for all m =
1, ..., M . The first inequality constraint is a linear combination of linear functions of the sought mass assignment m∗ : 2X×C → [0, 1] (since Bel∗ results from
applying a matrix transformation to m∗ ). As pl = 1 − Jbel = 1 − JBf rM m,
2
constraint gm
is also a linear combination of mass values. Hence, as linear func1
2
tion, constraints gm
and gm
are both concave and convex.
1
2
More general sufficient conditions can be given in terms of invexity [2] requirements.
http://projecteuclid.org/euclid.lnms/1215465631
Generalised max entropy classifiers
5.2
7
Belief max-entropy classifier for Shannon’s entropy
For the Shannon-like entropy Condition 1. (stationarity), applied to the sought
PM
1
optimal BF Bel∗ : 2X×C → [0, 1], reads as: ∇HBel (Bel∗ ) = m=1 µ1m ∇gm
(Bel∗ )+
2
µ2m ∇gm
(Bel∗ ). The components of ∇HBel are the partial derivatives of the entropy with respect to the mass values m(B), for all B ⊆ Θ. They read as:
X
X
X X
∂HBel
∂
=
−(
[1+log Bel(A)].
m(B)) log(
m(B)) = −
∂m(B)
∂m(B)
B⊆A
B⊆A
A⊇B
A⊇B
1
P
∂gm
∂
1
= ∂m(B)
As for ∇gm
(Bel∗ ) we have: ∂m(B)
(x,Ck )∈Θ Bel(x, Ck )φm (x, Ck ) −
P
∂
Ê[φm ] = ∂m(B) (x,Ck )∈Θ m(x, Ck )φm (x, Ck )−Ê[φm ] which is equal to φm (x, Ck )
∂g 2
m
for B = {(x, Ck )}, 0 otherwise3 . As for the second set of constraints: ∂m(B)
=
P
∂
(x,Ck )∈Θ φm (x, Ck )[p̂(x, Ck )−P l(x, Ck )] which, recalling that P l(x, Ck ) =
∂m(B)
P
P
B∩{(x,Ck )}6=∅ m(B), becomes equal to = −
(x,Ck )∈B φm (x, Ck ).
Assembling all our results, the KKT stationarity conditions for the generalised, belief-theoretical maximum entropy problem amount to, for all B ⊂ X×C:
( P
PM
− A⊇B [1 + log Bel(A)] = m=1 φm (x, C k )[µ1m − µ2m ], |B = {(x, C k )}| = 1,
PM
P
P
− A⊇B [1 + log Bel(A)] = m=1 µ2m (x,Ck )∈B φm (x, Ck ), |B| > 1.
(2)
2
1
≥
0
,
µ
The other conditions are, ∀m = 1, ..., M , (1) (primal
feasibility),
µ
m
m
P
(dual feasibility), and complementary slackness: µ1m (x,Ck )∈Θ Bel(x, Ck )φm (x, Ck )−
P
Ê[φm ] = 0, µ2m (x,Ck )∈Θ φm (x, Ck )[p̂(x, Ck ) − P l(x, Ck )] = 0.
6
Conclusions
In this paper we proposed a generalisation of the max entropy classifier entropy
in which the assumptions that test and training data are sampled by a same
probability distribution, and that the empirical expectation of the feature functions is ‘correct’ are relaxed in the formalism of belief theory. We also studied
the conditions under which the associated KKT conditions are necessary and
sufficient for the optimality of the solution. Much work remains: (i) providing
analytical model expressions, similar to log-linear models, for the Shannon-like
and other major entropy measures for belief functions; (ii) analysing the case in
which the full lower and upper expectations are plugged in; (iii) comparing the
resulting classifiers; (iv) analysing a formulation based on the least commitment
principle, rather than max entropy, for the objective function to optimise; finally,
(v) relaxing the constraint that feature functions be defined on singleton pairs
(x, Ck ), in a further generalisation of this important framework.
3
If we could define feature functions over non singletons subsets A ⊆ Θ, this would
simply generalise to φ(B) for all B ⊆ Θ.
8
F. Cuzzolin
References
1. Augustin, T.: Modeling weak information with generalized basic probability assignments. In: Data Analysis and Information Systems, 101–113, Springer, 1996.
2. Ben-Israel, A. et al: What is invexity? J. Austral. Math. Soc. Ser. B 28:1–9, 1986.
3. Cuzzolin, F.: Three alternative combinatorial formulations of the theory of evidence. Intelligent Data Analysis 14(4):439–464, 2010.
4. Denoeux, T. and Masson, M.-H.: EVCLUS: Evidential Clustering of Proximity
Data. IEEE Trans Syst Man Cybern B 34(1):95-109, 2004.
5. Denœux, T.: A k-nearest neighbor classification rule based on Dempster-Shafer
theory. IEEE Trans Syst Man Cybern 25(5):804-813, 1995.
6. Dubois, D., Prade, H.: Properties of measures of information in evidence and possibility theories. Fuzzy Sets Syst 100:35–49, 1999.
7. Elouedi, Z., Mellouli, K., Smets, P.: Belief decision trees: theoretical foundations.
Int J Approx Reason 28(23):91–124, 2001.
8. Harmanec, D., Klir, G.J.: Measuring total uncertainty in Dempster-Shafer theory:
A novel approach. Int J Gen Syst 22(4):405–419, 1994.
9. Hohle, U.: Entropy with respect to plausibility measures. In: Proceedings of the
12th IEEE Symposium on Multiple-Valued Logic., pp. 167–169, 1982.
10. Jirousek, R., Shenoy, P.P.: Entropy of belief functions in the Dempster-Shafer theory: A new perspective. In: Proceedings of BELIEF, pp. 3-13, 2016.
11. Jousselme, A.L. et al: Measuring ambiguity in the evidence theory. IEEE Trans
Syst Man Cybern A 36(5):890–903, 2006.
12. Karush, W.: Minima of functions of several variables with inequalities as side constraints. MSc Dissertation, Dept. of Mathematics, Univ. of Chicago, 1939.
13. Klir, G.J.: Measures of uncertainty in the Dempster-Shafer theory of evidence. In:
Advances in the Dempster-Shafer theory of evidence, pp. 35–49, 1994.
14. Lamata, M.T., Moral, S.: Measures of entropy in the theory of evidence. Int J Gen
Syst 14(4):297–305, 1988.
15. Liu, Z. et al: Belief C-means: An extension of fuzzy c-means algorithm in belief
functions framework. Pattern Recognit Lett 33(3):291–300, 2012.
16. Maeda, Y., Ichihashi, H.: An uncertainty measure with monotonicity under the
random set inclusion. Int J Gen Syst 21(4):379–392, 1993.
17. Nguyen, H.: On entropy of random sets and possibility distributions. In: The Analysis of Fuzzy Information, pp. 145–156, 1985.
18. Pal, N.R., Bezdek, J.C., Hemasinha, R.: Uncertainty measures for evidential reasoning ii: A new measure of total uncertainty. Int J Approx Reason 8:1–16, 1993.
19. Pietra, S.D., et al: Inducing features of random fields. IEEE Trans Pattern Anal
Mach Intell 19(4):380–393, 1997.
20. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, 1976.
21. Smets, P.: The application of the matrix calculus to belief functions. Int J Approx
Reason 31(1-2):1–30, 2002.
22. Vannoorenberghe, P. et al: Handling uncertain labels in multiclass problems using
belief decision trees. In: Proceedings of IPMU, 2002.
23. Xu, L. et al: Methods of combining multiple classifiers and their applications to
handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435, 1992.
24. Yager, R.R.: Entropy and specificity in a mathematical theory of evidence. Int J
Gen Syst 9:249–260, 1983.