Bayes-Adaptive POMDPs 2007
Bayes-Adaptive POMDPs 2007
Abstract
1 Introduction
In many real world systems, uncertainty can arise in both the prediction of the system’s behavior, and
the observability of the system’s state. Partially Observable Markov Decision Processes (POMDPs)
take both kinds of uncertainty into account and provide a powerful model for sequential decision
making under these conditions. However most solving methods for POMDPs assume that the model
is known a priori, which is rarely the case in practice. For instance in robotics, the POMDP must
reflect exactly the uncertainty on the robot’s sensors and actuators. These parameters are rarely
known exactly and therefore must often be approximated by a human designer, such that even if
this approximate POMDP could be solved exactly, the resulting policy may not be optimal. Thus we
seek a decision-theoretic planner which can take into account the uncertainty over model parameters
during the planning process, as well as being able to learn from experience the values of these
unknown parameters.
Bayesian Reinforcement Learning has investigated this problem in the context of fully observable
MDPs [1, 2, 3]. An extension to POMDP has recently been proposed [4], yet this method relies on
heuristics to select actions that will improve the model, thus forgoing any theoretical guarantee on
the quality of the approximation, and on an oracle that can be queried to provide the current state.
In this paper, we draw inspiration from the Bayes-Adaptive MDP framework [2], which is formu-
lated to provide an optimal solution to the exploration-exploitation trade-off. To extend these ideas
to POMDPs, we face two challenges: (1) how to update Dirichlet parameters when the state is a
hidden variable? (2) how to approximate the infinite dimensional belief space to perform belief
monitoring and compute the optimal policy. This paper tackles both problem jointly. The first prob-
lem is solved by including the Dirichlet parameters in the state space and maintaining belief states
over these parameters. We address the second by bounding the space of Dirichlet parameters to a
finite subspace necessary for ǫ-optimal solutions.
1
We provide theoretical results for bounding the state space while preserving the value function and
we use these results to derive approximate solving and belief monitoring algorithms. We compare
several belief approximations in two problem domains. Empirical results show that the agent is able
to learn good POMDP models and improve its return as it learns better model estimate.
2 POMDP
A POMDP is defined by finite sets of states S, actions A and observations Z. It has transition
′ ′
probabilities {T sas }s,s′ ∈S,a∈A where T sas = Pr(st+1 = s′ |st = s, at = a) and observation
saz saz
probabilities {O }s∈S,a∈A,z∈Z where O = Pr(zt = z|st = s, at−1 = a). The reward function
R : S × A → R specifies the immediate reward obtained by the agent. In a POMDP, the state is
never observed. Instead the agent perceives an observation z ∈ Z at each time step, which (along
with the action sequence) allows it to maintain a belief state b ∈ ∆S. The belief state specifies
the probability of being in each state given the history of action and observation experienced so far,
starting from an initial belief b0 . It can be updated at each time step using Baye’s rule: bt+1 (s′ ) =
′a z sat s′
Os t t+1
P
s∈S T bt (s)
P ′′
s at zt+1 P sat s′′ b (s)
.
s′′ ∈s O s∈S T t
3 Bayes-Adaptive POMDP
In this section, we introduce the Bayes-Adaptive POMDP (BAPOMDP) model, an optimal decision-
theoretic algorithm for learning and planning in POMDPs under parameter uncertainty. Throughout
we assume that the state, action, and observation spaces are finite and known, but that the transition
and observation probabilities are unknown or partially known. We also assume that the reward
function is known as it is generally specified by the user for the specific task he wants to accomplish,
but the model can easily be generalised to learn the reward function as well.
′
To model the uncertainty on the transition T sas and observation Osaz parameters, we use Dirichlet
distributions, which are probability distributions over the parameters of multinomial distributions.
Given φi , the number of times event ei has occurred over n trials, the probabilities pi of each event
follow a Dirichlet distribution, i.e. (p1 , . . . , pk ) ∼ Dir(φ1 , . . . , φk ). This distribution represents
the probability that a discrete random variable behaves according to some probability distribution
Pk
(p1 , . . . , pk ), given that the counts (φ1 , . . . , φk ) have been observed over n trials (n = i=1 φi ). Its
1 Qk φi −1
probability density function is defined by: f (p, φ) = B(φ) i=1 pi , where B is the multinomial
beta function. The expected value of pi is E(pi ) = Pkφi .
j=1 φj
The BAPOMDP is constructed from the model of the POMDP with unknown parameters. Let
′
(S, A, Z, T, O, R, γ) be that model. The uncertainty on the distributions T sa· and Os a· can be
represented by experience counts: φss′ ∀s represents the number of times the transition (s, a, s′ ) oc-
a ′
curred, similarly ψsa′ z ∀z is the number of times observation z was made in state s′ after doing action
a. Let φ be the vector of all transition counts and ψ be the vector of all observation counts. Given
2
′ ′ φa
the count vectors φ and ψ, the expected transition probability for T sas is: Tφsas = P ss′
a , and
s ∈S φss′′
′′
′ ′ ψsa′ z
similarly for Os az : Oψ
s az
= P a .
z ′ ∈Z ψs′ z ′
The objective of the BAPOMDP is to learn an optimal policy, such that actions are chosen to
maximize reward taking into account both state and parameter uncertainty. To model this, we
follow the Bayes-Adaptive MDP framework, and include the φ and ψ vectors in the state of
the BAPOMDP. Thus, the state space S ′ of the BAPOMDP is defined as S ′ = S × T × O,
2
where T = {φ ∈ N|S| |A| |∀(s, a), s′ ∈S φass′ > 0} represents the space in which φ lies and
P
Note here that the observation probabilities are folded into the transition function, and that the ob-
servation function becomes deterministic. This happens because a state transition in the BAPOMDP
automatically specifies which observation is acquired after transition, via the way the counts are
incremented. Since the counts do not affect the reward, the reward function of the BAPOMDP is de-
fined as R′ ((s, φ, ψ), a) = R(s, a); the discount factor of the BAPOMDP remains the same. Using
these definitions, the BAPOMDP has a known model specified by the tuple (S ′ , A, Z, T ′ , O′ , R′ , γ).
The belief state of the BAPOMDP represents a distribution over both states and count values. The
model is learned by simply maintaining this belief state, as the distribution will concentrate over
most likely models, given the prior and experience so far. If b0 is the initial belief state of the
unknown POMDP, and the count vectors φ0 ∈ T and ψ0 ∈ O represent the prior knowledge on this
POMDP, then the initial belief of the BAPOMDP is: b′0 (s, φ0 , ψ0 ) = {b0 (s), if (φ, ψ) = (φ0 , ψ0 );
0, otherwise}. After actions are taken, the uncertainty on the POMDP model is represented by
mixtures of Dirichlet distributions (i.e. mixtures of count vectors).
Note that the BAPOMDP is in fact a POMDP with a countably infinite state space. Hence the belief
update function and optimal value function are still defined as in Section 2. However these functions
now require summations over S ′ = S × T × O. Maintaining the belief state is practical only if the
number of states with non-zero probabilities is finite. We prove this in the following theorem:
Theorem 3.1. Let (S ′ , A, Z, T ′ , O′ , R′ , γ) be a BAPOMDP constructed from the POMDP
(S, A, Z, T, O, R, γ). If S is finite, then at any time t, the set Sb′ ′ = {σ ∈ S ′ |b′t (σ) > 0} has
t
size |Sb′ ′ | ≤ |S|t+1 .
t
The proof of this theorem suggests that it is sufficient to iterate over S and Sb′ ′ in order to compute
t−1
the belief state b′t when an action and observation are taken in the environment. Hence, Algorithm
3.1 can be used to update the belief state.
The value function of a BAPOMDP for finite horizons can be represented by a finite set Γ of func-
tions α : S ′ → R, as in standard POMDP. For example, an exact solution can be computed using
3
function τ (b, a, z)
Initialize b′ as a 0 vector.
for all (s, φ, ψ, s′ ) ∈ Sb′ × S do
a a a a sas′ s′ az
b′ (s′ , φ + δss ′ ′
′ , ψ + δs′ z ) ← b (s , φ + δss′ , ψ + δs′ z ) + b(s, φ, ψ)Tφ Oψ
end for
return normalized b′
Algorithm 3.1: Exact Belief Update in BAPOMDP.
We first present an upper bound on the value difference between two states that differ only by
′
their model estimate φ and ψ. This bound uses′
the following
′
φ ∈ T , and
definitions: given φ,
ψ, ψ ′ ∈ O, define DSsa (φ, φ′ ) = s′ ∈S Tφsas − Tφsas sa
(ψ, ψ ′ ) = z∈Z Oψ saz saz
P P
and DZ − Oψ
′ ′ ,
Proof. Proof available in [11] finds a bound on a 1-step backup and solves the recurrence.
We now use this bound on the α-vector values to approximate the space of Dirichlet parameters
ǫ(1−γ)2
within a finite subspace. We use the following definitions: given any ǫ > 0, define ǫ′ = 8γ||R|| ∞
,
−e ′ ′
2
ǫ(1−γ) ln(γ ) |S|(1+ǫ ) |Z|(1+ǫ )
ǫ′′ = 32γ||R||∞ , NSǫ = max ǫ′ , ǫ1′′ − 1 and NZǫ = max ǫ′ , ǫ1′′ − 1 .
′
Theorem 4.2. Given any ǫ > 0 and (s, φ, ψ) ∈ S ′ such that ∃a ∈ A, s′ ∈ S, Nφs a > NSǫ or
′ ′ ′
Nψs a > NZǫ , then ∃(s, φ′ , ψ ′ ) ∈ S ′ such that ∀a ∈ A, s′ ∈ S, Nφs′ a ≤ NSǫ and Nψs ′a ≤ NZǫ where
|αt (s, φ, ψ) − αt (s, φ′ , ψ ′ )| < ǫ holds for all t and αt ∈ Γt .
4
Theorem 4.2 suggests that if we want a precision of ǫ on the value function, we just need to restrict
2
the space of Dirichlet parameters to count vectors φ ∈ T̃ǫ = {φ ∈ N|S| |A| |∀a ∈ A, s ∈ S, 0 <
sa ǫ |S||A||Z|
Nφ ≤ NS } and ψ ∈ Õǫ = {ψ ∈ N |∀a ∈ A, s ∈ S, 0 < Nψ ≤ NZǫ }. Since T̃ǫ and Õǫ are
sa
finite, we can define a finite approximate BAPOMDP as the tuple (S̃ǫ , A, Z, T̃ǫ , Õǫ , R̃ǫ , γ) where
S̃ǫ = S × T̃ǫ × Õǫ is the finite state space. To define the transition and observation functions over
that finite state space, we need to make sure that when the count vectors are incremented, they stay
within the finite space. To achieve, this we define a projection operator Pǫ : S ′ → S̃ǫ that simply
projects every state in S ′ to their closest state in S̃ǫ .
Definition 4.1. Let d : S ′ × S ′ → R be definedh such that:
s′ a
2γ||R||
(1−γ)2
∞
sup DSsa (φ, φ′ ) + DZ (ψ, ψ ′ )
s,s′ ∈S,a∈A
if s = s′
P
a ′a a ′a
d(s, φ, ψ, s′ , φ′ , ψ ′ ) = s′′ ∈S |φss′′ −φss′′ | z∈Z |ψs′ z −ψs′ z |
P
4
+ ln(γ −e ) (Nφas +1)(Nφas +1) + (N as′ +1)(N as′ ′ +1) ,
′ ψ ψ
8γ||R||∞ 1 + 4
2||R||∞
(1−γ)2 ln(γ −e ) + (1−γ) , otherwise.
The function d uses the bound defined in Theorem 4.1 as a distance between states that only differs
by their φ and ψ vectors, and uses an upper bound on that value when the states differ. Thus
Pǫ always maps states (s, φ, ψ) ∈ S ′ to some state (s, φ′ , ψ ′ ) ∈ S̃ǫ . Note that if σ ∈ S̃ǫ , then
Pǫ (σ) = σ. Using Pǫ , the transition and observation function are defined as follows:
′
s′ az
, if (s′ , φ′ , ψ ′ ) = Pǫ (s′ , φ + δss
Tφsas Oψ a a
′ , ψ + δs′ z )
T̃ǫ ((s, φ, ψ), a, (s′ , φ′ , ψ ′ )) = (4)
0, otherwise.
1, if (s′ , φ′ , ψ ′ ) = Pǫ (s′ , φ + δss
a a
′ , ψ + δs′ z )
Õǫ ((s, φ, ψ), a, (s′ , φ′ , ψ ′ ), z) = (5)
0, otherwise.
These definitions are the same as the one in the infinite BAPOMDP, except that now we add an extra
projection to make sure that the incremented count vectors stays in S̃ǫ . Finally, the reward function
R̃ǫ : S̃ǫ × A → R is defined as R̃ǫ ((s, φ, ψ), a) = R(s, a).
Theorem 4.3 bounds the value difference between α-vectors computed with this finite model and
the α-vector computed with the original model.
Theorem 4.3. Given any ǫ > 0, (s, φ, ψ) ∈ S ′ and αt ∈ Γt computed from the infinite BAPOMDP.
Let α̃t be the α-vector representing the same conditionnal plan as αt but computed with the finite
ǫ
BAPOMDP (S̃ǫ , A, Z, T̃ǫ , Õǫ , R̃ǫ , γ), then |α̃t (Pǫ (s, φ, ψ)) − αt (s, φ, ψ)| < 1−γ .
Proof. Proof available in [11]. Solves a recurrence over the 1-step approximation in Thm. 4.2.
Because the state space is now finite, solution methods from the literature on finite POMDPs could
theoretically be applied. This includes en particular the equations for τ (b, a, z) and V ∗ (b) that were
presented in Section 2. In practice however, even though the state space is finite, it will generally
be very large for small ǫ, such that it may still be intractable, even for small domains. We therefore
favor a faster online solution approach, as described below.
As shown in Theorem 3.1, the number of states with non-zero probability grows exponentially in
the planning horizon, thus exact belief monitoring can quickly become intractable. We now discuss
different particle-based approximations that allow polynomial-time belief tracking.
Monte Carlo sampling: Monte Carlo sampling algorithms have been widely used for sequential
state estimation [12]. Given a prior belief b, followed by action a and observation z, the new belief
b′ is obtained by first sampling K states from the distribution b, then for each sampled s a new state
s′ is sampled from T (s, a, ·). Finally, the probability O(s′ , a, z) is added to b′ (s′ ) and the belief b′
is re-normalized. This will capture at most K states with non-zero probabilities. In the context of
5
BAPOMDPs, we use a slight variation of this method, where (s, φ, ψ) are first sampled from b, and
then a next state s′ ∈ S is sampled from the normalized distribution Tφsa· Oψ
·az
. The probability 1/K
′ ′ a a
is added directly to b (s , φ + δss′ , ψ + δs′ z ).
Most Probable: Alternately, we can do the exact belief update at a given time step, but then only
keep the K most probable states in the new belief b′ and renormalize b′ .
Weighted Distance Minimization: The two previous methods only try to approximate the distribu-
tion τ (b, a, z). However, in practice, we only care most about the agent’s expected reward. Hence,
instead of keeping the K most likely states, we can keep K states which best approximate the be-
lief’s value. As in the Most Probable method, we do an exact belief update, however in this case
we fit the posterior distribution using a greedy K-means procedure, where distance is defined as in
Definition 4.1, weighted by the probability of the state to remove. See [11] for algorithmic details.
While the finite model presented in Section 4.1 can be used to find provably near-optimal policies
offline, this will likely be intractable in practice due to the very large state space required to ensure
good precision. Instead, we turn to online lookahead search algorithms, which have been proposed
for solving standard POMDPs [9]. Our approach simply performs dynamic programming over all the
beliefs reachable within some fixed finite planning horizon from the current belief. The action with
highest return over that finite horizon is executed and then planning is conducted again on the next
belief. To further limit the complexity of the online planning algorithm, we used the approximate
belief monitoring methods detailed above. Its overall complexity is in O((|A||Z|)D Cb ) where D is
the planning horizon and Cb is the complexity of updating the belief.
5 Empirical Results
We begin by evaluating the different belief approximations introduced above. To do so, we use a
simple online d-step lookahead search, and compare the overall expected return and model accuracy
′
in two different problems: the well-known Tiger [5] and a new domain called Follow. Given T sas
s′ az
and O the exact probabilities of the (unknown) POMDP, the model accuracy is measured in
terms of the weighted sum of L1-distance, denoted W L1, between the exact model and the probable
models in a belief state b:
P
W L1(b) = (s,φ,ψ)∈Sb′ b(s, φ, ψ)L1(φ, ψ)
hP
sas′ ′
s′ az ′
i (6)
− T sas | + z∈Z |Oψ − Os az |
P P P
L1(φ, ψ) = a∈A s′ ∈S s∈S |Tφ
5.1 Tiger
In the Tiger problem [5], we consider the case where the transition and reward parameters are known,
but the observation probabilities are not. Hence, there are four unknown parameters: OLl , OLr ,
ORl , ORr (OLr stands for Pr(z = hear right|s = tiger Lef t, a = Listen)). We define the
observation count vector ψ = (ψLl , ψLr , ψRl , ψRr ). We consider a prior of ψ0 = (5, 3, 3, 5), which
specifies an expected sensor accuracy of 62.5% (instead of the correct 85%) in both states. Each
simulation consists of 100 episodes. Episodes terminate when the agent opens a door, at which
point the POMDP state (i.e. tiger’s position) is reset, but the distribution over count vector is carried
over to the next episode.
Figures 1 and 2 show how the average return and model accuracy evolve over the 100 episodes
(results are averaged over 1000 simulations), using an online 3-step lookahead search with varying
belief approximations and parameters. Returns obtained by planning directly with the prior and ex-
act model (without learning) are shown for comparison. Model accuracy is measured on the initial
belief of each episode. Figure 3 compares the average planning time per action taken by each ap-
proach. We observe from these figures that the results for the Most Probable and Weighted Distance
approximations are very similar and perform well even with few particles (lines are overlapping in
many places, making Weighted Distance results hard to see). On the other hand, the performance
of Monte Carlo is significantly affected by the number of particles and had to use much more par-
6
2 Exact model 1 20
Most Probable (2)
Monte Carlo (64)
Return
WL1
−1 10
0.4
−2
Prior model
Most Probable (2) 5
0.2
−3 Monte Carlo (64)
Weighted Distance (2)
−4 0 0
0 20 40 60 80 100 0 20 40 60 80 100 MP (2) MC (64) WD (2)
Episode Episode
Figure 1: Return with different Figure 2: Model accuracy with Figure 3: Planning Time with
belief approximations. different belief approximations. different belief approximations.
ticles (64) to obtain an improvement over the prior. This may be due to the sampling error that is
introduced when using fewer samples.
5.2 Follow
We propose a new POMDP domain, called Follow, inspired by an interactive human-robot task. It
is often the case that such domains are particularly subject to parameter uncertainty (due to the dif-
ficulty in modelling human behavior), thus this environment motivates the utility of Bayes-Adaptive
POMDP in a very practical way. The goal of the Follow task is for a robot to continuously follow one
of two individuals in a 2D open area. The two subjects have different motion behavior, requiring the
robot to use a different policy for each. At every episode, the target person is selected randomly with
P r = 0.5 (and the other is not present). The person’s identity is not observable (except through their
motion). The state space has two features: a binary variable indicating which person is being fol-
lowed, and a position variable indicating the person’s position relative to the robot (5 × 5 square grid
with the robot always at the center). Initially, the robot and person are at the same position. Both the
robot and the person can perform five motion actions {N oAction, N orth, East, South, W est}.
The person follows a fixed stochastic policy (stationary over space and time), but the parameters of
this behavior are unknown. The robot perceives observations indicating the person’s position rela-
tive to the robot: {Same, N orth, East, South, W est, U nseen}. The robot perceives the correct
observation P r = 0.8 and U nseen with P r = 0.2. The reward R = +1 if the robot and person
are at the same position (central grid cell), R = 0 if the person is one cell away from the robot, and
R = −1 if the person is two cells away. The task terminates if the person reaches a distance of 3
cells away from the robot, also causing a reward of -20. We use a discount factor of 0.9.
When formulating the BAPOMDP, the robot’s motion model (deterministic), the observation
probabilities and the rewards are assumed to be known. We maintain a separate count vec-
tor for each person, representing the number of times they move in each direction, i.e. φ1 =
(φ1N A , φ1N , φ1E , φ1S , φ1W ), φ2 = (φ2N A , φ2N , φ2E , φ2S , φ2W ). We assume a prior φ10 = (2, 3, 1, 2, 2)
for person 1 and φ20 = (2, 1, 3, 2, 2) for person 2, while in reality person 1 moves with probabilities
P r = (0.3, 0.4, 0.2, 0.05, 0.05) and person 2 with P r = (0.1, 0.05, 0.8, 0.03, 0.02). We run 200
simulations, each consisting of 100 episodes (of at most 10 time steps). The count vectors’ distri-
butions are reset after every simulation, and the target person is reset after every episode. We use a
2-step lookahead search for planning in the BAPOMDP.
Figures 4 and 5 show how the average return and model accuracy evolve over the 100 episodes (aver-
aged over the 200 simulations) with different belief approximations. Figure 6 compares the planning
time taken by each approach. We observe from these figures that the results for the Weighted Dis-
tance approximations are much better both in terms of return and model accuracy, even with fewer
particles (16). Monte Carlo fails at providing any improvement over the prior model, which indi-
cates it would require much more particles. Running Weighted Distance with 16 particles require
less time than both Monte Carlo and Most Probable with 64 particles, showing that it can be more
time efficient for the performance it provides in complex environment.
7
2 2 200
Exact model
Return
WL1
1 Weighted Distance (16) 100
Most Probable (64)
−4 Monte Carlo (64)
Weighted Distance (16)
Prior model 0.5 50
−6
−8 0 0
0 20 40 60 80 100 0 20 40 60 80 100 MP (64) MC (64) WD (16)
Episode Episode
Figure 4: Return with different Figure 5: Model accuracy with Figure 6: Planning Time with
belief approximations. different belief approximations. different belief approximations.
6 Conclusion
The objective of this paper was to propose a preliminary decision-theoretic framework for learning
and acting in POMDPs under parameter uncertainty. This raises a number of interesting challenges,
including (1) defining the appropriate model for POMDP parameter uncertainty, (2) approximating
this model while maintaining performance guarantees, (3) performing tractable belief updating, and
(4) planning action sequences which optimally trade-off exploration and exploitation.
We proposed a new model, the Bayes-Adaptive POMDP, and showed that it can be approximated
to ǫ-precision by a finite POMDP. We provided practical approaches for belief tracking and online
planning in this model, and validated these using two experimental domains. Results in the Follow
problem, showed that our approach is able to learn the motion patterns of two (simulated) individu-
als. This suggests interesting applications in human-robot interaction, where it is often essential that
we be able to reason and plan under parameter uncertainty.
Acknowledgments
This research was supported by the Natural Sciences and Engineering Research Council of Canada
(NSERC) and the Fonds Québécois de la Recherche sur la Nature et les Technologies (FQRNT).
References
[1] R. Dearden, N. Friedman, and N. Andre. Model based bayesian exploration. In UAI, 1999.
[2] M. Duff. Optimal Learning: Computational Procedure for Bayes-Adaptive Markov Decision Processes.
PhD thesis, University of Massachusetts, Amherst, USA, 2002.
[3] P. Poupart, N. Vlassis, J. Hoey, and K. Regan. An analytic solution to discrete bayesian reinforcement
learning. In Proc. ICML, 2006.
[4] R. Jaulmes, J. Pineau, and D. Precup. Active learning in partially observable markov decision processes.
In ECML, 2005.
[5] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic
domains. Artificial Intelligence, 101:99–134, 1998.
[6] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: an anytime algorithm for POMDPs. In
IJCAI, pages 1025–1032, Acapulco, Mexico, 2003.
[7] M. Spaan and N. Vlassis. Perseus: randomized point-based value iteration for POMDPs. JAIR, 24:195–
220, 2005.
[8] T. Smith and R. Simmons. Heuristic search value iteration for POMDPs. In UAI, Banff, Canada, 2004.
[9] S. Paquet, L. Tobin, and B. Chaib-draa. An online POMDP algorithm for complex multiagent environ-
ments. In AAMAS, 2005.
[10] Jonathan Baxter and Peter L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial
Intelligence Research (JAIR), 15:319–350, 2001.
[11] Stéphane Ross, Brahim Chaib-draa, and Joelle Pineau. Bayes-adaptive pomdps. Technical Report SOCS-
TR-2007.6, McGill University, 2007.
[12] A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo Methods In Practice. Springer, 2001.