Jaakkola, Singh, Jordan, 1994. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems, in: .