Partially Observed Markov Decision Processes: From Filtering to Control

Credits

6 hp / ECTS

Objectives

The course is aimed at PhD-level students in signal processing, control and telecommunications.

The goal is to learn useful tools in the estimation and optimization of stochastic dynamical systems. In particular, we will study discrete-time Markov decision processes.

The course will primarily deal with synthesis of algorithms and structural results. Examples will be drawn primarily from signal processing, telecommunications and sensor networks. For more details, see the lecture plan.

Examiner and course responsible

Vkram Krishnamurthy, vikramk@ece.ubc.ca.

Prerrequisites

Engineering level probability and random processes is adequate background. No measure-theory is required.

Literature

Much of the material in POMDPs have appeared recently in papers in IEEE Transactions Signal Processing, IEEE Transactions Information Theory, IEEE Transactions Automatic Control, Math of Operations Research.

Dates (at 15:15 - 17:00, in Q2, Osquldas väg 10, floor 2, KTH)

- Tuesday April 9

- Friday April 12

- Tuesday April 16

- Friday April 19

- Monday April 22 (new time)

- Tuesday April 23

- Friday April 26

- Monday April 29

- Friday May 3

- Monday May 6 (new time)

- Tuesday May 14

- Friday May 17

Outline

The total course is for 18 hours.

1. Stochastic State Space models and Stochastic Simulation [3 hours], intro slides, part1 slides, homework1

* Stochastic Dynamical Systems

* Markov Models, Perron Frobenius Theorem, Geometric ergodicity

* Linear Gaussian Models

* Jump Markov Linear Systems and Target Tracking

* Stochastic Simulation: Acceptance Rejection, Composition method. Simulation-based optimal predictors

2. Optimal Filtering [4 hours], filter slides, homework2, ML parameter estimation

* What is Optimal State Estimation?..(Conditional mean minimizes Bregman Loss Functions)

* Optimal Filtering: Bayes’ Recursion

* Optimal Predictors and Smoothers

* Kalman Filter

* Hidden Markov Model (HMM)Filter

* Geometric Forgetting of Initial Condition in Optimal Filter . .

* Particle Filter

* Data Augmentation Algorithm

* Reference Probability Method for Filtering

3. Filtering with Non-standard Information Patters [2 hours]

* Non-universal Filters for the State

* Filtering with social learning

* Filtering of Reciprocal Markov Random Fields

4. Fully Observed Markov Decision Processes [2 hours] fullmdp slides

* Problem Formulation

* Stochastic Dynamic Programming

* Supermodularity and Structural Results

* Constrained Markov Decision Processes

5. Partially Observed Markov Decision Processes [3 hours] partialmdp slides, homework3

* Problem Formulation

* Information State

* Stochastic Dynamic Programming and algorithms

6. Structural Results for POMDPs [4 hours]

* Stochastic Orders

* Stochastic Dominance of Filters

* Lattice Programming

* Example 1: Quickest Detection with Optimal Sampling

* Example 2: Optimized Social Learning

* Example 3: Global Games

* Multi-armed bandits

Evaluation (pass/fail)

Homeworks and a final exam.