PERSONAL - Markov Chains

Introduction

Markov chain: a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event
- discrete-time Markov chain: the chain moves state at discrete time steps (used in this course)
- defining property: a stochastic process (X_t)_{t\in\N_0} $(X_t)_{t\in\N_0}$ on a countable (finite or countably infinite) state space E $E$ is called a Markov chain if for every n \in \N $n \in \N$ and for every i,j,i_0,...,i_{n-1} \in E $i,j,i_0,...,i_{n-1} \in E$ such that P(X_0 = i_0,...,X_n=i) > 0 $P(X_0 = i_0,...,X_n=i) > 0$ , it holds that \displaystyle{P(X_{n+1}=j\;|\;X_0 = i_0,...,X_n=i) = P(X_{n+1}=j\;|\;X_n=i)} $\displaystyle{P(X_{n+1}=j\;|\;X_0 = i_0,...,X_n=i) = P(X_{n+1}=j\;|\;X_n=i)}$
  - equivalent (homogeneous): P(X_0=i_0,...,X_n=i_n) = P(X_0=i_0,...,X_{n-1}=i_{n-1})\Pi(i_{n-1},i_n) $P(X_0=i_0,...,X_n=i_n) = P(X_0=i_0,...,X_{n-1}=i_{n-1})\Pi(i_{n-1},i_n)$
  - equivalent (hom., expanded): P(X_0=i_0,...,X_n=i_n) = \mu(i_0)\Pi(i_0,i_1)...\Pi(i_{n-1},i_n) $P(X_0=i_0,...,X_n=i_n) = \mu(i_0)\Pi(i_0,i_1)...\Pi(i_{n-1},i_n)$
  - P(X_n = i_n) = \sum_{i_0,...,i_{n-1}\in E^n}P(X_n=i_n,...,X_0=i_0) $P(X_n = i_n) = \sum_{i_0,...,i_{n-1}\in E^n}P(X_n=i_n,...,X_0=i_0)$
- homogeneity: the transition probabilities are independent of the time t $t$
  - formally: the Markov chain (X_t)_{t\in\N_0} $(X_t)_{t\in\N_0}$ is homogeneous if for every i,j\in E $i,j\in E$ and every n,m\in\N $n,m\in\N$ , if P(X_{n-1}=i) > 0 $P(X_{n-1}=i) > 0$ and P(X_{m-1}=i)>0 $P(X_{m-1}=i)>0$ , then P(X_n=j\;|\;X_{n-1}=i) = P(X_m=j\;|\;X_{m-1}=i) $P(X_n=j\;|\;X_{n-1}=i) = P(X_m=j\;|\;X_{m-1}=i)$
stochastic process: a sequence (X_t)_{t\in\N_0} $(X_t)_{t\in\N_0}$ of random variables (t $t$ : time), all defined on the same probability space (\Omega, \mathcal{F}, P) $(\Omega, \mathcal{F}, P)$ , all taking values in the same (finite or countably infinite) space E $E$
- in other words: a sequence \{X(t) : t \in T\} $\{X(t) : t \in T\}$ , with t $t$ meaning "(discrete) time" here
- notation: X_t = i $X_t = i$ means that at time t $t$ , the process is in the state i $i$
conditional probability: P(A|B) = \frac{P(A,B)}{P(B)} $P(A|B) = \frac{P(A,B)}{P(B)}$
independence: two events A,B $A,B$ independent (of each other) if and only if P(A,B) = P(A)P(B) $P(A,B) = P(A)P(B)$
- also: P(A|B) = P(A) $P(A|B) = P(A)$ and P(B|A)=P(B) $P(B|A)=P(B)$ (observed by using above conditional probability definition)
stochastic matrix: matrix where the sum of each row is 1
- formally: \Pi\in[0,1]^{E\times E} $\Pi\in[0,1]^{E\times E}$ stochastic, if \sum_{j\in E} \Pi(i,j) = 1 $\sum_{j\in E} \Pi(i,j) = 1$ for all i\in E $i\in E$
- doubly stochastic matrix: columns also sum to 1
  - formally: \Pi\in[0,1]^{E\times E} $\Pi\in[0,1]^{E\times E}$ double stochastic, if \Pi $\Pi$ stochastic and \sum_{i\in E} \Pi(i,j) = 1 $\sum_{i\in E} \Pi(i,j) = 1$ for all j\in E $j\in E$
  - note¹: if a stochastic matrix is symmetric, it is also doubly stochastic
  - note²: for a double stochastic matrix, the stationary distribution is the uniform distribution
transition matrix: a square matrix used to describe the transitions of a homogeneous Markov chain
- formally: \Pi\in[0,1]^{E\times E} $\Pi\in[0,1]^{E\times E}$ stochastic and \Pi(i,j) = P(X_{n+1}=j\;|\;X_n=i) $\Pi(i,j) = P(X_{n+1}=j\;|\;X_n=i)$ for all i,j\in E $i,j\in E$ and n \in \N_0 $n \in \N_0$ with P(X_n = i) > 0 $P(X_n = i) > 0$

Existence, Markov Property

initial distribution: the distribution of X_0 $X_0$
- formally: \mu: E \to \R $\mu: E \to \R$ , \mu(i) := P(X_0 = i) $\mu(i) := P(X_0 = i)$
  - \forall i: \mu(i) \geq 0 $\forall i: \mu(i) \geq 0$
  - \sum_{i\in E} \mu(i) = 1 $\sum_{i\in E} \mu(i) = 1$
- existence theorem: let \mu $\mu$ be a distribution on E $E$ , let \Pi \in [0,1]^{E\times E} $\Pi \in [0,1]^{E\times E}$ be a stochastic matrix; then there exists a homogeneous Markov chain (X_t)_{t\in\N_0} $(X_t)_{t\in\N_0}$ with initial distribution \mu $\mu$ and transition matrix \Pi $\Pi$
- lemma: let (X_t)_{t\in\N_0} $(X_t)_{t\in\N_0}$ be a stochastic process on E $E$ , let \Pi \in [0,1]^{E\times E} $\Pi \in [0,1]^{E\times E}$ be a stochastic matrix; then (X_t)_{t\in\N_0} $(X_t)_{t\in\N_0}$ is a homogeneous Markov chain with transition matrix \Pi $\Pi$ if and only if for all n \in \N $n \in \N$ and for all i,j,i_0,...,i_{n-1} \in E $i,j,i_0,...,i_{n-1} \in E$ such that P(X_0 = i_0, ..., X_n = i) > 0 $P(X_0 = i_0, ..., X_n = i) > 0$ , it holds that \displaystyle{P(X_{n+1}=j\;|\;X_0 = i_0,...,X_n=i) = \Pi(i,j)} $\displaystyle{P(X_{n+1}=j\;|\;X_0 = i_0,...,X_n=i) = \Pi(i,j)}$
random-mapping representation: every homogeneous Markov chain can be realized as X_{n+1}=f(X_n, Z_{n+1}) $X_{n+1}=f(X_n, Z_{n+1})$
- formally: let Z_n, n\in N $Z_n, n\in N$ iid taking values in F $F$ and let E $E$ be a countable state space; let f : E \times F \to E $f : E \times F \to E$ be a measurable function and let X_0: \Omega \to E $X_0: \Omega \to E$ be a random variable independent of Z_n $Z_n$ ; set \displaystyle{X_{n+1} = f(X_n, Z_{n+1}) \;\forall n \in \N_0} $\displaystyle{X_{n+1} = f(X_n, Z_{n+1}) \;\forall n \in \N_0}$ , then (X_n)_{n\in\N_0} $(X_n)_{n\in\N_0}$ is a homogeneous Markov chain on E $E$ with transition matrix \displaystyle{\Pi(i,j) = P(f(i,Z_1)=j) \;\forall i,j\in E} $\displaystyle{\Pi(i,j) = P(f(i,Z_1)=j) \;\forall i,j\in E}$
  - expanded: X_1 = f(X_0, Z_1);\;\; X_2=f(X_1,Z_2) = f(f(X_0,Z_1),Z_2) $X_1 = f(X_0, Z_1);\;\; X_2=f(X_1,Z_2) = f(f(X_0,Z_1),Z_2)$ and so on...
- on [0,1] $[0,1]$ : let E $E$ be a countable state space and let \Pi\in[0,1]^{E \times E} $\Pi\in[0,1]^{E \times E}$ be a stochastic matrix; let Z_n, n\in N $Z_n, n\in N$ iid uniform distributed on [0,1] $[0,1]$ ; let f : E \times [0,1] \to E $f : E \times [0,1] \to E$ be a measurable function; set \displaystyle{X_0 = i_0,\; X_{n+1} = f(X_n, Z_{n+1}) \;\forall n \in \N_0} $\displaystyle{X_0 = i_0,\; X_{n+1} = f(X_n, Z_{n+1}) \;\forall n \in \N_0}$ , then (X_n)_{n\in\N_0} $(X_n)_{n\in\N_0}$ is a homogeneous Markov chain on E $E$ with transition matrix \Pi $\Pi$ and P(X_0=i_0) = 1 $P(X_0=i_0) = 1$
  - corollary: if E $E$ is countable and \Pi\in[0,1]^{E \times E} $\Pi\in[0,1]^{E \times E}$ is a stochastic matrix, there exists a homogeneous Markov chain with transition matrix \Pi $\Pi$
Markov property: no matter what happened before time m $m$ , once we know X_m = k $X_m = k$ , the process restarts at k $k$ with the same law as the original chain started from k $k$ , dropping all history before m $m$
- in other words: the future only depends on the present
- formally: let (X_n)_{n\in \N_0} $(X_n)_{n\in \N_0}$ be a Markov chain with transition matrix \Pi $\Pi$ ; fix m \in \N $m \in \N$ and k \in E $k \in E$ such that P(X_m = k) > 0 $P(X_m = k) > 0$ ; then, under \tilde P := P(\cdot\;|\;X_m = k) $\tilde P := P(\cdot\;|\;X_m = k)$ , the sequence (\tilde X_n := X_{n+m})_{n \in \N_0} $(\tilde X_n := X_{n+m})_{n \in \N_0}$ is a Markov chain with transition matrix \Pi $\Pi$ and starting distribution \delta_k $\delta_k$ (Dirac measure), independent of X_0,...,X_m $X_0,...,X_m$ (?)
  - \delta_k(i) = \begin{cases}1 & \text{if } k=i\\0 & \text{if } k \neq i\end{cases} $\delta_k(i) = \begin{cases}1 & \text{if } k=i\\0 & \text{if } k \neq i\end{cases}$
  - for any past event A $A$ before or at X_m $X_m$ and any future event B $B$ after X_m $X_m$ , \displaystyle{P(A \cap B \;|\; X_m = k) = P(A\;|\; X_m = k)P(B\;|\; X_m = k)} $\displaystyle{P(A \cap B \;|\; X_m = k) = P(A\;|\; X_m = k)P(B\;|\; X_m = k)}$

Finite Dimensional Distributions

what defines a Markov chain?: let E $E$ be a countable state space, let \Pi \in [0,1]^{E\times E} $\Pi \in [0,1]^{E\times E}$ be a stochastic matrix, let \mu $\mu$ be a distribution on E $E$ , let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be a stochastic process on E $E$ ; then the following are equivalent (iff-s):
1. (X_n)_{n\in \N_0} $(X_n)_{n\in \N_0}$ is a Markov chain with transition matrix \Pi $\Pi$ and initial distribution \mu $\mu$
2. \forall n \in \N_0, i_0,...,i_n \in E: P(X_0 = i_0,...,X_i=i_n)=\mu(i_0)\Pi(i_0,i_1)...\Pi(i_{n-1},i_n) $\forall n \in \N_0, i_0,...,i_n \in E: P(X_0 = i_0,...,X_i=i_n)=\mu(i_0)\Pi(i_0,i_1)...\Pi(i_{n-1},i_n)$
3. \forall n \in \N_0, A_0,...,A_n \subseteq E: P(X_0 \in A_0,...,X_n \in A_n)=\sum_{i_0\in A_0}\mu(i_0)\sum_{i_1\in A_1}\Pi(i_0,i_1)...\sum_{i_n \in A_n}\Pi(i_{n-1},i_n) $\forall n \in \N_0, A_0,...,A_n \subseteq E: P(X_0 \in A_0,...,X_n \in A_n)=\sum_{i_0\in A_0}\mu(i_0)\sum_{i_1\in A_1}\Pi(i_0,i_1)...\sum_{i_n \in A_n}\Pi(i_{n-1},i_n)$
- uniqueness in distribution: \mu $\mu$ and \Pi $\Pi$ uniquely define the distribution of a Markov chain
  - formally: let \mu $\mu$ be a distribution on E $E$ and \Pi \in [0,1]^{E\times E} $\Pi \in [0,1]^{E\times E}$ be a stochastic matrix on E $E$ ; then two Markov chains (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ and (Y_n)_{n \in \N_0} $(Y_n)_{n \in \N_0}$ , both with \mu $\mu$ and \Pi $\Pi$ , have the same distribution
Markov chain distribution at time n $n$ : let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be a Markov chain with initial distribution \mu $\mu$ and transition matrix \Pi $\Pi$ ; then for all n\in \N_0 $n\in \N_0$ , the distribution of X_n $X_n$ is \mu^n = \mu\Pi^n $\mu^n = \mu\Pi^n$
- equivalent: \mu^n(i) = P(X_n = i) = (\mu\Pi^n)(i) = \sum_{j \in E}\mu(j)\Pi^n(j,i) $\mu^n(i) = P(X_n = i) = (\mu\Pi^n)(i) = \sum_{j \in E}\mu(j)\Pi^n(j,i)$
  - \mu^n $\mu^n$ row vectors (probability distribution of the chain at time n $n$ ), \Pi^n $\Pi^n$ n-th power of \Pi $\Pi$ (n-step transition probability matrix)
    - \mu^0 = \mu $\mu^0 = \mu$
    - \Pi^0 = I $\Pi^0 = I$
- corollary: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be a Markov chain with transition matrix \Pi $\Pi$ ; then for all m,n \in \N_0, i,j \in E $m,n \in \N_0, i,j \in E$ such that P(X_m = i) > 0 $P(X_m = i) > 0$ , it holds that P(X_{m+n}=j\;|\;X_m=i)=\Pi^n(i,j) $P(X_{m+n}=j\;|\;X_m=i)=\Pi^n(i,j)$ (see Markov property)
- P_i(X_n = j) = \Pi^n(i,j) = \sum_{i_1,...,i_{n-1}\in E^{n-1}} \Pi(i,i_1)...\Pi(i_{n-1},j) $P_i(X_n = j) = \Pi^n(i,j) = \sum_{i_1,...,i_{n-1}\in E^{n-1}} \Pi(i,i_1)...\Pi(i_{n-1},j)$

Communication and Period

reachability: a state j $j$ is reachable from i $i$ if there exists n \in \N_0 $n \in \N_0$ such that \Pi^n(i,j) > 0 $\Pi^n(i,j) > 0$
- write: i \to j $i \to j$
- formally: i \to j \iff \exists n \in \N_0: \Pi^n(i,j) > 0 \iff \sum_{n=0}^\infty\Pi^n(i,j)>0 $i \to j \iff \exists n \in \N_0: \Pi^n(i,j) > 0 \iff \sum_{n=0}^\infty\Pi^n(i,j)>0$
  - tip: fix i_1,...,i_{n-1}\in E $i_1,...,i_{n-1}\in E$ , then \Pi^n(i,j) \geq \Pi(i,i_1)...\Pi(i_{n-1},j) $\Pi^n(i,j) \geq \Pi(i,i_1)...\Pi(i_{n-1},j)$ always; if the RHS is >0 $>0$ , then you've proven \Pi^n(i,j)>0 $\Pi^n(i,j)>0$
- \Pi^0(i,i) = 1 \implies i\to i $\Pi^0(i,i) = 1 \implies i\to i$ always holds
communication: two states i $i$ and j $j$ communicate if i \to j $i \to j$ and j \to i $j \to i$
- write: i \leftrightarrow j $i \leftrightarrow j$
- formally: i \leftrightarrow j \iff \exists n,m\in \N_0: \Pi^n(i,j) > 0, \Pi^m(j,i) > 0 $i \leftrightarrow j \iff \exists n,m\in \N_0: \Pi^n(i,j) > 0, \Pi^m(j,i) > 0$
- \leftrightarrow $\leftrightarrow$ is an equivalence relation; the equivalence classes are called communication classes E /\leftrightarrow $E /\leftrightarrow$
irreducibility: a Markov chain is called irreducible if it only has one communication class, otherwise it is reducible
- in other words: a Markov chain is irreducible if every state communicates with every other state
- formally: \forall i,j\in E: i \to j $\forall i,j\in E: i \to j$
  - \forall i,j \in E \; \exists n \in \N, i = i_0,i_1,...,i_{n-1},i_n=j: \Pi(i_0,i_1)...\Pi(i_{n-1},i_n)>0 $\forall i,j \in E \; \exists n \in \N, i = i_0,i_1,...,i_{n-1},i_n=j: \Pi(i_0,i_1)...\Pi(i_{n-1},i_n)>0$
- equivalent: \forall i,j \in E, i \neq j \;\exists n = n(i,j) \in \N : \Pi^n(i,j)>0 $\forall i,j \in E, i \neq j \;\exists n = n(i,j) \in \N : \Pi^n(i,j)>0$
closed set: a (non-empty?) set C \subseteq E $C \subseteq E$ is closed if you cannot leave the set
- formally: C \subseteq E $C \subseteq E$ closed \iff \forall i \in C: \sum_{j \in C}\Pi(i,j)=1 $\iff \forall i \in C: \sum_{j \in C}\Pi(i,j)=1$
- equivalent: \forall i \in C, j \notin C: \Pi(i,j) = 0 $\forall i \in C, j \notin C: \Pi(i,j) = 0$
- notes:
  - E, \emptyset $E, \emptyset$ are closed
  - if A,B $A,B$ are closed, then A\cup B, A \cap B $A\cup B, A \cap B$ are also closed
  - every communication class is closed
  - if the Markov chain is irreducible, then the only closed sets are E $E$ and \emptyset $\emptyset$
  - the Markov chain is irreducible iff \sum_{n=0}^\infty \Pi^n $\sum_{n=0}^\infty \Pi^n$ has no zero entries
periodicity: a state i $i$ has period k $k$ if k $k$ is the greatest common divisor of the number of transitions by which i $i$ can be reached, starting from i $i$
- formally: let i \in E $i \in E$ , define T(i):= \{n \geq 1:\Pi^n(i,i)>0\} $T(i):= \{n \geq 1:\Pi^n(i,i)>0\}$ , the period of i $i$ is defined as d_i := \gcd(T(i)) $d_i := \gcd(T(i))$
  - d_i = 1 $d_i = 1$ : the state i $i$ is aperiodic
    - all states aperiodic \implies $\implies$ the Markov chain is aperiodic
      - formally: \forall i \in E: \gcd(T(i)) = 1 $\forall i \in E: \gcd(T(i)) = 1$
  - d_i > 1 $d_i > 1$ : the state i $i$ is periodic
    - at least one state periodic \implies $\implies$ the Markov chain is periodic
      - formally: \exists i \in E: \gcd(T(i)) > 1 $\exists i \in E: \gcd(T(i)) > 1$
- conventionally: \gcd(\emptyset) = \infty $\gcd(\emptyset) = \infty$
- periodicity under \Pi^n $\Pi^n$ : \frac{d(i)}{\gcd(d(i),n)} $\frac{d(i)}{\gcd(d(i),n)}$
  - n $n$ multiple of d(i) \implies $d(i) \implies$ period collapses to 1 (i $i$ becomes aperiodic under \Pi^n $\Pi^n$ )
- lemma: if i \leftrightarrow j $i \leftrightarrow j$ , then d_i = d_j $d_i = d_j$
  - if the Markov chain is irreducible, all states have the same period \to $\to$ "period of the Markov chain"
    - formally: \forall i,j\in E: i \to j \implies \forall i,j \in E: d_i = d_j $\forall i,j\in E: i \to j \implies \forall i,j \in E: d_i = d_j$
- theorem: let (X_n)_{n\in\N_0} $(X_n)_{n\in\N_0}$ be an irreducible Markov chain with period d $d$ ; then for all i,j \in E $i,j \in E$ , there exist m = m(i,j) \in \N_0 $m = m(i,j) \in \N_0$ and n_0 = n_0(i,j) \in \N_0 $n_0 = n_0(i,j) \in \N_0$ such that for all n \geq n_0: \Pi^{m+nd}(i,j)>0 $n \geq n_0: \Pi^{m+nd}(i,j)>0$
  - choose m=0 $m=0$ if i=j $i=j$
  - special case (corollary): let (X_n)_{n\in\N_0} $(X_n)_{n\in\N_0}$ be an irreducible, aperiodic Markov chain on a finite state space E $E$ ; then there exists n_0 \in \N_0 $n_0 \in \N_0$ such that for all i,j \in E $i,j \in E$ and all n \geq n_0 $n \geq n_0$ : \Pi^n(i,j)>0 $\Pi^n(i,j)>0$ (i.e. you can get from i $i$ to j $j$ in every sufficiently large number of steps)
  - lemma: let A \subseteq \N $A \subseteq \N$ such that \gcd(A) = 1 $\gcd(A) = 1$ and if a,b\in A $a,b\in A$ , then a+b\in A $a+b\in A$ ; then there exists n_0 \in \N $n_0 \in \N$ such that n \in A $n \in A$ for all n \geq n_0 $n \geq n_0$
partitioning: let (X_n)_{n\in\N_0} $(X_n)_{n\in\N_0}$ be an irreducible Markov chain with period d $d$ ; then there exists exactly one partitioning C_0,...,C_{d-1} $C_0,...,C_{d-1}$ of E $E$ such that for all k=\{0,...,d-1\} $k=\{0,...,d-1\}$ and i \in C_k $i \in C_k$ : \sum_{j \in C_{k+1}} \Pi(i,j) = 1 $\sum_{j \in C_{k+1}} \Pi(i,j) = 1$ , where C_d = C_0 $C_d = C_0$
- in other words: choose a state i_0 $i_0$ , group all states by "distance mod d $d$ " from i_0 $i_0$
- it's possible to rearrange the states to get a block matrix: \Pi \;=\;\begin{array}{c|cccccc} & C_{0} & C_{1} & C_{2} & \dots & C_{d-2} & C_{d-1} \\\hline C_{0} & 0 & \Pi_{0} & 0 & \dots & 0 & 0 \\C_{1} & 0 & 0 & \Pi_{1} & \dots & 0 & 0 \\\vdots& \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\C_{d-2}&0 & 0 & 0 & \dots & 0 & \Pi_{d-2} \\C_{d-1}&\Pi_{d-1} & 0 & 0 & \dots & 0 & 0 \end{array} $\Pi \;=\;\begin{array}{c|cccccc} & C_{0} & C_{1} & C_{2} & \dots & C_{d-2} & C_{d-1} \\\hline C_{0} & 0 & \Pi_{0} & 0 & \dots & 0 & 0 \\C_{1} & 0 & 0 & \Pi_{1} & \dots & 0 & 0 \\\vdots& \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\C_{d-2}&0 & 0 & 0 & \dots & 0 & \Pi_{d-2} \\C_{d-1}&\Pi_{d-1} & 0 & 0 & \dots & 0 & 0 \end{array}$
  - \Pi^n $\Pi^n$ is also a block matrix, \Pi^{nd} $\Pi^{nd}$ is a block diagonal matrix (i.e. diagonals are square matrices)

Stationary Distributions

stationary distribution: a probability distribution that, once reached, remains unchanged over time as the chain evolves
- formally: a probability measure \alpha $\alpha$ is called stationary for a Markov chain with transition matrix \Pi $\Pi$ if for all i\in E $i\in E$ : \displaystyle{\alpha(i)=\sum_{j\in E}\alpha(j)\Pi(j,i)} $\displaystyle{\alpha(i)=\sum_{j\in E}\alpha(j)\Pi(j,i)}$
- matrix form: \alpha\Pi = \alpha $\alpha\Pi = \alpha$
- theorem: if the initial distribution \mu $\mu$ of a Markov chain (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ is stationary. then for all n \in \N_0 $n \in \N_0$ and A \subseteq E $A \subseteq E$ : P_\mu(X_n \in A) = \mu(A) $P_\mu(X_n \in A) = \mu(A)$
  - in other words: \mu\Pi^n = \mu^n = \mu $\mu\Pi^n = \mu^n = \mu$
  - then, P_\alpha(X_n = i) = \alpha(i) $P_\alpha(X_n = i) = \alpha(i)$ for all N\in \N_0, i \in E $N\in \N_0, i \in E$
- finding stationary distributions: solve \alpha\Pi = \alpha, \sum_{i\in E}\alpha(i) = 1 $\alpha\Pi = \alpha, \sum_{i\in E}\alpha(i) = 1$ (0/1/\infty $0/1/\infty$ solutions)
  - if a Markov chain has 2 stationary distributions, then it has infinitely many
    - formally: if \alpha, \beta $\alpha, \beta$ are stationary, then so is every \mu \in \{\lambda\alpha+(1-\lambda)\beta:\lambda\in(0,1)\} $\mu \in \{\lambda\alpha+(1-\lambda)\beta:\lambda\in(0,1)\}$
  - if |E| < \infty $|E| < \infty$ (or countably infinite, positive recurrent):
    - exactly one closed communicating class \iff $\iff$ exactly one stationary distribution
    - two or more closed communicating classes \iff $\iff$ infinitely many stationary distributions
reverse transitions (time-reversed chain): let (X_n)_{n\in\N_0} $(X_n)_{n\in\N_0}$ be a Markov chain with transition matrix \Pi $\Pi$ and stationary distribution \alpha $\alpha$ such that \alpha(i)>0 $\alpha(i)>0$ for all i\in E $i\in E$ ; define \displaystyle{\Pi'(i,j):=\frac{\alpha(j)\Pi(j,i)}{\alpha(i)}} $\displaystyle{\Pi'(i,j):=\frac{\alpha(j)\Pi(j,i)}{\alpha(i)}}$ , then for all i,j\in E, n \in \N_0 $i,j\in E, n \in \N_0$ : \Pi'(i,j)=P_\alpha(X_n=j\;|\;X_{n+1}=i) $\Pi'(i,j)=P_\alpha(X_n=j\;|\;X_{n+1}=i)$ are the backwards (reverse) transition probabilities
reversibility: let (X_n)_{n\in\N_0} $(X_n)_{n\in\N_0}$ be a Markov chain with transition matrix \Pi $\Pi$ ; a distribution \alpha $\alpha$ on E $E$ is called reversible if \displaystyle{\alpha(i)\Pi(i,j) = \alpha(j)\Pi(j,i)} $\displaystyle{\alpha(i)\Pi(i,j) = \alpha(j)\Pi(j,i)}$ for all i,j\in E $i,j\in E$
- the Markov chain is called reversible if it has a reversible distribution
- theorem: every reversible distribution is stationary
- note: if (X_n)_{n\in\N_0} $(X_n)_{n\in\N_0}$ reversible and \alpha(i)>0 $\alpha(i)>0$ for all i $i$ , then P_\alpha(X_n = j \;|\; X_{n+1}=i)=\Pi(i,j)=P_\alpha(X_{n+1}=j\;|\;X_n = i) $P_\alpha(X_n = j \;|\; X_{n+1}=i)=\Pi(i,j)=P_\alpha(X_{n+1}=j\;|\;X_n = i)$ (so if we start from \alpha $\alpha$ , the forwards and backwards transition probabilities are the same)
- Kolmogorov's criterion: an irreducible, positive recurrent, aperiodic Markov chain with transition matrix \Pi $\Pi$ is reversible iff \pi_{i_1i_2}\pi_{i_2i_3}...\pi_{i_n{i_1}} = \pi_{i_1{i_n}}\pi_{i_n{i_{n-1}}}...\pi_{i_2i_1} $\pi_{i_1i_2}\pi_{i_2i_3}...\pi_{i_n{i_1}} = \pi_{i_1{i_n}}\pi_{i_n{i_{n-1}}}...\pi_{i_2i_1}$ for all i_1,...,i_n\in E $i_1,...,i_n\in E$

Strong Markov Property

\sigma $\sigma$ -algebra: given a set X $X$ , a collection \mathcal A $\mathcal A$ of subsets A \subseteq P(X) $A \subseteq P(X)$ is called a \sigma $\sigma$ -algebra if:
1. contains the universe: X \in \mathcal A $X \in \mathcal A$ (and, by 2., the empty set \emptyset \in \mathcal A $\emptyset \in \mathcal A$ )
2. closed under complementation: if A \in \mathcal A $A \in \mathcal A$ , then X \backslash A = A^c \in \mathcal A $X \backslash A = A^c \in \mathcal A$
3. closed under countable unions: if A_n \in \mathcal A $A_n \in \mathcal A$ for all n \in \N $n \in \N$ , then \bigcup_{n=1}^\infty A_n \in \mathcal A $\bigcup_{n=1}^\infty A_n \in \mathcal A$
filtration: a growing sequence of information where past information does not get lost over time (accumulates)
- formally: let (\Omega, \mathcal F, P) $(\Omega, \mathcal F, P)$ be a probability space; a sequence \mathcal F_n \subseteq \mathcal F $\mathcal F_n \subseteq \mathcal F$ for n \in \N_0 $n \in \N_0$ is called a filtration if \mathcal F_n \subseteq \mathcal F_{n+1} $\mathcal F_n \subseteq \mathcal F_{n+1}$ for all n \in \N_0 $n \in \N_0$
- natural filtration: all the information generated by the chain up to time n $n$ (i.e. the exact states of the chain, whether certain states are in subsets of the state space, complements, functions of the past etc., but not model parameters themselves)
  - formally: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be a Markov chain on E $E$ ; define \mathcal F_n := \sigma(X_0,...,X_n) $\mathcal F_n := \sigma(X_0,...,X_n)$ as the smallest \sigma $\sigma$ -algebra containing all events of type X_t^{-1}(A) $X_t^{-1}(A)$ for A \subseteq E $A \subseteq E$ and t\in\{0,...,n\} $t\in\{0,...,n\}$ ; then (\mathcal F)_{n\in\N_0} $(\mathcal F)_{n\in\N_0}$ is the natural filtration of (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$
stopping time: the event of stopping at time n $n$ only depends on what happened up to that time
- formally: a random variable \tau : \Omega \to \N_0 \cup \{\infty\} $\tau : \Omega \to \N_0 \cup \{\infty\}$ is called a stopping time with respect to the natural filtration (\mathcal F)_{n\in\N_0} $(\mathcal F)_{n\in\N_0}$ if for all n \in \N_0 $n \in \N_0$ : \displaystyle{\{\tau = n\} \in \mathcal F_n} $\displaystyle{\{\tau = n\} \in \mathcal F_n}$ (i.e. can you rewrite it in a way that depends on X_i $X_i$ only up to n $n$ ?)
- stopped Markov chain: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be a Markov chain, let \tau $\tau$ be a stopping time w.r.t. the natural filtration (\mathcal F)_{n\in\N_0} $(\mathcal F)_{n\in\N_0}$ ; define a \land b = \min(a,b) $a \land b = \min(a,b)$ for a,b\in \R $a,b\in \R$ ; the stopped Markov chain is (X_{n \land \tau})_{n \in \N_0} $(X_{n \land \tau})_{n \in \N_0}$ with X_{n \land \tau} = \begin{cases}X_n & \text{if }n \leq \tau\\X_\tau & \text{if } n \geq \tau\end{cases} $X_{n \land \tau} = \begin{cases}X_n & \text{if }n \leq \tau\\X_\tau & \text{if } n \geq \tau\end{cases}$
strong Markov property: generalization of the Markov property to random stopping times; the Markov property still holds even if you restart the process at a random stopping time \tau $\tau$
- formally: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be a Markov chain with transition matrix \Pi $\Pi$ , let \tau $\tau$ be a stopping time w.r.t. the natural filtration (\mathcal F)_{n\in\N_0} $(\mathcal F)_{n\in\N_0}$ ; fix k \in E $k \in E$ such that P(\tau < \infty, X_\tau = k) > 0 $P(\tau < \infty, X_\tau = k) > 0$ ; then, under \tilde P := P(\cdot\;|\;X_\tau = k) $\tilde P := P(\cdot\;|\;X_\tau = k)$ , the sequence (\tilde X_n := X_{n+\tau})_{n \in \N_0} $(\tilde X_n := X_{n+\tau})_{n \in \N_0}$ is a Markov chain with transition matrix \Pi $\Pi$ and starting distribution \delta_k $\delta_k$ (Dirac measure), independent of (X_{n \land \tau})_{n \in \N_0} $(X_{n \land \tau})_{n \in \N_0}$
  - if \tau = \infty $\tau = \infty$ , choose (\tilde X_n) $(\tilde X_n)$ arbitrarily

Recurrence & Transience

recurrence and transience: a state i \in E $i \in E$ is called...
- ...recurrent, if P_i(T_i<\infty) \stackrel{!}{=} 1 $P_i(T_i<\infty) \stackrel{!}{=} 1$
  - in other words: starting from i $i$ and from wherever you can go, there is always a way (path) of returning to i $i$ (finite)
  - positive recurrent: recurrent and E_i[T_i] <\infty $E_i[T_i] <\infty$
  - null recurrent: recurrent and E_i[T_i]=\infty $E_i[T_i]=\infty$
  - theorem: a state i \in E $i \in E$ is recurrent if and only if \sum_{n=0}^\infty\Pi^n(i,i)=\infty $\sum_{n=0}^\infty\Pi^n(i,i)=\infty$
- ...transient, if P_i(T_i<\infty) <1 $P_i(T_i<\infty) <1$
  - in other words: starting from i $i$ , there is at least one path such that, if you take it, you will never be able to return to i $i$ (finite)
  - i $i$ transient, E $E$ finite \implies $\implies$ \alpha(i) = 0 $\alpha(i) = 0$
- reminders:
  - P_i(\cdot) = P(\cdot \;|\;X_0=i) $P_i(\cdot) = P(\cdot \;|\;X_0=i)$
  - E_i[\cdot] = E[\cdot \;|\; X_0 = i] $E_i[\cdot] = E[\cdot \;|\; X_0 = i]$
  - T_i := \inf\{n \geq 1 : X_n = i\} $T_i := \inf\{n \geq 1 : X_n = i\}$ (first return time)
  - P_i(T_i<\infty) = P(\text{ever return to }i\;|\; X_0 = i) $P_i(T_i<\infty) = P(\text{ever return to }i\;|\; X_0 = i)$
    - P_i(T_i<\infty) = \sum_{n=1}^\infty P_i(T_i = n) $P_i(T_i<\infty) = \sum_{n=1}^\infty P_i(T_i = n)$
      - \displaystyle{P_i(T_i = n) = P_i(X_1 \neq i, ..., X_{n-1} \neq i, X_n = i) = \sum_{i_1,...,i_{n-1} \neq i}P_i(X_1=i_1)P_i(X_2=i_2\;|\;X_1=i_1)...P_i(X_n=i\;|\;X_{n-1}=i_{n-1})} $\displaystyle{P_i(T_i = n) = P_i(X_1 \neq i, ..., X_{n-1} \neq i, X_n = i) = \sum_{i_1,...,i_{n-1} \neq i}P_i(X_1=i_1)P_i(X_2=i_2\;|\;X_1=i_1)...P_i(X_n=i\;|\;X_{n-1}=i_{n-1})}$
  - E_i[T_i] = \text{expected number of steps to go back to } i \text{ from } i $E_i[T_i] = \text{expected number of steps to go back to } i \text{ from } i$
    - E_i[T_i] = \sum_{n=0}^\infty P_i(T_i \geq n) $E_i[T_i] = \sum_{n=0}^\infty P_i(T_i \geq n)$
      - P_i(T_i \geq n) = \sum_{k=n}^\infty P_i(T_i = k) $P_i(T_i \geq n) = \sum_{k=n}^\infty P_i(T_i = k)$
- if a Markov chain returns to state i $i$ with probability 1 $1$ (recurrent state), it visits i $i$ infinitely many times
  - formally: E_i[N_i]= \infty \iff P_i(T_i < \infty) = 1 \iff P_i(N_i = \infty) = 1 $E_i[N_i]= \infty \iff P_i(T_i < \infty) = 1 \iff P_i(N_i = \infty) = 1$
    - N_i = \sum_{n=1}^\infty1_{\{X_n = i\}} $N_i = \sum_{n=1}^\infty1_{\{X_n = i\}}$ (number of visits to i $i$ starting at time 1)
- if a Markov chain returns to state i $i$ with probability <1 $<1$ (transient state), it visits i $i$ finitely many times
  - formally: E_i[N_i] < \infty \iff P_i(T_i < \infty) < 1 \iff P_i(N_i < \infty) = 1 $E_i[N_i] < \infty \iff P_i(T_i < \infty) < 1 \iff P_i(N_i < \infty) = 1$
- communication class theorem: if i \leftrightarrow j $i \leftrightarrow j$ , then either both are recurrent or both are transient
  - appendix: an irreducible Markov chain is called recurrent if all states are recurrent and transient if all states are transient
(*) different notation:
- f_{ij} := P_i(T_j < \infty) $f_{ij} := P_i(T_j < \infty)$
- f_{ii} := P_i(T_i < \infty) = \sum_{n=1}^\infty f_{ii} $f_{ii} := P_i(T_i < \infty) = \sum_{n=1}^\infty f_{ii}$
- f_{ii}^{(n)} = P_i(T_i = n) $f_{ii}^{(n)} = P_i(T_i = n)$
hitting time: the first time the chain enters a set A \subseteq E $A \subseteq E$
- formally: H^A := \inf\{n \geq 0:X_n \in A\} $H^A := \inf\{n \geq 0:X_n \in A\}$
- hitting probability vector: h^A = (h_i^A)_{i \in E} $h^A = (h_i^A)_{i \in E}$
  - for each starting state i $i$ : h_i^A = P_i(H_A < \infty) = P(\text{chain ever visits }A\;|\; X_0=i) $h_i^A = P_i(H_A < \infty) = P(\text{chain ever visits }A\;|\; X_0=i)$
  - \begin{cases}h_i^A = 1 & \text{if }i\in A \\ h_i^A = \sum_{j \in E}\Pi(i,j)h_j^A & \text{if }i\notin A \end{cases} $\begin{cases}h_i^A = 1 & \text{if }i\in A \\ h_i^A = \sum_{j \in E}\Pi(i,j)h_j^A & \text{if }i\notin A \end{cases}$ (smallest non-negative solution!)
    - h_i^A = 0 $h_i^A = 0$ if A $A$ is unreachable from i $i$ (absorbing state, state outside of closed set)
- mean hitting time vector: k^A=(k_i^A)_{i \in E} $k^A=(k_i^A)_{i \in E}$
  - for each starting state i $i$ : k_i^A = E_i[H_A] = \text{expected number of steps to hit } A \text{ from } i $k_i^A = E_i[H_A] = \text{expected number of steps to hit } A \text{ from } i$
  - \begin{cases}k_i^A = 0 & \text{if }i\in A \\ k_i^A = 1 + \sum_{j \notin A}\Pi(i,j)k_j^A & \text{if }i\notin A \end{cases} $\begin{cases}k_i^A = 0 & \text{if }i\in A \\ k_i^A = 1 + \sum_{j \notin A}\Pi(i,j)k_j^A & \text{if }i\notin A \end{cases}$ (smallest non-negative solution!)
    - k_i^A = \infty $k_i^A = \infty$ if A $A$ is unreachable from i $i$ (i.e. P_i(H^A < \infty) = 0 $P_i(H^A < \infty) = 0$ )
invariant distribution: a function \alpha : E \to \R $\alpha : E \to \R$ is called an invariant distribution for a Markov chain (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ with transition matrix \Pi $\Pi$ if:
1. \forall i \in E: \alpha(i) \in [0,\infty) $\forall i \in E: \alpha(i) \in [0,\infty)$
2. \exists i \in E: \alpha(i) >0 $\exists i \in E: \alpha(i) >0$ (i.e. \alpha $\alpha$ is not the null function)
3. \alpha(i)=\sum_{j\in E}\alpha(j)\Pi(j,i) $\alpha(i)=\sum_{j\in E}\alpha(j)\Pi(j,i)$
  - matrix form: \alpha\Pi = \alpha \;(= \alpha\Pi^n) $\alpha\Pi = \alpha \;(= \alpha\Pi^n)$
- if, additionally, \sum_{i\in E}\alpha(i) = 1 $\sum_{i\in E}\alpha(i) = 1$ , then this is the stationary (invariant) distribution
- Markov chain recurrent \implies $\implies$ it has an invariant measure
- Markov chain recurrent + irreducible \implies $\implies$ it has a unique invariant measure (up to multiplication by a constant)
- existence theorem + construction: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be an irreducible, recurrent Markov chain with transition matrix \Pi $\Pi$ and state space E $E$ ; then, (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ has an invariant measure which can be constructed as follows:
  1. pick an element 0 \in E $0 \in E$ (any element, call it "0")
  2. let T_0 := \inf\{n \geq 1:X_n = 0\} $T_0 := \inf\{n \geq 1:X_n = 0\}$ (first return time to 0 $0$ )
  3. write \alpha(i) = E_0[\sum_{n=1}^\infty1_{\{X_n=i\}}1_{\{n \leq T_0\}}] = E_0[\sum_{n=1}^{T_0}1_{\{X_n=i\}}] $\alpha(i) = E_0[\sum_{n=1}^\infty1_{\{X_n=i\}}1_{\{n \leq T_0\}}] = E_0[\sum_{n=1}^{T_0}1_{\{X_n=i\}}]$ (expected number of visits to i $i$ between two visits to 0 $0$ )
    - \alpha(i) = \sum_{n=1}^\infty P_0(X_n = i, n \leq T_0) $\alpha(i) = \sum_{n=1}^\infty P_0(X_n = i, n \leq T_0)$
  - then, \alpha $\alpha$ is invariant
    - \alpha(0) = 1 $\alpha(0) = 1$
    - \sum_{i\in E}\alpha(i) = E_0[T_0] $\sum_{i\in E}\alpha(i) = E_0[T_0]$
- any invariant measure of an irreducible Markov chain is (strictly) positive everywhere
  - formally: (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ irreducible \implies \forall i \in E: \alpha(i) > 0 $\implies \forall i \in E: \alpha(i) > 0$
- uniqueness up to a constant: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be an irreducible, recurrent Markov chain with transition matrix \Pi $\Pi$ , let \alpha, \beta $\alpha, \beta$ be invariant measures for (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ ; then \exists C > 0: \alpha = C\beta $\exists C > 0: \alpha = C\beta$ (unique invariant measure up to multiplication by a constant)
- theorem: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be an irreducible, recurrent Markov chain; then for all i,j \in E $i,j \in E$ : P_i(T_j<\infty) = 1 $P_i(T_j<\infty) = 1$
- theorem: \sum_{i\in E}\alpha(i) < \infty \iff (X_n)_{n \in \N_0} $\sum_{i\in E}\alpha(i) < \infty \iff (X_n)_{n \in \N_0}$ (irreducible and) positive recurrent
  - corollary: an irreducible Markov chain is positive recurrent if and only if it has a stationary distribution
    - then, this stationary distribution is unique with \alpha(i) >0 $\alpha(i) >0$ for all i\in E $i\in E$
- theorem: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be an irreducible, positive recurrent Markov chain; then \alpha(i) = \frac{1}{E_i[T_i]} $\alpha(i) = \frac{1}{E_i[T_i]}$ for all i \in E $i \in E$
- theorem: every irreducible Markov chain with a finite state space is positive recurrent
  - formally: (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ irreducible, |E| < \infty $|E| < \infty$ \implies $\implies$ (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ is positive recurrent
reversed transition matrix: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be a Markov chain with transition matrix \Pi $\Pi$ , let \alpha $\alpha$ be an invariant measure for (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ ; define the \alpha $\alpha$ -reversed transition matrix as \displaystyle{\Pi^\alpha(i,j)=\frac{\alpha(j)}{\alpha(i)}\Pi(j,i)} $\displaystyle{\Pi^\alpha(i,j)=\frac{\alpha(j)}{\alpha(i)}\Pi(j,i)}$
- \Pi^\alpha $\Pi^\alpha$ is a stochastic matrix
- if (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ is recurrent, then so is a Markov chain with transition matrix \Pi^\alpha $\Pi^\alpha$
- let (X_n)_{n=0}^N $(X_n)_{n=0}^N$ be a Markov chain with initial distribution \delta_0 $\delta_0$ , conditioned on X_N = 0 $X_N = 0$ ; then, (Y_n := X_{N-n})_{n=0}^N $(Y_n := X_{N-n})_{n=0}^N$ is a Markov chain with transition matrix \Pi^\alpha $\Pi^\alpha$ , initial distribution \delta_0 $\delta_0$ , conditioned on Y_N = 0 $Y_N = 0$
  - in other words: (Y_n)_{n=0}^N $(Y_n)_{n=0}^N$ is the time reversal of the Markov chain (X_n)_{n=0}^N $(X_n)_{n=0}^N$ , where we start at 0 $0$ and return to 0 $0$ at time N $N$
  - corollary: assume (X_n) $(X_n)$ is recurrent, let (Z_n) $(Z_n)$ have transition matrix \Pi^\alpha $\Pi^\alpha$ , then (Z_n) $(Z_n)$ is also recurrent, and if you run both with initial distribution \delta_0 $\delta_0$ from time 0 to time T_0 $T_0$ , then (Z_n)_{n = 0}^{T_0} $(Z_n)_{n = 0}^{T_0}$ has the same distribution as (X_{T_0-n})_{n = 0}^{T_0} $(X_{T_0-n})_{n = 0}^{T_0}$
- let (X_n)_{n\in \N_0} $(X_n)_{n\in \N_0}$ be a Markov chain with initial distribution \delta_0 $\delta_0$ and transition matrix \Pi $\Pi$ ; let (Y_n)_{n\in \N_0} $(Y_n)_{n\in \N_0}$ be a Markov chain with initial distribution \delta_0 $\delta_0$ and transition matrix \Pi^\alpha $\Pi^\alpha$ ; if P(T_0 < \infty) =1 $P(T_0 < \infty) =1$ , then (Y_0,...,Y_{T_0}) $(Y_0,...,Y_{T_0})$ has the same distribution as (X_{T_0},...,X_0) $(X_{T_0},...,X_0)$

Convergence

total variation metric: notion of convergence; the "distance" between \alpha $\alpha$ and \beta $\beta$ in total variation
- formally: let \alpha, \beta $\alpha, \beta$ be distributions on E $E$ (countable), then the total variation distance is defined as \displaystyle{d_{TV} = \frac12\sum_{i\in E}|\alpha(i) - \beta(i)|} $\displaystyle{d_{TV} = \frac12\sum_{i\in E}|\alpha(i) - \beta(i)|}$ (i.e. half of the L1 norm)
convergence theorem: let (X_n)_{n \in \N_0} $(X_n)_{n \in \N_0}$ be an irreducible, aperiodic, positive recurrent Markov chain with invariant distribution \alpha $\alpha$ ; then \displaystyle{\lim_{n \to \infty}P_i(X_n = j) = \lim_{n \to \infty}\Pi^n(i,j) = \alpha(j)} $\displaystyle{\lim_{n \to \infty}P_i(X_n = j) = \lim_{n \to \infty}\Pi^n(i,j) = \alpha(j)}$
- recipe: if the Markov chain is not irreducible, split Markov chain into equivalence classes and look at "restricted" transition matrices