Making the Most of MCMC

Sometimes in grad school you need to write about topics that you yourself have little to no clue about. Part of this learning process is figuring out how to teach yourself some of these very difficult concepts. This blog post comes from a blog post I co-wrote with my cohort chum, Justin, 
By: Justin and Meridith

Markov Chains, and particularly Markov Chains Monte Carlo, are a difficult concept to explain. In fact, Dr. Hanks has stated that they are “Easier done than said.” At the very basis of everything, Markov Chains are a system that transitions from one state to another state. It is a random memorylessness process, that is,  the next state depends only on the current state and not on the sequence of events that preceded it. I have scoured the web and believe the following to be the simplest visual introduction to Markov Chains. (Spoiler Alert: It arose from someone – Andrey Markov –  being a sassmaster.)

From here, it’s easy to start to gain an appreciation for the wide breadth of applications available for Markov Chains. However, if we want to transition, as it were, from Markov Chains to Markov Chain Monte Carlo simulations, we must first explore Monte Carlo methods. These methods are a class of computational algorithms that utilize repeated random sampling (simulations) to obtain the distribution of an unknown probabilistic entity. The modern version of the Monte Carlo method was invented in the late 1940s by Stanislaw Ulam (coolest name ever), while he was working on nuclear weapons projects at the Los Alamos National Laboratory (think Manhattan Project). It was named by Nicholas Metropolis, after the Monte Carlo Casino, where Ulam’s uncle often gambled. Because reasons, apparently.  Peter Muller’s article gives a brief introduction of Markov Chain Monte Carlo simulation, a method that enables the simulation of Bayesian posterior distributions and thus facilitates the use of Bayesian inference.  According to Muller, the goal of MCMC is to set up a Markov Chain with an ergodic distribution and some initial state, where the term ergodic indicates that there is a non-zero probability of the process passing from one state to any other state at each step. Starting at the initial state, transitions (from one state to another) are simulated and the simulated states recorded. The ergodic sample average of simulated states will then converge to the value of the desired posterior integral. Muller notes that two conditions must be met in order to use the resulting integral estimates:

  1. As the number of simulated transitions, M approaches infinity, the chain must converge to the desired posterior distribution
  2. Some diagnostic must be found to determine when practical convergence occurs, i.e. when a sufficient number of simulations have been performed.

The first prerequisite, theoretical convergence, can be reduced to meeting the following three criteria: irreducibility, aperiodicity, and invariance. The second and more ambiguous of the two conditions—a criterion for practical convergence—has several proposed solutions in the literature. For example, Gelman and Rubin (1992) developed an “ANOVA type statistic [for considering] several independent parallel runs of the MCMC simulation,” and Geweke (1992) has suggested a comparison of an early-iterations ergodic average to an ergodic average of later iterations. However, Muller also suggests the simpler method of visual diagnosis via plotting the states for each iteration against the iteration number to judge convergence. 

Markov Chains and MCMC have many useful applications, ranging across a wide spectrum of fields. One such interesting application is in the game of baseball. When viewing a half-inning in a baseball game, there are 28 possible states based on number of outs–0, 1, 2, or 3–and runners on base–different combinations of having no runners, or having runners on first, second, and/or third base (see for a more detailed description of the transition matrix). This gives us a 28×28 transition matrix filled with the probabilities of being in each respective state. From here, we could calculate the expected value of runs scored from each state and analyze how this expectation changes from state to state. We could also extend this to analyzing the probability of scoring a single run by defining a slightly different transition matrix (again, see the link provided above for more detail). Due to its usefulness, MCMC has become a common tool for baseball analysts and sabermetricians (Editor’s Note: Totally had to Google sabermetrician.  I’m feeling I got short changed a little in the job naming category).

Another useful application is in the field of ecology. A useful paper, An Application of Markov Chain Monte Carlo to Community Ecology, serves as a wonderful walkthrough of MCMC with the easily conceptualized example of community assemblages (presence/absence) of birds among islands. As with our stated requirements for MCMC, the next state of bird species’ distribution among a set of islands only depends on their current state distribution. The article does a great jobs of connecting the dots from the ecological concept (birds disperse among islands, possibly due to competition) to an ecological question (given the starting state and some measures of competition among species, what is the probability that a random matrix will exhibit the same level of competition) to mathematical challenge (applying MCMC) and then loops the results back around to answer the ecological question (the distribution of finch species among the Galapagos shows evidence of competition!). If you’re feeling extra badass and want to make the jump from learning about MCMC to coding some examples in R be sure to check out the following blog posts!