Making the Most of MCMC

Sometimes in grad school you need to write about topics that you yourself have little to no clue about. Part of this learning process is figuring out how to teach yourself some of these very difficult concepts. This blog post comes from a blog post I co-wrote with my cohort chum, Justin, 
By: Justin and Meridith

Markov Chains, and particularly Markov Chains Monte Carlo, are a difficult concept to explain. In fact, Dr. Hanks has stated that they are “Easier done than said.” At the very basis of everything, Markov Chains are a system that transitions from one state to another state. It is a random memorylessness process, that is,  the next state depends only on the current state and not on the sequence of events that preceded it. I have scoured the web and believe the following to be the simplest visual introduction to Markov Chains. (Spoiler Alert: It arose from someone – Andrey Markov –  being a sassmaster.)

From here, it’s easy to start to gain an appreciation for the wide breadth of applications available for Markov Chains. However, if we want to transition, as it were, from Markov Chains to Markov Chain Monte Carlo simulations, we must first explore Monte Carlo methods. These methods are a class of computational algorithms that utilize repeated random sampling (simulations) to obtain the distribution of an unknown probabilistic entity. The modern version of the Monte Carlo method was invented in the late 1940s by Stanislaw Ulam (coolest name ever), while he was working on nuclear weapons projects at the Los Alamos National Laboratory (think Manhattan Project). It was named by Nicholas Metropolis, after the Monte Carlo Casino, where Ulam’s uncle often gambled. Because reasons, apparently.  Peter Muller’s article gives a brief introduction of Markov Chain Monte Carlo simulation, a method that enables the simulation of Bayesian posterior distributions and thus facilitates the use of Bayesian inference.  According to Muller, the goal of MCMC is to set up a Markov Chain with an ergodic distribution and some initial state, where the term ergodic indicates that there is a non-zero probability of the process passing from one state to any other state at each step. Starting at the initial state, transitions (from one state to another) are simulated and the simulated states recorded. The ergodic sample average of simulated states will then converge to the value of the desired posterior integral. Muller notes that two conditions must be met in order to use the resulting integral estimates:

  1. As the number of simulated transitions, M approaches infinity, the chain must converge to the desired posterior distribution
  2. Some diagnostic must be found to determine when practical convergence occurs, i.e. when a sufficient number of simulations have been performed.

The first prerequisite, theoretical convergence, can be reduced to meeting the following three criteria: irreducibility, aperiodicity, and invariance. The second and more ambiguous of the two conditions—a criterion for practical convergence—has several proposed solutions in the literature. For example, Gelman and Rubin (1992) developed an “ANOVA type statistic [for considering] several independent parallel runs of the MCMC simulation,” and Geweke (1992) has suggested a comparison of an early-iterations ergodic average to an ergodic average of later iterations. However, Muller also suggests the simpler method of visual diagnosis via plotting the states for each iteration against the iteration number to judge convergence. 

Markov Chains and MCMC have many useful applications, ranging across a wide spectrum of fields. One such interesting application is in the game of baseball. When viewing a half-inning in a baseball game, there are 28 possible states based on number of outs–0, 1, 2, or 3–and runners on base–different combinations of having no runners, or having runners on first, second, and/or third base (see http://www.pankin.com/markov/theory.htm for a more detailed description of the transition matrix). This gives us a 28×28 transition matrix filled with the probabilities of being in each respective state. From here, we could calculate the expected value of runs scored from each state and analyze how this expectation changes from state to state. We could also extend this to analyzing the probability of scoring a single run by defining a slightly different transition matrix (again, see the link provided above for more detail). Due to its usefulness, MCMC has become a common tool for baseball analysts and sabermetricians (Editor’s Note: Totally had to Google sabermetrician.  I’m feeling I got short changed a little in the job naming category).

Another useful application is in the field of ecology. A useful paper, An Application of Markov Chain Monte Carlo to Community Ecology, serves as a wonderful walkthrough of MCMC with the easily conceptualized example of community assemblages (presence/absence) of birds among islands. As with our stated requirements for MCMC, the next state of bird species’ distribution among a set of islands only depends on their current state distribution. The article does a great jobs of connecting the dots from the ecological concept (birds disperse among islands, possibly due to competition) to an ecological question (given the starting state and some measures of competition among species, what is the probability that a random matrix will exhibit the same level of competition) to mathematical challenge (applying MCMC) and then loops the results back around to answer the ecological question (the distribution of finch species among the Galapagos shows evidence of competition!). If you’re feeling extra badass and want to make the jump from learning about MCMC to coding some examples in R be sure to check out the following blog posts!

Tricks of the Trade: LaTeX

Ok, guys. I’ve been studying as a baby statistician (scienctician? statscientist? ecologitician?)  for a little while now and I’m here to share some of their secrets. Before I started here at Penn State I had a couple ideas about what other grad students in my department would be like. First, everyone would be computer masters of any and all statistical programs: R, SAS, others that I hadn’t even heard of yet. Second, they’d all be completely on top of everything in all of our classes because they all would’ve completed undergraduate and master’s programs also in statistics. And thirdly, it’d be really hard to relate to other students because of my background in biology and my love for the outdoors (because clearly they’d all prefer sitting inside in front of their computers, right?). Thankfully, I was way off base and not only am I not left in the educational dust, but my cohort is full of awesome students with a wide variety of strengths and abilities. And I must collect them all. Yea, my new goal is to be like some sort of awesome Anna-Paquin-as-Rogue statistician and glean all of the amazing abilities and knowledge while I can. Except I think I’ll stick to taking the time to learn and practice things…instead of the whole touchy hurty thing she does. One of my absolute favorite new acquires is the ability to write code in LaTeX.


Another one of my pre-stats misconceptions was that whenever you saw an equation in a journal article it was created with Word’s super difficult equation editor.  Hopefully I’m not the only one who thought this, because now I feel really silly (Editor’s Note: I assumed mathematical witchcraft, so joke’s on me really.). LaTeX is a document preparation system for high-quality typesetting often used for technical or scientific documents. Long story short, you could be creating completely badass documents with lots of equations and badassery like these: [Homework with R Code, Homework with crazy stat stuff!]. I received my intro to LaTeX during one of the Cohort Workshops I have been arranging on Fridays for my department. Another grad student gave us a very brief introduction and showed us some of the basics. A few downloads, a bunch of googling, and several hours of practice later (not to mention an uninstall and redownload…) I was really starting to get the hang of it! Anyone who’s learning to program knows that you experience some of the most frustrating moments during that initial learning curve. WHY WON’T YOU JUST COMPILE AND SHOW ME A PDF OF MY NAME AND ‘HELLO WORLD’? I DID EXACTLY WHAT YOU TOLD ME…*deletes comma* Oh…well then. BEHOLD MY BRILLIANCE! FOR I HATH CREATED A MASTERPIECE!

I would like to encourage everyone to give it a go! I can answer basic questions, but I’ve found that the vast majority of my own beginner’s questions have been accomplished through a few key resources, including the Great Googily Moogily. Behold your starting point!

What to Download
  1. Tex – LaTeX is actually a sub-entity of Tex, sort of like Git and Github (which I also am just beginning to understand!) So you’ll actually need to download Tex in order to run LaTeX. Unfortunately, there are slightly different versions for Windows and Mac users but both deal with the same underlying program (if you run anything else, my apologies for being completely unaware of how to guide you).
  2. An Editor – The Tex download comes with everything you absolutely need, but I like using an editor for extra pretty colors and the option to code for other programming languages. I like working in Aquamacs, which is the Mac version of Emacs. (Update: I now use Sublime Text because Aquamacs kept giving me unhelpful error messages and I wasn’t having none of that.)
IMG_4569
Full disclosure: this took me a WHILE!
What to Try First
  1. Hello, World! – Your first task is to just compile and create a PDF file with the most basic of greetings. I used this website at Art of Problem Solving. Even still I spent way too long before I got my first code to compile and PDF produced. It’s a glorious achievement!
  2. Do a Homework in LaTeX – This is not applicable to everyone and for all classes. But if you have a math or stats course where the homework isn’t too intensive consider completing it in LaTeX! One of my professors even wrote lots of handy coding tips on one of my homeworks that helped me a lot the next time around. I love being able to feel accomplished at writing up a nice, clean looking final version even if the homework is crazy difficult. Helps me keep those imposter thoughts at bay!


Next Level Stuff
  1. Update your CV – This was one of my recommendations for our Motivation blog post last week. I used this one from Bradley P Carlin and you can check out my final form!
  2. Write and submit your next manuscript using LaTeX! – Now, I’m nowhere near this stage of my program but I’d wager that quite a few templates or style formatting guidelines available for submitting a paper using LaTeX! Go, go, go!
  3. Combine with RStudio to work with knitr and sweave to produce LaTeX documents with R code  and results spliced in!


Basically part of my grand PhD scheme is to master a lot of the computing and presentation side of statistics so that I will be a valuable asset and worthy of ALL the jobs. At least a few options after graduating will be worth the toiling away finding that stray comma or misspelled command. Now that you’ve heard my favorite new tool I’ve learned so far please share yours! Or even if your favorite is also LaTeX tell me all the little tricks  you’ve picked up! I want all the tricks!

Is there a Doctor in the House?

I’m over a month into my PhD program and I’m still oscillating between wild, ecstatic optimism and stone cold, stop you in your tracks fear of the route ahead.  Completing a Master’s degree was two and a half years of hard work and setbacks culminating in one of the proudest, happiest moments of my life – successful defending of my thesis. I’m back on track for five more years of the grad student life, but these will be harder, faster, stronger times ahead than before. Good thing I’ve got my Daft Punk pandora station ready to go. My Masters program didn’t entail any qualifying or comprehensive exams so they seem like lofty, impassable goals now. A sentiment shared by my cohort members, but we’ve found that the more information we have the more confidence we gain. We here at STS would like to share what we know about our own roads to knowledge with you the readers so that you guys can find the confidence to face this journey too.

Not freaking out. I am not freaking out. I’m not. 

First things first, what exactly is the difference between quals, comps, and a thesis defense? Well, if you’re in grad school you at least know enough to be shaking in your boots at the prospect of any one of them. As you progress through your PhD program the powers that be (general your advisors) will want to ensure that you’re advancing at the desired pace, thus a few intense, intimidating milestones are thrown at you. The first of these, the Qualifying exam, serves to assess whether the student is capable of conducting doctoral research/scholarship. Quals often also serves as the PhD candidacy examination. Qualification exams are taken early in your program and are often based on required coursework. Once you pass your quals (and sometimes it takes a few tries, don’t worry!) feel free to relax a tiny bit and allow yourself to celebrate! Throw a wild soiree with your cohort! The PhD Comprehensive exam is given by members of your committee once a student has completed the required coursework (generally year 2 or 3, but ultimately depends on your program) and serves to evaluate mastery of the major studied. Sometimes presenting your research proposal can be wrapped up within Comps, as a way so show you have mastered the content necessary to proceed. If you’ve passed your Comps go ahead and celebrate once more! Now all you have left is research, thesis writing, and a thesis defense! It’ll be tough, but you’re in the home stretch. A lot of students are terrified by the time they are fast approaching their thesis defense. A lot rides on that final presentation of research and oral examination by the committee, but honestly once your committee signs off on a date for you to present and defend you’re practically finished already! They don’t want to set you up to fail (it reflects poorly on them as well)! Smooth sailings on through to your doctorate! Congrats once more! You’re a doctor!!

Post Masters Celebrations!

If you picked up on how it sounds like your committee has a lot of power of your progress through your PhD project, then you’re not far from the truth! However, they will also be there to provide you with all of the guidance and insights that you could possibly need. After all, they’ve been in your shoes before and have helped others through your journey. The majority of your committee will be comprised of professors from your department, but if you’re one of those brave souls that goes for a more interdisciplinary approach you’ll likely find members from other departments or even other institutions.  You are in charge of approaching and inviting generally four professors to serve on your committee. Something to keep in mind while forming your own band of professors is that you’ll want to ensure that you choose members that will have the time and resources to help you with your thesis research, writing, and defending. You’ll need to have a close working relationship with these people so don’t be afraid to choose based on how well you foresee getting along with them. A highfalutin big wig in your field sounds great to have involved, but if they have no time for you then maybe it’s best to find someone else to serve instead. You want people who are passionate about being on your team and helping your grow and develop to ensure that upon completion of your PhD you’ll be ready to find a postdoc or a job in a variety of fields!

Workin’ hard with the cohort.

 If you are, like me, at the very beginning of your program with all of these hurdles strewn in your future it can be incredibly intimidating. A lot of doubts can creep into your mind about your ability to gain a mastery of the content, especially if you’ve changed fields! I’ve had quite a few chats with my cohort already about our looming quals at the end of this year. Our department recently changed it’s program for PhD students and we’re the first group to go through this new design! We feel a lot like guinea pigs – the kind that people eat rather than keep for pets! I have dealt with this nervousness by finding out as MUCH as I can about how I am expected to progress through each year. But what has really quelled my quals fears has been talking to my academic advisor and hearing his reassurance that no, the department really isn’t trying to scare anyone off or try and weed us out. They earnestly do want each and every one of us to pass and will provide us with all the resources to do so! Rather, instead of being a weed out process, the qualification exam more serves as a way to ensure that WE are absolutely sure that we want to put in the work necessary to earn a PhD. I’m so grateful that I am part of a large, wonderfully supportive cohort that is already working hard to make sure no one falls behind. If I can recommend just one thing to new graduate students feeling that fear creep in, it’s to talk to your cohort, the grad students that are ahead of you, and professor in your department. The reassurance I’ve gotten from admitting my fears and insecurities to others and in turn hearing theirs has been a tremendous confidence booster!

You can check out my (Meridith’s) Statistics PhD program expectations in the slide included! If you are interested in hearing about Rachel’s Ecology program (she’s in her 3rd year and has just schedule her comps!) you’ll want to keep an eye on our Sweet Tea, Science Tumblr this week! If you are also working on getting your PhD (or Masters!) we’d love to hear how these major exams work in your field/department! There’s so much variety that we can’t hope to cover how these things work for everyone, but go ahead and let your experiences be known down in the comments.

claimtoken-5437fb02be371