Sweet Link ParTEA (August 2018)

We hope everyone has had a great August. As always, this month has gone by too fast. It’s already time again for our collection of awesome links and videos that we found enjoyable and/or important this month. Let us know if we missed any super cool posts!

“She drew their attention as a wolf that had a lot of moxie and was very adventurous.” Check out this NatGeo article about Nate Blakeslee’s new book, American Wolf, who’s central character was once “the most famous wolf in the world”.

This in-depth interview with Francis Weller, author of The Wild Edge of Sorrow: Rituals of Renewal and the Sacred Work of Grief, is a must read when you have the time.

We are clearly fans of Priya Shukla‘s Forbes articles. Check out this one about the ocean’s itty bitties with an important link to carbon cycling.

Continue reading “Sweet Link ParTEA (August 2018)”

Advertisements

My #StatStud Starter Satchel Set-Up

The format for today’s blog post has been graciously borrowed from the Uses This website. This website hosts a collection of nerdy interviews asking people from all walks of life what they use to get the job done. I first stumbled upon this type of blog post on Hilary Parker’s old blog (ummm…I don’t think there’s a part 2?). Finding this blog post as a fledgling statistics PhD student was highly informative. What DO we all use to get our research done? I’m pretty certain if you asked grad students in my department they’d give  (at least slightly) different answers every time. And that is most certainly true as you get into most specific focuses (i.e. genetics data, theoretical statistics, etc). The outline below is certainly not the only/best set up, but it’s what I’ve got going on.

f8c8wjWfRRCeEpyn15s5mg

Who are you, and what do you do?

I am Meridith Bartley, one-half of Sweet Tea, Science, and I am an ecological statistician. Ecolostician? Staticologist? I studied biology and ecology for a bit (well….six and a half years!  I have a Master’s degree in Wildlife Science and a BS in Biology) and now I am in my 5th year as a PhD student in the Statistics Department at Penn State. I’ve written about my experiences, daily life, tips, and reasons for this change of field a few times on this blog. I’m currently working on two projects: one looking at modelling the feeding interactions of laboratory ants we have video monitored (so much data!) and another exploring how to identify when one might be extrapolating in a multivariate response model with a neat application to lake water quality data. I spend most of my time writing code and manuscripts and trying to understand what the heck it is I am coding and writing. Almost all of my work is done either on my computer OR on scrap pieces of paper, which hopefully end up copied over to my “lab” notebook. Continue reading “My #StatStud Starter Satchel Set-Up”

Making the Most of MCMC

Sometimes in grad school you need to write about topics that you yourself have little to no clue about. Part of this learning process is figuring out how to teach yourself some of these very difficult concepts. This blog post comes from a blog post I co-wrote with my cohort chum, Justin, 
By: Justin and Meridith

Markov Chains, and particularly Markov Chains Monte Carlo, are a difficult concept to explain. In fact, Dr. Hanks has stated that they are “Easier done than said.” At the very basis of everything, Markov Chains are a system that transitions from one state to another state. It is a random memorylessness process, that is,  the next state depends only on the current state and not on the sequence of events that preceded it. I have scoured the web and believe the following to be the simplest visual introduction to Markov Chains. (Spoiler Alert: It arose from someone – Andrey Markov –  being a sassmaster.)

From here, it’s easy to start to gain an appreciation for the wide breadth of applications available for Markov Chains. However, if we want to transition, as it were, from Markov Chains to Markov Chain Monte Carlo simulations, we must first explore Monte Carlo methods. These methods are a class of computational algorithms that utilize repeated random sampling (simulations) to obtain the distribution of an unknown probabilistic entity. The modern version of the Monte Carlo method was invented in the late 1940s by Stanislaw Ulam (coolest name ever), while he was working on nuclear weapons projects at the Los Alamos National Laboratory (think Manhattan Project). It was named by Nicholas Metropolis, after the Monte Carlo Casino, where Ulam’s uncle often gambled. Because reasons, apparently.  Peter Muller’s article gives a brief introduction of Markov Chain Monte Carlo simulation, a method that enables the simulation of Bayesian posterior distributions and thus facilitates the use of Bayesian inference.  According to Muller, the goal of MCMC is to set up a Markov Chain with an ergodic distribution and some initial state, where the term ergodic indicates that there is a non-zero probability of the process passing from one state to any other state at each step. Starting at the initial state, transitions (from one state to another) are simulated and the simulated states recorded. The ergodic sample average of simulated states will then converge to the value of the desired posterior integral. Muller notes that two conditions must be met in order to use the resulting integral estimates:

  1. As the number of simulated transitions, M approaches infinity, the chain must converge to the desired posterior distribution
  2. Some diagnostic must be found to determine when practical convergence occurs, i.e. when a sufficient number of simulations have been performed.

The first prerequisite, theoretical convergence, can be reduced to meeting the following three criteria: irreducibility, aperiodicity, and invariance. The second and more ambiguous of the two conditions—a criterion for practical convergence—has several proposed solutions in the literature. For example, Gelman and Rubin (1992) developed an “ANOVA type statistic [for considering] several independent parallel runs of the MCMC simulation,” and Geweke (1992) has suggested a comparison of an early-iterations ergodic average to an ergodic average of later iterations. However, Muller also suggests the simpler method of visual diagnosis via plotting the states for each iteration against the iteration number to judge convergence. 

Markov Chains and MCMC have many useful applications, ranging across a wide spectrum of fields. One such interesting application is in the game of baseball. When viewing a half-inning in a baseball game, there are 28 possible states based on number of outs–0, 1, 2, or 3–and runners on base–different combinations of having no runners, or having runners on first, second, and/or third base (see http://www.pankin.com/markov/theory.htm for a more detailed description of the transition matrix). This gives us a 28×28 transition matrix filled with the probabilities of being in each respective state. From here, we could calculate the expected value of runs scored from each state and analyze how this expectation changes from state to state. We could also extend this to analyzing the probability of scoring a single run by defining a slightly different transition matrix (again, see the link provided above for more detail). Due to its usefulness, MCMC has become a common tool for baseball analysts and sabermetricians (Editor’s Note: Totally had to Google sabermetrician.  I’m feeling I got short changed a little in the job naming category).

Another useful application is in the field of ecology. A useful paper, An Application of Markov Chain Monte Carlo to Community Ecology, serves as a wonderful walkthrough of MCMC with the easily conceptualized example of community assemblages (presence/absence) of birds among islands. As with our stated requirements for MCMC, the next state of bird species’ distribution among a set of islands only depends on their current state distribution. The article does a great jobs of connecting the dots from the ecological concept (birds disperse among islands, possibly due to competition) to an ecological question (given the starting state and some measures of competition among species, what is the probability that a random matrix will exhibit the same level of competition) to mathematical challenge (applying MCMC) and then loops the results back around to answer the ecological question (the distribution of finch species among the Galapagos shows evidence of competition!). If you’re feeling extra badass and want to make the jump from learning about MCMC to coding some examples in R be sure to check out the following blog posts!