My #StatStud Starter Satchel Set-Up

The format for today’s blog post has been graciously borrowed from the Uses This website. This website hosts a collection of nerdy interviews asking people from all walks of life what they use to get the job done. I first stumbled upon this type of blog post on Hilary Parker’s old blog (ummm…I don’t think there’s a part 2?). Finding this blog post as a fledgling statistics PhD student was highly informative. What DO we all use to get our research done? I’m pretty certain if you asked grad students in my department they’d give  (at least slightly) different answers every time. And that is most certainly true as you get into most specific focuses (i.e. genetics data, theoretical statistics, etc). The outline below is certainly not the only/best set up, but it’s what I’ve got going on.

f8c8wjWfRRCeEpyn15s5mg

Who are you, and what do you do?

I am Meridith Bartley, one-half of Sweet Tea, Science, and I am an ecological statistician. Ecolostician? Staticologist? I studied biology and ecology for a bit (well….six and a half years!  I have a Master’s degree in Wildlife Science and a BS in Biology) and now I am in my 5th year as a PhD student in the Statistics Department at Penn State. I’ve written about my experiences, daily life, tips, and reasons for this change of field a few times on this blog. I’m currently working on two projects: one looking at modelling the feeding interactions of laboratory ants we have video monitored (so much data!) and another exploring how to identify when one might be extrapolating in a multivariate response model with a neat application to lake water quality data. I spend most of my time writing code and manuscripts and trying to understand what the heck it is I am coding and writing. Almost all of my work is done either on my computer OR on scrap pieces of paper, which hopefully end up copied over to my “lab” notebook.

When I’m not bent over at my desk, occasionally reminding myself to sit up straight, I am usually trying to find enjoyable things to take my mind off of work. I like to help out our department’s grad student association and have served as president for a few years now. I also like to stay active and have been getting more into cycling. For the sake of this post’s “What do you use” format, I’ll go ahead and say I have an old hand-me-down (possibly soon to be a hand-me-back to its previous owner!) Schwinn road bike for around town, and I just got a new Specialized Diverge E5 for longer rides with these never-ending Pennsylvania hills.

What hardware do you use?

I use a 13 inch, early-2014 MacBook Air for all of my computer work. It’s been going strong since the start of my PhD and I’m pleased to see that it may just make it all the way through to the end…whenever that might be! I haven’t regretted it one minute and love how light it is, especially compared to some of the monster machines my fellow grad students are lugging around on their backs. Speaking of sore backs, a fair amount of my gear is aimed at avoiding one myself. I have a Roost Laptop Stand and carrying case I got through a Kickstarter a few years ago. The case fits the stand, an Apple keyboard, an Apple mouse (bought used), and has a pocket  where I keep some commonly needed cords. All of these items fit nicely into my Chrome backpack (kinda close one to my older model can be found here) along with a National Brand Computation Notebook I use as a lab-style notebook. I keep notes from my weekly advisor meeting in here, and after I’ve worked out a model on scrap paper, I copy it over to this notebook. It’s not an ideal system, but it’s my system!

I’m often carrying a replacement MacBook Air charging cord from ZrtKe safely housed in a reused ipsy Glam Bag. (I used to get these bags monthly and they come with 5 makeup samples.) My original cord is retired from travels and lives on my office desk, held together precariously by Sugru Moldable Glue. Other items include an iPhone 8 plus with an extra battery charge in its Mophie External Battery case, my stainless steel water bottle with nifty Netflix logo (similar here), and my SkullCandy Crusher Wireless headphones. I got the headphones in an airport shop en route to Scotland this summer and I do not regret it!  Sometimes, I carry around reusable glass jars and cloth bags, but only if I’m planning on going to the farmers market.

On campus I have a desk in a 6-person graduate student office. I have a Dell external monitor that, right now, just decorates my desk as a recent macOS update broke the DisplayLink connection. Hopefully, the upcoming Mojave OS release will fix this issue, and I’ll once more be running at full visual capacity. When the connection IS working, I bridge the connection between my Mac and the DVI cable on the monitor with a Toshiba Dynadock docking station I’m borrowing from my husband. You can connect two monitors to it though a variety of inputs, and then it connects to a laptop through a USB port. I also have one of the comfier desk chairs in the department offices…simply because I asked for one at an opportune time, I think.

And what software?

I do all of my statistical coding in R with RStudio customized to have a dark background. Typically, when I’m doing “coding work” it’s either working out a simple version of some example code to figure out how a method works/runs, coding up my own (more complicated) model to use with/on my data, or trying to figure out WHY WON’T MY CODE WORK. If some of my code takes too long on my machine, my advisor has set it up so I can connect to his machine on campus remotely. Connecting to his machine requires lots of process with lots of acronyms (e.g. SSH, probably others). Previously, I used had to start code running via the terminal…but I am admittedly very lacking in know-how, so now I get to work through a RStudio remote server . My husband has iTerm2 installed on my Mac to use rather than the built-in terminal. The benefits seem to be keyboard shortcuts and customizability? I have to ask him to help me with my computer often (e.g. when an update breaks something, often paths to files that help convert files to PDFs weirdly) he uses iTerm2 so I keep it. Other students in my department will deploy their code on a computing cluster (you can learn more in these slides from a fellow stats student). I don’t know much about this approach either, but it seems important to note this is a thing that exists!

I write all of my papers (and various class projects) in either RMarkdown (if I also want to include code) or LaTeX. I’ve written a longer blogpost about this but semi-simply put: LaTeX (yes, with those letters capitalized) is a markup language, but also there’s TeX which is a typesetting system. You have to install TeX onto your computer, and then install a LaTeX text editor. For Mac users that means going to MacTex and downloading the current distribution. Once this is installed, you may “code” documents in a LaTeX editor and then compile them into PDFs, where they look (as Hilary Parker puts it) pretty and professional and mathy. I honestly didn’t know this was a system that existed when I started this program. I just thought that mathy folk were super good at the equation editor in Word. I often use Overleaf to write LaTeX, it’s online and can be collaborative. I sometimes also use Sublime Text as my editor, especially when I’m writing offline. I use Mendeley to keep track of all of my papers to reference (Editor’s Note: I was starting to panic that I wouldn’t recognize anything in this post other than R…but look, a wild Mendeley appears!) and it’s super easy to import a .bib file through Overleaf’s interface.  I’m not completely sold on Mendeley over other options (e.g. Zotero, Papers) but I haven’t been convinced to switch yet either (Editor’s Note: LOL  Okay then).

I try to keep all of my work on GitHub for version control. Using this with RStudio is my approach but that’s a whole post on its own. I recommend starting here for learning more. Also if you’re a student check out this nifty student developer package. There’s a lot in there I’ve never used but it DOES have unlimited private repositories which can be very useful!

In my menu bar I have Caffeine to keep my screen from sleeping, f.lux to adapt my computer display colors throughout the day (but I’m always turning it off), and constant reminders that my Dropbox is full. Which is fine, I don’t know why it’s full, I usually use Google Drive. Which is also full.

I typically make my powerpoint slides and posters for conferences within Overleaf using Beamer. That way I can use LaTeX code when creating the content and the formatting is all done behind the scenes. I typically try and change the template every so slightly so mine don’t look similar to others (especially at a stat’s conference). I use this site to find various LaTeX templates.

On my phone, I use the Notes app to keep track of weekly and monthly goals. This is a new thing I’m trying out and I’m liking it so far! I also am a fan of the Snapseed app for quick photo edits before adding them to Instagram. I also try to do some daily morning Spanish Duolingo and take daily second-long videos using the 1SE app. As a bonus fun fact: my phone apps are organized by color and my background is a photo of a bunch of succulents.

What would be your dream setup?

Really looking forward to the day when I have room in an apartment/house for a home office. I enjoy working off campus, but our one bedroom doesn’t quiet allow for both Benjamin and I to work there comfortably. And we certainly don’t have any room for a permanent set-up with external monitors. I’m sure I’ll need a new laptop eventually, but I don’t anticipate switching it up too much. I keep hearing whispers of a larger design update to the new MacBook Airs, so I suppose we’ll see where that goes once I’m in the market for a new laptop. I also enjoy the idea of replacing my notebooks and papers with an iPad or iPad pro + Apple Pencil. This twitter thread about using a tablet to read/mark journal articles is great! I love that my fellow lefties LOVE writing on them. It seems once broken DisplayLink issues are fixed, one could also use an iPad as a second monitor (with an app purchase). In an idea world (where we also have loads of windows and a yard for ALL THE PLANTS), I’d love a standing desk. Also, computer screens you can see in the sunlight. Also also, a dog.

Did you like this style of blog post? You can find some other interesting ones on the Uses This website. I highly recommend David L. Miller’s post – he’s also a ecological statistician! Other relevant ones are Yihui Xie from when he was a statistics PhD student (now at RStudio), Amelia Greenhall, the Executive Director of Double Union, a non-profit feminist hacker/maker space(!), and the entire Scientist category (so many great ones here!). If you want to do a guest post for us this would also be a straight-forward post style to follow!

Making the Most of MCMC

Sometimes in grad school you need to write about topics that you yourself have little to no clue about. Part of this learning process is figuring out how to teach yourself some of these very difficult concepts. This blog post comes from a blog post I co-wrote with my cohort chum, Justin, 
By: Justin and Meridith

Markov Chains, and particularly Markov Chains Monte Carlo, are a difficult concept to explain. In fact, Dr. Hanks has stated that they are “Easier done than said.” At the very basis of everything, Markov Chains are a system that transitions from one state to another state. It is a random memorylessness process, that is,  the next state depends only on the current state and not on the sequence of events that preceded it. I have scoured the web and believe the following to be the simplest visual introduction to Markov Chains. (Spoiler Alert: It arose from someone – Andrey Markov –  being a sassmaster.)

From here, it’s easy to start to gain an appreciation for the wide breadth of applications available for Markov Chains. However, if we want to transition, as it were, from Markov Chains to Markov Chain Monte Carlo simulations, we must first explore Monte Carlo methods. These methods are a class of computational algorithms that utilize repeated random sampling (simulations) to obtain the distribution of an unknown probabilistic entity. The modern version of the Monte Carlo method was invented in the late 1940s by Stanislaw Ulam (coolest name ever), while he was working on nuclear weapons projects at the Los Alamos National Laboratory (think Manhattan Project). It was named by Nicholas Metropolis, after the Monte Carlo Casino, where Ulam’s uncle often gambled. Because reasons, apparently.  Peter Muller’s article gives a brief introduction of Markov Chain Monte Carlo simulation, a method that enables the simulation of Bayesian posterior distributions and thus facilitates the use of Bayesian inference.  According to Muller, the goal of MCMC is to set up a Markov Chain with an ergodic distribution and some initial state, where the term ergodic indicates that there is a non-zero probability of the process passing from one state to any other state at each step. Starting at the initial state, transitions (from one state to another) are simulated and the simulated states recorded. The ergodic sample average of simulated states will then converge to the value of the desired posterior integral. Muller notes that two conditions must be met in order to use the resulting integral estimates:

  1. As the number of simulated transitions, M approaches infinity, the chain must converge to the desired posterior distribution
  2. Some diagnostic must be found to determine when practical convergence occurs, i.e. when a sufficient number of simulations have been performed.

The first prerequisite, theoretical convergence, can be reduced to meeting the following three criteria: irreducibility, aperiodicity, and invariance. The second and more ambiguous of the two conditions—a criterion for practical convergence—has several proposed solutions in the literature. For example, Gelman and Rubin (1992) developed an “ANOVA type statistic [for considering] several independent parallel runs of the MCMC simulation,” and Geweke (1992) has suggested a comparison of an early-iterations ergodic average to an ergodic average of later iterations. However, Muller also suggests the simpler method of visual diagnosis via plotting the states for each iteration against the iteration number to judge convergence. 

Markov Chains and MCMC have many useful applications, ranging across a wide spectrum of fields. One such interesting application is in the game of baseball. When viewing a half-inning in a baseball game, there are 28 possible states based on number of outs–0, 1, 2, or 3–and runners on base–different combinations of having no runners, or having runners on first, second, and/or third base (see http://www.pankin.com/markov/theory.htm for a more detailed description of the transition matrix). This gives us a 28×28 transition matrix filled with the probabilities of being in each respective state. From here, we could calculate the expected value of runs scored from each state and analyze how this expectation changes from state to state. We could also extend this to analyzing the probability of scoring a single run by defining a slightly different transition matrix (again, see the link provided above for more detail). Due to its usefulness, MCMC has become a common tool for baseball analysts and sabermetricians (Editor’s Note: Totally had to Google sabermetrician.  I’m feeling I got short changed a little in the job naming category).

Another useful application is in the field of ecology. A useful paper, An Application of Markov Chain Monte Carlo to Community Ecology, serves as a wonderful walkthrough of MCMC with the easily conceptualized example of community assemblages (presence/absence) of birds among islands. As with our stated requirements for MCMC, the next state of bird species’ distribution among a set of islands only depends on their current state distribution. The article does a great jobs of connecting the dots from the ecological concept (birds disperse among islands, possibly due to competition) to an ecological question (given the starting state and some measures of competition among species, what is the probability that a random matrix will exhibit the same level of competition) to mathematical challenge (applying MCMC) and then loops the results back around to answer the ecological question (the distribution of finch species among the Galapagos shows evidence of competition!). If you’re feeling extra badass and want to make the jump from learning about MCMC to coding some examples in R be sure to check out the following blog posts!