*22 Jun 2016*

I’m sitting here at the **International Coral Reef Symposium in Hawaii**. Looking at all the exciting graphs, data and analysis going up got me thinking: what are the key Rstats tools coral reef scientists should learn?

Starting out with Rstats can be overwhelming because there are so many tools and packages available (but check out some of the free courses on my page). So here is a quick list of the key tools (packages and R functions) you should learn.

A data visualisation tool tops the list. Data visualisation is the first step of analysis - you can quickly look for interesting patterns and check for errors. The `ggplot2`

(‘grammar of graphics 2’) package makes dataviz quick and easy. You can rapidly slice and dice your data in different ways for viz and also overplot simple analyses, like regressions rapidly. Check out the guide here.

Linear models are becoming the dominant statistical tool for univariate data analysis in ecology and have largely replaced ANOVA methods. For instance, if you want to know how coral growth relates to temperature, how fish biomass relates to fishing pressure and so on you would probably use a linear model. Linear models are implemented in R with the function `lm()`

.

A handy guide for analysis and interpretation of linear models has just been published and check out the books by the same author for guidance on how to code them up.

A few quick examples on why `lm()`

is so great.
Just using a single continuous predictor - that is linear regression. Multiple continous predictor variables: multiple regression. If you combine categorical and continuous predictor variables you have your classic ANCOVA analysis.
Use the `predict()`

function to see what your trend lines look like.
You can get confidence interval using the `confint()`

functions. Do away with p-values they are dated 20th century statistics. Well ok p-values have their uses, but confidence intervals let you check how strong your effect is, as well as it’s significance.

Once you mastered `lm()`

, `glm()`

(generalized linear model) is a must for ecologists. We often deal with non-normal data (ie doesn’t fit a normal distribution), like presence/absence data and counts. `lm()`

is inappropriate for this type of data. Way back in the 70s some clever people invented GLMs as a generalization of logistic regression. GLMs can handle many types of data.
Check out Zuur et al.s book for a handy guide.

Ecological data is never simple and our data are often grouped. For instance, we often have blocks in experiments or transects within sites from field surveys. The package `lme4`

lets you implement random effects in linear models. Random effects are a flexible way to account for nuisance sources of variation, which may be important but we don’t have a specific hypothesis for which way the effect will go.
Two key functions in `lme4`

are `lmer()`

which is the random effects version of `lm()`

and `glmer()`

, the random effects version of `glm()`

. Again, check out Zuur et al.s book for a handy guide.

Got a large data-set? Need to merge data-sets by a common ID (e.g. site names)? Need to generate new variables from exisiting variables? Need to create data summaries? The package `dplyr`

(‘data pliers’) is a must for all these tasks. Check out the vignette.

Ok, so you have entered your data (or got it from someone else), but wait a minute, the way you entered it doesn’t let you use `ggplot2`

, `dplyr`

, `lm()`

.. (if you followed my guide you shouldn’t have this problem). Do you need to reenter it?
Nope, `tidyr`

(for tidy data) to the rescue. The package `tidyr`

lets you reformat data in ways (e.g. from wide format to long format) in the way that most stats packages prefer it. Check out this blog.

The package `vegan`

provides good old multi-variate analysis, without the cost of Primer. Admittedly this package is slightly harder to use than Primer, but hey if you are already using R, the jump to primer is easy. Check out the vignettes to get started.

That’s my top 7. Good luck getting started.

Designed by Chris Brown. Source on Github