Popular books that will help you learn about statistics

You don’t have to read text books to learn about statistics

Statistics sits at the intersection of applied maths and philosophy. We often learn statistics through tutorials and textbooks. But reading analytical textbooks is not the only way to deepen your understanding of statistics. In fact, I would go further and say that students of science must read more broadly into the philosophical side of statistics.

To that end, here’s my list of a the most informative and engaging popular science books on statistics.

The Signal and the Noise - Nate Silver

“Why so many predictions fail - but some don’t”

Prediction lies at the heart of scientific progress. It is how we test new theories to know they are an improvement on older theories. Silver look at both the analytical and psychological sides of prediction. He explains why political pundits are so bad at predicting election outcomes and how we’ve got so good at forecasting the weather, but only at time horizons of up to about 10 days.

In brief, its easier to get good at predicting if you can: make predictions frequently, test your predictions against independent data frequently and update your models for prediction.

Weather is a great example, we have lots of data and an ensemble of mathematical models with which to make predictions. We then get to test those predictions on new data every day.

Societal change is a leading example of where predictions often fail. We observe changes relatively slowly and there is limited data on which to test our models. Further, our models tend to be biased by our own agendas and cultural perceptions.

This book changed my view on my own field of ecology. I realized so many of the studies we read are making implicit predictions, which are likely to fail. And its not likely to get any better, because we don’t get many opportunities test our predictions with novel data.

Bernoulli’s Fallacy - Aubrey Clayton

“Statistical Illogic and the Crisis of Modern Science”

Bernoulli’s Fallacy looks at the history statistics and describes how that history has led to many of the problems we face in modern science.

The book focuses on the rivalry between frequentist and Bayesian statistical paradigms. Many of most influential statisticians of the late 19th and 20th century were frequentists and also progressing racist ideologies of eugenics. Clayton shows how frequentist statistical philosophy was a good fit for these ideological goals, and thus became the prominent statistical paradigm of the time, which continues today.

The book delves into the two different philosophies and the author makes no excuses for being an ardent supporter of the Bayesian philosophy.

There are many lessons that students of science can learn from this book. It is a historical tale, an accessible introduction to statistical theories, an overview of contemporary statistical problems and an argument for the application Bayesian statistics to overcome these problems.

Clayton’s brilliance is making the connection from centuries old political debates to modern science. He shows how the historical focus on frequentist statistics (and suppression of Bayesian ideas) has led us to the repeatability crisis we face in modern science.

For me there is a bigger lesson in this tale: that toxic political ideas can infiltrate the development of scientific methods and that this has repercussions for the quality of science decades or even centuries later.

Aubrey Clayton also has a YouTube series probability theory, found on this page

The Book of Why - Judea Pearl

“The new science of cause and effect”

Statistical analysis is often about inferring causes from observations. But correlations are often misleading, as the saying goes “correlation doesn’t mean causation”.

The Book of Why describes the theory of structural causal modelling, which allows us to infer causation from correlation. Of course, underpinning this theory are some strong assumptions.

The Book of Why introduces the challenge of inferring causation through compelling policy and health examples. It describes the logic of structural causal modelling and how that logic can be applied to infer causes in observational studies.

Statistical orthodoxy holds that you need a randomized controlled experiment to infer a cause. However, many problems are not amenable to randomized experiments, take climate change for instance.

Central to causal reasoning is the construction of ‘counterfactuals’ from models - what could have happened if we had done something else, or if the world was different. Counterfactuals come to the fore in policy analysis, for instance, where we want to know if things would have been different if we did (or did not) have a new policy.

The Book of Why builds an excellent foundational understanding of causal reasoning and counterfactuals, whether you intend to use structural causal modelling or not. Counterfactuals are essential in many other areas of statistical analysis where we are making predictions from models and comparing outcomes of different interventions.

Thinking Fast and Slow - Daniel Kahneman

“Thinking fast and slow” looks at how humans make decisions. Its not a statistical book per-se, but I’m including it here because all good scientists should understand their own biases.

Kahneman won the Nobel Prize for his work in behavioural economics, showing why people often make seemingly irrational choices. The book is based on the idea of two decision processes, quick decision making that is instinctive and often apparently irrational, and slow rational decision making.

The book describes many of the types of cognitive biases humans have when they make decisions, and ways to combat those biases. Much statistical logical is unintuitive to human minds (take the famous Monte Hall problem for instance). Kahneman explains why.

The case of the Apgar score is one leading example. The Apgar is a rapid assessment that neonatal nurses and doctors use to assess the health of newborn babies. Before the Apgar score nurses and doctors would make decisions about a new born baby based on experience. This was resulting in poor outcomes for babies, because of the cognitive load of making life critical decisions under time pressure.

The Apgar score takes away some of that pressure and inconsistency in outcomes. Doctors and nurses work through the checklist and score the baby based on its current state (colour, noise, pulse etc…). The score determines what happens to babies that aren’t breathing strongly, and determines whether they go on a resuscitator or not.

The broader lesson I took from Thinking Fast and Slow is how we can use reasoning and quantitative methods to overcome our own human biases that result in poor decision making.

Statistical Rethinking - Richard McElreath

“A Bayesian Course with Examples in R and STAN”

Ok so this one is a textbook and yes it does have R code in it. But it makes my list of ‘popular science’ books because I’ve never read a text book this engaging and frankly, fun.

McElreath has a way with stories and analogies, and he weaves these in among the R code and data analysis.

The book introduces Bayesian statistical analysis with R and Stan programs, but along the way we learn about many other fundamentals of statistics, including cause and correlation, covariances, prediction and the Bayesian paradigm.

Bayesian statistics has traditionally been seen as a topic for more advanced students who have already mastered frequentist methods. However, McElreath shows that Bayesian statistics absolutely deserves pride of place in introductory courses. McElreath starts simple and guides us through increasingly complex of statistical models in a way that is incremental and intuitive.

This book is for students who want to learn practical skills in statistical programming. But its also a great read for anyone who wants to understand more deeply how stats works.

McElreath also has a popular YouTube channel where he posts the book as a course, if you prefer video format.

Conclusion

There’s a theme that connects all of the books I’ve listed here. Bernoulli’s Fallacy describes how the history of statistics was shaped by its creators’ political biases. These biases have contributed many of the shortcomings science faces today, such as slow uptake on the science of climate change and the reproducibility crisis.

The Signal and the Noise, The Book of Why and Statistical Rethinking are all slightly different answers to these problems. Thinking Fast and Slow is warning to beware our own psychology and ho;w that biases our decisions: We need to be careful especially when we think we are being objective.

I’d be interested to hear your favourite books. You can comment on my substack or email me. My book list is dominated by white guy authors. A key lesson in Bernoulli’s Fallacy is the danger of monopolies in scientific ideas. So I’m keen to hear suggestions on books from a broader range of cultural and scientific backgrounds.