By
Leon (NHBS Catalogue Editor)
25 Jul 2017
Written for Hardback
I ended
my recent essay on surviving the misinformation age by mentioning articles that have drawn attention to the problem that a lot of published research cannot be replicated. The popular press has been quick to tarnish the reputation of science amidst claims of misconduct and fraud. Obviously, science stands or falls by its credibility, so, is there a crisis? This book brings together a cross-disciplinary team of authors to examine replication and recommend best practices. And yes, it shows there are many issues, mostly because doing research well is hard, and can be done poorly in many ways, even inadvertently, but systemic fraud and misconduct are not prevalent.
Stepping in the Same River Twice is divided in three parts. The introductory part is necessarily rather philosophical and conceptual, but makes clear how the problem of replication transcends any particular biological subdiscipline, defines what we even mean with replication and shows how all this matters at each step of a research project.
The bulk of the book is a series of surprisingly pithy chapters (most less than 20 pages), looking at replication in different disciplines, pairing a theoretical background chapter with one or several case study chapters. If you're a bit of a data nerd, this is easily the most enjoyable section of the book, looking at topics such as specimens in natural history collections, environmental monitoring programmes, the effects of time and space, meta-analyses, and metadata and data provenance. Many reasons why replication is so hard quickly emerge from these chapters. Natural history collections generally reflect the interests of whoever put them together, not necessarily aiming to represent species diversity in a particular habitat. Results from monitoring programmes are difficult to compare as there is little consensus on what should be measured in the first place. The assumption in experiments that – all other things being equal – manipulating a cause wil give a certain effect, often doesn't hold true, as these "other things" are rarely equal (e.g. batch effects in chemicals used, or the fact that you cannot replicate the exact time and place when and where experiments were done). The power of meta-analyses to answer specific questions is often hampered by the differences between studies analysed, leaving only the possibility to draw general conclusions. And, finally, there is no incentive for scientists to record data about their data, which would allow other people at a later date to understand the raw data sets and gauge how trustworthy they are. This last one was painfully recognisable – if you struggle going over old data and are faced with questions of "what exactly did I do here?" (I have), imagine how a complete stranger will struggle understanding your raw data. There is still a dearth of sound statistical know-how amongst most researchers, causing many people to repeat the same old mistakes when collecting and recording data.
The final section ties it all together and gives a list of best practices. Things have already been changing in recent years, with more and more journals requiring researchers to lodge their raw data as well (though good quality control and conventions on metadata still need work). We are moving from "open access" to "open science", and the need for improved transparency and accountability is becoming more widely understood.
In many cases, technical solutions and best practices already exist, but cultural and social barriers still stand in the way: researchers remain reluctant to share "their" data, many funding agencies (public and governmental) have no policies on data archiving tied to the funding they award, community-wide standards and consensus on what is sufficient methodological detail in published papers and what metadata is needed with datasets are often lacking, there is insufficient funding available to support all the extra work that scientists need to do to create public data repositories, and there are no sanctions for not archiving data.
This book, then, is required reading for all scientists, no matter their discipline. I know, you will hear this said of a great many books, but believe me, this one really is. And it is so well written that you can breeze through it in a day. The book will not necessarily hand you solutions on a platter: the chapters are simply too short for that, and the range of subjects covered too wide. Instead, this is a very necessary primer to get you thinking and talking to others about how we can improve our scientific practice: the hard, unglamorous work remains yours to do.