Opinion | Significance of double-blind drug trials6 min read . Updated: 09 Oct 2020, 08:18 AM IST
- A clinical trial should be doubly blind for results to have any statistical significance
Two weeks ago, I offered “a broad and not necessarily definitive outline" of the clinical trials of new vaccines against covid-19. Apart from the hope that we will produce a weapon to fight the virus, what really interests me about these trials is that they are essentially mathematical exercises. Since any vaccine that results from these trials will be important to us all, I will explore some of that mathematics here.
Much of what happens in such exercises involves searching for results we need to take note of, that carry some weight. Start at the top. We want to find a medicine that is effective against covid, so how do we determine that one we are testing is (or is not) effective?
Let’s say we have 100 people who have volunteered for the trials. We’ve divided them into two groups of 50 each. One will be administered the experimental drug, the other a placebo — i.e. something that looks identical, but has no medicinal value at all. There are rules for administering a placebo correctly, and I’ll come to those. For now, let’s assume they have been followed.
The trial runs its course. The placebo group reports that one person has recovered, whereas the group that got the actual drug reports that five have recovered. What, if anything, can we conclude? Is this just chance? Is there a real difference between the groups? Is this enough to conclude anything about the efficacy of the drug?
When quantitative methods like trials produce a result, statisticians search for something called “statistical significance" in that result. Meaning, evidence that it cannot be attributed to chance. The usual measure to decide this is called the “p-value". This is the probability of observing the difference you found between the two groups if they were otherwise identical in every respect, including in what’s been administered to them.
There’s a certain calculation involved here (usually called the “A/B test calculation") whose intricacies I won’t get into. But there are A/B test calculators online, and I fed the numbers above, from our hypothetical two groups, above to one of them. It reported a p-value of 0.0938, or 9.38%.
What this means is that we can expect to randomly—i.e. by chance—see a difference of 5% or more between these two groups, about 9.38% of the time. That’s confusing; so more simply, pure chance would account for the variance we’ve detected about 94 times of every 1,000 times we run this test. The other 906 times, we can attribute this difference to the effect of the medicine. With this knowledge, what we need to decide is, is this 9.38% level acceptable? Is it low enough for us to take the measured difference between the two groups seriously? As statistically significant?
To put that in perspective, let’s say the second group, the one taking the drug, saw not five, but just two recoveries. Still different from the one recovery from among the placebo recipients, but how different is it, really? The same A/B test calculator now gives us a p-value of 0.5597 or 55.97%. So of every 1,000 times we run this test, random chance will produce a reasonable difference between these two groups 560 times. The sensible conclusion you probably came to, as I did: this difference between these two groups is likely just pure chance, and this particular drug under test is not really different from a placebo.
The five recoveries? Now that’s more like a real difference. But even so, a generally accepted threshold for statistical significance is a p-value less than 0.05, or 5%. Put another way; we want to be 95% “confident" that the variance we’ve detected is real, and not due to chance. In this case, we’re only 90.6% confident (the 906 out of 1,000 above). So most statisticians would not read very much into even the five recoveries. In fact, even the calculator tells me this difference is “not significant".
One reason for that, and a factor in any such calculation, is the sample size — the number of people we are running the test on. Intuitively, you will agree that the greater that number, the more confidence we can have that our results are real. Conversely, the smaller our sample size, the less confident we’ll feel. To make this clear, let’s say we run the test on two groups, each comprising 500 volunteers instead of 50. We’ll keep the recovery rates the same: in the placebo group, 10 of the 500 recover; in the other, 50 do. Feed those figures into the A/B calculator, and we get a p-value less than 0.001: not even 1 test in a thousand will produce this difference purely by chance.
That is, with this increased sample size, even with the same rate of recovery, we can be better than 99.9% confident that the detected difference between the two groups is real.
Of course, there’s another aspect to these hypothetical numbers that we should pay attention to. If just 50 of the 500 who were given the drug recover, that’s a success ratio of only 10%. Certainly, this is significantly better than the placebo group; certainly, it’s better than not having a medicine at all. But if a covid vaccine works only 10% of the time, is that effective enough to release it for public use? In the absence of anything else, it may indeed be. Yet it is a question to ask about and during drug trials, especially when we are faced with a pandemic.
Finally, a little bit about placebos and their use in a trial: It’s easy to simply pronounce that one group of volunteers will be given the experimental vaccine and the other a placebo. But it’s worth thinking this through, because the use of placebos raises some interesting questions.
Suppose one of the people administering the test knows who among the volunteers has been infected with the coronavirus. Suppose he thinks he will do these infected people a favour and put them all into the group that will get the vaccine. This means the placebo is given to people who are not infected anyway. As you can imagine, this will skew the results in different ways. Besides, has he actually done them, and the rest of us, a favour?
There’s another ethical dilemma. Among those who sign up for the trial are, no doubt, people who are infected and have no medicine to help them recover. Absent anything else, they are willing to try even an experimental medicine in the hope that it might work. How would such a person react to the knowledge that what she was given was just a dummy, the placebo?
For these reasons, a placebo is administered as part of what statisticians refer to as a “double-blind" trial. The volunteers who sign up must be told that they will get either the medicine or a placebo, that they will not know which, and they will be assigned at random to either group. (Though the trial will be designed so that half get one and half the other). They must consent to these conditions. The doctors running the trial are also “blind": they don’t know the composition of the groups, nor whether a given volunteer is infected, nor if a given dose they are administering is the vaccine or the placebo.
Only if it is truly double-blind will the drug trial have any value. Only then will we have made progress towards finding a vaccine that works.
Once a computer scientist, Dilip D’Souza now lives in Mumbai and writes for his dinners. His Twitter handle is @DeathEndsFun