I have long enjoyed your blog . This is a topic near to my heart -- I wrote a thesis on the subject back in 2015.
I think you correctly summarize the upshot of Deaton and Nancy Cartwright's position. I would just like to clarify some terminology, that may make the critique easier to understand. No one, including Deaton and Cartwright as far as I'm aware, thinks that RCTs are bad tool. In fact, almost everyone thinks they are very good tools for one, specific, task: making causal inferences.
The critique is really focused on what you do with causal inferences -- sometimes known in the literature as so-called 'evidence-for-use.'
To see the problem, it's helpful to think what an ideal RCT tells you: An ideal RCT gives you an extremely strong evidence that the intervention (whatever form it takes) is the cause of the effect in the model population. What an RCT, however well designed, can never tell you is whether the same intervention will have the same effect in some other population. In order to jump from the inference in the model population to some other target population, you need to extrapolate. (I would emphasize, in passing, here that a target population is *always* distinct from the model population -- the same intervention may have different effects based solely on the time the intervention is administered!)
The difficulty is that an RCT -- by design -- does not explain *why* an intervention worked in the model population. All it tells you is that it did work. RCTs, to the extent we want them to do anything more than generate a true causal inference, have to be accompanied by some theory of mechanisms -- a theory that may need only be intuitive, as I understand Deaton to suggest. And this theory must explain why the *reason* the intervention had its effect in the model population can be expected to obtain with respect to some other population. In other words, to turn a causal inference from an RCT into evidence for, e.g., a policy's efficacy in some other (later, more widespread, geographically distinct, whatever) setting, you have to discharge the burden of showing the reason the effect occurred will hold in the target population.
To tie this back to reality, it is helpful to think about medicine -- aspirin, to use you example. Consider an RCT showing aspirin is effective in population A. We have reason to think that asprin will be effective in population B (say, all mankind more or less) because we know that the mechanism by which aspirin has its effect in population A will be unaffected by any differences in population B. People are the same in the relevant respects, across space and time. The casual pathway is essentially invariant. (Causal pathway is a term in medicine that development economists should pay greater attention to, in my view.) This assumption is more or less a fair one for the majority of medical interventions and the associated RCTs.
The same cannot be said for many RCTs in development economics. Consider deworming children. It may be that a mass deworming program has the effect of better educational outcomes (and associated human capital development) in model population A. But how do we extrapolate that result to other settings? We need to assume that causal pathway, i.e. the mechanism by which the intervention has its effect -- the drug works (okay), school children have fewer parasitic infections (okay), so they have more energy (maybe), are able to attend school earlier (are you sure?) and are more attentive in class (maybe), leading to better educational outcomes -- also obtains in the target population, often in some very different social context. In other words, the causal inference is only useful when it is accompanied by all kinds of other evidence as to whether the causal inference can survive extrapolation. (Please don't take me to be saying deworming is bad. I'm in favour of it, but not because of its affects on human capital accumulation!)
So what's the problem? Well in some sense there isn't one. RCTs are great! But they have to be accompanied by careful empirical research, as you say. But ideal (or close to ideal) RCTs are extremely expensive and time consuming. Furthermore, from a policy design perspective, placing RCTs on a pedestal may come at the cost of the other types of research necessary for good usable policies. Institutional demands for RCTs may also restrict funding for plausible evidence-based (but not RCT based!) interventions. There is nothing wrong with experimenting.
In my view, extrapolation is the real challenge for RCTs in development economics. It is a problem that medicine doesn't really need to grapple with to the same extent -- but they do. And economists should too.
I used to be an educational blogger for a few years. I am pretty familiar with educational research, and how poor it is. Even when its good... people ignore the results.
It was the largest most rigorous education experiment ever conducted, and it clearly showed that Direct Instruction was the most effective teaching method for young children, yet here we are with project based learning taking over our schools.
RCTs are only good if we actually design them well, and then actually pay attention to their results even if they give answers that we disagree with.
They have nothing to do with each other. Direct Instruction is a comprehensive curriculum and program that emphasizes teacher scripts, explicit instruction of things that other curricula consider minutiae, frequent assessment and regrouping of students based on their results. It has its intellectual roots in behaviorism. Project-based learning is a method with its roots in Dewey-ish educatonal progressivism. It is a method and not a school program so its definition is a bit fuzzier, but the point is that students should do a lot of projects and that this is somehow crucial for learning.
Project-based learning is not true? Agreed, research doesn't really support it. (Then again, it's hard to know how to define it precisely enough to be studied. Check out "From The Ivory Tower to the Schoolhouse" by Jack Schneider for a cool take on PBL.)
Direct Instruction does have research supporting its use in various contexts, but that research is often misinterpreted. The primary way it's misinterpreted is as a PEDAGOGICAL METHOD rather than as a COMPREHENSIVE PROGRAM. Clearly its method is related to its proven efficacy, but it's hard to know whether the secret sauce is scripted curricula, frequent regrouping by ability, assessment, or the classroom method. You can find videos of teachers using the DI program online: https://www.youtube.com/watch?v=3cwODCQ9BnU
The recent enthusiasm (I almost wrote 'craze') over RCT is a result of the loss of trust on economic theorizing after the great financial crisis. It has gone too far no doubt, ignoring a century of accumulated insights on how economies work. But I believe the fault is with the theorists who saturated the journals with unrealistic, degenerate models, for academic merit or for political gaslighting purposes.
Hey Noah. Great article! I will check out the podcast too! I also agree with you that RCTs are needed because not all nations or local governments have all the resources they need and sure, RCTs sometimes test obvious interventions, but how can we possibly know which intervention is the most effective at the least cost without an empirical study? Assuming the Oxford vaccine findings hold, discovering that 1.5 doses are more effective than 2 is the least obvious finding, at least to me. Couldn't have done it without an RCT (an accidental RCT for sure), and it has a real impact in terms of cost and supply.
As a doctor, it is my opinion that RCTs are overrated in medicine. The number of patients needed in a trial are usually huge and selecting patients for the study as to who to include and who to exclude is problematic. In my field of high risk obstetrics, there are specific guidelines for a stitch around the cervix to prevent pregnancy loss called a cerclage. The guidelines proven by RCT are very specific, and I have put in cerclages in many patients who do not meet those criteria. I am not a rebellious physician (although maybe a touch arrogant as all physicians are). I do have 15 years of experience behind me and for a few of those years, I saw the natural history of some of those patients who did not qualify for cerclage and when they lost their pregnancies because I declined to do something about it because the RCT said not to do it, it makes an impact. I understand that this too is problematic. But anytime I read something about RCTs being the gold standard triggers me. Yes, evidence is good and powerful, and no, Noah isn’t saying RCTs are the gold standard but the title does, and as mentioned before, I am triggered because medicine relies on it far more than it should.
RCTs are by no means perfect. And many are underpowered or poorly designed. The salient question is: what is a better alternative?
Of course, when we don’t yet have a good RCTs that addresses our particular situation, we are forced to make the best guess we can given the limited information we have. But when we do want to put in the effort to discover the answer to a clinical question, what is the best design of a study to do that?
I don’t know that there is a “best design.” I will admit I do love a large retrospective observational study with no controlled variables. All evidence should be taken into account including common sense. RCTs is just one piece the evidence. In medicine, it seems to be the absolute final say which is problematic...My point is that RCTs should not be considered a “gold” standard. There was a joke article that was an RCT comparing parachute use versus non parachute use and they could not detect a difference because one of the variables was that the height they published was six feet...Again, RCTs are fine until they become the medical stick that is used to beat me with. If it’s the stick that I beat someone else with, then they’re absolutely fine.
If it is true that ‘ Poverty rates across countries are almost perfectly correlated with the "typical" (median) income/consumption in that country...’ then it appears a growth policy is the anti poverty policy with the best evidence base. But that begs 2 questions for which the evidence is far less obvious. What causes growth? and Does growth result from or cause an effective regulatory and redistributive state? The causality in Wagner’s ‘law’ maybe the wrong way round.
One issue seems to be the conflation of whether we think of something as 'good' as an end in itself, and the question of whether that 'good' thing reduces poverty or has a positive effect on some outcome we're interested in. It's clear that most of Deaton's examples, like malaria pills, are desirable things to do whatever their effect on poverty. But at least in the extract you posted, he seems at risk of conflating this with knowing they will reduce poverty.
There are some areas where intuition is useful, but it gets confounded so often as to be untrustworthy. To start with, intuition is something one learns, so a lot depends on who has done the teaching. Once you get into the policy sphere, it is all politics, so intuition may be useful for political survival, but useless otherwise.
If you are actually doing something with serious consequences, intuition is often the enemy. A soldier's intuition is to either avoid danger or race towards it. A pilot's intuition is to navigate in two, not three dimensions. A chemist's intuition is to pour water on fire. A doctor's intuition is to prescribe opiates for pain. In all of those cases, following one's intuition can be a recipe for disaster.
There are a lot of tools for fighting one's intuition. RCTs are among the more effective ones.
Opponents of RCTs seems to have something in common: hubris. If they were proposing an alternative framework that doesn’t presuppose to know the right answer, I’d be all ears. But doing things in the real world requires humility.
I read his criticism as being about RCTs specifically, and the way they're used as justification for what he interprets as noble outsiders flying in and helping out the dumb locals; as evidenced by the following, and his joke about colonization:
"So I have no problem with altruism. It's the effectiveness that I have [issue with].
And when I listened to Peter talking about how easy it is to do these things, and the only thing that people have been doing wrong before was they weren't doing randomized controlled trials, then phase two randomized controlled trials, then we can find out what works… and to me that's just nonsense. And I don't think randomized controlled trials are capable of doing that."
I assumed he was speaking specifically about RCTs, but Julia makes is clear the rest of the discussion is explicitly about RCTs when she said:
"Well, maybe we should talk now about your critique of randomized controlled trials, or RCTs, because that type of evidence is one of the big things that effective altruists" which leads to several pages specifically about RCTs and his problems with them.
He also ends with something that to me makes it clear that when he's talking about teachers and malaria drugs being things we know work, that he's talking about in general, not specific education or healthcare interventions.
"But the point there is, the question is not whether those things can work. We're pretty sure that these things can work. The question is whether government civil servants or government employees working under all the usual constraints of employing workers and all the incentives that go with that, can actually do that.
And that comes to the crux of the matter, really. It's really whether the countries can do this for themselves. Because if we can develop general methods, of things that look like they're promising, then local people have to adapt them for themselves.
So this takes us back to where you started, which is this question, we've got to use local knowledge. We can send blueprints to places, they can look at it and say, "This is interesting, maybe this would work in our context if we adapted this." And that to me makes sense.
I'm just not persuaded by any number of randomized controlled trials, as they're usually run at least"
Sending broad blueprints for locals to implement in a manner that would be likely to work in their situation, seems to be the type of experimentation you seem to think would be a better alternative to RCTs.
RCTs are better to know the cost-effectiveness of different interventions that work. It's obvious to me that governments should know what are the most cost-effective solutions.
However, governments should not always use the most cost-effective solutions. For example: the nordic social democracies use universal programs so that the nordic model isn't seen as charity. They also have high taxes.
However, we should at least know what are the most cost-effective programs and what works. The real problem is that we usually don't even know what works.
Take any tankie and try to convince him of communism's inferiority to capitalism. Or for something less cold-war-y, take tax cuts for firms.
The real problem of RCTs is that they're too expensive and it's not feasible to do it for the big things.
The ideal RCT is the one that's performed on an intervention that's going to be scaled-up, before it's scaled-up.
Do you want to put up the political fight? you can be am activist, but the more effective is to become a politician.
Do you imagine the kind of nerdy personality that usually becomes an economist getting into the political arena?
I find your depiction of economists to be very biased.
In fact, the kind of economists that is NOT partisan is the kind of economist who, like Duflo and Banerjee, have given up on "The Big Questions" and dedicate to study scientifically what can be studied scientifically, refining our knowledge in the process.
Hi Noah,
I have long enjoyed your blog . This is a topic near to my heart -- I wrote a thesis on the subject back in 2015.
I think you correctly summarize the upshot of Deaton and Nancy Cartwright's position. I would just like to clarify some terminology, that may make the critique easier to understand. No one, including Deaton and Cartwright as far as I'm aware, thinks that RCTs are bad tool. In fact, almost everyone thinks they are very good tools for one, specific, task: making causal inferences.
The critique is really focused on what you do with causal inferences -- sometimes known in the literature as so-called 'evidence-for-use.'
To see the problem, it's helpful to think what an ideal RCT tells you: An ideal RCT gives you an extremely strong evidence that the intervention (whatever form it takes) is the cause of the effect in the model population. What an RCT, however well designed, can never tell you is whether the same intervention will have the same effect in some other population. In order to jump from the inference in the model population to some other target population, you need to extrapolate. (I would emphasize, in passing, here that a target population is *always* distinct from the model population -- the same intervention may have different effects based solely on the time the intervention is administered!)
The difficulty is that an RCT -- by design -- does not explain *why* an intervention worked in the model population. All it tells you is that it did work. RCTs, to the extent we want them to do anything more than generate a true causal inference, have to be accompanied by some theory of mechanisms -- a theory that may need only be intuitive, as I understand Deaton to suggest. And this theory must explain why the *reason* the intervention had its effect in the model population can be expected to obtain with respect to some other population. In other words, to turn a causal inference from an RCT into evidence for, e.g., a policy's efficacy in some other (later, more widespread, geographically distinct, whatever) setting, you have to discharge the burden of showing the reason the effect occurred will hold in the target population.
To tie this back to reality, it is helpful to think about medicine -- aspirin, to use you example. Consider an RCT showing aspirin is effective in population A. We have reason to think that asprin will be effective in population B (say, all mankind more or less) because we know that the mechanism by which aspirin has its effect in population A will be unaffected by any differences in population B. People are the same in the relevant respects, across space and time. The casual pathway is essentially invariant. (Causal pathway is a term in medicine that development economists should pay greater attention to, in my view.) This assumption is more or less a fair one for the majority of medical interventions and the associated RCTs.
The same cannot be said for many RCTs in development economics. Consider deworming children. It may be that a mass deworming program has the effect of better educational outcomes (and associated human capital development) in model population A. But how do we extrapolate that result to other settings? We need to assume that causal pathway, i.e. the mechanism by which the intervention has its effect -- the drug works (okay), school children have fewer parasitic infections (okay), so they have more energy (maybe), are able to attend school earlier (are you sure?) and are more attentive in class (maybe), leading to better educational outcomes -- also obtains in the target population, often in some very different social context. In other words, the causal inference is only useful when it is accompanied by all kinds of other evidence as to whether the causal inference can survive extrapolation. (Please don't take me to be saying deworming is bad. I'm in favour of it, but not because of its affects on human capital accumulation!)
So what's the problem? Well in some sense there isn't one. RCTs are great! But they have to be accompanied by careful empirical research, as you say. But ideal (or close to ideal) RCTs are extremely expensive and time consuming. Furthermore, from a policy design perspective, placing RCTs on a pedestal may come at the cost of the other types of research necessary for good usable policies. Institutional demands for RCTs may also restrict funding for plausible evidence-based (but not RCT based!) interventions. There is nothing wrong with experimenting.
In my view, extrapolation is the real challenge for RCTs in development economics. It is a problem that medicine doesn't really need to grapple with to the same extent -- but they do. And economists should too.
Agreed!
I used to be an educational blogger for a few years. I am pretty familiar with educational research, and how poor it is. Even when its good... people ignore the results.
I suggest looking at "Project Follow Through" https://en.wikipedia.org/wiki/Follow_Through_(project)
It was the largest most rigorous education experiment ever conducted, and it clearly showed that Direct Instruction was the most effective teaching method for young children, yet here we are with project based learning taking over our schools.
RCTs are only good if we actually design them well, and then actually pay attention to their results even if they give answers that we disagree with.
What's the difference between direct instruction and project-based learning?
They have nothing to do with each other. Direct Instruction is a comprehensive curriculum and program that emphasizes teacher scripts, explicit instruction of things that other curricula consider minutiae, frequent assessment and regrouping of students based on their results. It has its intellectual roots in behaviorism. Project-based learning is a method with its roots in Dewey-ish educatonal progressivism. It is a method and not a school program so its definition is a bit fuzzier, but the point is that students should do a lot of projects and that this is somehow crucial for learning.
But that's not true?
Project-based learning is not true? Agreed, research doesn't really support it. (Then again, it's hard to know how to define it precisely enough to be studied. Check out "From The Ivory Tower to the Schoolhouse" by Jack Schneider for a cool take on PBL.)
Direct Instruction does have research supporting its use in various contexts, but that research is often misinterpreted. The primary way it's misinterpreted is as a PEDAGOGICAL METHOD rather than as a COMPREHENSIVE PROGRAM. Clearly its method is related to its proven efficacy, but it's hard to know whether the secret sauce is scripted curricula, frequent regrouping by ability, assessment, or the classroom method. You can find videos of teachers using the DI program online: https://www.youtube.com/watch?v=3cwODCQ9BnU
The recent enthusiasm (I almost wrote 'craze') over RCT is a result of the loss of trust on economic theorizing after the great financial crisis. It has gone too far no doubt, ignoring a century of accumulated insights on how economies work. But I believe the fault is with the theorists who saturated the journals with unrealistic, degenerate models, for academic merit or for political gaslighting purposes.
Ha. Too true, sadly.
Hey Noah. Great article! I will check out the podcast too! I also agree with you that RCTs are needed because not all nations or local governments have all the resources they need and sure, RCTs sometimes test obvious interventions, but how can we possibly know which intervention is the most effective at the least cost without an empirical study? Assuming the Oxford vaccine findings hold, discovering that 1.5 doses are more effective than 2 is the least obvious finding, at least to me. Couldn't have done it without an RCT (an accidental RCT for sure), and it has a real impact in terms of cost and supply.
Yep
As a doctor, it is my opinion that RCTs are overrated in medicine. The number of patients needed in a trial are usually huge and selecting patients for the study as to who to include and who to exclude is problematic. In my field of high risk obstetrics, there are specific guidelines for a stitch around the cervix to prevent pregnancy loss called a cerclage. The guidelines proven by RCT are very specific, and I have put in cerclages in many patients who do not meet those criteria. I am not a rebellious physician (although maybe a touch arrogant as all physicians are). I do have 15 years of experience behind me and for a few of those years, I saw the natural history of some of those patients who did not qualify for cerclage and when they lost their pregnancies because I declined to do something about it because the RCT said not to do it, it makes an impact. I understand that this too is problematic. But anytime I read something about RCTs being the gold standard triggers me. Yes, evidence is good and powerful, and no, Noah isn’t saying RCTs are the gold standard but the title does, and as mentioned before, I am triggered because medicine relies on it far more than it should.
RCTs are by no means perfect. And many are underpowered or poorly designed. The salient question is: what is a better alternative?
Of course, when we don’t yet have a good RCTs that addresses our particular situation, we are forced to make the best guess we can given the limited information we have. But when we do want to put in the effort to discover the answer to a clinical question, what is the best design of a study to do that?
I don’t know that there is a “best design.” I will admit I do love a large retrospective observational study with no controlled variables. All evidence should be taken into account including common sense. RCTs is just one piece the evidence. In medicine, it seems to be the absolute final say which is problematic...My point is that RCTs should not be considered a “gold” standard. There was a joke article that was an RCT comparing parachute use versus non parachute use and they could not detect a difference because one of the variables was that the height they published was six feet...Again, RCTs are fine until they become the medical stick that is used to beat me with. If it’s the stick that I beat someone else with, then they’re absolutely fine.
If it is true that ‘ Poverty rates across countries are almost perfectly correlated with the "typical" (median) income/consumption in that country...’ then it appears a growth policy is the anti poverty policy with the best evidence base. But that begs 2 questions for which the evidence is far less obvious. What causes growth? and Does growth result from or cause an effective regulatory and redistributive state? The causality in Wagner’s ‘law’ maybe the wrong way round.
One issue seems to be the conflation of whether we think of something as 'good' as an end in itself, and the question of whether that 'good' thing reduces poverty or has a positive effect on some outcome we're interested in. It's clear that most of Deaton's examples, like malaria pills, are desirable things to do whatever their effect on poverty. But at least in the extract you posted, he seems at risk of conflating this with knowing they will reduce poverty.
There are some areas where intuition is useful, but it gets confounded so often as to be untrustworthy. To start with, intuition is something one learns, so a lot depends on who has done the teaching. Once you get into the policy sphere, it is all politics, so intuition may be useful for political survival, but useless otherwise.
If you are actually doing something with serious consequences, intuition is often the enemy. A soldier's intuition is to either avoid danger or race towards it. A pilot's intuition is to navigate in two, not three dimensions. A chemist's intuition is to pour water on fire. A doctor's intuition is to prescribe opiates for pain. In all of those cases, following one's intuition can be a recipe for disaster.
There are a lot of tools for fighting one's intuition. RCTs are among the more effective ones.
Opponents of RCTs seems to have something in common: hubris. If they were proposing an alternative framework that doesn’t presuppose to know the right answer, I’d be all ears. But doing things in the real world requires humility.
I read his criticism as being about RCTs specifically, and the way they're used as justification for what he interprets as noble outsiders flying in and helping out the dumb locals; as evidenced by the following, and his joke about colonization:
"So I have no problem with altruism. It's the effectiveness that I have [issue with].
And when I listened to Peter talking about how easy it is to do these things, and the only thing that people have been doing wrong before was they weren't doing randomized controlled trials, then phase two randomized controlled trials, then we can find out what works… and to me that's just nonsense. And I don't think randomized controlled trials are capable of doing that."
I assumed he was speaking specifically about RCTs, but Julia makes is clear the rest of the discussion is explicitly about RCTs when she said:
"Well, maybe we should talk now about your critique of randomized controlled trials, or RCTs, because that type of evidence is one of the big things that effective altruists" which leads to several pages specifically about RCTs and his problems with them.
He also ends with something that to me makes it clear that when he's talking about teachers and malaria drugs being things we know work, that he's talking about in general, not specific education or healthcare interventions.
"But the point there is, the question is not whether those things can work. We're pretty sure that these things can work. The question is whether government civil servants or government employees working under all the usual constraints of employing workers and all the incentives that go with that, can actually do that.
And that comes to the crux of the matter, really. It's really whether the countries can do this for themselves. Because if we can develop general methods, of things that look like they're promising, then local people have to adapt them for themselves.
So this takes us back to where you started, which is this question, we've got to use local knowledge. We can send blueprints to places, they can look at it and say, "This is interesting, maybe this would work in our context if we adapted this." And that to me makes sense.
I'm just not persuaded by any number of randomized controlled trials, as they're usually run at least"
Sending broad blueprints for locals to implement in a manner that would be likely to work in their situation, seems to be the type of experimentation you seem to think would be a better alternative to RCTs.
RCTs are better to know the cost-effectiveness of different interventions that work. It's obvious to me that governments should know what are the most cost-effective solutions.
However, governments should not always use the most cost-effective solutions. For example: the nordic social democracies use universal programs so that the nordic model isn't seen as charity. They also have high taxes.
However, we should at least know what are the most cost-effective programs and what works. The real problem is that we usually don't even know what works.
Take any tankie and try to convince him of communism's inferiority to capitalism. Or for something less cold-war-y, take tax cuts for firms.
The real problem of RCTs is that they're too expensive and it's not feasible to do it for the big things.
The ideal RCT is the one that's performed on an intervention that's going to be scaled-up, before it's scaled-up.
You can study it but you're not going to find a relationship, because no one listens to economists anyway 😂
This is not a realistic caricature, but if you do believe this, why don't you believe Deaton is like this?
It's not what it's easier, it's what is nobler.
Do you want to put up the political fight? you can be am activist, but the more effective is to become a politician.
Do you imagine the kind of nerdy personality that usually becomes an economist getting into the political arena?
I find your depiction of economists to be very biased.
In fact, the kind of economists that is NOT partisan is the kind of economist who, like Duflo and Banerjee, have given up on "The Big Questions" and dedicate to study scientifically what can be studied scientifically, refining our knowledge in the process.