It seems like every conversation about Iraq nowadays begins with, "Everybody knows we’re going to lose this war, it’s just a matter of when we choose to leave." Maybe, they think the whole thing is a matter of silly juvenile behavior. The reason for this defeatist attitude is that people have some absurd preconceptions about the nature and extent of the violence. They don’t understand what we are fighting. They don’t understand what we are accomplishing. They don’t know how bad things can really get, and they have an exaggerated picture of the current violence. Things are bad, I recognize this. We are not "winning" yet. That’s true. However, it is far from a hopeless situation, and it’s far from clear that things are worse than they were before the war. (Did you know, for instance, that oil production in Iraq has steadily increased?) Mostly the defeatist attitude is a matter of self-deception and lack of perspective. I know. I, myself, feel guilty when I have offended someone. That’s not an unusual attitude in this country. Americans want to be "nice" to everyone. Even if we invade your country, we have to see ourselves as merely, ah …, helping out. War does not abide that attitude. You can’t just re-sheathe your sword when the work becomes tiresome.
Once upon a time, there weren’t so many simple-minded people here. When America was looking toward half a million casualties, or more, to invade the main islands of Japan, there was little trepidation and shirking. The members of the invasion force saw themselves as marked for death. Only MacArthur looked forward eagerly, but everyone looked forward. Thank God for the Enola Gay and its life-saving cargo. How many of us owe our existence to Harry Truman?
One of the major contributing factors to our lack of perspective, however, has been the "academic" contribution of some researchers at Johns Hopkins. The absurd body counts that these people have touted have been swallowed whole by a good number of Americans. Let me just say this. The authors of this study would not flinch from the assertion that a million Iraqis have died as a result of this war. OK. That’s one thousand, on average, each and every day for a thousand days. Can we picture that?
Pearl Harbor – 2400
Valley Forge – 2000 (six months about 10 a day)
Gettysburg – 8000 (3 days, about 2666 per day)
Normandy – 245,000 (81 days, about 3000 per day)
Battle of the Bulge – 35,000 (41 days, about 900 per day)
These were awesome events, storied in history. Let it be known that civilians died as well, but the fact is that it is hard to kill people. They take effective steps to prevent that outcome, especially civilians. And the dead tend to be counted pretty accurately. People come hunting for them.
I’m going to explain to you why the Hopkins/Lancet study is flawed. I realize this is old news, but I get tired of reading that nobody has challenged the Hopkins study on a statistical basis. There are a number of statisticians who have done so. I have seen some critiques that were telling and concise, but the anti-war people blew them off, just as they accuse others of doing. I suspect that many statisticians didn’t bother – after all, it is a lot of work – because they deemed the study too tentative and troubled to be worth the effort. Some people reject the whole field of statistical epidemiology as naturally prone to hyperbole. I think that goes too far. Good science is possible. Statistics can tell the truth, but such ideals have not been honored by this study.
I recognized at the time that a lot of the people were impressed by the scientific trappings and daunted by the difficult reading. I felt I could contribute in this area, so I wrote up a detailed analysis. Since interest had faded, I didn’t anticipate that a follow-up study would be done based on the same flawed process. I have not, and don’t really plan to look at the new study, but I think my complaints about the original study are still pertinent. I have rewritten them extensively for the sake of clarity.
Description of the Study
The Bloomburg School of Public Health of Johns Hopkins University sponsored research into the effects of the US invasion of Iraq in 2002, treating it as a public health issue for the people of Iraq. How many deaths resulted, directly or indirectly, from the invasion? The study was designed as a retrospective survey of households across Iraq to measure "excess" deaths of the post-invasion over the pre-invasion period. Theoretically, this would mean every death, whether it be from malaria, whooping cough or self-immolation. In practice, this seemed to come down to measuring war fatalities. Respondents were asked to provide names and death certificates for everyone who died and, most importantly, the date of that death. In which period did they die, before or after the invasion.
The term "excess" implies a hypothesis that a specified event induced an increase in the number of deaths beyond a "normal" level. This is the epidemiological bias. How many extra people died after Chernobyl, when considering all conceivable effects? In actuality, though, the "event" is a lot more complicated than the authors let on. Many, many changes and events occurred at roughly the same time, and the conclusions are naturally muddied by that mix. The other obvious concern is that the time preceding the invasion was not normal either.
Specific critiques follow in roughly the order they occur in the Lancet paper:
Sample Construction and Questionnaire
- Cluster Dilution – Variance of the sample is artificially reduced by pairing some provinces to avoid sampling both. Even though there are the same number of clusters, some clusters, coming from the same province, are now more alike than they would have been, and the significance of the result is thus exaggerated.
- Non-Response – Interviewers skipped a lot of houses where no one answered. There is every reason to believe that the households skipped are different than the ones who answered. No attempt was made to characterize the non-responders.
- Subject Bias – Households who answered the door were almost uniformly responders. This implies that they knew what the study was about ahead of time and had already assented when they opened the door. Therefore we have to assume that this was essentially a self-selected sample. It’s no better than walking through a mall interviewing only people who ask to be interviewed.
- Frame Definition –The demographics of the households changed between the two periods more than should be expected. Also, do the interviewers and subjects have the same definition of household? I was listed as living with my parents during the years that I was away at college. Where was my household? Iraq upheavals may have effected the inclusion of young males in particular as members of a household.
- Unbalanced Ignorance – No information on disappearances – only on known deaths. Do we know when the death occurred, or only when the body was found? If only a few were switched from pre-invasion to post, it would seriously distort the results.
- Sample Manipulation – The selection of Fallujah makes me suspicious that the sampling was not purely random. Surely there was no more notorious town in all of Iraq. The dropping of Fallujah makes me even more suspicious. I’m questioning their stated motives on dropping it. I believe that the effect of Fallujah on the analysis was probably to expand variance so greatly that the confidence interval included zero, rendering the result statistically insignificant. Prove me wrong. I have some experience with researchers. The most dangerous tendency in Academia is the drive to attain positive results in order to publish. Combine that with the natural human tendency to hide weaknesses in an argument, and you have a license to argue any point of view whatsoever.
- Omitted Controls – The interviewers wanted to avoid certain areas. That implies that there is a known variable not considered. Relevant distinctions should always be measured. This variable should have been quantified and used as a control variable in the study. Let’s call it "war zone status", which would serve as a useful indicator variable. Areas in war zones should have been oversampled to reduce variance, not undersampled. Maybe violence is only a big issue in violent places. Yes? Non-violent places cannot be expected to contribute significantly to excess deaths. The sampling frame is not uniform! Think about Kurdistan and the parts of southern Iraq that are relatively peaceful. Making the assumption that they will follow the same distribution inflates the projected total.
- Uneven Field – Means and variances can be expected to change from cluster to cluster dramatically, and the model does not control for that except by governorate.
- Source Clumping – The study conflates multiple sources of excess deaths that have different statistical properties. These sources, such as economic/public health vs. military, should be evaluated separately. On one extreme we have a fields of grain analogy. On the other we have a minefield analogy.
- Dark Matter – Did Saddam kill anyone? How many? Why don’t we see a number for that in the results? Was such a question posed in the interview? Do the investigators think that would be a legitimate question to ask? Do they have any way of knowing the answer?
Conduct of the Interview
- Tare – If I assert that the introduction by the interviewer was biasing the results, do the investigators have any way to refute my assertion? In other words, were there any efforts to disguise the perceived purpose or political import of the study? Was there any effort to measure the impact of alternative introduction scripts or question sequencing?
- Differential Time Bias – Interviewers would naturally be afraid to probe if the death were recent. Any such behavioral change that depends on the lapse of time since the death would cause bias.
- Differential Cause Bias – It is also possible that subjects would be more reluctant to report certain types of deaths than other types. Since the types of deaths may have changed between the two periods, the interviewers need to know that they are getting the whole truth. Were measures taken to do so?
- Intimidation Distortions – Were some clusters in areas controlled by AIF at the time of the interview? Would interviewers change the wording depending on who they thought was listening?
Statistical Methodology
- Hyper-Processing – Modern statistical software allows, even encourages, a great variety of data snooping and something analogous to venue shopping. I read nothing to indicate what the investigators were doing to minimize such dangers. It is not sufficient to wear a cloak of professionalism if you’re not going to share with the readers.
- Post Facto Definitions – Background deaths went up. The investigators clumped them in with the total. This increase should have been tested for statistical significance before doing so. If background deaths had gone down would they still have combined them? I’m suspicious that they wouldn’t have.
- Post Facto Scale Conversion – A log-linear scale conversion was imposed on the numbers. This wouldn’t work if any of the results had been decreases. You would have to kluge the conversion to handle negative inputs. So the conversion obviously wasn’t planned for from the beginning.
- Demographic Weighting – Was ethnicity, religion, age and sex of household members recorded? Was this information compared to national statistics? Projections to national population should be weighted to match national demographics, and conversion ratios should be reported.
- Emotional Class Reporting – According to the study, the majority of violent deaths were among women and children. There are several problems with this assertion. 1) The estimate is not separately tested for significance. 2) The number of women killed was very small. Why even mention them? It would make much more sense to combine children and men as classes, since the majority of the sample fell in those two categories. 3) The rate is wrong. The Iraqi Health Ministry estimated a much lower proportion of women and children, less than 10%. You can argue with the Ministry’s absolute numbers, but this peculiar anomaly is most likely due to a small sample size. 4) The definition of "children" does not necessarily preclude its members from combat operations. 5) Children are not categorized by sex. Why not? I suspect it makes a difference. 6) Combining women and children together seems calculated to allow emotional political statements. This misstep, in and of itself, points to motive and brings the whole study into question.
- Mathematical Mistakes – Although it may seem intuitively acceptable, it is nevertheless inappropriate to perform arithmetic on Confidence Intervals in the same fashion that you might use for other numbers. I am sure that the investigators derived some CIs based on the raw numbers, and then converted the result to a ratio of post-invasion to pre-invasion deaths. Even if a confidence statement about increased mortality is correct, the corresponding statement about the ratio increase could easily be incorrect.
- Division by Zero – In particular, many clusters had ZERO deaths by violence in the pre-invasion period. There was only one death by violence during that time for the WHOLE study. (I find it confusing that both Fallujah and the other clusters show this as a missing value on Table 2.) A ratio with zero on the bottom is hard to estimate. The increase would be interpreted as infinite. This study certainly does not measure the ratio for more than one cluster. I’m suspicious, forgive me, that the investigators imputed a single violent death for the earlier period in order to have a calculable ratio, which they may have seen as a conservative procedure. Also, a normally distributed variate divided by another normal distribution theoretically produces a Cauchy distribution. An interesting feature of the Cauchy distribution is that it has infinite variance. Since there was only one cluster, at best, for which a ratio was measurable beforehand, I am certain that the distribution of ratios was not directly determined for violent deaths. The numbers should therefore be reported in raw form and not represented as ratios.
Public Presentation
- Why did theauthors feel it necessary to point out that the sponsors had no impact on the design of the study? Is this customary? The design of a study is a technical issue that can benefit from all viewpoints.
- Johns Hopkins is a highly respected institution around the world, and the Lancet a respected journal. May I point out, however, that JH has witnessed some anti-war activism supported by faculty, and that several of the authors have shown bias in public statements and blog-published correspondence. They commit themselves to claims that exceed even the conclusions of their report. They now argue essentially that the "real" numbers must be even higher because of all the violent places they "missed". Well? Did the study measure those numbers or not?
- There is considerable doublespeak about the meaning of the numbers. We are talking in the report about excess deaths, but that does not necessarily mean civilian deaths. Someone who takes up arms is not a civilian. The authors, in their public statements, seem to want to have it both ways.
- As discussed above, a log-normal conversion was used for some of the data, yet Richard Garfield now seems to be sure that the distribution is normal. The distinction is important because of the implied asymmetry. In a log-normal distribution the mean or expected value is closer to the left, or lower end of a confidence interval. Statements that the bulk of the probability will tend toward the center are therefore incorrect.
- The timing these reports, both the original and updated versions, is indicative of political motivation.
The war in Iraq is a very sad affair. People are dying. War is Hell. Though I disagree with their politics, I have respect for people like Les Roberts and Gilbert Burnham, etc., who have a lot of heart and are trying desperately to do something about these problems. I envy their talents and energy, and I admire their compassion. I myself am doing very little.
That being said, I believe their efforts are counterproductive. For one thing they do a disservice to Science by abusing it for political reasons. I think that I have made a good argument here that the results are tainted by all sorts of goal-directed biases. Scientists are supposed to bend over backwards to assure the validity of their work. If you abuse the tool, it won’t be as useful when you need it again. And I think – and this is my political angle – that they are also doing a disservice to public health in the long run by undermining the strategic goals of the war in Iraq. The world needs peace, but we are never going to get it until Saddam and his ilk are impossible.
My original more extensive discussion can be found here.
http://journals.aol.com/jjmollo/SoundoftheMushroom/entries/2005/04/30/iraq-mortality-study-100000/342
New study?
http://www.thelancet.com/webfiles/images/journals/lancet/s0140673606694919.pdf
Cluster Sampling
http://en.wikipedia.org/wiki/Cluster_sampling
http://en.wikipedia.org/wiki/Log-normal_distribution
Lancet/hopkins justification
http://www.stats.org/stories/the_science_ct_dead_oct17_06.htm
Original Study
http://www.thelancet.com/journals/lancet/article/PIIS0140673604174412/fulltext