Review for "Why Do Scientists Fabricate And Falsify Data? A Matched-Control Analysis Of Papers Containing Problematic Image Duplications"

Completed on 26 Apr 2017 by Brian Martinson. Sourced from http://biorxiv.org/content/early/2017/04/12/126805.

Login to endorse this review.


Comments to author

Author response is in blue.

Thanks for the opportunity to review and comment on this very interesting research. I've got one suggestion and a question for you. (I offer these in addition to supporting Chris Mebane's suggestions, particularly those with respect to further explication of methodology and more detailed description of the Lee & Schrank hypothesis)

Suggestion: At various points in the paper you use of the term "questionable" or "questionable practices" or "questionable research practices," to refer to the image manipulation falling into your Category 2. However, from the description in the paper of what you classified into that category, that type of thing clearly violates currently accepted standards for
allowed types of image manipulation, whether due to ignorance,
carelessness or malfeasance, meaninging the behaviors themselves are not really questionable at all. So, in keeping with the recently introduced terminology of the NASEM report - Fostering Integrity in Research (disclosure: I was a member of the authoring panel of the report), it would seem more correct to reference these as "detrimental," "detrimental practices," or "detrimental research practices."

Question: It's great to see efforts to empirically test hypotheses derived from a variety of theoretical perspectives, positing potential influences arising at multiple levels ranging from systemic, to local, to intra-individual. At the same time, this raises an interpretation question in particular about the country-level associations observed, but perhaps also about the team-level associations observed. It's not clear to me how the methodology you've employed helps to avoid making the exception fallacy - the error of "exceptional cases" leading to conclusions being reached about the larger groups from which the cases are drawn. In most multi-level analyses this concern is typically addressed through the use of multi-level models where the units of observation are distinguished from the units of analysis and the latter are specified at two, three or sometimes four different "levels" in the context of generalized linear modeling of some sort. It may be arguable whether, or to what extent, such methods help to avoid the exception fallacy, but they do represent an explicit recognition of the issue. Perhaps I missed it, but I don't think you've employed such multi-level modeling techniques here, nor do such techniques appear to have been employed in the 2017 Fanelli et al. PNAS publication? To give just one example of how this might lead to an interpretation problem, if a case arises from, say, a country in which there are institutional level policies about misconduct, one doesn't know whether the individual who engaged in the image manipulation was employed at an institution with or without such a policy. And regardless of whether their institution had such policies in place, one doesn't really know the extent to which the individual was even "exposed" or subject to the influence of the policy's presence or absence. If I'm right, then more caution is warranted in the interpretation of the associations beyond those at the individual-level.



Hi Brian, many thanks for your comments.

Will see hot to rephrase the passage about QRPs, although the bottom line is that we don't actually know what any of these duplications was due to. Honest errors are also likely to be part of it.

If I understand the point about "exceptional cases" it is what I would also call the risk of ecological fallacy: interpreting a correlation across countries as reflecting correlations across institutions or across individuals.
A crucial element to consider in this analysis is that it is a matched control test. So, what the analyses tell us is literally the risk of an individual being in one category or the other.

There are no "exceptional individuals" (which I would understand as being extreme values/outliers) in the sense that, particularly in the case of country-level analyses, we are testing simple, binary categories. One is either in or out, and all we measure is the likelihood that being in one category coincides with being in a given country.
Countries in which some institutions have policies and some have not should, under the hypothesis, say, that misconduct policies reduce the risk, present a mixture of low risk and high risk individuals, whereas countries with national policies should yield uniformly lower risk individuals. Hence, the risk should reasonably be expected to be lower for the latter than the former (under this hypothesis).

The PNAS study about bias actually uses multilevel models, to adjust not for country directly but for field (i.e. meta-analysis). That effectively also generates a matched-control test, because we are comparing the effects (of e.g. working/not working within a specific country) within a given meta-analysis.

Intriguingly, ecological fallacy was a risk in an older study of mine, quite tellingly the only study that I published which seemed to directly support the "pressures to publish" hypothesis. It is unfortunate that that study keeps being cited, ignoring multiple later evidence that, in my opinion, refutes its original interpretation.

All that said, I do of course agree that there are confounding factors, and indeed we did our best to refrain from making causal inferences (and will see if we need to do that more). However, we run multivariable analyses, as well as analyses limited to subgroups of one or more countries, as well as separate tests on individual data. Any conclusion stems from an assessment of these different results, from their agreement with predictions, and from the remarkable consistencies between these results and those of completely different studies.

It would be great to be able to run a matched control analysis comparing institutional effects, but it would be impossible with this sample and study design: too many institutions compared to the number of available cases.
This is perhaps an idea to pursue in future work!