Preprint reviews by Krzysztof Jacek Gorgolewski

Using experimental data as a voucher for study pre-registration

Matan Mazor, Noam Mazor, Roy Mukamel

Review posted on 16th December 2017

Mazor and colleagues in their manuscript titled “Using experimental data as a voucher for study pre-registration” propose a solution that potentially prevents bad actors from pretending that they preregistered the analysis protocol prior to data analysis. This is an interesting approach to an increasingly important problem. Some technical issue might prevent this solution from being practically applicable.


- The proposed solution only works if the authors share raw data (which would be great, but sadly is not common).
- The verification process requires reviewers to reanalyze the data which seems like an unrealistic expectation.
- Differences between processing pipelines used by the authors and the reviewers could result in slightly different results (see Carp et. al. 2012) and raise false concerns about changes to the preregistered protocol. This could be exploited further by the randomization scheme that has a very limited set of orders resulting in very similar results.
- “Bob then uses the Python script that he found in the protocol folder to generate a pseudorandom sequence of experimental events, based on the resulting protocol-sum.” Isn’t the fact that the code to translate the checksum to random order provided by the authors? What if it always gives the same answer? Am I missing some detail?
- A more sophisticated attack would involve modifying already acquired data to temporary rearrange so it would comply with a protocol defined post hoc. This would, however, require a highly motivated bad actor.
- RPR approaches do not necessarily provide time locking. One could imagine a situation, when a bad actor collects data, picks analysis protocol post hoc, submits to first stage of registered report pretending they did not acquire any data yet. This way they could game the system, but only assuming reviewers will not require changes in acquisition protocol.

show less


AFQ-Browser: Supporting reproducible human neuroscience research through browser-based visualization tools

Jason D. Yeatman, Adam Richie-Halford, Josh K. Smith, Ariel Rokem

Review posted on 09th October 2017

The paper entitled “AFQ-­‐Browser: Supporting reproducible human neuroscience research through browser-­‐based visualization tools” is a beautifully written description of a software tool that takes outputs a specific of a specific diffusion MRI analysis method (AFQ) and creates interactive visualizations that make data exploration easy. The tool implements some truly innovative ideas such as piggy backing on GitHub as a service for hosting data and visualizations and representation of data in a form that is appealing to data scientists with no prior MR experience. I hope that other tools will emulate those features. The manuscript also includes thoughtful discussion of exploratory vs hypothesis driven methods.


- The abstract gives the reader the wrong impression that the AFQ-Browser tool is more generic than it really is. It should be clarified that the tool only allows users to visualize and share outputs of AFQ analyses.
- When describing BrainBrowser and its involvement in MACACC dataset surely you meant “visualization” not “analysis”.
- It might be worth to introduce the publication feature earlier in the paper. I was quite confused when reading about reproducibility and data sharing without knowing that AFQ-Browser is not just a visualization tool.
- Please mention in the paper the license under which the tool is distributed and any pending or obtained patents that would limit its use or redistribution.
- If all AFQ users start uploading their results to GitHub using AFQ-Browser it might be hard to find or aggregate those results. It might be worth considering (and discussing) a centralized index (also hosted on GitHub) of all publicly available AFQ-Browser generated bundles. This index can be automatically updated during the “publish” procedure.
- GitHub is a great resource, but have few guarantees in terms of long term storage. A solution to this would be depositing the bundles into Zenodo which could be done directly from GitHub. Would be worth implementing and/or discussing this in the manuscript.
- It’s a technical detail, but it took me a little time to figure out why the tool requires user to spin up a local server (presumably to be able to access CSV and JSON files). Might be worth elaborating.
- Saving the visualization “view” (or “browser state”) seems cumbersome when done via a file. Could the view be encoded in the URL (via GET parameters)? Sharing of such views would be much easier and natural.
- Some example analyses include information about group membership or demographic information such as age. How is such information stored and conveyed to AFQ-Browser? Does it also come as output of AFQ?
- In the manuscript you mention that AFQ-Browser allows users to compare their results with normative distributions. Where are they coming from a central repository (please describe how it is populated) or do users need to provide such distributions themselves?
- It might be worth considering a crowdsourcing scheme such as the one employed in MRIQC Web API (https://mriqc.nimh.nih.gov/) to generate normative distributions of AFQ outputs.
- Is the way you store data in CSV files and their relation to the JSON files (beyond the “tidy” convention) described somewhere in detail? It would be useful for users.
- Please describe the software testing approach you employed in this project.

show less


Porcupine: a visual pipeline tool for neuroimaging analysis

Tim van Mourik, Lukas Snoek, Tomas Knapen, David Norris

Review posted on 27th September 2017

Porcupine by van Mourik et al. is extensible cross platform desktop application that allow users to quickly design neuroimaging data workflows via a graphical user interface. Lack of graphical user interface has been a deeply needed feature for Nipype and Porcupine fills this gap.


Porcupine is designed in a very smart and flexible way allowing it to be extended to new code generation backends. Furthermore, since the output is the source code of the pipeline the processing can be customized via editing the code. Reproducibility of the produced pipelines is increased, by the generation of Dockerfiles.

It’s hard to understate this contribution since Porcupine since it will expose a large community of researchers that prefer graphical interfaces to reproducible neuroimaging pipelines.

Minor comments:
- The manuscript at some point mentions saving MATLAB code, but I don’t believe such plugin exists yet.
- It might be worth mentioning NIAK as potential output plugin.
- In context of computational clusters it might be worth clarifying that Docker images can be run via singularity.
- “Nypipe” -> “Nipype”
- It’s unclear why the user is required to make modifications to the output Dockerfile – it seems that it should be possible to generate a complete Dockerfile without a need for any modifications.
- “It should be noted that Porcupine is not meant for low-level functionality, such as file handling and direct data operations.” What does that mean? Could you give an example?
- In context of graphical workflow systems: did you mean JIST instead of CBS Tools?
- “providing a direct way of creating this is high on the feature list of Porcupine” –> “planned features list”?
- The license under which Porcupine is distributed is not listed in the manuscript

show less

See response


Privacy and anonymity in public sharing of high-dimensional datasets: legal and ethical restrictions

Jacob Jolij, Els Van Maeckelberghe, Rosalie Koolhoven, and Monicque Lorist

Review posted on 15th September 2017

Jolij and colleagues argue in their paper that it is unethical and soon it will be illegal in the EU to publicly share data describing human participants of academic research experiments. Their perspective is deliberately biased to “spark a debate”. The authors strongly urge researchers not to share data.


There are several issues with the paper:

• The title and the summary are misleadingly broad and suggest a thorough review of the legal status of data sharing around the world. However, the paper only analyzes data sharing under a new not yet implemented European Union regulation with strong emphasis on the legal system in the Netherlands.

• The authors purposefully take the strictest possible interpretation of ethical guidelines. I find this approach of very limited use. For example, the excerpt from the Declaration of Helsinki they quote: “Every precaution must be taken to protect the privacy of research subjects and the confidentiality of their personal information” in its strictest interpretation would make doing any research impossible. If taken literally (which the authors seem to encourage) all human derived data – whether anonymized or not – would have to be stored on encrypted temper proof computers. Passwords would have to be entered in prescreened empty rooms to ensure eavesdropping would not be possible. One could even say that displaying the data in a room with windows is a danger of eavesdropping so such situations should be eliminated – as a precaution. This is obviously impractical, but it shows how strictest possible interpretation can be manipulated into absurdity making any research unethical.

• Furthermore, some argue that there is another aspect of the ethics of data sharing – that researchers have the ethical obligation to maximize the contribution of their participants. See Brakewood B, Poldrack RA. The ethics of secondary data analysis: considering the application of Belmont principles to the sharing of neuroimaging data. Neuroimage [Internet]. 2013 Nov 15;82:671–6. Available from: http://dx.doi.org/10.1016/j.neuroimage.2013.02.040

• I am not a scholar of law to judge if the authors interpretation of ‘General Data Protection Regulation’ is correct. It is, however, unclear if it is also illegal to share data with other researchers within the same institution or institutions outside of the EU. Such analysis would be useful to the reader.

• I might be mistaken, but judging from the affiliations none of the authors is experienced in practicing law. If I am not mistaken, adding a collaborator with a law background would strengthen the paper.


• It’s not even clear why the topic of anonymity needs to be discussed since under “strictest possible interpretation” of the rules if one cannot control the purpose of data processing in context of public data sharing and thus making data sharing illegal whether they are properly anonymized.

• The section on anonymity is a mixed bag. The point of that one can re-identify anyone if equipped with the right information is not very revealing. It is also not clear what is the purpose of the example of the author identifying himself from a public database using information only available to himself. The argument that EEG recordings or fMRI scans greatly increase the chance of re-identification, because of their high dimensionality is mute, because acquiring matching data by a third party would be very hard. A date of birth or a zip code even though includes less information is much more useful for reidentification.

• It is not clear if the rulings of the Dutch Council of State are legally binding in all of the EU (I suspect they are not).

• The section about the risk posed by potential re-identification is purely hypothetical and lacks any analysis or example of actual harm that was inflicted due to reidentification of research participants.

• The consent form section is also confusing. Why is the claim that participants don’t always read consent forms a problem only in context of data sharing? Does GDPR enforce researchers to do mandatory consent form comprehension checks? Would the type of a consent form done by The Harvard Personal Genome Project make public data sharing legal under GDPR? Would it be ethical? Was Russ Poldrak’s MyConnectome study ethical?

• The reference cited in support of “anecdotal (…) sharp drop in willingness to participate in experiment of which data may be published openly” is incorrect. There is no such journal as “Belief, Perception, and Cognition Lab”. I did find this piece in Winnower - https://thewinnower.com/papers/the-open-data-pitfall-ii-now-with-data A reader that is not careful enough might miss the fact that this piece (never peer reviewed) describes the same first author as the reviewed manuscript asking his students if they would participate in a study which data is going to be publicly shared. I have a mixed feeling about using this reference. On one side, I appreciate that the author acknowledged the ad hoc nature of it and lack of scientific merit, but finding those comments required some effort and are not clear in the currently reviewed manuscript.

• Finally, authors failed to reference the following five analyses of GDPR in context of research data:
Chassang G. The impact of the EU general data protection regulation on scientific research. Ecancermedicalscience [Internet]. 2017 Jan 3;11:709. Available from: http://dx.doi.org/10.3332/ecancer.2017.709

Rumbold JMM, Pierscionek BK. A critique of the regulation of data science in healthcare research in the European Union. BMC Med Ethics [Internet]. 2017 Apr 8;18(1):27. Available from: http://dx.doi.org/10.1186/s12910-017-0184-y

Stevens L. The Proposed Data Protection Regulation and Its Potential Impact on Social Sciences Research in the UK. European Data Protection Law Review [Internet]. 2015;1(2):97–112. Available from: http://edpl.lexxion.eu/article/EDPL/2015/2/4

European Society of Radiology (ESR). The new EU General Data Protection Regulation: what the radiologist should know. Insights Imaging [Internet]. 2017 Jun;8(3):295–9. Available from: http://dx.doi.org/10.1007/s13244-017-0552-7

Rumbold JMM, Pierscionek B. The Effect of the General Data Protection Regulation on Medical Research. J Med Internet Res [Internet]. 2017 Feb 24;19(2):e47. Available from: http://dx.doi.org/10.2196/jmir.7108
• Big plus for sharing the analysis code (in the future I recommend putting it in Zenodo or similar archive for long term preservation).

Overall the manuscript ends on a recommendation not to share data and statement that it is coincidentally the best thing for one’s scientific career which implicitly suggest that the ethical and legal reasons (and strictest interpretation of guidelines) is merely an excuse not to share data and maintain competitive edge. I am not sure if this was the intention of the authors, but this is how the manuscript reads now. Independent of legal and ethical arguments I am not convinced those are the values we want to foster in science.

I really wish this paper was more constructive in its nature and explore how scientists who want to or are required to publicly share human data could use consents forms to inform their participants of the risks. In the past, we have recommended a ready to use text that could be included in consent forms to ethically enable public data sharing: http://open-brain-consent.readthedocs.io/en/latest/ultimate.html. Considering that the new EU law will take effect in May 2018 this is the right time for researchers around EU to start adding such clauses to their consent forms.

show less

See response


Structural and functional MRI from a cross-sectional Southwest University Adult lifespan Dataset (SALD)

Dongtao Wei, Kaixiang Zhuang, Qunlin Chen, Wenjing Yang, Wei Liu, Kangcheng Wang, Jiang-Zhou Sun, and Jiang Qiu

Review posted on 25th August 2017

Only a small percentage of neuroimaging data is being shared openly. The number of datasets expanding beyond caucasian white population and spanning wide range of ages is even smaller. Therefore this dataset is a valuable contribution to the field and merits a publication conditional on certain improvements.


Major:
- I strongly recommend distributing the dataset in the Brain Imaging Data Structure (http://bids.neuroimaging.io) format instead of the current custom file organization. This will greatly increase the ease of reuse and validation of the dataset.

- The dataset should be validated using bids-validator (https://github.com/INCF/bids-validator) to check for missing scans and consistency of scanning parameters across all subject.

- It is not clear if the anatomical and resting state scans were acquired during one or two separate sessions.

- "Image acquisitions" section mentions task data, but no other details are provided and files are missing. Is task data suppose to be part of this release?

- Context of the resting state scan should be explained - was it performed after or before a particular task?

- Please share the code/scripts/config files used to perform the analyses

Minor:
- No "known issues" are reported in the paper. Is it really try that in such a large sample there were no scans that caused your concern?
- DPARSF is misspelled as DPARF
- Please provide which version of DPARSF was used
- The "Sex" column in the demographics Excel file appears twice
- I would advice against using Jet colormap in Figure 5 since it's perceptually inaccurate https://www.youtube.com/watch?v=xAoljeRJ3lU
- Labels on the axes of figures 2 and 3 are unreadable

Looking forward to reviewing a revised version of this paper.

show less