Review for "Privacy and anonymity in public sharing of high-dimensional datasets: legal and ethical restrictions"

Completed on 15 Sep 2017 by Krzysztof Jacek Gorgolewski .

Login to endorse this review.


Jolij and colleagues argue in their paper that it is unethical and soon it will be illegal in the EU to publicly share data describing human participants of academic research experiments. Their perspective is deliberately biased to “spark a debate”. The authors strongly urge researchers not to share data.

Comments to author

There are several issues with the paper:

• The title and the summary are misleadingly broad and suggest a thorough review of the legal status of data sharing around the world. However, the paper only analyzes data sharing under a new not yet implemented European Union regulation with strong emphasis on the legal system in the Netherlands.

• The authors purposefully take the strictest possible interpretation of ethical guidelines. I find this approach of very limited use. For example, the excerpt from the Declaration of Helsinki they quote: “Every precaution must be taken to protect the privacy of research subjects and the confidentiality of their personal information” in its strictest interpretation would make doing any research impossible. If taken literally (which the authors seem to encourage) all human derived data – whether anonymized or not – would have to be stored on encrypted temper proof computers. Passwords would have to be entered in prescreened empty rooms to ensure eavesdropping would not be possible. One could even say that displaying the data in a room with windows is a danger of eavesdropping so such situations should be eliminated – as a precaution. This is obviously impractical, but it shows how strictest possible interpretation can be manipulated into absurdity making any research unethical.

• Furthermore, some argue that there is another aspect of the ethics of data sharing – that researchers have the ethical obligation to maximize the contribution of their participants. See Brakewood B, Poldrack RA. The ethics of secondary data analysis: considering the application of Belmont principles to the sharing of neuroimaging data. Neuroimage [Internet]. 2013 Nov 15;82:671–6. Available from:

• I am not a scholar of law to judge if the authors interpretation of ‘General Data Protection Regulation’ is correct. It is, however, unclear if it is also illegal to share data with other researchers within the same institution or institutions outside of the EU. Such analysis would be useful to the reader.

• I might be mistaken, but judging from the affiliations none of the authors is experienced in practicing law. If I am not mistaken, adding a collaborator with a law background would strengthen the paper.

• It’s not even clear why the topic of anonymity needs to be discussed since under “strictest possible interpretation” of the rules if one cannot control the purpose of data processing in context of public data sharing and thus making data sharing illegal whether they are properly anonymized.

• The section on anonymity is a mixed bag. The point of that one can re-identify anyone if equipped with the right information is not very revealing. It is also not clear what is the purpose of the example of the author identifying himself from a public database using information only available to himself. The argument that EEG recordings or fMRI scans greatly increase the chance of re-identification, because of their high dimensionality is mute, because acquiring matching data by a third party would be very hard. A date of birth or a zip code even though includes less information is much more useful for reidentification.

• It is not clear if the rulings of the Dutch Council of State are legally binding in all of the EU (I suspect they are not).

• The section about the risk posed by potential re-identification is purely hypothetical and lacks any analysis or example of actual harm that was inflicted due to reidentification of research participants.

• The consent form section is also confusing. Why is the claim that participants don’t always read consent forms a problem only in context of data sharing? Does GDPR enforce researchers to do mandatory consent form comprehension checks? Would the type of a consent form done by The Harvard Personal Genome Project make public data sharing legal under GDPR? Would it be ethical? Was Russ Poldrak’s MyConnectome study ethical?

• The reference cited in support of “anecdotal (…) sharp drop in willingness to participate in experiment of which data may be published openly” is incorrect. There is no such journal as “Belief, Perception, and Cognition Lab”. I did find this piece in Winnower - A reader that is not careful enough might miss the fact that this piece (never peer reviewed) describes the same first author as the reviewed manuscript asking his students if they would participate in a study which data is going to be publicly shared. I have a mixed feeling about using this reference. On one side, I appreciate that the author acknowledged the ad hoc nature of it and lack of scientific merit, but finding those comments required some effort and are not clear in the currently reviewed manuscript.

• Finally, authors failed to reference the following five analyses of GDPR in context of research data:

Chassang G. The impact of the EU general data protection regulation on scientific research. Ecancermedicalscience [Internet]. 2017 Jan 3;11:709. Available from:

Rumbold JMM, Pierscionek BK. A critique of the regulation of data science in healthcare research in the European Union. BMC Med Ethics [Internet]. 2017 Apr 8;18(1):27. Available from:

Stevens L. The Proposed Data Protection Regulation and Its Potential Impact on Social Sciences Research in the UK. European Data Protection Law Review [Internet]. 2015;1(2):97–112. Available from:

European Society of Radiology (ESR). The new EU General Data Protection Regulation: what the radiologist should know. Insights Imaging [Internet]. 2017 Jun;8(3):295–9. Available from:

Rumbold JMM, Pierscionek B. The Effect of the General Data Protection Regulation on Medical Research. J Med Internet Res [Internet]. 2017 Feb 24;19(2):e47. Available from:

• Big plus for sharing the analysis code (in the future I recommend putting it in Zenodo or similar archive for long term preservation).

Overall the manuscript ends on a recommendation not to share data and statement that it is coincidentally the best thing for one’s scientific career which implicitly suggest that the ethical and legal reasons (and strictest interpretation of guidelines) is merely an excuse not to share data and maintain competitive edge. I am not sure if this was the intention of the authors, but this is how the manuscript reads now. Independent of legal and ethical arguments I am not convinced those are the values we want to foster in science.

I really wish this paper was more constructive in its nature and explore how scientists who want to or are required to publicly share human data could use consents forms to inform their participants of the risks. In the past, we have recommended a ready to use text that could be included in consent forms to ethically enable public data sharing: Considering that the new EU law will take effect in May 2018 this is the right time for researchers around EU to start adding such clauses to their consent forms.