Review for "Privacy and anonymity in public sharing of high-dimensional datasets: legal and ethical restrictions"

Completed on 15 Sep 2017 by Krzysztof Jacek Gorgolewski .

Login to endorse this review.


Jolij and colleagues argue in their paper that it is unethical and soon it will be illegal in the EU to publicly share data describing human participants of academic research experiments. Their perspective is deliberately biased to “spark a debate”. The authors strongly urge researchers not to share data.

Comments to author

Author response.

There are several issues with the paper:

• The title and the summary are misleadingly broad and suggest a thorough review of the legal status of data sharing around the world. However, the paper only analyzes data sharing under a new not yet implemented European Union regulation with strong emphasis on the legal system in the Netherlands.

This asseration is incorrect. The GDPR is already in effect, and all EU member states are required to be compliant as per May 2018. As such, our paper is relevant to all researchers in the EU.

• The authors purposefully take the strictest possible interpretation of ethical guidelines. I find this approach of very limited use. For example, the excerpt from the Declaration of Helsinki they quote: “Every precaution must be taken to protect the privacy of research subjects and the confidentiality of their personal information” in its strictest interpretation would make doing any research impossible. If taken literally (which the authors seem to encourage) all human derived data – whether anonymized or not – would have to be stored on encrypted temper proof computers. Passwords would have to be entered in prescreened empty rooms to ensure eavesdropping would not be possible. One could even say that displaying the data in a room with windows is a danger of eavesdropping so such situations should be eliminated – as a precaution. This is obviously impractical, but it shows how strictest possible interpretation can be manipulated into absurdity making any research unethical.

It is interesting to note that the caricature the reviewer sketches here is in fact required to work with, for example, medical and legal records of minors, or with microdata from the Dutch Statistical Bureau. Obviously, context matters with regard to precautions one needs to take. We will elaborate on this in a revision.

• Furthermore, some argue that there is another aspect of the ethics of data sharing – that researchers have the ethical obligation to maximize the contribution of their participants. See Brakewood B, Poldrack RA. The ethics of secondary data analysis: considering the application of Belmont principles to the sharing of neuroimaging data. Neuroimage [Internet]. 2013 Nov 15;82:671–6. Available from:

We agree, and will discuss in a revision.

• I am not a scholar of law to judge if the authors interpretation of ‘General Data Protection Regulation’ is correct. It is, however, unclear if it is also illegal to share data with other researchers within the same institution or institutions outside of the EU. Such analysis would be useful to the reader.

It is perfectly legal to share even non-anonymized data for research purposes under the GDPR, as long as data subjects have given their consent. However, when researchers wish to share data outside the EU, they need to assure data protection standards are comparable to EU standards.

• I might be mistaken, but judging from the affiliations none of the authors is experienced in practicing law. If I am not mistaken, adding a collaborator with a law background would strengthen the paper.

The reviewer is indeed mistaken. Professor Koolhoven works at the Faculty of Law, and is an internationally recognized expert on ICT and privacy law.

• It’s not even clear why the topic of anonymity needs to be discussed since under “strictest possible interpretation” of the rules if one cannot control the purpose of data processing in context of public data sharing and thus making data sharing illegal whether they are properly anonymized.

As a matter of fact, the topic of anonymity is essential. Public sharing of anonymized datasets is perfectly allowed, as under the GDPR such data is no longer defined as 'personal data', and the GDPR no longer applies. However, our main argument is that for the majority of datasets collected by psychologists and cognitive neuroscientists, the level of anonymity required to meet this criterion cannot be met.

It is clear this point has not come across, and we will rephrase our paper accordingly.

• The section on anonymity is a mixed bag. The point of that one can re-identify anyone if equipped with the right information is not very revealing. It is also not clear what is the purpose of the example of the author identifying himself from a public database using information only available to himself. The argument that EEG recordings or fMRI scans greatly increase the chance of re-identification, because of their high dimensionality is mute, because acquiring matching data by a third party would be very hard. A date of birth or a zip code even though includes less information is much more useful for reidentification.

"It is also not clear what is the purpose of the example of the author identifying himself from a public database using information only available to himself."
- This assertion is incorrect. The data used for re-identification are not known to only the author himself, but are available from public sources, as clearly indicated in the manuscript.

"The argument that EEG recordings or fMRI scans greatly increase the chance of re-identification, because of their high dimensionality is mute, because acquiring matching data by a third party would be very hard"
- Whether acquiring EEG or fMRI data is hard is technology-dependent. Already, there are consumer grade EEG headsets on the market which are heavily used in certain online communities. It is conceivable that in the near future, EEG will be more easily obtained than now, and what is anonymous data now, will become identifiable personal data.

• It is not clear if the rulings of the Dutch Council of State are legally binding in all of the EU (I suspect they are not).

Rulings of national courts are taken into account by the European Court of Justice.

• The section about the risk posed by potential re-identification is purely hypothetical and lacks any analysis or example of actual harm that was inflicted due to reidentification of research participants.

Fortunately not. We do not know of a case where actual harm was inflicted, and we hope our paper contributes to keeping it that way.

• The consent form section is also confusing. Why is the claim that participants don’t always read consent forms a problem only in context of data sharing? Does GDPR enforce researchers to do mandatory consent form comprehension checks? Would the type of a consent form done by The Harvard Personal Genome Project make public data sharing legal under GDPR? Would it be ethical? Was Russ Poldrak’s MyConnectome study ethical?

"Why is the claim that participants don’t always read consent forms a problem only in context of data sharing?"
- This is a problem in general, of course. However, where data sharing is concerned, personal data can only be shared after very explicit permission of the data subject, and specific uses need to be mentioned. A boilerplate informed consent may not meet these requirements, depending on the context.

"Does GDPR enforce researchers to do mandatory consent form comprehension checks?"
- No, but we recommend researchers to do so.

"Would the type of a consent form done by The Harvard Personal Genome Project make public data sharing legal under GDPR?"
- Yes, we suppose so.

"Would it be ethical?"
- That's a more difficult question. If one values the individual liberties over societal liberties, there is nothing wrong here, of course. However, if one finds societal liberties more important this is more challenging. Allowing people to openly share such data creates peer pressure, and other individuals may feel restricted in their freedom of choice to not share data.

"Was Russ Poldrak’s MyConnectome study ethical?"
- A similar argument applies here. If one argues that individual liberties prevail over societal liberties, Professor Poldrak is perfectly free to name himself as the data subject in said study. However, this does create peer pressure - an influential professor sharing his own data publicly is a strong signal. Given that we are presently debating what the best way to achieve transparency in science is, and not everyone is convinced public sharing is the best way to do so, this strong signal may have been premature, and may create unreasonable peer pressure on other scientists, and limit them in their academic freedom to decide not to share data publicly.

• The reference cited in support of “anecdotal (…) sharp drop in willingness to participate in experiment of which data may be published openly” is incorrect. There is no such journal as “Belief, Perception, and Cognition Lab”. I did find this piece in Winnower - A reader that is not careful enough might miss the fact that this piece (never peer reviewed) describes the same first author as the reviewed manuscript asking his students if they would participate in a study which data is going to be publicly shared. I have a mixed feeling about using this reference. On one side, I appreciate that the author acknowledged the ad hoc nature of it and lack of scientific merit, but finding those comments required some effort and are not clear in the currently reviewed manuscript.

We will clarify this in the revision.

• Finally, authors failed to reference the following five analyses of GDPR in context of research data:

We thank the author for these helpful suggestions.

Chassang G. The impact of the EU general data protection regulation on scientific research. Ecancermedicalscience [Internet]. 2017 Jan 3;11:709. Available from:

Rumbold JMM, Pierscionek BK. A critique of the regulation of data science in healthcare research in the European Union. BMC Med Ethics [Internet]. 2017 Apr 8;18(1):27. Available from:

Stevens L. The Proposed Data Protection Regulation and Its Potential Impact on Social Sciences Research in the UK. European Data Protection Law Review [Internet]. 2015;1(2):97–112. Available from:

European Society of Radiology (ESR). The new EU General Data Protection Regulation: what the radiologist should know. Insights Imaging [Internet]. 2017 Jun;8(3):295–9. Available from:

Rumbold JMM, Pierscionek B. The Effect of the General Data Protection Regulation on Medical Research. J Med Internet Res [Internet]. 2017 Feb 24;19(2):e47. Available from:

• Big plus for sharing the analysis code (in the future I recommend putting it in Zenodo or similar archive for long term preservation).

Overall the manuscript ends on a recommendation not to share data and statement that it is coincidentally the best thing for one’s scientific career which implicitly suggest that the ethical and legal reasons (and strictest interpretation of guidelines) is merely an excuse not to share data and maintain competitive edge. I am not sure if this was the intention of the authors, but this is how the manuscript reads now. Independent of legal and ethical arguments I am not convinced those are the values we want to foster in science.

We regret our manuscript comes across as such. We are by no means advocating researchers should not share their data - on the contrary. However, we do urge researchers to take care when sharing data publicly, and we wish to emphasize that data sharing is not black-and-white (public versus no sharing at all). There are many ways to share research data that circumvent the problems open data advocates warn for (e.g. unwillingness to share upon request) without making data public.

I really wish this paper was more constructive in its nature and explore how scientists who want to or are required to publicly share human data could use consents forms to inform their participants of the risks. In the past, we have recommended a ready to use text that could be included in consent forms to ethically enable public data sharing: Considering that the new EU law will take effect in May 2018 this is the right time for researchers around EU to start adding such clauses to their consent forms.

We will refer to this in the revision.