Completed on 28 Sep 2018 by Krzysztof Jacek Gorgolewski .
Login to endorse this review.
The manuscript “BOLD5000: A public fMRI dataset of 5000 images” describes one of the most exciting fMRI datasets published in the recent years. It includes data from 4 participants each viewing (and reacting to) ~5k different images. The stimuli were selected very carefully from established collections of images used in computer vision research providing a bridge between neuroscience and machine learning. I am confident that we will see a lot of fascinating reuses of this dataset in the upcoming years.
- Tweaking the title of the manuscript might be worth thinking about. The problem is that the word “image” could refer to an MR scan or a photograph used as a stimulus. Perhaps “BOLD5000: a public dataset of human brain activation during viewing of 5000 images”.
- Figures 2, 3, and four would benefit from using normalized histograms with unified bin sizes displaying both compared distributions on the same set of axes (see https://cdn-images-1.medium.com/max/1200/1*NyGPyuSF9enQDCJGYGiI9A.png for example)
- Plotting the estimated ROIs on top of the anatomy of each participant would also benefit the manuscript.
- Figure 5: Adding plotting distribution of framewise displacement would help to asses your readers how much motion to expect in the data.
- Figure 5: It is unclear why a different number of sessions would justify plotting data from one of the participants on a different set of axes
- Figure 5: please make the labels for outliers larger (so they could be readable) and make them correspond to participant, session, and run labels.
- Figure 6: it would be good to add a plot from an ROI where you do not expect a response - as a sanity check - for example, the motor cortex.
- Figure 7: bar plots should be replaced with a visualization that depicts the spread of each distribution (whisker plots or violin plots)
- Group level reports are missing from MRIQC results which makes it hard to diagnose the outliers on QC metrics.
- The BIDS version of the dataset deposited on OpenNeuro.org is missing some data which makes future automatic processing harder. Mainly:
* Missing participants.tsv file with demographic data (Section 8.9 of the BIDS Spec)
* Missing _sessions.tsv file with post-session questionnaire answers (Section 9.1 of the BIDS Spec)
* _events.json data dictionaries do not include a description of column names (section 4.2 of the spec)
* Acquisition datetimes (“Begin” in _events.json) should be anonymized and moved to _scans.tsv files (Section 8.8 of the BIDS Spec)
* Missing stimuli files (cropped images displayed to users) and stim_file columns in the _events.tsv files (Section 8.5 of the BIDS Spec)
* Lack of data dictionaries (_events.json) for localizer events files (Section 4.2 of the BIDS Spec)
* Lack of physiological data (Section 8.6 of the BIDS Spec)