This is great work and I'm eager to try this method after my holidays. It is essential that the community explores the possibilities of improving on existing tools and pushes the field forward and this is a nice contribution.
I think the preprint is well written and I have a few minor comments/suggestions:
- "Here, we report the first deep learning model - Chiron - that can directly translate the raw signal to DNA sequence, without the error-prone segmentation step."
If I'm not mistaken Albacore is also moving away from event detection/segmentation, see also https://community.nanoporet... The same argument returns later in your manuscript. Now you are probably correct, but if you consider submitting this to a journal and going through peer review I think Albacore might be using raw data by then. The plans to move away from segmentation have been around for a while, but Chiron is still the first to implement it in practice.
- "The device then uses the signal to determine the nucleotide sequence of the DNA strand"
=> It is a minor detail, but basecalling is not performed on the device itself.
Comparison with existing basecallers:
I read on twitter that you are also retraining on human data. It is apparent from Table 1 that all tools perform worse on human data, so I think this is definitely an application in which improvements are very relevant and will likely make a big impact. Perhaps you can comment on why the basecallers perform less on the human data? The accuracy of Chiron is impressive given the fairly limited training dataset you employed for this analysis.
Areas left undiscussed are nucleotide modifications and basecalling of direct RNA, which would be worth exploring I guess and potentially have an important impact.
-Albacore is considered the ’gold standard’ in terms of accuracy, but as it is not open source, we cannot comment on **it’s** implementation.
=> its implementation