TY - JOUR
T1 - High resolution annotation of zebrafish transcriptome using long-read sequencing
AU - Nudelman, German
AU - Frasca, Antonio
AU - Kent, Brandon
AU - Sadler, Kirsten C.
AU - Sealfon, Stuart C.
AU - Walsh, Martin J.
AU - Zaslavsky, Elena
N1 - Funding Information:
We thank Dr. Robert Sebra for help and support with performing long-read sequencing using the Pacific Biosciences RS instrument. We thank Dr. Side Li for performing the PCR experiments and Christopher Smith for help with figure design. All sequencing services performed were supported through National Institutes of Health (NIH) grants 5R01CA154809 and 5R01HL103967. The development of the analysis pipeline was supported by NIH grant U19 AI117873.
Funding Information:
of Health (NIH) grants 5R01CA154809 and 5R01HL103967. The development of the analysis pipeline was supported by NIH grant U19 AI117873.
Publisher Copyright:
© 2018 Nudelman et al.
PY - 2018/9
Y1 - 2018/9
N2 - With the emergence of zebrafish as an important model organism, a concerted effort has been made to study its transcriptome. This effort is limited, however, by gaps in zebrafish annotation, which are especially pronounced concerning transcripts dynamically expressed during zygotic genome activation (ZGA). To date, short-read sequencing has been the principal technology for zebrafish transcriptome annotation. In part because these sequence reads are too short for assembly methods to resolve the full complexity of the transcriptome, the current annotation is rudimentary. By providing direct observation of full-length transcripts, recently refined long-read sequencing platforms can dramatically improve annotation coverage and accuracy. Here, we leveraged the SMRT platform to study the transcriptome of zebrafish embryos before and after ZGA. Our analysis revealed additional novelty and complexity in the zebrafish transcriptome, identifying 2539 high-confidence novel transcripts that originated from previously unannotated loci and 1835 high-confidence new isoforms in previously annotated genes. We validated these findings using a suite of computational approaches including structural prediction, sequence homology, and functional conservation analyses, as well as by confirmatory transcript quantification with short-read sequencing data. Our analyses provided insight into new homologs and paralogs of functionally important proteins and noncoding RNAs, isoform switching occurrences, and different classes of novel splicing events. Several novel isoforms representing distinct splicing events were validated through PCR experiments, including the discovery and validation of a novel 8-kb transcript spanning multiple mir-430 elements, an important driver of early development. Our study provides a significantly improved zebrafish transcriptome annotation resource.
AB - With the emergence of zebrafish as an important model organism, a concerted effort has been made to study its transcriptome. This effort is limited, however, by gaps in zebrafish annotation, which are especially pronounced concerning transcripts dynamically expressed during zygotic genome activation (ZGA). To date, short-read sequencing has been the principal technology for zebrafish transcriptome annotation. In part because these sequence reads are too short for assembly methods to resolve the full complexity of the transcriptome, the current annotation is rudimentary. By providing direct observation of full-length transcripts, recently refined long-read sequencing platforms can dramatically improve annotation coverage and accuracy. Here, we leveraged the SMRT platform to study the transcriptome of zebrafish embryos before and after ZGA. Our analysis revealed additional novelty and complexity in the zebrafish transcriptome, identifying 2539 high-confidence novel transcripts that originated from previously unannotated loci and 1835 high-confidence new isoforms in previously annotated genes. We validated these findings using a suite of computational approaches including structural prediction, sequence homology, and functional conservation analyses, as well as by confirmatory transcript quantification with short-read sequencing data. Our analyses provided insight into new homologs and paralogs of functionally important proteins and noncoding RNAs, isoform switching occurrences, and different classes of novel splicing events. Several novel isoforms representing distinct splicing events were validated through PCR experiments, including the discovery and validation of a novel 8-kb transcript spanning multiple mir-430 elements, an important driver of early development. Our study provides a significantly improved zebrafish transcriptome annotation resource.
UR - http://www.scopus.com/inward/record.url?scp=85052756186&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052756186&partnerID=8YFLogxK
U2 - 10.1101/gr.223586.117
DO - 10.1101/gr.223586.117
M3 - Article
C2 - 30061115
AN - SCOPUS:85052756186
SN - 1088-9051
VL - 28
SP - 1415
EP - 1425
JO - Genome Research
JF - Genome Research
IS - 9
ER -