TY - JOUR
T1 - Correlated substitutions reveal SARS-like coronaviruses recombine frequently with a diverse set of structured gene pools
AU - Steinberg, Asher Preska
AU - Silander, Olin K.
AU - Kussell, Edo
N1 - Funding Information:
ACKNOWLEDGMENTS. This work was supported by NIH grant R01-GM-097356
Funding Information:
(to E.K.) and grant 20/1041 from the Health Research Council of New Zealand (to O.K.S). Asher Preska Steinberg is a Simons Foundation Awardee of the Life Sciences Research Foundation. We gratefully acknowledge the New York University (NYU) high-performance computing cluster for resources, and its staff for technical support.
Funding Information:
This work was supported by NIH grant R01-GM-097356 (to E.K.) and grant 20/1041 from the Health Research Council of New Zealand (to O.K.S). Asher Preska Steinberg is a Simons Foundation Awardee of the Life Sciences Research Foundation. We gratefully acknowledge the New York University (NYU) high-performance computing cluster for resources, and its staff for technical support.
Publisher Copyright:
Copyright © 2023 the Author(s). Published by PNAS.
PY - 2023/1/31
Y1 - 2023/1/31
N2 - Quantifying SARS-like coronavirus (SL-CoV) evolution is critical to understanding the origins of SARS-CoV-2 and the molecular processes that could underlie future epidemic viruses. While genomic analyses suggest recombination was a factor in the emergence of SARS-CoV-2, few studies have quantified recombination rates among SL-CoVs. Here, we infer recombination rates of SL-CoVs from correlated substitutions in sequencing data using a coalescent model with recombination. Our computationally-efficient, non-phylogenetic method infers recombination parameters of both sampled sequences and the unsampled gene pools with which they recombine. We apply this approach to infer recombination parameters for a range of positive-sense RNA viruses. We then analyze a set of 191 SL-CoV sequences (including SARSCoV-2) and find that ORF1ab and S genes frequently undergo recombination. We identify which SL-CoV sequence clusters have recombined with shared gene pools, and show that these pools have distinct structures and high recombination rates, with multiple recombination events occurring per synonymous substitution. We find that individual genes have recombined with different viral reservoirs. By decoupling contributions from mutation and recombination, we recover the phylogeny of non-recombined portions for many of these SL-CoVs, including the position of SARSCoV-2 in this clonal phylogeny. Lastly, by analyzing >400,000 SARS-CoV-2 whole genome sequences, we show current diversity levels are insufficient to infer the within-population recombination rate of the virus since the pandemic began. Our work offers new methods for inferring recombination rates in RNA viruses with implications for understanding recombination in SARS-CoV-2 evolution and the structure of clonal relationships and gene pools shaping its origins.
AB - Quantifying SARS-like coronavirus (SL-CoV) evolution is critical to understanding the origins of SARS-CoV-2 and the molecular processes that could underlie future epidemic viruses. While genomic analyses suggest recombination was a factor in the emergence of SARS-CoV-2, few studies have quantified recombination rates among SL-CoVs. Here, we infer recombination rates of SL-CoVs from correlated substitutions in sequencing data using a coalescent model with recombination. Our computationally-efficient, non-phylogenetic method infers recombination parameters of both sampled sequences and the unsampled gene pools with which they recombine. We apply this approach to infer recombination parameters for a range of positive-sense RNA viruses. We then analyze a set of 191 SL-CoV sequences (including SARSCoV-2) and find that ORF1ab and S genes frequently undergo recombination. We identify which SL-CoV sequence clusters have recombined with shared gene pools, and show that these pools have distinct structures and high recombination rates, with multiple recombination events occurring per synonymous substitution. We find that individual genes have recombined with different viral reservoirs. By decoupling contributions from mutation and recombination, we recover the phylogeny of non-recombined portions for many of these SL-CoVs, including the position of SARSCoV-2 in this clonal phylogeny. Lastly, by analyzing >400,000 SARS-CoV-2 whole genome sequences, we show current diversity levels are insufficient to infer the within-population recombination rate of the virus since the pandemic began. Our work offers new methods for inferring recombination rates in RNA viruses with implications for understanding recombination in SARS-CoV-2 evolution and the structure of clonal relationships and gene pools shaping its origins.
KW - RNA viruses
KW - SARS-CoV-2
KW - coronavirus
KW - phylogeny
KW - recombination
UR - http://www.scopus.com/inward/record.url?scp=85147048098&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147048098&partnerID=8YFLogxK
U2 - 10.1073/pnas.2206945119
DO - 10.1073/pnas.2206945119
M3 - Article
C2 - 36693089
AN - SCOPUS:85147048098
SN - 0027-8424
VL - 120
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 5
M1 - e2206945119
ER -