Distribution of short paired duplications in mammalian genomes

Elizabeth E. Thomas, Nathan Srebro, Jonathan Sebat, Nicholas Navin, John Healy, Bud Mishra, Michael Wigler

Research output: Contribution to journalArticlepeer-review


Mammalian genomes are densely populated with long duplicated sequences. In this paper, we demonstrate the existence of doublets, short duplications between 25 and 100 bp, distinct from previously described repeats. Each doublet is a pair of exact matches, separated by some distance. The distribution of these intermatch distances is strikingly nonrandom. An unexpectedly high number of doublets have matches either within 100 bp (adjacent) or at distances tightly concentrated ≈1,000 bp apart (nearby). We focus our study on these proximate doublets. First, they tend to have both matches on the same strand. By comparing nearby doublets shared in human and chimpanzee, we can also see that these doublets seem to arise by an insertion event that produces a copy without markedly affecting the surrounding sequence. Most doublets in humans are shared with chimpanzee, but many new pairs arose after the divergence of the species. Doublets found in human but not chimpanzee are most often composed of almost tandem matches, whereas older doublets (found in both species) are more likely to have matches spaced by ≈1 kb, indicating that the nearly tandem doublets may be more dynamic. The spacing of doublets is highly conserved. So far, we have found clearly recognizable doublets in the following genomes: Homo sapiens, Mus musculus, Arabidopsis thaliana, and Caenorhabditis elegans, indicating that the mechanism generating these doublets is widespread. A mechanism that generates short local duplications while conserving polarity could have a profound impact on the evolution of regulatory and protein-coding sequences.

Original languageEnglish (US)
Pages (from-to)10349-10354
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Issue number28
StatePublished - Jul 13 2004

ASJC Scopus subject areas

  • General


Dive into the research topics of 'Distribution of short paired duplications in mammalian genomes'. Together they form a unique fingerprint.

Cite this