LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA

  1. Michael Brudno1,
  2. Chuong B. Do1,
  3. Gregory M. Cooper2,
  4. Michael F. Kim1,
  5. Eugene Davydov1,
  6. NISC Comparative Sequencing Program1,
  7. Eric D. Green3,
  8. Arend Sidow2, and
  9. Serafim Batzoglou1,4
  1. 1Department of Computer Science, Stanford University, Stanford, California 94305-9010, USA; 2Department of Pathology and Department of Genetics, Stanford University, Stanford, California 94305-5324, USA; 3Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA

Abstract

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an ∼1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http://lagan.stanford.edu.

Footnotes

  • 4 Corresponding author.

  • E-MAIL serafim{at}cs.stanford.edu; FAX (650) 725-1449.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.926603. Article published online before print in March 2003.

    • Received October 23, 2002.
    • Accepted December 11, 2002.
| Table of Contents

Preprint Server