Genome Biol 9(7) , R113 ( 2008)
BACKGROUND: Cichlid fish from East Africa are remarkable for phenotypic and behavioral diversity on a backdrop of genomic similarity. In 2006, the Joint Genome Institute completed low coverage survey sequencing of the genomes of five phenotypically and ecologically diverse Lake Malawi species. We report a computational and comparative analysis of these data that provides insight into the mechanisms that make closely related species different from one another. RESULTS: We produced assemblies for the five species ranging in aggregate length from 68 to 79 megabase pairs, identified putative orthologs for more than 12,000 human genes, and predicted more than 32,000 cross-species single nucleotide polymorphisms (SNPs). Nucleotide diversity was lower than that found among laboratory strains of the zebrafish. We collected around 36,000 genotypes to validate a subset of SNPs within and among populations and across multiple individuals of about 75 Lake Malawi species. Notably, there were no fixed differences observed between focal species nor between major lineages. Roughly 3% to 5% of loci surveyed are statistical outliers for genetic differentiation (FST) within species, between species, and between major lineages. Outliers for FST are candidate genes that may have experienced a history of natural selection in the Malawi lineage. CONCLUSION: We present a novel genome sequencing strategy, which is useful when evolutionary diversity is the question of interest. Lake Malawi cichlids are phenotypically and behaviorally diverse, but they appear genetically like a subdivided population. The unique structure of Lake Malawl cichlid genomes should facilitate conceptually new experiments, employing SNPs to identity genotype-phenotype association, using the entire species flock as a mapping panel.