The fragments are finally re-aligned based on overlaps in their sequences. To make the job challenging, only about 1 kb at the end of each fragment is actually sequenced. The computer has to figure out where each fragment fits, based on the sequences of these ends (known as “reads”) along with information about the fragment size.
JGI uses a program called Phrap, which was developed by Phil Green from the University of Washington. It is a program for finding the best-fit alignment for small genomes. A similar program called JAZZ was developed here at JGI by Dan Rohksar and is used for larger genomes.
Reads are aligned to form a “consensus read.” Phrap scores for each base in the consensus are assigned by using
- Phred scores for bases at that position
- sequencing chemistry
- read orientation
- depth of read