Alignment methods (MUSCLE, MAFFT)
Learning objectives
At the end of today’s session, you
- will learn to use different software options: MAFFT and MUSCLE
Pre-class work
Read the paper corresponding to your canvas group: MUSCLE, MAFFT
Stop and check
First, let’s take a look at data from last year’s projects here.
Iterative refinement: MUSCLE and MAFFT

Note that you can easily install the two software with BioConda again:
conda install -c bioconda muscle
conda install -c bioconda mafft
In-class activity: MUSCLE and MAFFT
In-groups time: 30 minutes
Google slides:
Instructions: Work individually or in teams (people that read the same paper as you). In the slides:
- Write down one claim that the authors make about their method (e.g. accuracy, speed, robustness, scalability)
- Run the software on the
primatesAA.fastadata and write down the command you ran in the slides - Write a paragraph suitable for a paper Methods section that: a. Describes what you did b. Expliclty connects your choices to the claims described in the paper c. Makes clear what assumptions you are relying on d. Note that you may not say ‘default parameters’ without explaining what that implies in terms of the method
Whole group activity: 20 minutes
One person per method will present their work to the class.
Comparing all three software
Let’s open all three in AlignmentViewer:
primatesAA-aligned.fasta ## clustalw
primatesAA-aligned-muscle.fasta ## muscle
primatesAA-aligned-mafft.fasta ## mafft
Looking at all three alignments, think of:
- One region where all three agree
- One region where at least two disagree
- One region you would be uncomfortable using for phylogenetic inference
- Which method tends to introduce: More gaps? Longer indels? Better conservation?
Homework: Check out the alignment HW here.
Final MSA insights
- No perfect method
- No automatic method
- All methods require manual work of comparing results from different alignment parameters and from different sofware
- Take notes of the choices you made and keep track of all comparisons to justify final choice
- We probably don’t spend as much time as we should on the alignment step of the phylogenomic pipeline
- We want a blackbox that does not exist yet!
Other algorithms
Genetic algorithm
- SAGA uses the WSP objective function but uses genetic algorithms instead of dynamic programming (individual=alignment)
- very accurate
- not scalable
Hidden markov model
Simultaneous estimation tree/alignment
Which program to choose?
- Not a clear answer
- Scalability vs accuracy
- HW reading: Alignathon
- Strategy:
- Run multiple programs and parameter choices
- Filtering is more important than the specific program used (more on filtering later)
- Read program papers and documentation carefully
- Take good note of choices and keep track of all comparisons to justify final choice
Learn more!
- Read Chapter 3 of the Phylogenetic Handbook (HB)
- Read Sections 9.1-9.5, 9.11, 9.12, 9.13 of Computational phylogenetics (Warnow)
- Read HAL 2.3 on a new alignment method
MACSE