Alignment methods (MUSCLE, MAFFT)

Learning objectives

At the end of today’s session, you

  • will learn to use different software options: MAFFT and MUSCLE

Pre-class work

Read the paper corresponding to your canvas group: MUSCLE, MAFFT

Stop and check

First, let’s take a look at data from last year’s projects here.

Iterative refinement: MUSCLE and MAFFT

Edgar, 2004, MUSCLE

Katoh et al, 2002, MAFFT

Note that you can easily install the two software with BioConda again:

conda install -c bioconda muscle
conda install -c bioconda mafft

In-class activity: MUSCLE and MAFFT

In-groups time: 30 minutes

Google slides:

Instructions: Work individually or in teams (people that read the same paper as you). In the slides:

  1. Write down one claim that the authors make about their method (e.g. accuracy, speed, robustness, scalability)
  2. Run the software on the primatesAA.fasta data and write down the command you ran in the slides
  3. Write a paragraph suitable for a paper Methods section that: a. Describes what you did b. Expliclty connects your choices to the claims described in the paper c. Makes clear what assumptions you are relying on d. Note that you may not say ‘default parameters’ without explaining what that implies in terms of the method

Whole group activity: 20 minutes

One person per method will present their work to the class.

Comparing all three software

Let’s open all three in AlignmentViewer:

primatesAA-aligned.fasta        ## clustalw
primatesAA-aligned-muscle.fasta ## muscle
primatesAA-aligned-mafft.fasta  ## mafft

Looking at all three alignments, think of:

  • One region where all three agree
  • One region where at least two disagree
  • One region you would be uncomfortable using for phylogenetic inference
  • Which method tends to introduce: More gaps? Longer indels? Better conservation?

Homework: Check out the alignment HW here.

Final MSA insights

  • No perfect method
  • No automatic method
    • All methods require manual work of comparing results from different alignment parameters and from different sofware
  • Take notes of the choices you made and keep track of all comparisons to justify final choice
  • We probably don’t spend as much time as we should on the alignment step of the phylogenomic pipeline
    • We want a blackbox that does not exist yet!

Other algorithms

Genetic algorithm

  • SAGA uses the WSP objective function but uses genetic algorithms instead of dynamic programming (individual=alignment)
  • very accurate
  • not scalable

Hidden markov model

Simultaneous estimation tree/alignment

Which program to choose?

  • Not a clear answer
  • Scalability vs accuracy
  • HW reading: Alignathon
  • Strategy:
    • Run multiple programs and parameter choices
    • Filtering is more important than the specific program used (more on filtering later)
    • Read program papers and documentation carefully
    • Take good note of choices and keep track of all comparisons to justify final choice

Learn more!

  • Read Chapter 3 of the Phylogenetic Handbook (HB)
  • Read Sections 9.1-9.5, 9.11, 9.12, 9.13 of Computational phylogenetics (Warnow)
  • Read HAL 2.3 on a new alignment method MACSE

Copyright Solis-Lemus lab © 2022. Distributed by an MIT license.

This site uses Just the Docs, a documentation theme for Jekyll.