Distance and Parsimony methods (Part 2)

Parsimony-based methods

  • Character-based data: 4 nucleotides ACGT or 20 aminoacids => matrix of aligned characters
  • It does not rely on models of evolution
  • One seeks the tree that minimizes the amount of evolutionary change required to explain the data
  • Justification:
    • Ockham’s razor: when two hypothesis provide equally valid explanations for a phenomenon, the simpler one should always be preferred
    • More character-state changes imply a more complex hypothesis because homoplasy (sharing identical character states that cannot be explained by inheritance from a common ancestor) is an ad hoc hypothesis
  • Parsimony represents a useful fall back method when model-based methods cannot be used due to computational limitations

Assumptions

  • Parsimony methods are most effective when rate of evolution is slow, but this is not a necessary assumption
  • Parsimony methods can perform well under high rates of evolution as long as there are no pathological inequalities (long-branch attraction: Felsenstein zone)
  • The only real assumption of parsimony is independence among characters

Methodology

  1. Determine the amount of character change required to explain the data by a given tree
  2. Search over all possible tree topologies

We need to be able to calculate the length (parsimony score) of a proposed tree which is defined as the amount of character change implied by a most parsimonious reconstruction of internal nodes. Just as in MSA, we need to have costs for substitutions (equal costs or unequal costs).

Example: Evaluate the length of the ((W,Y),(X,Z)); tree given the site:

W:G
X:C
Y:A
Z:C
  • Note that this is only one site! We need to repeat this process for every site and add up the lengths
  • More on Newick (parenthetical) format here and Instagram photo of the restaurant where the Newick format was invented

youtube

Watch the full algorithm in this YouTube video

Just as in MSA, we cannot do this by hand and there are dynamic programming algorithms that help us (what was dynamic programming?):

  • Fitch algorithm (HB Box 8.2) for equal costs
  • Sankoff algorithm (HB Box 8.1) for unequal costs

Fitch algorithm

1) Root the tree in a random place (parsimony score is not affected by the root)

2) Calculate the state-set $X_i$ for each internal node $i$ corresponding the set of states that can be assigned to each node so that the minimum possible length of the subtree can be achieved. Let $L(i)$ and $R(i)$ be the left and right child descendant nodes of $i$ respectively.

2.1) Form the intersection of the two child state sets: $X_{L(i)} \cap X_{R(i)}$

2.2) If the intersection is non-empty, set $X_i$ equal to this intersection and the accumulated length for this node as the sum of the accumulated lengths for the two child nodes: $s_i=s_{L(i)}+s_{R(i)}$

2.3) If the intersection is empty, let $X_i$ be equal to the union of the two child sets: $X_{L(i)} \cup X_{R(i)}$ and set the accumulated length for this node as the sum of the accumulated lengths for the two child nodes plus one: $s_i=s_{L(i)}+s_{R(i)}+1$

Example: Evaluate the length of the ((W,Y),(X,Z)); tree given the site:

W:G
X:C
Y:A
Z:C

using the Fitch algorithm.

youtube

Watch the full algorithm in this YouTube video.

  • Homework: Redo the algorithm with different root positions to verify that you get the same length

Phylogenetic inference: Maximum Parsimony (MP) tree: Step 1) Evaluate the parsimony score of a given tree (length) with Fitch algorithm. Step 2) Search the space of trees until you find the optimum.

** Some downsides:** Parsimony methods have been shown to produce inconsistent trees. Read more in Felsenstein 1978


Copyright Solis-Lemus lab © 2022. Distributed by an MIT license.

This site uses Just the Docs, a documentation theme for Jekyll.