Bayesian Inference (MrBayes)
Previous class check-up
- We studied Bayesian inference and focused on the main contributors to good or poor performance: priors, convergence and burnin.
Learning objectives
At the end of today’s session, you will be able to
- explain in details the MrBayes inference methods
- use MrBayes software
Pre-class work
In-class discussion
Individual work (15 minutes)
Instructions: Based on the MrBayes papers you read, write down (somewhere) the following questions:
- What limitations of existing phylogenetic methods motivated MrBayes?
- What problems with maximum parsimony and early ML methods do the authors emphasize?
- Which of these are biological concerns (e.g., uncertainty, model realism)?
- What does “Bayesian phylogenetic inference” promise to users?
- Direct probability statements about trees/clades
- Integration over uncertainty (trees, parameters)
- Use of prior information
- Which of these would actually matter to you as a biologist?
- What is the role of MCMC in this story?
- Why can’t we just “compute” the posterior exactly?
- What trade-off are we making by using MCMC?
- What choices does the user have to make when running MrBayes?
- Which of these choices are “biological” rather than statistical?
-
What could go wrong if the user treats MrBayes as a black box?
- If you were reviewing a paper that used MrBayes, what would you want the authors to report?
Whole class work (15 minutes)
Let’s discuss the questions together!
Take-home message: always read carefully the paper and the documentation of any phylogenetic method you use
Create your own cheatsheet with description, strengths, weaknesses, assumptions and user choices (and other things you find relevant).
Software
- Download MrBayes from here.
In Mac, you can:
brew install mrbayes
And you can check installation with:
$ which mb
/usr/local/bin/mb
People can also install with conda”
conda install -c bioconda mrbayes
-
We will use a new dataset (
algaemb.nex) in the data folder in the class repository. -
Now, we want to create a mrbayes block. I do this in a separate text file called
mbblock.txt. Note that we need to addmcmc;sumt;at the end so that the mb block is executed.mcmcis the command to run MCMC andsumtis the command to obtain a summary tree.
begin mrbayes;
set autoclose=yes;
prset brlenspr=unconstrained:exp(10.0);
prset shapepr=exp(1.0);
prset tratiopr=beta(1.0,1.0);
prset statefreqpr=dirichlet(1.0,1.0,1.0,1.0);
lset nst=2 rates=gamma ngammacat=4;
mcmcp ngen=10000 samplefreq=10 printfreq=100 nruns=1 nchains=3 savebrlens=yes;
outgroup Anacystis_nidulans;
mcmc;
sumt;
end;
Explanation of MrBayes commands
begin mrbayes;
Starts the MrBayes command block. All commands until end; are interpreted by MrBayes.
set autoclose=yes;
Automatically exits MrBayes after all commands in the block have finished executing (useful for non-interactive or scripted runs).
prset brlenspr=unconstrained:exp(10.0);
Sets the prior on branch lengths. Branch lengths are allowed to vary freely, with an exponential prior having mean 0.1 substitutions per site. This reflects the expectation that most branches are short but longer branches are possible.
prset shapepr=exp(1.0);
Sets the prior on the gamma shape parameter controlling rate variation among sites. An exponential prior with mean 1 assumes moderate rate heterogeneity while remaining weakly informative.
prset tratiopr=beta(1.0,1.0);
Sets the prior on the transition/transversion rate ratio. A Beta(1,1) distribution is uniform, indicating no prior preference for particular values.
prset statefreqpr=dirichlet(1.0,1.0,1.0,1.0);
Sets the prior on nucleotide base frequencies (A, C, G, T). A uniform Dirichlet prior assumes no prior bias toward any nucleotide composition.
lset nst=2 rates=gamma ngammacat=4;
Specifies the substitution model: the HKY model (separate transition and transversion rates) with gamma-distributed rate variation across sites, approximated using four rate categories.
mcmcp ngen=10000 samplefreq=10 printfreq=100 nruns=1 nchains=3 savebrlens=yes;
Sets MCMC run parameters: 10,000 generations, sampling every 10 generations, progress printed every 100 generations, one independent run with three chains (one cold, two heated), and branch lengths saved for sampled trees.
outgroup Anacystis_nidulans;
Specifies Anacystis_nidulans as the outgroup for rooting the tree, reflecting an external biological assumption about ancestry.
mcmc;
Executes the Markov chain Monte Carlo analysis using the specified model, priors, and run settings.
sumt;
Summarizes the sampled trees and produces a consensus tree with posterior probabilities for clades.
end;
Ends the MrBayes command block.
- Now, I append the MrBayes block to the end of the nexus file with the data
algaemb.nex:
cat algaemb.nex mbblock.txt > algaemb-mb.nex
- Now, we run MrBayes:
mb algaemb-mb.nex
MrBayes 3.2.7a x86_64
(Bayesian Analysis of Phylogeny)
(Parallel version)
(1 processors available)
Distributed under the GNU General Public License
Type "help" or "help <command>" for information
on the commands that are available.
Type "about" for authorship and general
information about the program.
Executing file "algaemb-mb.nex"
DOS line termination
Longest line length = 80
Parsing file
Expecting NEXUS formatted file
Reading taxa block
Allocated taxon set
Defining new set of 8 taxa
Exiting taxa block
Deleting previously defined taxa
Reading data block
Allocated taxon set
Allocated matrix
Defining new matrix with 8 taxa and 920 characters
Data is Rna
Missing data coded as ?
Gaps coded as -
Taxon 1 -> Tobacco
Taxon 2 -> Rice
Taxon 3 -> Marchantia
Taxon 4 -> Chlamydomonas
Taxon 5 -> Chlorella
Taxon 6 -> Euglena
Taxon 7 -> Anacystis_nidulans
Taxon 8 -> Olithodiscus
Successfully read matrix
Setting default partition (does not divide up characters)
Setting model defaults
Seed (for generating default start values) = 1617459021
Setting output file names to "algaemb-mb.nex.run<i>.<p|t>"
Exiting data block
Reading mrbayes block
Setting autoclose to yes
Setting Brlenspr to Unconstrained:Exponential(10.00)
Successfully set prior model parameters
Setting Shapepr to Exponential(1.00)
Successfully set prior model parameters
Setting Tratiopr to Beta(1.00,1.00)
Successfully set prior model parameters
Setting Statefreqpr to Dirichlet(1.00,1.00,1.00,1.00)
Successfully set prior model parameters
Setting Nst to 2
Setting Rates to Gamma
Setting Ngammacat to 4
Successfully set likelihood model parameters
Setting number of generations to 10000
Setting sample frequency to 10
Setting print frequency to 100
WARNING: Reallocation of zero size attempted. This is probably a bug. Problems may follow.
WARNING: Reallocation of zero size attempted. This is probably a bug. Problems may follow.
Setting number of runs to 1
WARNING: Allocation of zero size attempted. This is probably a bug; problems may follow.
Setting number of chains to 3
Setting chain output file names to "algaemb-mb.nex.<p/t>"
Successfully set chain parameters
Setting outgroup to taxon "Anacystis_nidulans"
Running Markov chain
MCMC stamp = 8063218354
Seed = 989486596
Swapseed = 1617459021
Model settings:
Data not partitioned --
Datatype = RNA
Nucmodel = 4by4
Nst = 2
Transition and transversion rates, expressed
as proportions of the rate sum, have a
Beta(1.00,1.00) prior
Covarion = No
# States = 4
State frequencies have a Dirichlet prior
(1.00,1.00,1.00,1.00)
Rates = Gamma
The distribution is approximated using 4 categories.
Shape parameter is exponentially
distributed with parameter (1.00).
Active parameters:
Parameters
---------------------
Tratio 1
Statefreq 2
Shape 3
Ratemultiplier 4
Topology 5
Brlens 6
---------------------
1 -- Parameter = Tratio
Type = Transition and transversion rates
Prior = Beta(1.00,1.00)
2 -- Parameter = Pi
Type = Stationary state frequencies
Prior = Dirichlet
3 -- Parameter = Alpha
Type = Shape of scaled gamma distribution of site rates
Prior = Exponential(1.00)
4 -- Parameter = Ratemultiplier
Type = Partition-specific rate multiplier
Prior = Fixed(1.0)
5 -- Parameter = Tau
Type = Topology
Prior = All topologies equally probable a priori
Subparam. = V
6 -- Parameter = V
Type = Branch lengths
Prior = Unconstrained:Exponential(10.0)
Number of chains per MPI processor = 3
The MCMC sampler will use the following moves:
With prob. Chain will use move
1.89 % Dirichlet(Tratio)
0.94 % Dirichlet(Pi)
0.94 % Slider(Pi)
1.89 % Multiplier(Alpha)
9.43 % ExtSPR(Tau,V)
9.43 % ExtTBR(Tau,V)
9.43 % NNI(Tau,V)
9.43 % ParsSPR(Tau,V)
37.74 % Multiplier(V)
13.21 % Nodeslider(V)
5.66 % TLMultiplier(V)
Division 1 has 178 unique site patterns
Initializing conditional likelihoods
Running benchmarks to automatically select fastest BEAGLE resource...
Using BEAGLE v3.1.2 resource 0 for division 1:
Rsrc Name : CPU
Impl Name : CPU-4State-Single
Flags: PROCESSOR_CPU PRECISION_SINGLE COMPUTATION_SYNCH EIGEN_REAL
SCALING_MANUAL SCALERS_RAW VECTOR_NONE THREADING_CPP MODEL STATES: 4
Initial log likelihoods and log prior probs:
Chain 1 -- -3759.825459 -- 18.876285
Chain 2 -- -3702.099422 -- 18.876285
Chain 3 -- -3739.136439 -- 18.876285
Chain results (10000 generations requested):
0 -- [-3759.825] (-3702.099) (-3739.136) (...0 remote chains...)
100 -- (-3353.395) (-3347.153) [-3280.734] (...0 remote chains...) -- 0:00:00
200 -- (-3306.317) (-3298.108) [-3241.949] (...0 remote chains...) -- 0:00:00
300 -- (-3283.194) [-3227.173] (-3225.352) (...0 remote chains...) -- 0:00:00
400 -- (-3281.870) (-3224.687) [-3224.098] (...0 remote chains...) -- 0:00:00
500 -- (-3266.355) (-3199.826) [-3182.320] (...0 remote chains...) -- 0:00:00
600 -- (-3241.760) (-3203.766) [-3183.467] (...0 remote chains...) -- 0:00:00
700 -- (-3236.628) (-3200.126) [-3185.338] (...0 remote chains...) -- 0:00:00
800 -- (-3231.015) (-3194.926) [-3180.164] (...0 remote chains...) -- 0:00:00
900 -- (-3217.827) (-3184.216) [-3178.889] (...0 remote chains...) -- 0:00:00
1000 -- (-3195.190) [-3184.355] (-3183.678) (...0 remote chains...) -- 0:00:00
1100 -- (-3192.223) [-3178.895] (-3179.926) (...0 remote chains...) -- 0:00:00
1200 -- (-3178.715) [-3179.211] (-3176.804) (...0 remote chains...) -- 0:00:00
1300 -- (-3176.553) (-3177.750) [-3178.823] (...0 remote chains...) -- 0:00:00
1400 -- (-3176.467) [-3179.437] (-3181.574) (...0 remote chains...) -- 0:00:00
1500 -- [-3183.282] (-3176.957) (-3180.743) (...0 remote chains...) -- 0:00:00
1600 -- (-3181.371) (-3178.273) [-3180.235] (...0 remote chains...) -- 0:00:00
1700 -- [-3179.620] (-3175.754) (-3184.869) (...0 remote chains...) -- 0:00:00
1800 -- [-3179.730] (-3174.753) (-3183.184) (...0 remote chains...) -- 0:00:00
1900 -- (-3180.505) [-3179.671] (-3186.008) (...0 remote chains...) -- 0:00:00
2000 -- [-3184.664] (-3182.241) (-3186.777) (...0 remote chains...) -- 0:00:00
2100 -- (-3180.064) [-3188.069] (-3188.436) (...0 remote chains...) -- 0:00:00
2200 -- [-3181.685] (-3180.572) (-3180.552) (...0 remote chains...) -- 0:00:00
2300 -- (-3179.195) (-3180.901) [-3176.199] (...0 remote chains...) -- 0:00:00
2400 -- (-3178.852) (-3184.149) [-3176.425] (...0 remote chains...) -- 0:00:00
2500 -- (-3184.212) (-3175.047) [-3179.531] (...0 remote chains...) -- 0:00:00
2600 -- (-3182.521) [-3176.479] (-3177.626) (...0 remote chains...) -- 0:00:00
2700 -- (-3182.899) (-3179.200) [-3183.085] (...0 remote chains...) -- 0:00:00
2800 -- (-3182.914) [-3177.533] (-3180.915) (...0 remote chains...) -- 0:00:00
2900 -- (-3180.871) (-3179.146) [-3185.498] (...0 remote chains...) -- 0:00:00
3000 -- [-3186.474] (-3180.030) (-3177.632) (...0 remote chains...) -- 0:00:00
3100 -- (-3183.345) (-3179.069) [-3176.737] (...0 remote chains...) -- 0:00:00
3200 -- (-3175.511) [-3181.196] (-3178.613) (...0 remote chains...) -- 0:00:00
3300 -- [-3176.828] (-3181.322) (-3182.329) (...0 remote chains...) -- 0:00:00
3400 -- (-3176.869) [-3180.169] (-3176.778) (...0 remote chains...) -- 0:00:00
3500 -- (-3174.747) (-3180.622) [-3177.945] (...0 remote chains...) -- 0:00:00
3600 -- (-3179.714) [-3178.250] (-3179.186) (...0 remote chains...) -- 0:00:00
3700 -- (-3179.598) (-3188.674) [-3184.173] (...0 remote chains...) -- 0:00:00
3800 -- [-3177.332] (-3188.028) (-3182.412) (...0 remote chains...) -- 0:00:00
3900 -- (-3178.186) [-3178.082] (-3186.137) (...0 remote chains...) -- 0:00:00
4000 -- (-3173.430) [-3178.177] (-3187.433) (...0 remote chains...) -- 0:00:00
4100 -- [-3175.895] (-3176.054) (-3178.803) (...0 remote chains...) -- 0:00:00
4200 -- (-3180.242) (-3181.884) [-3175.935] (...0 remote chains...) -- 0:00:00
4300 -- [-3182.494] (-3184.167) (-3175.051) (...0 remote chains...) -- 0:00:00
4400 -- [-3183.247] (-3187.316) (-3176.281) (...0 remote chains...) -- 0:00:00
4500 -- [-3179.956] (-3184.521) (-3184.126) (...0 remote chains...) -- 0:00:00
4600 -- (-3183.176) (-3183.810) [-3179.572] (...0 remote chains...) -- 0:00:00
4700 -- (-3185.077) [-3184.858] (-3181.104) (...0 remote chains...) -- 0:00:00
4800 -- [-3178.355] (-3185.939) (-3179.460) (...0 remote chains...) -- 0:00:00
4900 -- [-3174.345] (-3183.450) (-3180.649) (...0 remote chains...) -- 0:00:00
5000 -- [-3178.869] (-3182.610) (-3182.627) (...0 remote chains...) -- 0:00:00
5100 -- (-3186.838) [-3176.949] (-3185.346) (...0 remote chains...) -- 0:00:00
5200 -- (-3175.196) (-3179.961) [-3183.729] (...0 remote chains...) -- 0:00:00
5300 -- (-3185.150) (-3179.676) [-3181.571] (...0 remote chains...) -- 0:00:00
5400 -- [-3181.734] (-3178.326) (-3185.810) (...0 remote chains...) -- 0:00:00
5500 -- [-3182.337] (-3182.883) (-3184.292) (...0 remote chains...) -- 0:00:00
5600 -- (-3185.635) [-3177.900] (-3179.411) (...0 remote chains...) -- 0:00:00
5700 -- (-3181.867) (-3178.463) [-3179.230] (...0 remote chains...) -- 0:00:00
5800 -- [-3178.576] (-3182.170) (-3177.209) (...0 remote chains...) -- 0:00:00
5900 -- [-3180.475] (-3181.036) (-3180.487) (...0 remote chains...) -- 0:00:00
6000 -- (-3184.769) [-3180.163] (-3187.583) (...0 remote chains...) -- 0:00:00
6100 -- [-3178.123] (-3184.955) (-3185.543) (...0 remote chains...) -- 0:00:00
6200 -- [-3177.295] (-3178.714) (-3176.608) (...0 remote chains...) -- 0:00:00
6300 -- [-3186.024] (-3181.882) (-3183.340) (...0 remote chains...) -- 0:00:00
6400 -- (-3179.892) (-3190.756) [-3179.995] (...0 remote chains...) -- 0:00:00
6500 -- [-3179.452] (-3185.336) (-3183.170) (...0 remote chains...) -- 0:00:00
6600 -- (-3176.169) [-3186.239] (-3182.716) (...0 remote chains...) -- 0:00:00
6700 -- [-3179.675] (-3184.730) (-3181.033) (...0 remote chains...) -- 0:00:00
6800 -- [-3175.040] (-3177.573) (-3188.196) (...0 remote chains...) -- 0:00:00
6900 -- [-3175.664] (-3185.497) (-3184.356) (...0 remote chains...) -- 0:00:00
7000 -- [-3177.964] (-3183.668) (-3184.600) (...0 remote chains...) -- 0:00:00
7100 -- (-3180.991) (-3177.802) [-3179.543] (...0 remote chains...) -- 0:00:00
7200 -- (-3175.904) [-3179.617] (-3181.589) (...0 remote chains...) -- 0:00:00
7300 -- (-3184.867) (-3176.631) [-3187.276] (...0 remote chains...) -- 0:00:00
7400 -- [-3178.121] (-3181.399) (-3181.721) (...0 remote chains...) -- 0:00:00
7500 -- (-3178.015) [-3178.664] (-3177.271) (...0 remote chains...) -- 0:00:00
7600 -- [-3185.636] (-3183.791) (-3173.629) (...0 remote chains...) -- 0:00:00
7700 -- (-3185.497) (-3185.767) [-3174.464] (...0 remote chains...) -- 0:00:00
7800 -- (-3183.996) (-3188.534) [-3176.742] (...0 remote chains...) -- 0:00:00
7900 -- (-3184.967) [-3190.058] (-3177.931) (...0 remote chains...) -- 0:00:00
8000 -- (-3180.265) (-3193.064) [-3177.111] (...0 remote chains...) -- 0:00:00
8100 -- (-3186.522) [-3190.527] (-3178.535) (...0 remote chains...) -- 0:00:00
8200 -- (-3179.201) (-3183.305) [-3179.911] (...0 remote chains...) -- 0:00:00
8300 -- (-3177.099) [-3179.418] (-3180.000) (...0 remote chains...) -- 0:00:00
8400 -- (-3178.369) [-3180.151] (-3181.331) (...0 remote chains...) -- 0:00:00
8500 -- (-3179.846) (-3190.499) [-3179.051] (...0 remote chains...) -- 0:00:00
8600 -- (-3176.245) (-3187.241) [-3183.481] (...0 remote chains...) -- 0:00:00
8700 -- [-3177.287] (-3188.540) (-3184.361) (...0 remote chains...) -- 0:00:00
8800 -- (-3175.809) [-3179.570] (-3179.106) (...0 remote chains...) -- 0:00:00
8900 -- [-3175.401] (-3177.233) (-3178.733) (...0 remote chains...) -- 0:00:00
9000 -- (-3174.233) [-3176.472] (-3178.749) (...0 remote chains...) -- 0:00:00
9100 -- (-3174.608) [-3176.467] (-3181.590) (...0 remote chains...) -- 0:00:00
9200 -- (-3176.358) [-3181.941] (-3176.371) (...0 remote chains...) -- 0:00:00
9300 -- (-3185.855) [-3174.466] (-3171.889) (...0 remote chains...) -- 0:00:00
9400 -- [-3178.352] (-3175.461) (-3173.370) (...0 remote chains...) -- 0:00:00
9500 -- (-3173.858) (-3186.138) [-3175.535] (...0 remote chains...) -- 0:00:00
9600 -- [-3176.441] (-3186.226) (-3174.411) (...0 remote chains...) -- 0:00:00
9700 -- [-3177.655] (-3184.349) (-3175.636) (...0 remote chains...) -- 0:00:00
9800 -- (-3179.726) (-3175.420) [-3176.869] (...0 remote chains...) -- 0:00:00
9900 -- (-3180.197) (-3179.664) [-3180.942] (...0 remote chains...) -- 0:00:00
10000 -- (-3182.122) (-3182.789) [-3176.193] (...0 remote chains...) -- 0:00:00
Analysis completed in 1 second
Analysis used 0.64 seconds of CPU time on processor 0
Log likelihood of best state for "cold" chain was -3171.63
Acceptance rates for the moves in the "cold" chain:
With prob. (last 100) chain accepted proposals by move
33.3 % ( 37 %) Dirichlet(Tratio)
NA NA Dirichlet(Pi)
NA NA Slider(Pi)
41.9 % ( 43 %) Multiplier(Alpha)
6.2 % ( 8 %) ExtSPR(Tau,V)
6.8 % ( 11 %) ExtTBR(Tau,V)
9.0 % ( 10 %) NNI(Tau,V)
9.5 % ( 3 %) ParsSPR(Tau,V)
49.3 % ( 34 %) Multiplier(V)
32.7 % ( 38 %) Nodeslider(V)
17.7 % ( 18 %) TLMultiplier(V)
Chain swap information:
1 2 3
--------------------
1 | 0.76 0.57
2 | 3255 0.75
3 | 3371 3374
Upper diagonal: Proportion of successful state exchanges between chains
Lower diagonal: Number of attempted state exchanges between chains
Chain information:
ID -- Heat
-----------
1 -- 1.00 (cold chain)
2 -- 0.91
3 -- 0.83
Heat = 1 / (1 + T * (ID - 1))
(where T = 0.10 is the temperature and ID is the chain number)
Summarizing trees in file "algaemb-mb.nex.t"
Using relative burnin ('relburnin=yes'), discarding the first 25 % of sampled trees
Writing statistics to files algaemb-mb.nex.<parts|tstat|vstat|trprobs|con>
Examining file ...
Found one tree block in file "algaemb-mb.nex.t" with 1001 trees in last block
Tree reading status:
0 10 20 30 40 50 60 70 80 90 100
v-------v-------v-------v-------v-------v-------v-------v-------v-------v-------v
*********************************************************************************
Read 1001 trees from last tree block (sampling 751 of them)
General explanation:
In an unrooted tree, a taxon bipartition (split) is specified by removing a
branch, thereby dividing the species into those to the left and those to the
right of the branch. Here, taxa to one side of the removed branch are denoted
'.' and those to the other side are denoted '*'. Specifically, the '.' symbol
is used for the taxa on the same side as the outgroup.
In a rooted or clock tree, the tree is rooted using the model and not by
reference to an outgroup. Each bipartition therefore corresponds to a clade,
that is, a group that includes all the descendants of a particular branch in
the tree. Taxa that are included in each clade are denoted using '*', and
taxa that are not included are denoted using the '.' symbol.
The output first includes a key to all the bipartitions with frequency larger
or equual to (Minpartfreq) in at least one run. Minpartfreq is a parameter to
sumt command and currently it is set to 0.10. This is followed by a table
with statistics for the informative bipartitions (those including at least
two taxa), sorted from highest to lowest probability. For each bipartition,
the table gives the number of times the partition or split was observed in all
runs (#obs) and the posterior probability of the bipartition (Probab.), which
is the same as the split frequency. If several runs are summarized, this is
followed by the minimum split frequency (Min(s)), the maximum frequency
(Max(s)), and the standard deviation of frequencies (Stddev(s)) across runs.
The latter value should approach 0 for all bipartitions as MCMC runs converge.
This is followed by a table summarizing branch lengths, node heights (if a
clock model was used) and relaxed clock parameters (if a relaxed clock model
was used). The mean, variance, and 95 % credible interval are given for each
of these parameters. If several runs are summarized, the potential scale
reduction factor (PSRF) is also given; it should approach 1 as runs converge.
Node heights will take calibration points into account, if such points were
used in the analysis.
Note that Stddev may be unreliable if the partition is not present in all
runs (the last column indicates the number of runs that sampled the partition
if more than one run is summarized). The PSRF is not calculated at all if
the partition is not present in all runs.The PSRF is also sensitive to small
sample sizes and it should only be considered a rough guide to convergence
since some of the assumptions allowing one to interpret it as a true potential
scale reduction factor are violated in MrBayes.
List of taxa in bipartitions:
1 -- Tobacco
2 -- Rice
3 -- Marchantia
4 -- Chlamydomonas
5 -- Chlorella
6 -- Euglena
7 -- Anacystis_nidulans
8 -- Olithodiscus
Key to taxon bipartitions (saved to file "algaemb-mb.nex.parts"):
ID -- Partition
--------------
1 -- *.......
2 -- .*......
3 -- ..*.....
4 -- ...*....
5 -- ....*...
6 -- .....*..
7 -- ******.*
8 -- .......*
9 -- ***.....
10 -- **......
11 -- ******..
12 -- ****....
13 -- *****...
14 -- ....**..
15 -- .....*.*
--------------
Summary statistics for informative taxon bipartitions
(saved to file "algaemb-mb.nex.tstat"):
ID #obs Probab.
--------------------
9 750 0.998668
10 750 0.998668
11 669 0.890812
12 601 0.800266
13 442 0.588549
14 272 0.362184
15 81 0.107856
--------------------
Summary statistics for branch and node parameters
(saved to file "algaemb-mb.nex.vstat"):
95% HPD Interval
--------------------
Parameter Mean Variance Lower Upper Median
---------------------------------------------------------------------
length[1] 0.008161 0.000011 0.002427 0.014307 0.007719
length[2] 0.023876 0.000032 0.013570 0.034100 0.023029
length[3] 0.007419 0.000013 0.001953 0.014742 0.006779
length[4] 0.104244 0.000249 0.078615 0.140039 0.102775
length[5] 0.028415 0.000080 0.014044 0.048425 0.027800
length[6] 0.131972 0.000478 0.090502 0.171280 0.130754
length[7] 0.117047 0.000487 0.073892 0.158182 0.114494
length[8] 0.109015 0.000353 0.078083 0.148798 0.107979
length[9] 0.030881 0.000064 0.017233 0.048161 0.030458
length[10] 0.020222 0.000031 0.011416 0.032282 0.019576
length[11] 0.034847 0.000182 0.010263 0.062116 0.035234
length[12] 0.014920 0.000056 0.000411 0.029187 0.014737
length[13] 0.021550 0.000102 0.003188 0.039851 0.021096
length[14] 0.015335 0.000047 0.003886 0.026494 0.014376
length[15] 0.028468 0.000118 0.003274 0.044160 0.028945
---------------------------------------------------------------------
Clade credibility values:
/---------------------------------------------------------- Anacystis_nidul~ (7)
|
|---------------------------------------------------------- Olithodiscus (8)
|
| /---------- Tobacco (1)
| /---100--+
| | \---------- Rice (2)
+ /---100---+
| | \------------------- Marchantia (3)
| /----80---+
| | \----------------------------- Chlamydomonas (4)
| /---59---+
| | \--------------------------------------- Chlorella (5)
\----89---+
\------------------------------------------------ Euglena (6)
Phylogram (based on average branch lengths):
/---------------------------------------- Anacystis_nidul~ (7)
|
|-------------------------------------- Olithodiscus (8)
|
| /--- Tobacco (1)
| /-----+
| | \--------- Rice (2)
+ /----------+
| | \-- Marchantia (3)
| /----+
| | \------------------------------------ Chlamydomonas (4)
| /-------+
| | \---------- Chlorella (5)
\-----------+
\---------------------------------------------- Euglena (6)
|----------------| 0.050 expected changes per site
Calculating tree probabilities...
Credible sets of trees (16 trees sampled):
50 % credible set contains 2 trees
90 % credible set contains 6 trees
95 % credible set contains 8 trees
99 % credible set contains 11 trees
Exiting mrbayes block
Reached end of file
Tasks completed, exiting program because mode is noninteractive
To return control to the command line after completion of file processing,
set mode to interactive with 'mb -i <filename>' (i is for interactive)
or use 'set mode=interactive'
The command produces multiple output files:
(master) $ ls algaemb-mb.nex*
algaemb-mb.nex algaemb-mb.nex.con.tre algaemb-mb.nex.parts algaemb-mb.nex.tstat
algaemb-mb.nex.ckp algaemb-mb.nex.mcmc algaemb-mb.nex.t algaemb-mb.nex.vstat
algaemb-mb.nex.ckp~ algaemb-mb.nex.p algaemb-mb.nex.trprobs
- We now will use Tracer to assess if the chain converged and had good mixing.
We download Tracer here.
We need to File -> Import Trace File and select algaemb-mb.nex.p.
Look at the trace plot:
- Good behavior:
- The trace looks like a fuzzy horizontal band
- No obvious trends up or down after burn-in
- Bad behavior:
- Strong trends
- Long flat stretches
- Sudden jumps followed by no mixing
ESS Rules of thumb:
- ESS > 200 → acceptable
- ESS > 500 → good
- ESS < 100 → problematic
We can do a quick and dirty plot in R:
library(ape)
tre = read.nexus(file="algaemb-mb.nex.con.tre")
plot(tre)
After the computer lab, make sure to read the MrBayes tutorial carefully to determine the appropriate MrBayes block for your data.
Bonus points for paying attention to the importants aspects that affect the performance of the chosen method.
Double bonus points for comparing how do results differ if we use an alignment from a different method. That is, primatesAA-aligned-muscle.fasta are the aligned sequences with MUSCLE. I can run MrBayes on this file and on primatesAA-aligned.fasta which was aligned with ClustalW and compare with the estimated trees.
Homework
See the details of the Bayesian Inference HW in here.