Internal Documentation

Contents

Index

types

PhyloNetworks.ANodeType
ANode

Abstract node. An object of type EdgeT has a node attribute, which is an vector of 2 objects of some subtype of ANode. The concrete type Node is a subtype of ANode, and has an edge attribute, which is vector of Edge objects (where Edge is an alias for EdgeT{Node}).

source
PhyloNetworks.BMType
BM(λ)

Brownian Motion, subtype of ContinuousTraitEM, to model the population mean of a trait (or of the residuals from a linear model). Under the BM model, the population (or species) means have a multivariate normal distribution with covariance matrix = σ²λV, where σ² is the between-species variance-rate (to be estimated), and the matrix V is obtained from sharedPathMatrix(net)[:Tips].

λ is set to 1 by default, and is immutable. In future versions, λ may be used to control the scale for σ².

On a tree, V is the length of shared ancestry. On a network, the BM model assumes that the trait at a hybrid node is the weighted average of its immediate parents (plus possibly a fixed shift). The weights are the proportion of genes inherited from each parent: the γ parameters of hybrid edges.

source
PhyloNetworks.CacheLengthLiNCType
CacheLengthLiNC

Type to store intermediate values during optimization of one branch length, to limit time spent on garbage collection. Re-used across branches. See constructor below, from a StatisticalSubstitutionModel.

source
PhyloNetworks.CacheLengthLiNCMethod
CacheLengthLiNC(obj::SSM,
    ftolRel::Float64, ftolAbs::Float64, xtolRel::Float64, xtolAbs::Float64,
    maxeval::Int)

Constructor from a StatisticalSubstitutionModel object, with tolerance values and parameters controlling the search.

source
PhyloNetworks.ContinuousTraitEMType
ContinuousTraitEM

Abstract type for evolutionary models for continuous traits, using a continuous-time stochastic process on a phylogeny.

For subtypes, see BM, PagelLambda, ScalingHybrid.

Each of these subtypes/models has the field lambda, whose default value is 1.0. However, the interpretation of this field differs across models.

source
PhyloNetworks.EdgeTType
EdgeT{node type}
Edge = EdgeT{Node}
Edge(number, length=1.0)

Data structure for an edge and its various attributes. Most notably:

  • number (integer): serves as unique identifier; remains unchanged when the network is modified, with a nearest neighbor interchange for example
  • node: vector of Nodes, normally just 2 of them
  • isChild1 (boolean): true if node[1] is the child node of the edge, false if node[1] is the parent node of the edge
  • length: branch length
  • hybrid (boolean): whether the edge is a tree edge or a hybrid edge (in which case isChild1 is important, even if the network is semi-directed)
  • gamma: proportion of genetic material inherited by the child node via the edge; 1.0 for a tree edge
  • isMajor (boolean): whether the edge is the major path to the child node; true for tree edges, since a tree edge is the only path to its child node; normally true if gamma>0.5.

and other fields, used very internally

source
PhyloNetworks.MatrixTopologicalOrderType
MatrixTopologicalOrder

Matrix associated to an HybridNetwork in which rows/columns correspond to nodes in the network, sorted in topological order.

The following functions and extractors can be applied to it: tipLabels, obj[:Tips], obj[:InternalNodes], obj[:TipsNodes] (see documentation for function getindex(::MatrixTopologicalOrder, ::Symbol)).

Functions sharedPathMatrix and simulate return objects of this type.

The MatrixTopologicalOrder object has fields: V, nodeNumbersTopOrder, internalNodeNumbers, tipNumbers, tipNames, indexation. Type in "?MatrixTopologicalOrder.field" to get documentation on a specific field.

source
PhyloNetworks.NodeType
Node(number, leaf)
Node(number, leaf, hybrid)

Data structure for a node and its various attributes. Most notably:

  • number (integer): serves as unique identifier; remains unchanged when the network is modified, with a nearest neighbor interchange for example
  • leaf (boolean): whether the node is a leaf (with data typically) or an internal node (no data typically)
  • name (string): taxon name for leaves; internal node may or may not have a name
  • edge: vector of Edges that the node is attached to; 1 if the node is a leaf, 2 if the node is the root, 3 otherwise, and potentially more if the node has a polytomy
  • hybrid (boolean): whether the node is a hybrid node (with 2 or more parents) or a tree node (with a single parent)

Other more internal attributes include:

  • isBadDiamondI and isBadDiamondII (booleans): whether the node is a hybrid node where the reticulation forms a cycle of 4 nodes (diamond), and where both parents of the hybrid nodes are connected to a leaf. In a bad diamond of type I, the hybrid node itself is also connected to a leaf but the common neighbor of the 2 hybrid's parents is not connected to a leaf. In a bad diamond of type II, the hybrid node has an internal node as child, and the common neighbor of the 2 hybrid's parents is connected to a leaf.
  • isBadTriangle, isVeryBadTriangle and isExtBadTriangle (booleans): true if the reticulation forms a cycle of 3 nodes (triangle) and depending on the number of leaves attached these 3 nodes. The triangle means that the 2 parents of the hybrid node are directly related: one is the child of the other. isBadTriangle is true if the triangle is "good", as per Solís-Lemus & Ané (2016), that is, if all 3 nodes in the cycle are not connected to any leaves (the reticulation is detectable from quartet concordance factors, even though all branch lengths are not identifiable). isVeryBadTriangle is true if 2 (or all) of the 3 nodes are connected to a leaf, in which case the reticulation is undetectable from unrooted gene tree topologies (thus it's best to exclude these reticulations from a search). isBadTriangle is true if exactly 1 of the 3 nodes is connected to a leaf.

For details see Solís-Lemus & Ané (2016, doi:10.1371/journal.pgen.1005896)

source
PhyloNetworks.OptSummaryType
OptSummary{T<:AbstractFloat}

Summary of an NLopt optimization. Idea and code taken from MixedModels. T is the type of the function argument(s) and of the function value.

source
PhyloNetworks.PagelLambdaType
PagelLambda(λ)

Pagel's λ model, subtype of ContinuousTraitEM, with covariance matrix σ²V(λ). σ² is the between-species variance-rate (to be estimated), and V(λ) = λV + (1-λ)T, where V is the covariance under a Brownian motion BM and T is a diagonal matrix containing the total branch length elapsed from the root to each leaf (if the phylogeny is a tree, or more generally if the network is time consistent: the time from the root to a given node does not depend on the path).

λ ∈ [0,1] is mutable and may be optimized. It is a measure of phylogenetic signal, that is, how important the given network is for explaining variation in the response. When λ=1, the PagelLambda model reduces to the BM model.

source
PhyloNetworks.QuartetNetworkType
QuartetNetwork(net::HybridNetwork)

Subtype of Network abstract type. A QuartetNetwork object is an internal type used to calculate the expected CFs of quartets on a given network. Attributes of the QuartetNetwork objects need not be updated at a given time (see below).

The procedure to calculate expected CFs for a given network is as follows:

  1. A QuartetNetwork object is created for each Quartet using extractQuartet!(net,d) for net::HybridNetwork and d::DataCF
  2. The vector d.quartet has all the Quartet objects, each with a QuartetNetwork object (q.qnet). Attibutes in QuartetNetwork are not updated at this point
  3. Attributes in QuartetNetwork are partially updated when calculating the expected CF (calculateExpCFAll!). To calculate the expected CF for this quartet, we need to update the attributes: which, typeHyb, t1, split, formula, expCF. To do this, we need to modify the QuartetNetwork object (i.e. merge edges,...). But we do not want to modify it directly because it is connected to the original net via a map of the edges and nodes, so we use a deep copy: qnet=deepcopy(q.qnet) and then calculateExpCFAll!(qnet). Attributes that are updated on the original QuartetNetwork object q.qnet are:
    • q.qnet.hasEdge: array of booleans of length equal to net.edge that shows which identifiable edges and gammas of net (net.ht) are in qnet (and still identifiable). Note that the first elements of the vector correspond to the gammas.
    • q.qnet.index: length should match the number of trues in qnet.hasEdge. It has the indexes in qnet.edge from the edges in qnet.hasEdge. Note that the first elements of the vector correspond to the gammas.
    • q.qnet.edge: list of edges in QuartetNetwork. Note that external edges in net are collapsed when they appear in QuartetNetwork, so only internal edges map directly to edges in net
    • q.qnet.expCF: expected CF for this Quartet

Why not modify the original QuartetNetwork? We wanted to keep the original QuartetNetwork stored in DataCF with all the identifiable edges, to be able to determine if this object had been changed or not after a certain optimization.

The process is:

  1. Deep copy of full network to create q.qnet for Quartet q. This QuartetNetwork object has only 4 leaves now, but does not have merged edges (the identifiable ones) so that we can correspond to the edges in net. This QuartetNetwork does not have other attributes updated.
  2. For the current set of branch lengths and gammas, we can update the attributes in q.qnet to compute the expected CF. The functions that do this will "destroy" the QuartetNetwork object by merging edges, removing nodes, etc... So, we do this process in qnet=deepcopy(q.qnet), and at the end, only update q.qnet.expCF.
  3. After we optimize branch lengths in the full network, we want to update the branch lengths in q.qnet. The edges need to be there (which is why we do not want to modify this QuartetNetwork object by merging edges), and we do not do a deep-copy of the full network again. We only change the values of branch lengths and gammas in q.qnet, and we can re-calculate the expCF by creating a deep copy qnet=deepcopy(q.qnet) and run the other functions (which merge edges, etc) to get the expCF.

Future work: there are definitely more efficient ways to do this (without the deep copies). In addition, currently edges that are no longer identifiable in QuartetNetwork do not appear in hasEdge nor index. Need to study this.

julia> net0 = readTopology("(s17:13.76,(((s3:10.98,(s4:8.99,s5:8.99)I1:1.99)I2:0.47,(((s6:2.31,s7:2.31)I3:4.02,(s8:4.97,#H24:0.0::0.279)I4:1.36)I5:3.64,((s9:8.29,((s10:2.37,s11:2.37)I6:3.02,(s12:2.67,s13:2.67)I7:2.72)I8:2.89)I9:0.21,((s14:2.83,(s15:1.06,s16:1.06)I10:1.78)I11:2.14)#H24:3.52::0.72)I12:1.47)I13:1.48)I14:1.26,(((s18:5.46,s19:5.46)I15:0.59,(s20:4.72,(s21:2.40,s22:2.40)I16:2.32)I17:1.32)I18:2.68,(s23:8.56,(s1:4.64,s2:4.64)I19:3.92)I20:0.16)I21:3.98)I22:1.05);");

julia> net = readTopologyLevel1(writeTopology(net0)) ## need level1 attributes for functions below
HybridNetwork, Un-rooted Network
46 edges
46 nodes: 23 tips, 1 hybrid nodes, 22 internal tree nodes.
tip labels: s17, s3, s4, s5, ...
(s4:8.99,s5:8.99,(s3:10.0,((((s6:2.31,s7:2.31)I3:4.02,(s8:4.97,#H24:0.0::0.279)I4:1.36)I5:3.64,((s9:8.29,((s10:2.37,s11:2.37)I6:3.02,(s12:2.67,s13:2.67)I7:2.72)I8:2.89)I9:0.21,((s14:2.83,(s15:1.06,s16:1.06)I10:1.78)I11:2.14)#H24:3.52::0.721)I12:1.47)I13:1.48,((((s18:5.46,s19:5.46)I15:0.59,(s20:4.72,(s21:2.4,s22:2.4)I16:2.32)I17:1.32)I18:2.68,(s23:8.56,(s1:4.64,s2:4.64)I19:3.92)I20:0.16)I21:3.98,s17:10.0)I22:1.26)I14:0.47)I2:1.99)I1;

julia> q1 = Quartet(1,["s1", "s16", "s18", "s23"],[0.296,0.306,0.398])
number: 1
taxon names: ["s1", "s16", "s18", "s23"]
observed CF: [0.296, 0.306, 0.398]
pseudo-deviance under last used network: 0.0 (meaningless before estimation)
expected CF under last used network: Float64[] (meaningless before estimation)

julia> qnet = PhyloNetworks.extractQuartet!(net,q1)
taxa: ["s1", "s16", "s18", "s23"]
number of hybrid nodes: 1

julia> sum([e.istIdentifiable for e in net.edge]) ## 23 identifiable edges in net
23

julia> idedges = [ee.number for ee in net.edge[[e.istIdentifiable for e in net.edge]]];

julia> print(idedges)
[5, 6, 9, 11, 12, 13, 17, 20, 21, 22, 26, 27, 28, 29, 30, 31, 34, 38, 39, 40, 44, 45, 46]

julia> length(qnet.hasEdge) ## 24 = 1 gamma + 23 identifiable edges
24

julia> sum(qnet.hasEdge) ## 8 = 1 gamma + 7 identifiable edges in qnet
8

julia> print(idedges[qnet.hasEdge[2:end]]) ## 7 id. edges: [12, 13, 29, 30, 31, 45, 46]
[12, 13, 29, 30, 31, 45, 46]

julia> qnet.edge[qnet.index[1]].number ## 11 = minor hybrid edge
11
source
PhyloNetworks.QuartetTType

QuartetT{T}

Generic type for 4-taxon sets. Fields:

  • number: rank of the 4-taxon set
  • taxonnumber: static vector of 4 integers, assumed to be distinct and sorted
  • data: object of type T

For easier look-up, a unique mapping is used between the rank (number) of a 4-taxon set and its 4 taxa (see quartetrank and nchoose1234):

rank-1 = (t1-1) choose 1 + (t2-1) choose 2 + (t3-1) choose 3 + (t4-1) choose 4

examples

julia> nCk = PhyloNetworks.nchoose1234(5)
6×4 Matrix{Int64}:
 0   0   0  0
 1   0   0  0
 2   1   0  0
 3   3   1  0
 4   6   4  1
 5  10  10  5

julia> PhyloNetworks.QuartetT(1,3,4,6, [.92,.04,.04, 100], nCk)
4-taxon set number 8; taxon numbers: 1,3,4,6
data: [0.92, 0.04, 0.04, 100.0]
source
PhyloNetworks.ScalingHybridType
ScalingHybrid(λ)

Scaling Hybrid model, subtype of ContinuousTraitEM, with covariance matrix σ²V(N(λ)). σ² is the between-species variance-rate (to be estimated), V(N) is the Brownian motion BM covariance obtained from network N, and N(λ) is a obtained from the input network by rescaling the inheritance parameter γ of all minor edges by the same λ: a minor edge has its original γ changed to λγ, using the same λ at all reticulations. Note that for a major edge with original inheritance γ, the partner minor edge has inheritance γminor = 1-γ, so the major edge's inheritance is changed to 1-λγminor = λγ+1-λ.

For more information: see Bastide (2017) dissertation, section 4.3.2 p.175, available at https://tel.archives-ouvertes.fr/tel-01629648

λ ∈ [0,1] is mutable and may be optimized. It is a measure of how important the reticulations are for explaining variation in the response. When λ=1, the ScalingHybrid model reduces to the BM model.

source
PhyloNetworks.StatisticalSubstitutionModelType
StatisticalSubstitutionModel(model::SubstitutionModel,
        ratemodel::RateVariationAcrossSites,
        net::HybridNetwork, trait::AbstractVector,
        siteweight=nothing::Union{Nothing, Vector{Float64}},
        maxhybrid=length(net.hybrid)::Int)

Inner constructor. Makes a deep copy of the input model, rate model. Warning: does not make a deep copy of the network: modification of the object.net would modify the input net. Assumes that the network has valid gamma values (to extract displayed trees).

StatisticalSubstitutionModel(net::HybridNetwork, fastafile::String,
        modsymbol::Symbol, rvsymbol=:noRV::Symbol,
        ratecategories=4::Int;
        maxhybrid=length(net.hybrid)::Int)

Constructor from a network and a fasta file. The model symbol should be one of :JC69, :HKY85, :ERSM or :BTSM. The rvsymbol should be as required by RateVariationAcrossSites.

The network's gamma values are modified if they are missing. After that, a deep copy of the network is passed to the inner constructor.

source
PhyloNetworks.StatisticalSubstitutionModelType
StatisticalSubstitutionModel

Subtype of StatsBase.StatisticalModel, to fit discrete data to a model of trait substitution along a network. See fitdiscrete to fit a trait substitution model to discrete data. It returns an object of type StatisticalSubstitutionModel, to which standard functions can be applied, like loglikelihood(object), aic(object) etc.

source
PhyloNetworks.TopologyConstraintType

Type for various topological constraints, such as:

  1. a set of taxa forming a clade in the major tree
  2. a set of individuals belonging to the same species
  3. a set of taxa forming a clade in any one of the displayed trees
source
PhyloNetworks.TopologyConstraintMethod
TopologyConstraint(type::UInt8, taxonnames::Vector{String}, net::HybridNetwork)

Create a topology constraint from user-given type, taxon names, and network. There are 3 types of constraints:

  • type 1: A set of tips that forms a species.
  • type 2: A set of tips that forms a clade in the major tree. Note that the root matters. Constraining a set of species to be an outgroup is equivalent to constraining the ingroup to form a clade.
  • type 3: A set of tips that forms a clade in any one of the displayed trees.

Note: currently, with type-1 constraints, hybridizations are prevented from coming into or going out of a species group.

examples

julia> net = readTopology("(((2a,2b),(((((1a,1b,1c),4),(5)#H1),(#H1,(6,7))))#H2),(#H2,10));");

julia> c_species1 = PhyloNetworks.TopologyConstraint(0x01, ["1a","1b","1c"], net)
Species constraint, on tips: 1a, 1b, 1c
 stem edge number 7
 crown node number -9

julia> c_species2 = PhyloNetworks.TopologyConstraint(0x01, ["2a","2b"], net)
Species constraint, on tips: 2a, 2b
 stem edge number 3
 crown node number -4

julia> nni!(net , net.edge[3], true, true, [c_species2]) === nothing # we get nothing: would break species 2
true

julia> # the following gives an error, because 4,5 do not form a clade in "net"
       # PhyloNetworks.TopologyConstraint(0x02, ["4","5"], net)

julia> c_clade145 = PhyloNetworks.TopologyConstraint(0x02, ["1a","1b","1c","4","5"], net)
Clade constraint, on tips: 1a, 1b, 1c, 4, 5
 stem edge number 12
 crown node number -7

julia> nni!(net , net.edge[12], true, true, [c_species1, c_clade145]) # we get nothing (failed NNI): would break the clade
source
PhyloNetworks.WithinSpeciesCTMType
WithinSpeciesCTM

CTM stands for "continuous trait model". Contains the estimated variance components (between-species phylogenetic variance rate and within-species variance) and output from the NLopt optimization used in the estimation.

Fields

  • wsp_var: intra/within-species variance.
  • bsp_var: inter/between-species variance-rate.
  • wsp_ninv: vector of the inverse sample-sizes (e.g. [1/n₁, ..., 1/nₖ], where data from k species was used to fit the model and nᵢ is the no. of observations for the ith species).
  • rss: within-species sum of squares
  • optsum: an OptSummary object.
source

functions

Base.getindexFunction
getindex(obj, d,[ indTips, nonmissing])

Getting submatrices of an object of type MatrixTopologicalOrder.

Arguments

  • obj::MatrixTopologicalOrder: the matrix from which to extract.
  • d::Symbol: a symbol precising which sub-matrix to extract. Can be:
    • :Tips columns and/or rows corresponding to the tips
    • :InternalNodes columns and/or rows corresponding to the internal nodes Includes tips not listed in indTips or missing data according to nonmissing.
    • :TipsNodes columns corresponding to internal nodes, and row to tips (works only is indexation="b")
  • indTips::Vector{Int}: optional argument precising a specific order for the tips (internal use).
  • nonmissing::BitArray{1}: optional argument saying which tips have data (internal use). Tips with missing data are treated as internal nodes.
source
Base.getindexFunction
getindex(obj, d)

Getting submatrices of an object of type TraitSimulation.

Arguments

  • obj::TraitSimulation: the matrix from which to extract.
  • d::Symbol: a symbol precising which sub-matrix to extract. Can be:
    • :Tips columns and/or rows corresponding to the tips
    • :InternalNodes columns and/or rows corresponding to the internal nodes
source
PhyloNetworks.P!Method
P!(Pmat::AbstractMatrix, obj::SM, t::Float64)

Fill in the input matrix Pmat with the transition rates to go from each state to another in time t, according to rates in Q. see also: P.

julia> m1 = BinaryTraitSubstitutionModel([1.0,2.0], ["low","high"])
Binary Trait Substitution Model:
rate low→high α=1.0
rate high→low β=2.0

julia> PhyloNetworks.P!(Matrix{Float64}(undef,2,2), m1, 0.3) # fills an uninitialized 2x2 matrix of floats
2×2 Matrix{Float64}:
 0.80219  0.19781
 0.39562  0.60438

julia> m2 = JC69([1.]);

julia> PhyloNetworks.P!(Matrix{Float64}(undef,4,4), m2, 0.2)
4×4 Matrix{Float64}:
 0.824446   0.0585179  0.0585179  0.0585179
 0.0585179  0.824446   0.0585179  0.0585179
 0.0585179  0.0585179  0.824446   0.0585179
 0.0585179  0.0585179  0.0585179  0.824446 
source
PhyloNetworks.PMethod
P(obj, t)

Probability transition matrix for a TraitSubstitutionModel, of the form

P[1,1] ... P[1,k]
   .          .
   .          .
P[k,1] ... P[k,k]

where P[i,j] is the probability of ending in state j after time t, given that the process started in state i. see also: P!.

HKY example:

julia> m1 = HKY85([0.5], [0.20, 0.30, 0.30, 0.20])
HKY85 Substitution Model base frequencies: [0.2, 0.3, 0.3, 0.2]
relative rate version with transition/tranversion ratio kappa = 0.5,
 scaled so that there is one substitution per unit time
rate matrix Q:
               A       C       G       T
       A       *  0.4839  0.2419  0.3226
       C  0.3226       *  0.4839  0.1613
       G  0.1613  0.4839       *  0.3226
       T  0.3226  0.2419  0.4839       *

julia> PhyloNetworks.P(m1, 0.2)
4×4 StaticArraysCore.MMatrix{4, 4, Float64, 16} with indices SOneTo(4)×SOneTo(4):
 0.81592    0.0827167  0.0462192  0.0551445
 0.0551445  0.831326   0.0827167  0.0308128
 0.0308128  0.0827167  0.831326   0.0551445
 0.0551445  0.0462192  0.0827167  0.81592  

Juke-Cantor example:

julia> m1 = JC69([1.]);

julia> PhyloNetworks.P(m1, 0.2)
4×4 StaticArraysCore.MMatrix{4, 4, Float64, 16} with indices SOneTo(4)×SOneTo(4):
 0.824446   0.0585179  0.0585179  0.0585179
 0.0585179  0.824446   0.0585179  0.0585179
 0.0585179  0.0585179  0.824446   0.0585179
 0.0585179  0.0585179  0.0585179  0.824446 
source
PhyloNetworks.addAlternativeHybridizations!Method
addAlternativeHybridizations!(net::HybridNetwork, BSe::DataFrame;
                              cutoff=10::Number, top=3::Int)

Modify the network net (the best estimated network) by adding some of the hybridizations present in the bootstrap networks. By default, it will only add hybrid edges with more than 10% bootstrap support (cutoff) and it will only include the top 3 hybridizations (top) sorted by bootstrap support.

The dataframe BSe is also modified. In the original BSe, supposedly obtained with hybridBootstrapSupport, hybrid edges that do not appear in the best network have a missing number. After hybrid edges from bootstrap networks are added, BSe is modified to include the edge numbers of the newly added hybrid edges. To distinguish hybrid edges present in the original network versus new edges, an extra column of true/false values is also added to BSe, named "alternative", with true for newly added edges absent from the original network.

The hybrid edges added to net are added as minor edges, to keep the underlying major tree topology.

example

julia> bootnet = readMultiTopology(joinpath(dirname(pathof(PhyloNetworks)), "..","examples", "bootsnaq.out")); # vector of 10 networks

julia> bestnet = readTopology("((O,(E,#H7:::0.196):0.314):0.332,(((A)#H7:::0.804,B):10.0,(C,D):10.0):0.332);");

julia> BSn, BSe, BSc, BSgam, BSedgenum = hybridBootstrapSupport(bootnet, bestnet);

julia> BSe[1:6,[:edge,:hybrid_clade,:sister_clade,:BS_hybrid_edge]]
6×4 DataFrame
 Row │ edge     hybrid_clade  sister_clade  BS_hybrid_edge 
     │ Int64?   String        String        Float64        
─────┼─────────────────────────────────────────────────────
   1 │       7  H7            B                       33.0
   2 │       3  H7            E                       32.0
   3 │ missing  c_minus3      c_minus8                44.0
   4 │ missing  c_minus3      H7                      44.0
   5 │ missing  E             O                       12.0
   6 │ missing  c_minus6      c_minus8                 9.0

julia> PhyloNetworks.addAlternativeHybridizations!(bestnet, BSe)

julia> BSe[1:6,[:edge,:hybrid_clade,:sister_clade,:BS_hybrid_edge,:alternative]]
6×5 DataFrame
 Row │ edge     hybrid_clade  sister_clade  BS_hybrid_edge  alternative 
     │ Int64?   String        String        Float64         Bool        
─────┼──────────────────────────────────────────────────────────────────
   1 │       7  H7            B                       33.0        false
   2 │       3  H7            E                       32.0        false
   3 │      16  c_minus3      c_minus8                44.0         true
   4 │      19  c_minus3      H7                      44.0         true
   5 │      22  E             O                       12.0         true
   6 │ missing  c_minus6      c_minus8                 9.0        false

julia> # using PhyloPlots; plot(bestnet, edgelabel=BSe[:,[:edge,:BS_hybrid_edge]]);
source
PhyloNetworks.addHybridBetweenClades!Method
addHybridBetweenClades!(net::HybridNetwork, hybnum::Number, sisnum::Number)

Modify net by adding a minor hybrid edge from "donor" to "recipient", where "donor" is the major parent edge e1 of node number hybnum and "recipient" is the major parent edge e2 of node number sisnum. The new nodes are currently inserted at the middle of these parent edges.

If a hybrid edge from e1 to e2 would create a directed cycle in the network, then this hybrid cannot be added. In that case, the donor edge e1 is moved up if its parent is a hybrid node, to ensure that the sister clade to the new hybrid would be a desired (the descendant taxa from e1) and a new attempt is made to create a hybrid edge.

Output: number of the new hybrid edge, or nothing if the desired hybridization is not possible.

See also: addhybridedge! (used by this method) and directionalconflict to check that net would still be a DAG.

source
PhyloNetworks.addhybridedge!Function
addhybridedge!(net::HybridNetwork, edge1::Edge, edge2::Edge, hybridpartnernew::Bool,
               edgelength=-1.0::Float64, gamma=-1.0::Float64)

Add hybridization to net coming from edge1 going into edge2. 2 new nodes and 3 new edges are created: edge1 are edge2 are both cut into 2 edges, and a new edge is created linking the 2 new "middle" nodes, pointing from edge1 to edge2. The new node in the middle of edge1 is a tree node. The new node in the middle of edge2 is a hybrid node. Its parent edges are the newly created hybrid edge (with γ = gamma, missing by default), and either the newly edge "above" edge2 if hybridpartnernew=true, or the old edge2 otherwise (which would reverse the direction of edge2 and others).

Should be called from the other method, which performs a bunch of checks. Updates containRoot attributes for edges below the new hybrid node.

Output: new hybrid node (middle of the old edge2) and new hybrid edge.

examples

julia> net = readTopology("((S8,(((S1,(S5)#H1),(#H1,S6)))#H2),(#H2,S10));");

julia> hybnode, hybedge = PhyloNetworks.addhybridedge!(net, net.edge[13], net.edge[8], true, 0.0, 0.2)
(PhyloNetworks.Node:
 number:9
 name:H3
 hybrid node
 attached to 3 edges, numbered: 8 16 17
, PhyloNetworks.EdgeT{PhyloNetworks.Node}:
 number:17
 length:0.0
 minor hybrid edge with gamma=0.2
 attached to 2 node(s) (parent first): 8 9
)


julia> writeTopology(net)
"((S8,(((S1,(S5)#H1),((#H1,S6))#H3:::0.8))#H2),(#H2,(S10,#H3:0.0::0.2)));"
source
PhyloNetworks.addhybridedge!Function
addhybridedge!(net::HybridNetwork, nohybridladder::Bool, no3cycle::Bool,
               constraints=TopologyConstraint[]::Vector{TopologyConstraint};
               maxattempts=10::Int, fixroot=false::Bool)

Randomly choose two edges in net then: add hybrid edge from edge 1 to edge 2 of length 0.01. The two halves of edge 1 (and of edge 2) have equal lengths. The hybrid partner edge (top half of edge 2, if fixroot is true) will point towards the newly-created node on the middle of the original edge 2.

If the resulting network is a DAG, satisfies the constraint(s), does not contain any 3-cycle (if no3cycle=true), and does not have a hybrid ladder (if nohybridladder=true) then the proposal is successful: net is modified, and the function returns the newly created hybrid node and newly created hybrid edge.

If the resulting network is not acceptable, then a new set of edges is proposed (using a blacklist) until one is found acceptable, or until a maximum number of attempts have been made (maxattempts). If none of the attempted proposals are successful, nothing is returned (without causing an error).

After a pair of edges is picked, the "top" half of edge2 is proposed as the partner hybrid edge with probability 0.8 if fixroot is false, (to avoid changing the direction of edge2 with more than 50% chance) and with probability 1.0 if fixroot is true. If this choice does not work and if fixroot is false, the other half of edge2 is proposed as the partner hybrid edge. Note that choosing the "bottom" half of edge2 as the partner edge requires to flip the direction of edge 2, and to move the root accordingly (to the original child of edge2).

examples

julia> net = readTopology("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");

julia> using Random

julia> Random.seed!(170);

julia> PhyloNetworks.addhybridedge!(net, true, true)
(PhyloNetworks.Node:
 number:9
 name:H3
 hybrid node
 attached to 3 edges, numbered: 5 16 17
, PhyloNetworks.EdgeT{PhyloNetworks.Node}:
 number:17
 length:0.01
 minor hybrid edge with gamma=0.32771460911632916
 attached to 2 node(s) (parent first): 8 9
)

julia> writeTopology(net, round=true, digits=2)
"((S1,(((#H1,S4),((S2,(S3)#H1))#H3:::0.67))#H2),((#H2,S5),#H3:0.01::0.33));"
source
PhyloNetworks.addhybridedgeLiNC!Method
addhybridedgeLiNC!(obj::SSM, currLik::Float64,
    no3cycle::Bool, nohybridladder::Bool,
    constraints::Vector{TopologyConstraint},
    ftolAbs::Float64,
    γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)

Completes checks, adds hybrid in a random location, updates SSM object, and optimizes branch lengths and gammas locally as part of PhyLiNC optimization.

Return true if accepted add hybrid move. If move not accepted, return false. If cannot add a hybrid, return nothing.

For arguments, see phyLiNC. Called by optimizestructure!.

source
PhyloNetworks.addindividuals!Method
addindividuals!(net::HybridNetwork, species::AbstractString, individuals::Vector)

Add individuals to their species leaf as a star, and return the corresponding species constraint. nothing is returned in 2 cases:

  • if individuals contains only 1 individual, in which case the name of the leaf for that species is replaced by the name of its one individual representative
  • if species is not found in the network

Spaces in individuals' names are eliminated, see cleantaxonname.

Called by mapindividuals.

examples

julia> net = readTopology("(S8,(((S1,S4),(S5)#H1),(#H1,S6)));");

julia> PhyloNetworks.addindividuals!(net, "S1", ["S1A", "S1B", "S1C"])
Species constraint, on tips: S1A, S1B, S1C
 stem edge number 2
 crown node number 2

julia> writeTopology(net) # 3 new nodes, S1 now internal: not a tip
"(S8,((((S1A,S1B,S1C)S1,S4),(S5)#H1),(#H1,S6)));"
source
PhyloNetworks.addleaf!Function
addleaf!(net::HybridNetwork, node::Node, leafname::String, edgelength::Float64=-1.0)
addleaf!(net::HybridNetwork, edge::Edge, leafname::String, edgelength::Float64=-1.0)

Add a new external edge between node or between the "middle" of edge and a newly-created leaf, of name leafname. By default, the new edge length is missing (-1).

examples

julia> net = readTopology("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");

julia> net.node[6].name # leaf S4
"S4"

julia> PhyloNetworks.addleaf!(net, net.node[6], "4a"); # adding leaf to a node

julia> writeTopology(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(4a)S4)))#H2),(#H2,S5));"

julia> PhyloNetworks.addleaf!(net, net.node[6], "4b");

julia> writeTopology(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(4a,4b)S4)))#H2),(#H2,S5));"
julia> net = readTopology("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");

julia> [n.name for n in net.edge[7].node] # external edge to S4
2-element Vector{String}:
 "S4"
 ""  

julia> PhyloNetworks.addleaf!(net, net.edge[7], "4a"); # adding leaf to an edge

julia> writeTopology(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(S4,4a))))#H2),(#H2,S5));"
source
PhyloNetworks.afterOptBL!Method

afterOptBL road map

Function that will check if there are h==0,1;t==0,hz==0,1 cases in a network after calling optBL!.

Arguments:

  • closeN=true will move origin/target, if false, add/delete N times before giving up (we have only tested closeN=true)
  • origin=true will move origin, false will move target. We added this to avoid going back and forth between the same networks
  • movesgamma vector of counts of number of times each move is proposed to fix a gamma zero problem: (add,mvorigin,mvtarget,chdir,delete,nni)

Procedure:

  • First we split the ht vector in nh,nt,nhz (gammas, lengths, gammaz)

  • If we find a h==0,1, we loop through nh to find a hybrid edge with h==0 or 1 and want to try to fix this by doing:

    • gammaZero!(currT,d,edge,closeN,origin,N,movesgamma) which returns true if there was a successful change, and we stop the loop
  • If we find a t==0, we loop through all nt to find such edge, and do NNI move on this edge; return true if change successful and we stop the loop

  • If we find a hz==0,1, we loop through nhz to find such hybrid edge and call gammaZero again

  • If we did a successful change, we run optBL again, and recheck if there are no more problems.

  • Returns successchange, flagh, flagt,flaghz (flag=true means no problems)

  • If it is the multiple alleles case, it will not try to fix h==0,1;hz==0,1 because it can reach a case that violates the multiple alleles condition. If we add a check here, things become horribly slow and inefficient, so we just delete a hybridization that has h==0,1;hz==0,1

** Important: ** afterOptBL is doing only one change, but we need to repeat multiple times to be sure that we fix all the gamma zero problems, which is why we call afterOptBLRepeat

source
PhyloNetworks.afterOptBLAll!Method

afterOptBLAll road map

After optBL, we want to call afterOptBLAll (or afterOptBLAllMultipleAlleles) to check if there are h==0,1; t==0; hz==0,1. This function will try to fix the gamma zero problem, but if it cannot, it will call moveDownLevel, to delete the hybridization from the network.

Procedure:

While startover=true and tries<N

  • While badliks < N2 (number of bad pseudolikelihoods are less than N2)
    • Run success = afterOptBLRepeat
    • If success = true (it changed something):
      • If worse pseudolik, then go back to original topology currT, set startover=true and badliks++
      • If better pseudolik, then check flags. If all good, then startover=false; otherwise startover = true
    • If success = false (nothing changed), then set badliks=N2+1 (to end the while on currT)
      • If all flags are ok, then startover = false
      • If bad h or hz, then call moveDownLevel (delete one hybridization), and set startover = true (maybe deleting that hybridization did not fix other gamma zero problems)
      • If bad t, then set startover = false
  • If left second while by back to original currT, and still bad h/hz, then move down one level, and startover=true; otherwise startover=false

If first while ends by tries>N, then it checks one last time the flags, if bad h/hz will move down one level, and exit

source
PhyloNetworks.afterOptBLRepeat!Method

afterOptBLRepeat road map

afterOptBL is doing only one change, but we need to repeat multiple times to be sure that we fix all the gamma zero problems, which is why we call afterOptBLRepeat. This function will repeat afterOptBL every time a successful change happened; this is done only if closeN=false, because we would delete/add hybridizations and need to stop after tried N times. If closeN=true (default), then afterOptBLRepeat only does one afterOptBL, because in this case, only the neighbor edges need to be tested, and this would have been done already in gammaZero.

source
PhyloNetworks.allowrootbelow!Method
allowrootbelow!(net::HybridNetwork)

Set containRoot to true for each edge below the root node, then traverses net in preorder to update containRoot of all edges (stopping at hybrid nodes): see the other methods. Assumes correct isChild1 edge field.

source
PhyloNetworks.allowrootbelow!Method
allowrootbelow!(e::Edge)
allowrootbelow!(n::Node, parent_edge_of_n::Edge)

Set containRoot to true for edge e and all edges below, recursively. The traversal stops whenever a hybrid node is encountered: if the child of e is a hybrid node (that is, if e is a hybrid edge) or if n is a hybrid node, then the edges below e or n are not traversed.

source
PhyloNetworks.anovaMethod
anova(objs::PhyloNetworkLinearModel...)

Takes several nested fits of the same data, and computes the F statistic for each pair of models.

The fits must be results of function phylolm called on the same data, for models that have more and more effects.

Returns a DataFrame object with the anova table.

source
PhyloNetworks.assignhybridnames!Method
assignhybridnames!(net)

Assign names to hybrid nodes in the network net. Hybrid nodes with an empty name field ("") are modified with a name that does not conflict with other hybrid names in the network. The preferred name is "H3" if the node number is 3 or -3, but an index other than 3 would be used if "H3" were the name of another node already.

If two hybrid nodes have non-empty and equal names, the name of one of them is changed and re-assigned as described above (with a warning).

source
PhyloNetworks.biconnectedcomponent_entrynodesFunction
biconnectedcomponent_entrynodes(net, bcc, preorder=true)

Array containing the entry node of the each biconnected component in bcc. bcc is supposed to contain the biconnected components as output by biconnectedComponents, that is, an array of array of edges.

These entry nodes depend on the rooting (whereas the BCC only depend on the unrooted graph). They are either the root of the network or cut node (articulation points).

source
PhyloNetworks.biconnectedcomponent_exitnodesFunction
biconnectedcomponent_exitnodes(net, bcc, preorder=true)

Array containing an array of the exit node(s) of the each biconnected component in bcc. bcc is supposed to contain the biconnected components as output by biconnectedComponents, that is, an array of array of edges.

These exit nodes depend on the rooting (whereas the BCC only depend on the unrooted graph). The degree of a blob is the number of exit nodes + 1 if the blob doesn't contain the root (its entry node is a cut node), or + 0 if the blob contains the root (which enters into the blob but isn't a cut node).

Warning (or positive side effect?): the edge .inCycle attribute is modified. It stores the index (in bcc) of the biconnected component that an edge belongs to. If an edge doesn't belong in any (e.g. if trivial blobs are ignored), then its .inCycle is set to -1.

source
PhyloNetworks.blobInfoFunction
blobInfo(network, ignoreTrivial=true)

Calculate the biconnected components (blobs) using function biconnectedComponents then:

  • set node field isExtBadTriangle to true at the root of each non-trivial blob (and at the network root), false otherwise. (a better name for the field would be something like "isBlobRoot".)
  • output:
    1. array of nodes that are the roots of each non-trivial blob, and the network root. If the root of the full network is not part of a non-trivial blob, a corresponding blob is added to the list.
    2. array of arrays: for each non-trivial blob, array of major hybrid edges in that blob.
    3. array of arrays: same as #2 but for minor hybrid edges, with hybrids listed in the same order, for each blob.

Blobs are ordered in reverse topological ordering (aka post order). If ignoreTrivial is true, trivial components are ignored.

keyword argument: checkPreorder, true by default. If false, the isChild1 edge field and the net.nodes_changed network field are supposed to be correct.

warning: see biconnectedComponents for node attributes modified during the algorithm.

source
PhyloNetworks.breakedge!Method
breakedge!(edge::Edge, net::HybridNetwork)

Break an edge into 2 edges, each of length half that of original edge, creating a new node of degree 2. Useful to root network along an edge. Return the new node and the new edge, which is the "top" half of the original starting edge. These new node & edge are pushed last in net.node and net.edge.

If the starting edge was:

n1  --edge-->  n2

then we get this:

n1  --newedge-->  newnode  --edge-->  n2

isChild1 and containRoot are updated, but not fields for level-1 networks like inCycle, partition, gammaz, etc.

examples

julia> net = readTopology("(((S8,S9),((((S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));");

julia> length(net.node)
19

julia> net.edge[4] # edge 4 goes from node -8 to 3
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
 number:4
 length:-1.0
 attached to 2 node(s) (parent first): -8 3


julia> newnode, newedge = PhyloNetworks.breakedge!(net.edge[4], net);

julia> length(net.node) # one more than before
20

julia> newedge # new edge 21 goes from node -8 and 11 (new)
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
 number:21
 length:-1.0
 attached to 2 node(s) (parent first): -8 11


julia> net.edge[4] # original edge 4 now goes from node 11 (new) to 3
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
 number:4
 length:-1.0
 attached to 2 node(s) (parent first): 11 3


julia> writeTopology(net) # note extra pair of parentheses around S1
"(((S8,S9),((((S4,(S1)),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));"

See also: fuseedgesat!

source
PhyloNetworks.calculateObsCFAll!Method
calculateObsCFAll!(DataCF, taxa::Union{Vector{<:AbstractString}, Vector{Int}})

Calculate observed concordance factors: update the .quartet[i].obsCF values of the DataCF object based on its .tree vector.

calculateObsCFAll!(vector of quartets, vector of trees, taxa)

Calculate observed concordance factors: update the .obsCF values of the quartets, based on the trees, and returns a new DataCF object with these updated quartets and trees.

calculateObsCFAll_noDataCF!(vector of quartets, vector of trees, taxa)

update the .obsCF values of the quartets based on the trees, but returns nothing.

Warning: all these functions need input trees (without any reticulations: h=0).

See also: countquartetsintrees, which uses a faster algorithm, processing each input tree only once. calculateObsCFAll_noDataCF! processes each input tree # quartet times.

source
PhyloNetworks.checkMapDFMethod
checkMapDF(mapping_allele2species::DataFrame)

Check that the data frame has one column named "allele" or "individual", and one column named "species". Output: indices of these column.

source
PhyloNetworks.checkNumHybEdges!Method
checkNumHybEdges!(net)

Check for consistency between hybrid-related attributes in the network:

  • for each hybrid node: 2 or more hybrid edges
  • exception: allows for a leaf to be attached to a single hybrid edge
  • exactly 2 incoming parent hybrid edges

Run after storeHybrids!. See also check2HybEdges.

source
PhyloNetworks.check_matchtaxonnames!Method
check_matchtaxonnames!(species, data, net)

Modify species and dat by removing the species (rows) absent from the network. Return a new network (net is not modified) with tips matching those in species: if some species in net have no data, these species are pruned from the network. The network also has its node names reset, such that leaves have nodes have consecutive numbers starting at 1, with leaves first. Used by fitdiscrete to build a new StatisticalSubstitutionModel.

source
PhyloNetworks.check_nonmissing_nonnegative_edgelengthsFunction
check_nonmissing_nonnegative_edgelengths(net, str="")

Throw an Exception if net has undefined edge lengths (coded as -1.0) or negative edge lengths. The error message indicates the number of the offending edge(s), followed by str.

source
PhyloNetworks.checknetwork_LiNC!Function
checknetwork_LiNC!(net::HybridNetwork, maxhybrid::Int, no3cycle::Bool,
    nohybridladder::Bool,
    constraints=TopologyConstraint[]::Vector{TopologyConstraint},
    verbose::Bool=false)

Check that net is an adequate starting network before phyLiNC!: remove nodes of degree 2 (possibly including the root); check that net meets the topological constraints, has no polytomies (except at species constraints), and maxhybrid of fewer reticulations. According to user-given options, also check for the absence of 3-cycles and/or hybrid ladders.

julia> maxhybrid = 3;

julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");

julia> preorder!(net) # for correct unzipping in checknetwork_LiNC!

julia> PhyloNetworks.checknetwork_LiNC!(net, maxhybrid, true, true)
HybridNetwork, Rooted Network
8 edges
8 nodes: 4 tips, 1 hybrid nodes, 3 internal tree nodes.
tip labels: A, B, C, D
((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0,D:2.5);
source
PhyloNetworks.checkspeciesnetwork!Method
checkspeciesnetwork!(network::HybridNetwork,
                     constraints::Vector{TopologyConstraint})

Check that the network satisfies a number of requirements:

  • no polytomies, other than at species constraints: throws an error otherwise
  • no unnecessary nodes: fuse edges at nodes with degree two, including at the root (hence the bang: net may be modified)
  • topology constraints are met.

Output: true if all is good, false if one or more clades are violated.

source
PhyloNetworks.cleantaxonnameMethod
cleantaxonname(taxonname::AbstractString)

Return a String with leading and trailing spaces removed, and interior spaces replaced by underscores: good for using as tip names in a network without causing future error when reading the newick description of the network.

source
PhyloNetworks.constraintviolatedMethod
constraintviolated(network::HybridNetwork,
                   constraints::Vector{TopologyConstraint})

True if network violates one (or more) of the constraints of type 1 (individuals in a species group) or type 2 (must be clades in the major tree). Warning: constraints of type 3 are not implemented.

source
PhyloNetworks.defaultsubstitutionmodelFunction
defaultsubstitutionmodel(network, modsymbol::Symbol, data::DataFrame,
              siteweights::Vector)

Return a statistical substitution model (SSM) with appropriate state labels and a rate appropriate for the branch lengths in net (see startingrate). The data frame must have the actual trait/site data in columns 2 and up, as when the species names are in column 1. For DNA data, the relative rate model is returned, with a stationary distribution equal to the empirical frequencies.

source
PhyloNetworks.deleteEdge!Method
deleteEdge!(net::HybridNetwork,  e::Edge; part=true)
deleteEdge!(net::QuartetNetwork, e::Edge)

Delete edge e from net.edge and update net.numEdges. If part is true, update the network's partition field.

source
PhyloNetworks.deleteLeaf!Method

deleteLeaf!(net::HybridNetwork, leaf::AbstractString) deleteLeaf!(net::Network, leaf::Node)

Deletes the leaf taxon from the network. The leaf argument is the name of the taxon to delete.

Warnings:

  • requires a level-1 network with up-to-date attributes for snaq! (e.g. non-missing branch lengths, gammaz, etc.)
  • does not care where the root is and does not update it to a sensible location if the root is affected by the leaf removal.
  • does not merge edges, i.e. does not remove all nodes of degree 2. Within snaq!, this is used to extract quartets and to keep track of which edge lengths in the original network map to the quartet network.
source
PhyloNetworks.deleteNode!Method
deleteNode!(net::HybridNetwork, n::Node)
deleteNode!(net::QuartetNetwork, n::Node)

Delete node n from a network, i.e. removes it from net.node, and from net.hybrid or net.leaf as appropriate. Update attributes numNodes, numTaxa, numHybrids. Warning: net.names is not updated, and this is a feature (not a bug) for networks of type QuartetNetwork.

Warning: if the root is deleted, the new root is arbitrarily set to the first node in the list. This is intentional to save time because this function is used frequently in snaq!, which handles semi-directed (unrooted) networks.

source
PhyloNetworks.deletehybridedge!Function
deletehybridedge!(net::HybridNetwork, edge::Edge,
                  nofuse=false, unroot=false,
                  multgammas=false, simplify=true, keeporiginalroot=false)

Delete a hybrid edge from net and return the network. The network does not have to be of level 1 and may contain polytomies, although each hybrid node must have exactly 2 parents. Branch lengths are updated, allowing for missing values.

If nofuse is false, when edge is removed, its child (hybrid) node is removed and its partner hybrid edge is removed. Its child edge is retained (below the hybrid node), fused with the former partner, with new length: old length + length of edge's old partner. Any 2-cycle is simplified into a single edge, unless simplify is false.

If nofuse is true, edges with descendant leaves are kept as is, and are not fused. Nodes are retained during edge removal, provided that they have at least one descendant leaf. The hybrid edge that is partner to edge becomes a tree edge, but has its γ value unchanged (it is not set to 1), since it is not merged with its child edge after removal of the reticulation. Also, 2-cycles are not simplified if nofuse is true. That is, if we get 2 hybrid edges both from the same parent to the same child, these hybrid edges are retained without being fused into a single tree edge.

If unroot is false and if the root is up for deletion during the process, it will be kept if it's of degree 2 or more. A root node of degree 1 will be deleted unless keeporiginalroot is true.

If multgammas is true: inheritance weights are kept by multiplying together the inheritance γ's of edges that are merged. For example, if there is a hybrid ladder, the partner hybrid edge remains a hybrid edge (with a new partner), and its γ is the product of the two hybrid edges that have been fused. So it won't add up to 1 with its new partner's γ.

If keeporiginalroot is true, a root of degree one will not be deleted.

Warnings:

  • containRoot is updated, but this requires correct isChild1 fields
  • if the parent of edge is the root and if nofuse is false, the root is moved to keep the network unrooted with a root of degree two.
  • does not update attributes needed for snaq! (like inCycle, edge.z, edge.y etc.)
source
PhyloNetworks.deletehybridedgeLiNC!Method
deletehybridedgeLiNC!(obj::SSM, currLik::Float64,
    no3cycle::Bool,
    constraints::Vector{TopologyConstraint},
    γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)

Deletes a random hybrid edge and updates SSM object as part of PhyLiNC optimization. Return true if the move is accepted, false if not.

Note: if net is tree-child and a hybrid edge is deleted, then the resulting network is still tree-child. But: if net has no hybrid ladders, deleting an existing reticulation may create a hybrid ladder. It happens if there is a reticulation above a W structure (tree node whose children are both hybrid nodes). This creates a problem if the user asked for nohybridladder: this request may not be met. fixit: In future, we could check for this case and prevent it.

For a description of arguments, see phyLiNC. Called by optimizestructure!, which does some checks.

source
PhyloNetworks.descendantsFunction
descendants(edge::Edge, internal::Bool=false)

Return the node numbers of the descendants of a given edge: all descendant nodes if internal is true (internal nodes and tips), or descendant tips only otherwise (defaults).

edge should belong in a rooted network for which isChild1 is up-to-date. Run directEdges! beforehand. This is very important, otherwise one might enter an infinite loop, and the function does not test for this.

Examples

julia> net5 = "(A,((B,#H1),(((C,(E)#H2),(#H2,F)),(D)#H1)));" |> readTopology |> directEdges! ;

julia> PhyloNetworks.descendants(net5.edge[12], true) # descendants of 12th edge: all of them
7-element Vector{Int64}:
 -6
 -7
  4
  6
  5
 -9
  7

julia> PhyloNetworks.descendants(net5.edge[12]) # descendant leaves only
3-element Vector{Int64}:
 4
 5
 7
source
PhyloNetworks.directionalconflictMethod
directionalconflict(parent::Node, edge::Edge, hybridpartnernew::Bool)

Check if creating a hybrid edge down of parent node into the middle of edge would create a directed cycle in net, i.e. not a DAG. The proposed hybrid would go in the direction of edge down its child node if hybridpartnernew is true. Otherwise, both halves of edge would have their direction reversed, for the hybrid to go towards the original parent node of edge. Does not modify the network.

Output: true if a conflict would arise (non-DAG), false if no conflict.

source
PhyloNetworks.discrete_backwardlikelihood_trait!Method
discrete_backwardlikelihood_trait!(obj::SSM, tree::Integer, ri::Integer)

Update and return the backward likelihood (last argument backwardlik) assuming rate category ri and tree index tree, using current forward and backwards likelihoods in obj: these depend on the trait (or site) given to the last call to discrete_corelikelihood_trait!. Used by ancestralStateReconstruction.

warning: assume correct transition probabilities.

source
PhyloNetworks.discrete_corelikelihood!Method
discrete_corelikelihood!(obj::StatisticalSubstitutionModel;
                         whichtrait::AbstractVector{Int} = 1:obj.nsites)

Calculate the likelihood and update obj.loglik for discrete characters on a network, calling discrete_corelikelihood_trait!. The algorithm extracts all displayed trees and weighs the likelihood under all these trees. The object's partial likelihoods are updated:

  • forward and direct partial likelihoods are re-used, one trait at a time,
  • overall likelihoods on each displayed tree, given each rate category and for each given site/trait: are cached in _loglikcache.
source
PhyloNetworks.discrete_corelikelihood_trait!Method
discrete_corelikelihood_trait!(obj::SSM, t::Integer, ci::Integer, ri::Integer)

Return the likelihood for tree t, trait (character/site) index ci and rate category ri. Update & modify the forward & directional log-likelihoods obj.forwardlik and obj.directlik, which are indexed by [state, nodenumber or edgenumber]. Used by discrete_corelikelihood!.

Preconditions: obj.logtrans updated, edges directed, nodes/edges preordered

source
PhyloNetworks.displayedNetworks!Function
displayedNetworks!(net::HybridNetwork, node::Node, keepNode=false,
                   unroot=false, multgammas=false, keeporiginalroot=false)

Extracts the two networks that simplify a given network at a given hybrid node: deleting either one or the other parent hybrid edge. If nofuse is true, the original edges (and nodes) are kept in both networks, provided that they have one or more descendant leaves. If unroot is true, the root will be deleted if it becomes of degree 2. If keeporiginalroot is true, the root is retained even if it is of degree 1.

  • the original network is modified: the minor edge removed.
  • returns one HybridNetwork object: the network with the major edge removed
source
PhyloNetworks.edgerelationMethod
edgerelation(e::Edge, node::Node, origin::Edge)

Return a symbol:

  • :origin if e is equal to origin, and otherwise:
  • :parent if e is a parent of node,
  • :child if e is a child of node

using the isChild1 attribute of edges. Useful when e iterates over all edges adjacent to node and when origin is one of the edges adjacent to node, to known the order in which these edges come.

example:

labs = [edgerelation(e, u, uv) for e in u.edge] # assuming u is a node of edge uv
parentindex = findfirst(isequal(:parent), labs) # could be 'nothing' if no parent
childindices = findall( isequal(:child), labs)  # vector. could be empty
source
PhyloNetworks.fliphybrid!Function
fliphybrid!(net::HybridNetwork, minor=true::Bool, nohybridladder=false::Bool,
            constraints=TopologyConstraint[]::Vector{TopologyConstraint})

Cycle through hybrid nodes in random order until an admissible flip is found. At this hybrid node, flip the indicated hybrid parent edge (minor or major).

If an admissible flip is found, return the tuple: newhybridnode, flippededge, oldchildedge. Otherwise, return nothing.

The flip can be undone with fliphybrid!(net, newhybridnode, minor, constraints), or fliphybrid!(net, newhybridnode, !flippededge.isMajor, constraints) more generally, such as if the flipped edge had its γ modified after the original flip.

Warnings

  • if the root needed to be reset and if the original root was of degree 2, then a node of degree 2 remains in the modified network.
  • undoing the flip may not recover the original root in case the root position was modified during the original flip.
source
PhyloNetworks.fliphybrid!Function
fliphybrid!(net::HybridNetwork, hybridnode::Node, minor=true::Bool,
            nohybridladder=false::Bool,
            constraints=TopologyConstraint[]::Vector{TopologyConstraint})

Flip the direction of a single hybrid edge: the minor parent edge of hybridnode by default, or the major parent edge if minor is false. The parent node of the hybrid edge becomes the new hybrid node. The former hybrid edge partner is converted to a tree edge (with γ=1), and hybridnode becomes a tree node.

For the flip to be admissible, the new network must be a semi-directed phylogenetic network: with a root such that the rooted version is a DAG. If nohybridladder is false (default), the flip may create a hybrid ladder If nohybridladder is true and if the flip would create a hybrid ladder, then the flip is not admissible. A hybrid ladder is when a hybrid child of another hybrid.

The new hybrid partner is an edge adjacent to the new hybrid node, such that the flip is admissible (so it must be a tree edge). The flipped edge retains its original γ. The new hybrid edge is assigned inheritance 1-γ.

Output: (newhybridnode, flippededge, oldchildedge) if the flip is admissible, nothing otherwise.

The network is unchanged if the flip is not admissible. If the flip is admissible, the root position may be modified, and the direction of tree edges (via isChild1) is modified accordingly. If the root needs to be modified, then the new root is set to the old hybrid node.

The index of the new hybrid node in net.hybrid is equal to that of the old hybridnode.

Warning: Undoing this move may not recover the original root if the root position was modified.

source
PhyloNetworks.fliphybridedgeLiNC!Method
fliphybridedgeLiNC!(obj::SSM, currLik::Float64, nohybridladder::Bool,
                constraints::Vector{TopologyConstraint}, ftolAbs::Float64,
                γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)

Randomly chooses a minor hybrid edge and tries to flip its direction (that is, reverse the direction of gene flow) using fliphybrid!. If the flip fails, it looks for the next minor hybrid edge. If all minor edges fail, tries to flip major edges in random order.

After a successful flip, optimize branch lengths and gammas, then compare the likelihood of the previous network with the new one.

Return:

  • true if a flip hybrid move was completed and improved the likelihood
  • false if a move was completed but did not improve the likelihoood
  • nothing if no hybrid flip move was admissible in this network.

Warning: Undoing this move may not recover the original root if the root position was modified.

For arguments, see phyLiNC. Called by optimizestructure!.

source
PhyloNetworks.fuseedgesat!Function
fuseedgesat!(i::Integer,net::HybridNetwork, multgammas=false::Bool)

Removes ith node in net.node, if it is of degree 2. The parent and child edges of this node are fused. If either of the edges is hybrid, the hybrid edge is retained. Otherwise, the edge with the lower edge number is retained.

Reverts the action of breakedge!.

returns the fused edge.

source
PhyloNetworks.gammaZero!Method

gammaZero road map

Function that tries to fix a gamma zero problem (h==0,1; t==0; hz==0,1)

  1. First tries to do changeDirection
  2. If not successful from start, we call moveHybrid
  3. If successful move (change direction), we call optBL and check if we fixed the problem
  4. If problem fixed and we do not have worse pseudolik, we return success=true
  5. If still problem or worse pseudolik, we call moveHybrid

** Important: ** Any function (afterOptBL) calling gammaZero is assuming that it only made a change, so if the returned value is true, then a change was made, and the other function needs to run optBL and check that all parameters are 'valid'. If the returned value is false, then no change was possible and we need to remove a hybridization if the problem is h==0,1; hz==0,1. If the problem is t==0, we ignore this problem.

source
PhyloNetworks.getDataValue!Method
getdataValue!(s::IO, int, numLeft::Array{Int,1})

Helper function for parseEdgeData!. Read a single floating point edge data value in a tree topology. Ignore (and skip) nexus-style comments before & after the value (see readnexuscomment).

Return -1.0 if no value exists before the next colon, return the value as a float otherwise. Modifies s by advancing past the next colon character. Only call this function to read a value when you know a numerical value exists!

source
PhyloNetworks.getGammasMethod
getGammas(net)

Vector of inheritance γ's of all major edges (tree edges and major hybrid edges), ordered according to the pre-order index of their child node, assuming this pre-order is already calculated (with up-to-date field nodes_changed). Here, a "major" edge is an edge with field isMajor set to true, regardless of its actual γ (below, at or above 0.5).

See setGammas!

source
PhyloNetworks.getHeightsFunction
getHeights(net, checkpreorder::Bool=true)

Return the height (distance to the root) of all nodes, assuming a time-consistent network (where all paths from the root to a given hybrid node have the same length) but not necessarily ultrametric: tips need not all be at the same distance from the root. If checkpreorder=false, assumes the network has already been preordered with preorder!, because it uses getGammas and setGammas!).

Output: vector of node heights, one per node, in the same order as in net.nodes_changed. Examples:

julia> net = readTopology("(((C:1,(A:1)#H1:1.5::0.7):1,(#H1:0.3::0.3,E:2.0):2.2):1.0,O:5.2);");

julia> # using PhyloPlots; plot(net, useedgelength=true, showedgelength=true, shownodenumber=true); # to see

julia> nodeheight = PhyloNetworks.getHeights(net)
9-element Vector{Float64}:
 0.0
 5.2
 1.0
 3.2
 5.2
 2.0
 3.5
 4.5
 3.0

julia> [node.number => (nodeheight[i], node.name) for (i,node) in enumerate(net.nodes_changed)]
9-element Vector{Pair{Int64, Tuple{Float64, String}}}:
 -2 => (0.0, "")
  5 => (5.2, "O")
 -3 => (1.0, "")
 -6 => (3.2, "")
  4 => (5.2, "E")
 -4 => (2.0, "")
  3 => (3.5, "H1")
  2 => (4.5, "A")
  1 => (3.0, "C")
source
PhyloNetworks.getTipSubmatrixMethod
getTipSubmatrix(M, net; indexation=:both)

Extract submatrix of M, with rows and/or columns corresponding to tips in the network, ordered like in net.leaf. In M, rows and/or columns are assumed ordered as in net.nodes_changed.

indexation: one of :rows, :cols or :both: are nodes numbers indexed in the matrix by rows, by columns, or both? Subsetting is taken accordingly.

source
PhyloNetworks.getparametersMethod
getparameters(obj::RateVariationAcrossSites)

Return a copy of the alpha and/or pinv parameters of model obj, in a single vector.

source
PhyloNetworks.hardwiredClusterDistance_unrootedMethod
hardwiredClusterDistance_unrooted(net1::HybridNetwork, net2::HybridNetwork)

Miminum hardwired cluster dissimilarity between the two networks, considered as unrooted (or semi-directed). This dissimilarity is defined as the minimum rooted distance, over all root positions that are compatible with the direction of hybrid edges. Called by hardwiredClusterDistance.

To avoid repeating identical clusters, all degree-2 nodes are deleted before starting the comparison. Since rooting the network at a leaf creates a root node of degree 2 and an extra cluster, leaves are excluded from possible rooting positions.

source
PhyloNetworks.hashybridladderMethod
hashybridladder(net::HybridNetwork)

Return true if net contains a hybrid ladder: where a hybrid node's child is itself a hybrid node. This makes the network not treechild, assuming it is fully resolved. (One of the nodes does not have any tree-node child).

source
PhyloNetworks.hybrid3cycleMethod
hybrid3cycle(edge1::Edge, edge2::Edge)

Check if proposed hybrid edge from edge1 into edge2 would create a 3 cycle, that is, if edge1 and edge2 have a node in common. (This move cannot create a 2-cycles because new nodes would be created in the middle of edges 1 and 2.)

source
PhyloNetworks.hybridEdgesMethod
hybridEdges(node::Node, e::Edge)

Return the 2 edges connected to node other than e, in the same order as node.edge, except that e absent from the list.

Despite what the name suggest, node need not be a hybrid node! node is assumed to have 3 edges, though.

source
PhyloNetworks.hybridEdgesMethod
hybridEdges(node::Node)

Return the 3 edges attached to node in a specific order [e1,e2,e3]. Warning: assume a level-1 network with node field hasHybEdge and edge field inCycle up-to-date.

If node is a hybrid node:

  • e1 is the major hybrid parent edge of node
  • e2 is the minor hybrid parent edge
  • e3 is the tree edge, child of node.

If node is a tree node parent of one child edge:

  • e1 is the hybrid edge, child of node
  • e2 is the tree edge that belongs to the cycle created by e1
  • e3 is the other tree edge attached to node (not in a cycle)

Otherwise:

  • e3 is an external edge from node to a leaf, if one exists.
source
PhyloNetworks.hybridatnodeMethod
hybridatnode(net::HybridNetwork, nodeNumber::Integer)

Move the hybrid node in a cycle to make node number nodeNumber a hybrid node Compared to [hybridatnode!], this method checks that net is of level 1 (required) and does not modify it.

source
PhyloNetworks.inheritanceWeightMethod
inheritanceWeight(tree::HybridNetwork)

Return the log inheritance weight of a network or tree (as provided by displayedTrees with nofuse = true for instance). For a tree displayed in a network, its inheritance weight is the log of the product of γ's of all edges retained in the tree. To avoid underflow, the log is calculated: i.e. sum of log(γ) across retained edges.

If any edge has a negative γ, it is assumed to mean that its γ is missing, and the function returns missing.

Example

julia> net = readTopology("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");

julia> trees = displayedTrees(net,0.0; nofuse=true);

julia> PhyloNetworks.inheritanceWeight.(trees)
2-element Vector{Float64}:
 -0.105361
 -2.30259 
source
PhyloNetworks.initializeWeightsFromLeaves!Method
initializeWeightsFromLeaves!(w, net, tips, stateset, criterion)

Modify weight in w: to Inf for w[n, i] if the "tips" data has a state different from the lineage state of index i at node number n. Assumes that w was initialized to 0 for the leaves.

criterion: should be one of :softwired, :parental or :hardwired.

  • softwired parsimony: lineage states are in this order: ∅,{1},{2},{3},...,{nstates}
source
PhyloNetworks.isconnectedMethod
isconnected(node1::Node, node2::Node)

Check if two nodes are connected by an edge. Return true if connected, false if not connected.

source
PhyloNetworks.isdescendant_undirectedMethod
isdescendant_undirected(des:Node, ancestor::Node, parentedge)

Return true if des is a strict descendant of ancestor when starting from edge parentedge and going towards ancestor onward, regardless of the field isChild1 of tree edges; false otherwise.

This is useful to know how descendant relationships would change as a result of reverting the direction of a tree edge, without actually modifying the direction (isChild1) of any edge.

parentedge should be connected to ancestor (not checked). The direction of hybrid edges is respected (via isChild1), that is, the traversal does not go from the child to the parent of a hybrid edge.

source
PhyloNetworks.ladderpartitionMethod
ladderpartition(tree::HybridNetwork)

For each node in tree, calculate the clade below each child edge of the node, and each clade moving up the "ladder" from the node to the root. The output is a tuple of 2 vectors (node) of vector (clade) of vectors (taxon in clade): below,above. More specifically, for node number n, below[n] is generally of vector of 2 clades: one for the left child and one for the right child of the node (unless the node is of degree 2 or is a polytomy). above[n] contains the grade of clades above node number n.

WARNING: assumes that

  1. node numbers and edge numbers can be used as indices, that is, be all distinct, positive, covering exactly 1:#nodes and 1:#edges.
  2. edges are corrected directed (isChild1 is up-to-date) and nodes have been pre-ordered already (field nodes_changed up-to-date).

examples

julia> tree = readTopology("(O,A,((B1,B2),(E,(C,D))));");

julia> PhyloNetworks.resetNodeNumbers!(tree; checkPreorder=true, type=:postorder)

julia> printNodes(tree)
node leaf  hybrid hasHybEdge name inCycle edges'numbers
1    true  false  false      O    -1      1   
2    true  false  false      A    -1      2   
3    true  false  false      B1   -1      3   
4    true  false  false      B2   -1      4   
8    false false  false           -1      3    4    5   
5    true  false  false      E    -1      6   
6    true  false  false      C    -1      7   
7    true  false  false      D    -1      8   
9    false false  false           -1      7    8    9   
10   false false  false           -1      6    9    10  
11   false false  false           -1      5    10   11  
12   false false  false           -1      1    2    11  

julia> below, above = PhyloNetworks.ladderpartition(tree);

julia> below
12-element Vector{Vector{Vector{Int64}}}:
 [[1]]                      
 [[2]]                      
 [[3]]                      
 [[4]]                      
 [[5]]                      
 [[6]]                      
 [[7]]                      
 [[3], [4]]                 
 [[6], [7]]                 
 [[5], [6, 7]]              
 [[3, 4], [5, 6, 7]]        
 [[1], [2], [3, 4, 5, 6, 7]]

julia> for n in 8:12
         println("clades below node ", n, ": ", join(below[n], " "))
       end
clades below node 8: [3] [4]
clades below node 9: [6] [7]
clades below node 10: [5] [6, 7]
clades below node 11: [3, 4] [5, 6, 7]
clades below node 12: [1] [2] [3, 4, 5, 6, 7]

julia> above[8:12] # clades sister to and above nodes 8 through 12:
5-element Vector{Vector{Vector{Int64}}}:
 [[5, 6, 7], [1], [2]]
 [[5], [3, 4], [1], [2]]
 [[3, 4], [1], [2]]     
 [[1], [2]]             
 []                     
source
PhyloNetworks.lambda!Method
lambda!(m::PhyloNetworkLinearModel, newlambda)
lambda!(m::ContinuousTraitEM, newlambda)

Assign a new value to the lambda parameter.

source
PhyloNetworks.lambdaMethod
lambda(m::PhyloNetworkLinearModel)
lambda(m::ContinuousTraitEM)

Value assigned to the lambda parameter, if appropriate.

source
PhyloNetworks.learnlabelsMethod
learnlabels(model::Symbol, dat::DataFrame)

Return unique non-missing values in dat, and check that these labels can be used to construct of substitution model of type model.

examples:

julia> using DataFrames

julia> dat = DataFrame(trait1 = ["A", "C", "A", missing]); # 4×1 DataFrame

julia> PhyloNetworks.learnlabels(:BTSM, dat)
2-element Vector{String}:
 "A"
 "C"

julia> PhyloNetworks.learnlabels(:JC69, dat)
2-element Vector{String}:
 "A"
 "C"
source
PhyloNetworks.leaststableancestorFunction
leaststableancestor(net, preorder=true::Bool)

Return (lsa, lsa_index) where lsa is the least stable ancestor node (LSA) in net, and lsa_index is the index of lsa in net.nodes_changed. The LSA the lowest node n with the following property: any path between any leaf and the root must go through n. All such nodes with this property are ancestral to the LSA (and therefore must have an index that is lower or equal to lsa_index).

Exception: if the network has a single leaf, the output lsa is the leaf's parent node, to maintain one external edge between the root and the leaf.

Warning: uses biconnectedComponents and biconnectedcomponent_exitnodes, therefore share the same caveats regarding the use of fields .inCycle (for edges and nodes), .k (for nodes) etc. As a positivie side effect, the biconnected components can be recovered via the edges' .inCycle field –including the trivial blobs (cut edges).

See also: deleteaboveLSA!

source
PhyloNetworks.majoredgelengthMethod
majoredgelength(net::HybridNetwork)

Generate vector of edge lengths of major net edges organized in the same order as the edge matrix created via majoredgematrix. Considers values of -1.0 as missing values, recognized as NA in R. Output: vector allowing for missing values.

Assume nodes_changed was updated, to list nodes in pre-order.

Examples

julia> net = readTopology("(((A:3.1,(B:0.2)#H1:0.3::0.9),(C,#H1:0.3::0.1):1.1),D:0.7);");

julia> directEdges!(net); preorder!(net);

julia> PhyloNetworks.majoredgelength(net)
8-element Vector{Union{Missing, Float64}}:
  missing
 0.7     
  missing
 1.1     
  missing
 3.1     
 0.3     
 0.2     
source
PhyloNetworks.majoredgematrixMethod
majoredgematrix(net::HybridNetwork)

Matrix of major edges from net where edge[i,1] is the number of the parent node of edge i and edge[i,2] is the number of the child node of edge i. Assume nodes_changed was updated, to list nodes in pre-order.

Examples

julia> net = readTopology("(A,(B,(C,D)));");

julia> PhyloNetworks.resetNodeNumbers!(net);

julia> PhyloNetworks.majoredgematrix(net)
6×2 Matrix{Int64}:
 5  1
 5  6
 6  2
 6  7
 7  3
 7  4
source
PhyloNetworks.makemissing!Method
makemissing!(x::AbstractVector)

Turn to missing any element of x exactly equal to -1.0. Used for branch lengths and γs. x needs to accept missing values. If not, this can be done with allowmissing(x).

source
PhyloNetworks.mapAllelesCFtable!Method
mapAllelesCFtable!(quartet CF DataFrame, mapping DataFrame, columns, write?, filename)

Modify (and return) the quartet concordance factor (CF) DataFrame: replace each allele name by the species name that the allele maps onto based on the mapping data frame. This mapping data frame should have columns named "allele" and "species" (see rename! to change column names if need be).

If write? is true, the modified data frame is written to a file named "filename".

Warning: mapAllelesCFtable takes the quartet data file as its second argument, while mapAllelesCFtable! takes the quartet data (which it modifies) as its first argument.

source
PhyloNetworks.mapindividualsMethod
mapindividuals(net::HybridNetwork, mappingFile::String)

Return a network expanded from net, where species listed in the mapping file are replaced by individuals mapped to that species. If a species has only 1 individual, the name of the leaf for that species is replaced by the name of its one individual representative. If a species has 2 or more individuals, the leaf for that species is expanded into a "star" (polytomy if 3 or more individuals) with a tip for each individual. If a species is in the network but not listed in the mapping file, the tip for that species is left as is. Species listed in the mapping file but not present in the network are ignored.

The mapping file should be readable by CSV.File and contain two columns: one for the species names and one for the individual (or allele) names. fixit: make this function more flexible by accepting column names

Output: individual-level network and vector of species constraint(s).

examples

julia> species_net = readTopology("(((S8,S9),((((S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));");

julia> filename = joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "mappingIndividuals.csv");

julia> filename |> read |> String |> print # to see what the mapping file contains
species,individual
S1,S1A
S1,S1B
S1,S1C

julia> individual_net, species_constraints = PhyloNetworks.mapindividuals(species_net, filename);

julia> writeTopology(individual_net, internallabel=true)
"(((S8,S9),(((((S1A,S1B,S1C)S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));"

julia> species_constraints
1-element Vector{PhyloNetworks.TopologyConstraint}:
 Species constraint, on tips: S1A, S1B, S1C
 stem edge number 4
 crown node number 3
source
PhyloNetworks.maxParsimonyNetRun1Function

Road map for various functions behind maxParsimonyNet

maxParsimonyNet
maxParsimonyNetRun1
maxParsimonyNetRun1!

All return their optimized network. Only maxParsimonyNet returns a rooted network (though all functions guarantee that the returned networks agree with the outgroup).

  • maxParsimonyNet calls maxParsimonyNetRun1 per run, after a read(write(.)) of the starting network (to ensure level-1 and semi-directedness).
  • maxParsimonyNetRun1 will make a copy of the topology, and will call findStartingTopology! to modify the topology according to random NNI/move origin/move target moves. It then calls maxParsimonyNetRun1! on the modified network
  • maxParsimonyNetRun1! proposes new network with various moves (same moves as snaq), and stops when it finds the most parsimonious network, using parsimonyGF.

None of these functions allow for multiple alleles yet.

Note that the search algorithm keeps two HybridNetworks at a time: currT (current topology) and newT (proposed topology). Both are kept unrooted (semi-directed), otherwise the moves in proposedTop! function fail. We only root the topologies to calculate the parsimony, so we create a rooted copy (currTr, newTr) to compute parsimony score in this copied topology. We do not root and calculate parsimony score in the original HybridNetworks objects (currT,newT) because the computation of the parsimony score overwrites the inCycle attribute of the Nodes, which messes with the search moves.

Extensions:

  • other criteria: hardwired, parental (only softwired implemented now)
  • remove level-1 restriction: this will involve changing the proposedTop! function to use rSPR or rNNI moves (instead of the level-1 moves coded for snaq!). We need:
    • functions for rSPR and rNNI moves
    • create new proposedTop! function (proposedRootedTop?) to choose from the rSPR/rNNI/other moves
    • have the search in maxParsimonuNetRun1! to have rooted currT and rooted newT, instead of keeping semi-directed objects (currT, newT), only to root for the parsimony score (currTr, newTr)
  • outgroup is currently String or Node number, but it would be good if it allowed Edge number as an option too. Not sure best way to distinguish between Node number and Edge number, which is why left as Node number for now.
source
PhyloNetworks.maxParsimonyNetRun1!Function

Road map for various functions behind maxParsimonyNet

maxParsimonyNet
maxParsimonyNetRun1
maxParsimonyNetRun1!

All return their optimized network. Only maxParsimonyNet returns a rooted network (though all functions guarantee that the returned networks agree with the outgroup).

  • maxParsimonyNet calls maxParsimonyNetRun1 per run, after a read(write(.)) of the starting network (to ensure level-1 and semi-directedness).
  • maxParsimonyNetRun1 will make a copy of the topology, and will call findStartingTopology! to modify the topology according to random NNI/move origin/move target moves. It then calls maxParsimonyNetRun1! on the modified network
  • maxParsimonyNetRun1! proposes new network with various moves (same moves as snaq), and stops when it finds the most parsimonious network, using parsimonyGF.

None of these functions allow for multiple alleles yet.

Note that the search algorithm keeps two HybridNetworks at a time: currT (current topology) and newT (proposed topology). Both are kept unrooted (semi-directed), otherwise the moves in proposedTop! function fail. We only root the topologies to calculate the parsimony, so we create a rooted copy (currTr, newTr) to compute parsimony score in this copied topology. We do not root and calculate parsimony score in the original HybridNetworks objects (currT,newT) because the computation of the parsimony score overwrites the inCycle attribute of the Nodes, which messes with the search moves.

Extensions:

  • other criteria: hardwired, parental (only softwired implemented now)
  • remove level-1 restriction: this will involve changing the proposedTop! function to use rSPR or rNNI moves (instead of the level-1 moves coded for snaq!). We need:
    • functions for rSPR and rNNI moves
    • create new proposedTop! function (proposedRootedTop?) to choose from the rSPR/rNNI/other moves
    • have the search in maxParsimonuNetRun1! to have rooted currT and rooted newT, instead of keeping semi-directed objects (currT, newT), only to root for the parsimony score (currTr, newTr)
  • outgroup is currently String or Node number, but it would be good if it allowed Edge number as an option too. Not sure best way to distinguish between Node number and Edge number, which is why left as Node number for now.
source
PhyloNetworks.minorreticulationgammaMethod
minorreticulationgamma(net::HybridNetwork)

Vector of minor edge gammas (inheritance probabilities) organized in the same order as in the matrix created via minorreticulationmatrix. Considers values of -1.0 as missing values, recognized as NA in R. Output: vector allowing for missing values.

Examples

julia> net = readTopology("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");

julia> PhyloNetworks.minorreticulationgamma(net)
1-element Vector{Union{Float64, Missings.Missing}}:
 0.1
source
PhyloNetworks.minorreticulationlengthMethod
minorreticulationlength(net::HybridNetwork)

Vector of lengths for the minor hybrid edges, organized in the same order as in the matrix created via minorreticulationmatrix. Replace values of -1.0 with missing values recognized by R. Output: vector allowing for missing values.

Examples

julia> net = readTopology("(((A:3.1,(B:0.2)#H1:0.4::0.9),(C,#H1:0.3::0.1):1.1),D:0.7);");

julia> PhyloNetworks.minorreticulationlength(net)
1-element Vector{Union{Missing, Float64}}:
 0.3
source
PhyloNetworks.minorreticulationmatrixMethod
minorreticulationmatrix(net::HybridNetwork)

Matrix of integers, representing the minor hybrid edges in net. edge[i,1] is the number of the parent node of the ith minor hybrid edge, and edge[i,2] is the number of its child node. Node numbers may be negative, unless they were modified by resetNodeNumbers!. Assumes correct isChild1 fields.

Examples

julia> net = readTopology("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");
julia> PhyloNetworks.minorreticulationmatrix(net)
1×2 Matrix{Int64}:
 -6  3
source
PhyloNetworks.moveHybrid!Method

moveHybrid road map

Function that tries to fix a gamma zero problem (h==0,1; t==0; hz==0,1) after changing direction of hybrid edge failed. This function is called in gammaZero.

Arguments:

  • closeN=true will try move origin/target on all neighbors (first choose minor/major edge at random, then make list of all neighbor edges and tries to put the hybrid node in all the neighbors until successful move); if false, will delete and add hybrid until successful move up to N times (this is never tested)

Returns true if change was successful (not testing optBL again), and false if we could not move anything

source
PhyloNetworks.moveroot!Function
moveroot!(net::HybridNetwork, constraints=TopologyConstraint[]::Vector{TopologyConstraint})

Move the root to a randomly chosen non-leaf node that is different from the current root, and not within a constraint clade or species. Output: true if successul, nothing otherwise.

source
PhyloNetworks.nameinternalnodes!Method
nameinternalnodes!(net::HybridNetwork, prefix)

Add names to nodes in net that don't already have a name. Leaves already have names; but if not, they will be given names as well. New node names will be of the form "prefixI" where I is an integer.

examples

julia> net = readTopology("((a:1,(b:1)#H1:1::0.8):5,(#H1:0::0.2,c:1):1);");

julia> PhyloNetworks.nameinternalnodes!(net, "I") # by default, shown without internal node names
HybridNetwork, Rooted Network
7 edges
7 nodes: 3 tips, 1 hybrid nodes, 3 internal tree nodes.
tip labels: a, b, c
((a:1.0,(b:1.0)#H1:1.0::0.8)I1:5.0,(#H1:0.0::0.2,c:1.0)I2:1.0)I3;

julia> writeTopology(net; internallabel=false) # by default, writeTopology shows internal names if they exist
"((a:1.0,(b:1.0)#H1:1.0::0.8):5.0,(#H1:0.0::0.2,c:1.0):1.0);"

julia> net = readTopology("((int5:1,(b:1)#H1:1::0.8):5,(#H1:0::0.2,c:1):1);"); # one taxon name starts with "int"

julia> PhyloNetworks.nameinternalnodes!(net, "int");

julia> writeTopology(net)
"((int5:1.0,(b:1.0)#H1:1.0::0.8)int6:5.0,(#H1:0.0::0.2,c:1.0)int7:1.0)int8;"
source
PhyloNetworks.nchoose1234Method
nchoose1234(nmax)

nmax+1 x 4 matrix containing the binomial coefficient "n choose k" in row n+1 and column k. In other words, M[i,k] gives "i-1 choose k". It is useful to store these values and look them up to rank (a large number of) 4-taxon sets: see quartetrank.

source
PhyloNetworks.nj!Function
nj!(D::Matrix{Float64}, names::AbstractVector{<:AbstractString}=String[];
    force_nonnegative_edges::Bool=false)

Construct a phylogenetic tree from the input distance matrix and vector of names (as strings), using the Neighbour-Joinging algorithm (Satou & Nei 1987). The order of the names argument should match that of the row (and column) of the matrix D. With force_nonnegative_edges being true, any negative edge length is changed to 0.0 (with a message).

Warning: D is modified.

source
PhyloNetworks.nni_LiNC!Method
nni_LiNC!(obj::SSM, no3cycle::Bool, nohybridladder::Bool,
          constraints::Vector{TopologyConstraint},
          ftolAbs::Float64,
          γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)

Loop over possible edges for a nearest-neighbor interchange move until one is found. Performs move and compares the original and modified likelihoods. If the modified likelihood is greater than the original by likAbs, the move is accepted.

Return true if move accepted, false if move rejected. Return nothing if there are no nni moves possible in the network.

For arguments, see phyLiNC.

Called by optimizestructure!, which is called by phyLiNC!.

Note: an RR move does not change the best likelihood. RR means that there's a hybrid ladder, so it looks like a hard polytomy at the reticulation after unzipping. Theoretically, we could avoid the re-optimizing the likelihood accept the move: just change inheritance values to get same likelihood, and update the tree priors. Not done.

source
PhyloNetworks.nnimaxMethod
nnimax(e::Edge)

Return the number of NNI moves around edge e, assuming that the network is semi-directed. Return 0 if e is not internal, and more generally if either node attached to e does not have 3 edges.

Output: UInt8 (e.g. 0x02)

source
PhyloNetworks.norootbelow!Method
norootbelow!(e::Edge)

Set containRoot to false for edge e and all edges below, recursively. The traversal stops if e.containRoot is already false, assuming that containRoot is already false all the way below down that edge.

source
PhyloNetworks.optBL!Method

optBL road map

Function that optimizes the numerical parameters (branch lengths and inheritance probabilities) for a given network. This function is called multiple times inside optTopLevel!.

  • Input: network net, data d
  • Numerical tolerances: ftolAbs, ftolRel, xtolAbs, xtolRel
  • Function based on MixedModels fit function
  • The function assumes net has all the right attributes, and cannot check this inside because it would be inefficient

Procedure:

  • ht = parameters!(net) extracts the vector of parameters to estimate (h,t,gammaz), and sets as net.ht; identifies a bad diamond I, sets net.numht (vector of hybrid node numbers for h, edge numbers for t, hybrid node numbers for gammaz), and net.index to keep track of the vector of parameters to estimate
  • extractQuartet!(net,d) does the following for all quartets in d.quartet:
    • Extract quartet by deleting all leaves not in q -> create QuartetNetwork object saved in q.qnet
    • This network is ugly and does not have edges collapsed. This is done to keep a one-to-one correspondence between the edges in q.qnet and the edges in net (if we remove nodes with only two edges, we will lose this correspondence)
    • Calculate expected CF with calculateExpCFAll for a copy of q.qnet. We do this copy because we want to keep q.qnet as it is (without collapsed edges into one). The function will then save the expCF in q.qnet.expCF
  • calculateExpCFAll!(qnet) will
    • identify the type of quartet as type 1 (equivalent to a tree) or type 2 (minor CF different). Here the code will first clean up any hybrid node by removing nodes with only two edges before identifying the qnet (because identification depends on neighbor nodes to hybrid node); later, set qnet.which (1 or 2), node.prev (neighbor node to hybrid node), updates node.k (number of nodes in hybridization cycle, this can change after deleting the nodes with only two edges), node.typeHyb (1,2,3,4,5 depending on the number of nodes in the hybridization cycle and the origin/target of the minor hybrid edge; this attribute is never used).
    • eliminate hybridization: this will remove type 1 hybridizations first. If qnet.which=1, then the qnet is similar to a tree quartet, so it will calculate the internal length of the tree quartet: qnet.t1.
    • update split for qnet.which=1, to determine which taxa are together. For example, for the quartet 12|34, the split is [1,1,2,2] or [2,2,1,1], that is, taxon 1 and 2 are on the same side of the split. This will update qnet.split
    • update formula for qnet.which=1 to know the order of minorCF and majorCF in the vector qnet.expCF. That is, if the quartet is 1342 (order in qnet.quartet.taxon), then the expected CF should match the observed CF in 13|42, 14|32, 12|34 and the qnet is 12|34 (given by qnet.split), qnet.formula will be [2,2,1] minor, minor, major
    • calculateExpCF!(qnet) for qnet.which=1, it will do 1-2/3exp(-qnet.t1) if qnet.formula[i]==1, and 1/3exp(qnet.t1) if qnet.formula[i]==2. For qnet.which=2, we need to make sure that there is only one hybrid node, and compute the major, minor1,minor2 expected CF in the order 12|34, 13|24, 14|23 of the taxa in qnet.quartet.taxon

Then we create a NLopt object with algorithm BOBYQA and k parameters (length of ht). We define upper and lower bounds and define the objective function that should only depend on x=(h,t,gz) and g (gradient, which we do not have, but still need to put as argument).

The objective function obj(x,g) calls

  • calculateExpCFAll!(d,x,net) needs to be run after extractQuartet(net,d) that will update q.qnet for all quartet. Assumes that qnet.indexht is updated already: we only need to do this at the beginning of optBL! because the topology is fixed at this point)
    • First it will update the edge lengths according to x
    • If the q.qnet.changed=true (that is, any of qnet branches changed value), we need to call calculateExpCFAll!(qnet) on a copy of q.qnet (again because we want to leave q.qnet with the edge correspondence to net)
  • update!(net,x) simply saves the new x in net.ht

Finally, we call NLopt.optimize, and we update the net.loglik and net.ht at the end. After optBL, we want to call afterOptBLAll (or afterOptBLAllMultipleAlleles) to check if there are h==0,1; t==0; hz==0,1.

source
PhyloNetworks.optTopLevel!Method

optTopLevel road map

Function that does most of the heavy-lifting of snaq. It optimizes the pseudolikelihood for a given starting topology, and returns the best network. Assumes that the starting topology is level-1 network, and has all the attributes correctly updated.

Input parameters:

  • Starting topology currT, input data DataCF d, maximum number of hybridizations hmax
  • Numerical optimization parameters: liktolAbs, Nfail, ftolRel, ftolAbs, xtolRel, xtolAbs
  • Print parameters: verbose, logfile, writelog
  • Parameters to tune the search in space of networks: closeN=true only propose move origin/target to neighbor edges (coded, but not tested with closeN=false), Nmov0 vector with maximum number of trials allowed per type of move (add, mvorigin, mvtarget, chdir, delete, nni), by default computed inside with coupon’s collector formulas

The optimization procedure keeps track of

  • movescount: count of proposed moves,
  • movesgamma: count of proposed moves to fix a gamma zero situation (see below for definition of this situation),
  • movesfail: count of failed moves by violation of level-1 network (inCycle attribute) or worse pseudolikelihood than current,
  • failures: number of failed proposals that had a worse pseudolikelihood

Optimization procedure:

While the difference between current loglik and proposed loglik is greater than liktolAbs, or failures<Nfail, or stillmoves=true:

  • Nmov is updated based on newT. The type of move proposed will depend on newT (which is the same as currT at this point). For example, if currT is a tree, we cannot propose move origin/target.

  • move = whichMove selects randomly a type of move, depending on Nmov,movesfail,hmax,newT with weights 1/5 by default for all, and 0 for delete. These weights are adjusted depending on newT.numHybrids and hmax. If newT.numHybrids is far from hmax, we give higher probability to adding a new hybrid (we want to reach the hmax sooner, maybe not the best strategy, easy to change). Later, we adjust the weights by movesfail (first, give weight of 0 if movesfail[i]>Nmov[i], that is, if we reached the maximum possible number of moves allowed for a certain type) and then increase the probability of the other moves. So, unless one move has w=0, nothing changes. This could be improved by using the outlier quartets to guide the proposal of moves.

  • whichMove will choose a move randomly from the weights, it will return none if no more moves allowed, in which case, the optimization ends

  • flag=proposedTop!(move, newT) will modify newT based on move. The function proposedTop will return flag=true if the move was successful (the move succeeded by inCycle, containRoot, available edge to make the move (more details in proposedTop)). If flag=false, then newT is cleaned, except for the case of multiple alleles. The function proposedTop keeps count of movescount (successful move), movesfail (unsuccessful move),

    Options:

    random=true: moves major/minor hybrid edge with prob h,1-h, respectively

    N=10: number of trials for NNI edge.

  • if(flag) Optimize branch lengths with optBL

    If newT.loglik is better than currT.loglik by liktolAbs, jump to newT (accepted=true) and fix gamma=0, t=0 problems (more info on afterOptBL)

    If(accepted) failures=0, movesfail=zeros, movescount for successful move +1

end while

After choosing the best network newT, we do one last more thorough optimization of branch lengths with optBL, we change non identifiable branch lengths to -1 (only in debug mode) and return newT

source
PhyloNetworks.optTopRun1!Method
optTopRun1!(net, liktolAbs, Nfail, d::DataCF, hmax, etc.)

The function will run 1 run by modifying the starting topology and calling optTopLevel. See optTopRuns! for a roadmap.

probST (default in snaq is 0.3) is the probability of starting one run at the same input tree. So, with probability 1-probST, we will change the topology by a NNI move on a tree edge without neighbor hybrid. If the starting topology is a network, then with probability 1-probST it will also modify one randomly chosen hybrid edge: with prob 0.5, the function will move origin, with prob 0.5 will do move target.

If there are multiple alleles (d.repSpecies not empty), then the function has to check that the starting topology does not violate the multiple alleles condition.

After modifying the starting topology with NNI and/or move origin/target, optTopLevel is called.

source
PhyloNetworks.optTopRuns!Method

Road map for various functions behind snaq!

snaq!
optTopRuns!
optTopRun1!
optTopLevel!
optBL!

All return their optimized network.

  • snaq! calls optTopRuns! once, after a deep copy of the starting network. If the data contain multiple alleles from a given species, snaq! first expands the leaf for that species into 2 separate leaves, and merges them back into a single leaf after calling optTopRuns!.
  • optTopRuns! calls optTopRun1! several (nrun) times. assumes level-1 network with >0 branch lengths. assumes same tips in network as in data: i.e. 2 separate tips per species that has multiple alleles. each call to optTopRun1! gets the same starting network.
  • optTopRun1! calls optTopLevel! once, after deep copying + changing the starting network slightly.
  • optTopLevel! calls optBL! various times and proposes new network with various moves.
source
PhyloNetworks.optimizeallgammas_LiNC!Method
optimizeallgammas_LiNC!(obj::SSM, ftolAbs::Float64,
                        γcache::CacheGammaLiNC, maxeval=1000::Int)

Optimize all γ's in a network, one by one using [optimizegamma_LiNC!] until maxeval γ's have been optimized or until the difference in log-likelihood falls below ftolAbs.

At the end: hybrid edges with γ=0 are deleted (if any).

Output: true if reticulations have been deleted, false otherwise. If true, updateSSM! needs to be called afterwards, with constraints if any. (Constraints are not known here). Before updating the displayed trees in the SSM, shrink2cycles! or shrink3cycles! could be called, if desired, despite the (slight?) change in likelihood that this shrinking would cause. 2/3-cycles are not shrunk here.

source
PhyloNetworks.optimizegamma_LiNC!Function
optimizegamma_LiNC!(obj::SSM, focusedge::Edge,
                ftolAbs::Float64, cache::CacheGammaLiNC, maxNR=10::Int)

Optimize γ on a single hybrid edge using the Newton-Raphson method (the log-likelihood is concave). The new log-likelihood is returned after updating obj and its fields with the new γ.

The search stops if the absolute difference in log-likelihood between two consecutive iterations is below ftolAbs, or after maxNR Newton-Raphson iterations.

Warnings:

  • no check that edge is hybrid
  • obj._loglikcache and displayed trees (etc.) are assumed to be up-to-date, and is updated alongside the new γ.

Used by optimizelocalgammas_LiNC! and optimizeallgammas_LiNC!.

source
PhyloNetworks.optimizelength_LiNC!Method
optimizelength_LiNC!(obj::SSM, edge::Edge, cache::CacheLengthLiNC, Qmatrix)

Optimize the length of a single edge using a gradient method, if edge is not below a reticulation (the unzipped canonical version should have a length of 0 below reticulations). Output: nothing.

Warning: displayed trees are assumed up-to-date, with nodes preordered

source
PhyloNetworks.optimizelocalBL_LiNC!Method
optimizelocalBL_LiNC!(obj::SSM, edge::Edge, lcache::CacheLengthLiNC)

Optimize branch lengths in net locally around edge. Update all edges that share a node with edge (including itself). Constrains branch lengths to zero below hybrid nodes. Return vector of updated edges (excluding constrained edges). The tolerance values and parameters controlling the search are set in lcache.

Used after nni! or addhybridedge! moves to update local branch lengths. See phyLiNC!.

Assumptions:

  • displayed trees in obj are up-to-date with nodes preordered, and obj.priorltw are up-to-date.
  • branch lengths are at or above the lower bound, and definitely non-negative.
julia> net = readTopology("(((A:2.0,(B:0.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");

julia> fastafile = abspath(joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "simple.aln"));

julia> obj = PhyloNetworks.StatisticalSubstitutionModel(net, fastafile, :JC69);

julia> obj.loglik = -Inf64; # not calculated yet

julia> e = obj.net.edge[4];

julia> e.length
1.5

julia> using Random; Random.seed!(1234);

julia> PhyloNetworks.optimizelocalBL_LiNC!(obj, e, PhyloNetworks.CacheLengthLiNC(obj, 1e-6,1e-6,1e-2,1e-2, 10));

julia> round(e.length, sigdigits=6) # we get the lower bound from PhyLiNC in this case
1.0e-8

julia> writeTopology(obj.net; round=true)
"(((A:0.338,(B:0.0)#H1:0.04::0.9):0.0,(C:0.6,#H1:1.0::0.1):0.0):0.0,D:2.0);"
source
PhyloNetworks.optimizelocalgammas_LiNC!Function
optimizelocalgammas_LiNC!(obj::SSM, edge::Edge,
                          ftolAbs::Float64, γcache::CacheGammaLiNC,
                          maxeval=1000::Int)

Optimize γ's in net locally around edge. Update all edges adjacent to edge (including itself), one by one using [optimizegamma_LiNC!] until maxeval γ's have been optimized or until the difference in log-likelihood falls below ftolAbs. nothing is returned.

Used after nni! or addhybridedge! moves to update local gammas.

Assumptions:

  • correct isChild1 field for edge and for hybrid edges
  • no in-coming polytomy: a node has 0, 1 or 2 parents, no more

```jldoctest julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");

julia> fastafile = abspath(joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "simple.aln"));

julia> obj = PhyloNetworks.StatisticalSubstitutionModel(net, fastafile, :JC69);

julia> obj.net.edge[3].gamma 0.9

julia> using Random; Random.seed!(1234);

julia> PhyloNetworks.optimizelocalgammas_LiNC!(obj, obj.net.edge[3], 1e-3, PhyloNetworks.CacheGammaLiNC(obj));

julia> obj.net.edge[3].gamma 0.0 ````

source
PhyloNetworks.optimizestructure!Method
optimizestructure!(obj::SSM, maxmoves::Integer, maxhybrid::Integer,
                   no3cycle::Bool, nohybridladder::Bool,
                   nreject::Integer, nrejectmax::Integer,
                   constraints::Vector{TopologyConstraint},
                   ftolAbs::Float64,
                   γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)

Alternate NNI moves, hybrid additions / deletions, and root changes based on their respective weights in moveweights_LiNC ([0.4, 0.2, 0.2, 0.2]). Branch lengths and hybrid γs around the NNI focal edge or around the added / deleted hybrid edge are optimized (roughly) on the proposed network. Each proposed network is accepted –or not– if the likelihood improves (or doesn't decrease much for hybrid deletion). After adding or removing a hybrid, obj is updated, to have correct displayed trees and node/edge numberings: done by nni_LiNC!, addhybridedgeLiNC! and deletehybridedgeLiNC!.

Output: number of consecutive rejections so far.

The percent of nni moves, hybrid additions, hybrid deletions, and root changes to be performed is in PhyloNetworks.moveweights_LiNC.

  • maxmoves: maximum number of moves to be performed, including root changes, which don't actually change the semi-directed topology and likelihood.
  • nreject: number of consecutive rejections (ignoring root changes), prior to starting the search (from a prior function call)
  • nrejectmax: the search stops when there has been this number of moves that have been rejected in a row (ignoring root changes)

For a description of other arguments, see phyLiNC.

Assumptions:

  • checknetworkbeforeLiNC and discrete_corelikelihood! have been called on obj.net.
  • starting with a network without 2- and 3- cycles (checked by checknetworkbeforeLiNC)

Note: When removing a hybrid edge, always removes the minor edge.

source
PhyloNetworks.pairwiseTaxonDistanceGradMethod
pairwiseTaxonDistanceGrad(net; checkEdgeNumber=true, nodeAges=[])

3-dim array: gradient of pairwise distances between all nodes. (internal and leaves); gradient with respect to edge lengths if nodeAges is empty; with respect to node ages otherwise. Assume correct net.nodes_changed (preorder). This gradient depends on the network's topology and γ's only, not on branch lengths or node ages (distances are linear in either).

WARNING: edge numbers need to range between 1 and #edges.

source
PhyloNetworks.parseEdgeData!Method
parseEdgeData!(s::IO, edge, numberOfLeftParentheses::Array{Int,1})

Helper function for readSubtree!. Modifies e according to the specified edge length and gamma values in the tree topology. Advances the stream s past any existing edge data. Edges in a topology may optionally be followed by ":edgeLen:bootstrap:gamma" where edgeLen, bootstrap, and gamma are decimal values. Nexus-style comments [&...], if any, are ignored.

source
PhyloNetworks.parseHybridNode!Method
parseHybridNode!(node, parentNode, hybridName, net, hybrids)

Helper function for readSubtree!. Create the parent edge for node. Return this edge, and the hybrid node retained (node or its clone in the newick string). Insert new edge and appropriate node into net and hybrids accordingly. Handles any type of given hybrid node. Called after a # has been found in a tree topology.

source
PhyloNetworks.parseRemainingSubtree!Method
parseRemainingSubtree!(s::IO, numLeft, net, hybrids)

Create internal node. Helper for readSubtree!, which creates the parent edge of the node created by parseRemainingSubtree!: readSubtree! calls parseRemainingSubtree!, and vice versa. Called once a ( has been read in a tree topology and reads until the corresponding ) has been found. This function performs the recursive step for readSubtree!. Advances s past the subtree, adds discovered nodes and edges to net, and hybrids.

Does not read the node name and the edge information of the subtree root: this is done by readSubtree!

source
PhyloNetworks.parseTreeNode!Method
parseTreeNode!(node, parentNode, net)

Helper function for readSubtree!. Insert the input tree node and associated edge (created here) into net.

source
PhyloNetworks.parsimonyBottomUpFitch!Method
parsimonyBottomUpFitch!(node, states, score)

Bottom-up phase (from tips to root) of the Fitch algorithm: assign sets of character states to internal nodes based on character states at tips. Polytomies okay. Assumes a tree (no reticulation) and correct isChild1 attribute.

output: dictionary with state sets and most parsimonious score

source
PhyloNetworks.parsimonyBottomUpGF!Method
parsimonyBottomUpGF!(node, blobroot, nchar, w, scores,
                     costmatrix1, costmatrix2)

Compute the MP scores (one for each assignment of the blob root state) given the descendants of a blob, conditional on the states at predefined parents of hybrids in the blobs (one parent per hybrid) as described in

Leo Van Iersel, Mark Jones, Celine Scornavacca (2017). Improved Maximum Parsimony Models for Phylogenetic Networks, Systematic Biology, (https://doi.org/10.1093/sysbio/syx094).

Assumes a set of state guesses, ie correct initialization of w for predefined hybrid parents, and correct fromBadDiamondI field for the children edges of these predefined parents. fromBadDiamondI is true for edges that are cut.

The field isExtBadTriangle is used to know which nodes are at the root of a blob. The field isChild1 is used (and assumed correct). Field inCycle is assumed to store the # of detached parents (with guessed states)

  • nchar: number of characters considered at internal lineages. For softwired parsimony, this is # states + 1, because characters at internal nodes are ∅, {1}, {2}, etc. For parental parsimony, this is 2^#states -1, because characters are all sets on {1,2,...} except for the empty set ∅.
  • costmatrix1[i,j] and costmatrix2[k][i,j]: 2d array and vector of 2d arrays containing the cost of going to character j starting from character i when the end node has a single parent, or the cost of a child node having character j when its parents have characters k and i. These cost matrices are pre-computed depending on the parsimony criterion (softwired, hardwired, parental etc.)

used by parsimonyGF.

source
PhyloNetworks.parsimonyBottomUpSoftwired!Method
parsimonyBottomUpSoftwired!(node, blobroot, states, w, scores)

Computing the MP scores (one for each assignment of the root state) of a swicthing as described in Algorithm 1 in the following paper:

Fischer, M., van Iersel, L., Kelk, S., Scornavacca, C. (2015). On computing the Maximum Parsimony score of a phylogenetic network. SIAM J. Discrete Math., 29(1):559-585.

Assumes a switching (ie correct fromBadDiamondI field) and correct isChild1 field. The field isExtBadTriangle is used to know which nodes are at the root of a blob.

source
PhyloNetworks.parsimonyDiscreteFitchMethod
parsimonyDiscreteFitch(net, tipdata)

Calculate the most parsimonious (MP) score of a network given a discrete character at the tips. The softwired parsimony concept is used: where the number of state transitions is minimized over all trees displayed in the network. Tip data can be given in a data frame, in which case the taxon names are to appear in column 1 or in a column named "taxon" or "species", and trait values are to appear in column 2 or in a column named "trait". Alternatively, tip data can be given as a dictionary taxon => trait.

also return the union of all optimized character states at each internal node as obtained by Fitch algorithm, where the union is taken over displayed trees with the MP score.

source
PhyloNetworks.parsimonyTopDownFitch!Method
parsimonyTopDownFitch!(node, states)

Top-down phase (root to tips) of the Fitch algorithm: constrains character states at internal nodes based on the state of the root. Assumes a tree: no reticulation.

output: dictionary with state sets

source
PhyloNetworks.phyLiNCFunction
phyLiNC(net::HybridNetwork, fastafile::String, substitutionModel::Symbol)

Estimate a phylogenetic network from concatenated DNA data using maximum likelihood, ignoring incomplete lineage sorting (phyLiNC: phylogenetic Likelihood Network from Concatenated data). The network is constrained to have maxhybrid reticulations at most, but can be of any level. The search starts at (or near) the network net, using a local hill-climbing search to optimize the topology (nearest-neighbor interchange moves, add hybridizations, and remove hybridizations). Also optimized are evolutionary rates, amount of rate variation across sites, branch lengths and inheritance γs. This search strategy is run nruns times, and the best of the nruns networks is returned.

Return a StatisticalSubstitutionModel object, say obj, which contains the estimated network in obj.net.

Required arguments:

  • net: a network or tree of type HybridNetwork, to serve as a starting point in the search for the best network. Newick strings can be converted to this format with readTopology.
  • fastafile: file with the sequence data in FASTA format. Ambiguous states are treated as missing.
  • substitutionModel: A symbol indicating which substitution model is used. Choose :JC69 JC69 for the Jukes-Cantor model or :HKY85 for the Hasegawa, Kishino, Yano model HKY85.

The length of the edge below a reticulation is not identifiable. Therefore, phyLiNC estimates the canonical version of the network: with reticulations unzipped: edges below reticulations are set to 0, and hybrid edges (parental lineages) have estimated lengths that are increased accordingly.

If any branch lengths are missing in the input network, phyLiNC estimates the starting branch lengths using pairwise distances. Otherwise, it uses the input branch lengths as starting branch lengths, only unzipping all reticulations, as described above.

Optional arguments (default value in parenthesis):

  • symbol for the model of rate variation across sites (:noRV for no rate variation): use :G or :Gamma for Gamma-distributed rates, :I or :Inv for a proportion of invariable sites, and :GI or :GammaInv for a combination (not recommended).
  • integer (4) for the number of categories to use in estimating evolutionary rates using a discretized gamma model. When allowing for rate variation, four categories is standard. With 1 category, no rate variation is assumed. See RateVariationAcrossSites.

Main optional keyword arguments (default value in parenthesis):

  • speciesfile (""): path to a csv file with samples in rows and two columns: species (column 1), individual (column 2) Include this file to group individuals by species.
  • cladefile (""): path to a csv file containing two columns: clades and individuals used to create one or more clade topology constraints to meet during the search. (NOTE: clade contraints not yet implemented.)
  • filename ("phyLiNC"): root name for the output files (.out, .err). If empty (""), files are not created, progress log goes to the screen only (standard out).
  • maxhybrid (1): maximum number of hybridizations allowed. net (starting network) must have maxhybrid or fewer reticulations.
  • nruns (10): number of independent starting points for the search
  • no3cycle (true): prevents 3-cycles, which are (almost) not identifiable
  • nohybridladder (true): prevents hybrid ladder in network. If true, the input network must not have hybrid ladders. Warning: Setting this to true will avoid most hybrid ladders, but some can still occur in some cases when deleting hybrid edges. This might be replaced with an option to avoid all non-tree-child networks in the future.

Optional arguments controlling the search:

  • seed (default 0 to get it from the clock): seed to replicate a given search
  • nreject (75): maximum number of times that new topologies are proposed and rejected in a row. Lower values of nreject result in a less thorough but faster search. Controls when to stop proposing new network topologies.
  • probST (0.5): probability to use net as the starting topology for each given run. If probST < 1, the starting topology is k NNI moves away from net, where k is drawn from a geometric distribution: p (1-p)ᵏ, with success probability p = probST.
  • maxmoves (100): maximum number of topology moves before branch lengths, hybrid γ values, evolutionary rates, and rate variation parameters are reestimated.
  • verbose (true): set to false to turn off screen output
  • alphamin (0.02) and alphamax (50.0): minimum and maximum values for the shape parameter alpha for Gamma-distributed rates across sites.
  • pinvmin (1e-8) and pinvmax (0.99): minimum and maximum values for the proportion of invariable sites, if included in the model.

The following optional arguments control when to stop the optimization of branch lengths and gamma values on each individual candidate network. Defaults in parentheses.

  • ftolRel (1e-6) and ftolAbs (1e-6): relative and absolute differences of the network score between the current and proposed parameters
  • xtolRel (1e-5) and xtolAbs (1e-5): relative and absolute differences between the current and proposed parameters.

Greater values will result in a less thorough but faster search. These parameters are used when evaluating candidate networks only. Regardless of these arguments, once a final topology is chosen, branch lenghts are optimized using stricter tolerances (1e-10, 1e-12, 1e-10, 1e-10) for better estimates.

source
PhyloNetworks.phyLiNC!Method
phyLiNC!(obj::SSM; kwargs...)

Called by phyLiNC after obj is created (containing both the data and the model) and after checks are made to start from a network that satisfies all the constraints. Different runs are distributed to different processors, if more than one are available.

source
PhyloNetworks.phyLiNCone!Method
phyLiNCone!(obj::SSM, maxhybrid::Int, no3cycle::Bool,
            nohybridladder::Bool, maxmoves::Int, nrejectmax::Int,
            verbose::Bool, writelog_1proc::Bool, logfile::IO,
            seed::Int, probST::Float64,
            constraints::Vector{TopologyConstraint},
            ftolRel::Float64, ftolAbs::Float64,
            xtolRel::Float64, xtolAbs::Float64,
            alphamin::Float64, alphamax::Float64,
            pinvmin::Float64, pinvmax::Float64,
            γcache::CacheGammaLiNC)

Estimate one phylogenetic network (or tree) from concatenated DNA data, like phyLiNC!, but doing one run only, and taking as input an StatisticalSubstitutionModel object obj. The starting network is obj.net and is assumed to meet all the requirements.

writelog_1proc is passed by phyLiNC! an indicates if a log should be written. If the number of processors is > 1, this will be false because workers can't write on streams opened by master. logfile will be stdout if writelog_1proc is false. Otherwise, it will be the log file created by phyLiNC!.

See phyLiNC for other arguments.

source
PhyloNetworks.posterior_loghybridweightFunction
posterior_loghybridweight(obj::SSM, hybrid_name, trait = 1)
posterior_loghybridweight(obj::SSM, edge_number, trait = 1)

Log-posterior probability for all trees displaying the minor parent edge of hybrid node named hybrid_name, or displaying the edge number edge_number. That is: log of P(hybrid minor parent | trait) if a single trait is requested, or A[i]= log of P(hybrid minor parent | trait i) if trait is a vector or range (e.g. trait = 1:obj.nsites). These probabilities are conditional on the model parameters in obj.

Precondition: _loglikcache updated by discrete_corelikelihood!

examples

julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");

julia> m1 = BinaryTraitSubstitutionModel([0.1, 0.1], ["lo", "hi"]); # arbitrary rates

julia> using DataFrames

julia> dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]);

julia> fit = fitdiscrete(net, m1, dat); # optimized rates: α=0.27 and β=0.35

julia> plhw = PhyloNetworks.posterior_loghybridweight(fit, "H1");

julia> round(exp(plhw), digits=5) # posterior probability of going through minor hybrid edge
0.08017

julia> hn = net.node[3]; getparentedgeminor(hn).gamma # prior probability
0.1
source
PhyloNetworks.posterior_logtreeweightFunction
posterior_logtreeweight(obj::SSM, trait = 1)

Array A of log-posterior probabilities for each tree displayed in the network: such that A[t] = log of P(tree t | trait trait) if a single trait is requested, or A[t,i]= log of P(tree t | trait i) if trait is a vector or range (e.g. trait = 1:obj.nsites). These probabilities are conditional on the model parameters in obj.

Displayed trees are listed in the order in which they are stored in the fitted model object obj.

Precondition: _loglikcache updated by discrete_corelikelihood!

examples

julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");

julia> m1 = BinaryTraitSubstitutionModel([0.1, 0.1], ["lo", "hi"]); # arbitrary rates

julia> using DataFrames

julia> dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]);

julia> fit = fitdiscrete(net, m1, dat); # optimized rates: α=0.27 and β=0.35

julia> pltw = PhyloNetworks.posterior_logtreeweight(fit);

julia> round.(exp.(pltw), digits=5) # posterior trees probabilities (sum up to 1)
2-element Vector{Float64}:
 0.91983
 0.08017

julia> round.(exp.(fit.priorltw), digits=4) # the prior tree probabilities are similar here (tiny data set!)
2-element Vector{Float64}:
 0.9
 0.1
source
PhyloNetworks.problem4cycleMethod
problem4cycle(β::Node, δ::Node, α::Node, γ::Node)

Check if the focus edge uv has a 4 cycle that could lead to a 3 cycle after an chosen NNI. Return true if there is a problem 4 cycle, false if none.

source
PhyloNetworks.proposedTop!Method

proposedTop!(move,newT,random,count,N,movescount,movesfail,multall) road map

Function to change the current network newT by a given move, and checks that the move was successful (correct attributes). If not successful, newT is changed back to its original state, except for the case of multiple alleles.

Note that the update of attributes by each move is not done in all the network, but only in the local edges that were changed by the move. This is efficient (and makes a move easy to undo), but makes the code of each move function very clunky.

Arguments:

  • move chosen from whichMove as described in optTopLevel
  • newT is the topology that will be modified inside with the move
  • random=true: chooses minor hybrid edge with prob 1-h, and major edge with prob h, if false, always chooses minor hybrid edge
  • count: simply which likelihood step we are in in the optimization at optTopLevel
  • movescount and movesfail: vector of counts of number of moves proposed
  • multall=true if multiple alleles case: we need to check if the move did not violate the multiple alleles condition (sister alleles together and no gene flow into the alleles). This is inefficient because we are proposing moves that we can reject later, instead of being smart about the moves we propose: for example, move origin/target could rule out some neighbors that move gene flow into the alleles, the same for add hybridization; nni move can check if it is trying to separate the alleles)

Moves:

  • addHybridizationUpdate(newT,N):

will choose a partition first (to avoid choosing edges that will create a non level-1 network) will choose two edges from this partition randomly, will not allow two edges in a cherry (non-identifiable), or sister edges that are not identifiable (the blacklist was a way to keep track of "bad edges" were we should not waste time trying to put hybridizations, it has never been used nor tested). Also choose gamma from U(0,0.5). The "Update" in the function name means that it creates the new hybrid, and also updates all the attributes of newT

  • node = chooseHybrid(newT) choose a hybrid randomly for the next moves:
  • moveOriginUpdateRepeat!(newT,node,random)

will choose randomly the minor/major hybrid edge to move (if random=true); will get the list of all neighbor edges where to move the origin, will move the origin and update all the attributes and check if the move was successful (not conflicting attributes); if not, will undo the move, and try with a different neighbor until it runs out of neighbors. Return true if the move was successful.

  • moveTargetUpdateRepeat!(newT,node,random)

same as move origin but moving the target

  • changeDirectionUpdate!(newT,node,random)

chooses minor/major hybrid edge at random (if `random=true), and changes the direction, and updates all the attributes. Checks if the move was successful (returns true), or undoes the change and returns false.

  • deleteHybridizationUpdate!(newT,node)

removes the hybrid node, updates the attributes, no need to check any attributes, always successful move

  • NNIRepeat!(newT,N)

choose an edge for nni that does not have a neighbor hybrid. It will try to find such an edge N times, and if it fails, it will return false (unsuccessful move). N=10 by default. If N=1, it rarely finds such an edge if the network is small or complex. The function cannot choose an external edge. it will update locally the attributes.

** Important: ** All the moves undo what they did if the move was not successful, so at the end you either have a newT with a new move and with all good attributes, or the same newT that started. This is important to avoid having to do deepcopy of the network before doing the move. Also, after each move, when we update the attributes, we do not update the attributes of the whole network, we only update the attributes of the edges that were affected by the move. This saves time, but makes the code quite clunky. Only the case of multiple alleles the moves does not undo what it did, because it finds out that it failed after the function is over, so just need to treat this case special.

source
PhyloNetworks.quartetdata_columnnamesMethod
quartetdata_columnnames(T) where T <: StaticArray

Vector of column names to hold the quartet data of type T in a data frame. If T is a length-3 vector type, they are "CF1234","CF1324","CF1423". If T is a length-4 vector type, the 4th name is "ngenes". If T is a 3×n matrix type, the output vector contains 3×n names, 3 for each of "CF", "V2", "V3", ... "Vn".

Used by writeTableCF to build a data frame from a vector of QuartetT objects.

source
PhyloNetworks.quartetrankMethod
quartetrank(t1,t2,t3,t4, nCk::Matrix)
quartetrank([t1,t2,t3,t4], nCk)

Return the rank of a four-taxon set with taxon numbers t1,t2,t3,t4, assuming that tis are positive integers such that t1<t2, t2<t3 and t3<t4 (assumptions not checked!). nCk should be a matrix of "n choose k" binomial coefficients: see nchoose1234.

examples

julia> nCk = PhyloNetworks.nchoose1234(5)
6×4 Matrix{Int64}:
 0   0   0  0
 1   0   0  0
 2   1   0  0
 3   3   1  0
 4   6   4  1
 5  10  10  5

julia> PhyloNetworks.quartetrank([1,2,3,4], nCk)
1

julia> PhyloNetworks.quartetrank([3,4,5,6], nCk)
15
source
PhyloNetworks.readCSVtoArrayMethod
readCSVtoArray(dat::DataFrame)
readCSVtoArray(filename::String)

Read a CSV table containing both species names and data, create two separate arrays: one for the species names, a second for the data, in a format that parsimonyGF needs.

Warning:

  • it will try to find a column 'taxon' or 'species' for the taxon names. If none found, it will assume the taxon names are in column 1.
  • will use all other columns as characters
source
PhyloNetworks.readFastaToArrayFunction
readFastaToArray(filename::AbstractString, sequencetype=BioSequences.LongDNA{4})

Read a fasta-formatted file. Return a tuple species, sequences where species is a vector of Strings with identifier names, and sequences is a vector of BioSequences, each of type sequencetype (DNA by default).

Warnings:

  • assumes a semi-sequential format, not interleaved. More specifically, each taxon name should appear only once. For this one time, the corresponding sequence may be broken across several lines though.
  • fails if all sequences aren't of the same length
source
PhyloNetworks.readInputDataMethod
readInputData(trees, quartetfile, whichQuartets, numQuartets, writetable, tablename, writeQfile, writesummary)
readInputData(trees, whichQuartets, numQuartets, taxonlist,   writetable, tablename, writeQfile, writesummary)

Read gene trees and calculate the observed quartet concordance factors (CF), that is, the proportion of genes (and the number of genes) that display each quartet for a given list of four-taxon sets.

Input:

  • trees: name of a file containing a list of input gene trees, or vector of trees (HybridNetwork objects)

Optional arguments (defaults):

  • quartetfile: name of a file containing a list of quartets, or more precisely, a list of four-taxon sets
  • whichQuartets (:all): which quartets to sample. :all for all of them, :rand for a random sample.
  • numQuartets: number of quartets in the sample. default: total number of quartets if whichQuartets=:all and 10% of total if whichQuartets=:rand
  • taxonlist (all in the input gene trees): If taxonlist is used, whichQuartets will consist of all sets of 4 taxa in the taxonlist.
  • writetable (true): write the table of observed CF?
  • tablename ("tableCF.txt"): if writetable is true, the table of observed CFs is write to file tablename
  • writeQfile (false): write intermediate file with sampled quartets?
  • writesummary (true): write a summary file? if so, the summary will go in file "summaryTreesQuartets.txt".

Uses calculateObsCFAll!, which implements a slow algorithm.

See also: countquartetsintrees, which uses a much faster algorithm; readTrees2CF, which is basically a re-naming of readInputData, and readTableCF to read a table of quartet CFs directly.

source
PhyloNetworks.readSubtree!Method
readSubtree!(s::IO, parentNode, numLeft, net, hybrids)

Recursive helper method for readTopology: read a subtree from an extended Newick topology. input s: IOStream/IOBuffer.

Reads additional info formatted as: :length:bootstrap:gamma. Allows for name of internal nodes without # after closing parenthesis: (1,2)A. Warning if hybrid edge without γ, or if γ (ignored) without hybrid edge

source
PhyloNetworks.readnexus_extractgammaMethod
readnexus_extractgamma(nexus_string)

Extract γ from comments and return a dictionary hybrid number ID => γ, from one single phylogeny given as a string. The output from BEAST2 uses this format for reticulations at minor edges, as output by bacter (Vaughan et al. 2017):

#11[&conv=0, relSize=0.08, ...

or as output by SpeciesNetwork (Zhang et al. 2018):

#H11[&gamma=0.08]

The function below assumes that the "H" was already added back if not present already (from bacter), like this:

#H11[&conv=0, relSize=0.19, ...

The bacter format is tried first. If this format doesn't give any match, then the SpeciesNetwork format is tried next. See readnexus_assigngammas!.

source
PhyloNetworks.readnexus_translatetableMethod
readnexus_translatetable(io)

Read translate table from IO object io, whose first non-empty line should contain "translate". Then each line should have "number name" and the end of the table is indicated by a ;. Output tuple:

  • line that was last read, and is not part of the translate table, taken from io
  • translate: boolean, whether a table was successfully read
  • id2name: dictionary mapping number to name.
source
PhyloNetworks.readnexuscommentMethod
readnexuscomment(s::IO, c::Char)

Read (and do nothing with) nexus-style comments: [& ... ] Assumption: 'c' is the next character to be read from s. Output: nothing.

Comments can appear after (or instead of) a node or leaf name, before or after an edge length, and after another comment.

source
PhyloNetworks.readnodenameMethod
readnodename(s::IO, c::Char, net, numLeft)

Auxiliary function to read a taxon name during newick parsing. output: tuple (number, name, pound_boolean)

Names may have numbers: numbers are treated as strings. Accepts # as part of the name (but excludes it from the name), in which case pound_boolean is true. # is used in extended newick to flag hybrid nodes.

Nexus-style comments following the node name, if any, are read and ignored.

source
PhyloNetworks.recursionPostOrderMethod
recursionPostOrder(net::HybridNetwork, checkPreorder::Bool,
                   init_function, tip_function, node_function,
                   indexation="b", parameters...)
recursionPostOrder(nodes, init_function, tip_function, node_function,
                   parameters)
updatePostOrder!(index, nodes, updated_matrix, tip_function, node_function,
                parameters)

Generic tool to apply a post-order (or topological ordering) algorithm, acting on a matrix where rows & columns correspond to nodes. Used by descendenceMatrix.

source
PhyloNetworks.recursionPreOrder!Function
recursionPreOrder(nodes, init_function, root_function, tree_node_function,
                  hybrid_node_function, parameters)
recursionPreOrder!(nodes, AbstractArray, root_function, tree_node_function,
                   hybrid_node_function, parameters)
updatePreOrder(index, nodes, updated_matrix, root_function, tree_node_function,
               hybrid_node_function, parameters)

Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedPathMatrix and by pairwiseTaxonDistanceMatrix.

source
PhyloNetworks.recursionPreOrderMethod
recursionPreOrder(nodes, init_function, root_function, tree_node_function,
                  hybrid_node_function, parameters)
recursionPreOrder!(nodes, AbstractArray, root_function, tree_node_function,
                   hybrid_node_function, parameters)
updatePreOrder(index, nodes, updated_matrix, root_function, tree_node_function,
               hybrid_node_function, parameters)

Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedPathMatrix and by pairwiseTaxonDistanceMatrix.

source
PhyloNetworks.removeHybrid!Method
removeHybrid!(net::Network, n::Node)

Delete a hybrid node n from net.hybrid, and update net.numHybrid. The actual node n is not deleted. It is kept in the full list net.node.

source
PhyloNetworks.resetEdgeNumbers!Function
resetEdgeNumbers!(net::HybridNetwork, verbose=true)

Check that edge numbers of net are consecutive numbers from 1 to the total number of edges. If not, reset the edge numbers to be so.

source
PhyloNetworks.resetNodeNumbers!Method
resetNodeNumbers!(net::HybridNetwork; checkPreorder=true, type=:ape)

Change internal node numbers of net to consecutive numbers from 1 to the total number of nodes.

keyword arguments:

  • type: default is :ape, to get numbers that satisfy the conditions assumed by the ape R package: leaves are 1 to n, the root is n+1, and internal nodes are higher consecutive integers. If :postorder, nodes are numbered in post-order, with leaves from 1 to n (and the root last). If :internalonly, leaves are unchanged. Only internal nodes are modified, to take consecutive numbers from (max leaf number)+1 and up. With this last option, the post-ordering of nodes is by-passed.
  • checkPreorder: if false, the isChild1 edge field and the net.nodes_changed network field are supposed to be correct (to get nodes in preorder). This is not needed when type=:internalonly.

Examples

julia> net = readTopology("(A,(B,(C,D)));");

julia> PhyloNetworks.resetNodeNumbers!(net)

julia> printNodes(net) # first column "node": root is 5
node leaf  hybrid hasHybEdge name inCycle edges'numbers
1    true  false  false      A    -1      1   
2    true  false  false      B    -1      2   
3    true  false  false      C    -1      3   
4    true  false  false      D    -1      4   
7    false false  false           -1      3    4    5   
6    false false  false           -1      2    5    6   
5    false false  false           -1      1    6   

julia> net = readTopology("(A,(B,(C,D)));");

julia> PhyloNetworks.resetNodeNumbers!(net; type=:postorder)

julia> printNodes(net) # first column "node": root is 7
node leaf  hybrid hasHybEdge name inCycle edges'numbers
1    true  false  false      A    -1      1   
2    true  false  false      B    -1      2   
3    true  false  false      C    -1      3   
4    true  false  false      D    -1      4   
5    false false  false           -1      3    4    5   
6    false false  false           -1      2    5    6   
7    false false  false           -1      1    6   
source
PhyloNetworks.sameTaxaMethod
sameTaxa(Quartet, HybridNetwork)

Return true if all taxa in the quartet are represented in the network, false if one or more taxa in the quartet does not appear in the network.

warning: the name can cause confusion. A more appropriate name might be "in", or "taxain", or "taxonsubset", or etc.

source
PhyloNetworks.sampleBootstrapTreesMethod
sampleBootstrapTrees(vector of tree lists; seed=0::Integer, generesampling=false, row=0)
sampleBootstrapTrees!(tree list, vector of tree lists; seed=0::Integer, generesampling=false, row=0)

Sample bootstrap gene trees, 1 tree per gene. Set the seed with keyword argument seed, which is 0 by default. When seed=0, the actual seed is set using the clock. Assumes a vector of vectors of networks (see readBootstrapTrees), each one of length 1 or more (error if one vector is empty, tested in bootsnaq).

  • site resampling: always, from sampling one bootstrap tree from each given list. This tree is sampled at random unless row>0 (see below).
  • gene resampling: if generesampling=true (default is false), genes (i.e. lists) are sampled with replacement.
  • row=i: samples the ith bootstrap tree for each gene. row is turned back to 0 if gene resampling is true.

output: one vector of trees. the modifying function (!) modifies the input tree list and returns it.

source
PhyloNetworks.sampleCFfromCIFunction
sampleCFfromCI(data frame, seed=0)
sampleCFfromCI!(data frame, seed=0)

Read a data frame containing CFs and their credibility intervals, and sample new obsCF uniformly within the CIs. These CFs are then rescaled to sum up to 1 for each 4-taxon sets. Return a data frame with taxon labels in first 4 columns, sampled obs CFs in columns 5-7 and credibility intervals in columns 8-13.

  • The non-modifying function creates a new data frame (with re-ordered columns) and returns it. If seed=-1, the new df is a deep copy of the input df, with no call to the random number generator. Otherwise, seed is passed to the modifying function.
  • The modifying function overwrites the input data frame with the sampled CFs and returns it. If seed=0, the random generator is seeded from the clock. Otherwise the random generator is seeded using seed.

Warning: the modifying version does not check the data frame: assumes correct columns.

optional argument: delim=',' by default: how columns are delimited.

source
PhyloNetworks.setBLGammaParsimony!Method

setBLGammaParsimony!(net::HybridNetwork)

Maximum parsimony function does not provide estimates for branch lengths, or gamma. But since the maxParsimonyNet function is using snaq move functions, branch lengths and gamma values are set randomly (to initialize optimization). We need to remove these random values before returning the maximum parsimony network.

source
PhyloNetworks.setBranchLength!Method
setBranchLength!(Edge, newlength)

Set the length of an Edge object. The new length needs to be non-negative, or -1.0 to be interpreted as missing. edge.y and edge.z are updated accordingly.

source
PhyloNetworks.setGammaBLfromGammaz!Method
setGammaBLfromGammaz!(node, network)

Update the γ values of the two sister hybrid edges in a bad diamond I, given the gammaz values of their parent nodes, and update the branch lengths t1 and t2 of their parent edges (those across from the hybrid nodes), in such a way that t1=t2 and that these branch lengths and γ values are consistent with the gammaz values in the network.

Similar to the first section of undoGammaz!, but does not update anything else than γ and t's. Unlike undoGammaz!, no error if non-hybrid node or not at bad diamond I.

source
PhyloNetworks.setGammas!Method
setGammas!(net, γ vector)

Set inheritance γ's of hybrid edges, using input vector for major edges. Assume pre-order calculated already, with up-to-date field nodes_changed. See getGammas.

Warning: very different from setGamma!, which focuses on a single hybrid event, updates the field isMajor according to the new γ, and is not used here.

Assumption: each hybrid node has only 2 parents, a major and a minor parent (according to the edges' field isMajor).

source
PhyloNetworks.setNonIdBL!Method

setNonIdBL!(net)

Set non-identifiable edge branch lengths to -1.0 (i.e. missing) for a level-1 network net, except for edges in

  • a good triangle: the edge below the hybrid is constrained to 0.
  • a bad diamond II: the edge below the hybrid is constrained to 0
  • a bad diamond I: the edges across from the hybrid node have non identifiable lengths but are kept, because the two γ*(1-exp(-t)) values are identifiable.

will break if inCycle attributes are not initialized (at -1) or giving a correct node number.

see Node for the meaning of boolean attributes isBadTriangle (which corresponds to a "good" triangle above), isBadDiamondI and isBadDiamondII.

source
PhyloNetworks.setalpha!Method
setalpha!(obj, alpha)

Set the shape parameter alpha in a RateVariationAcrossSites model obj, and update the rate multipliers accordingly. Return the modified object.

source
PhyloNetworks.seteigeninfo!Method
seteigeninfo!(obj)

Calculate eigenvalue & eigenfector information for a substitution model (SM) object (as needed to calculate transition rate matrices) and store this info within the object.

source
PhyloNetworks.setpinv!Method
setpinv!(obj, pinv)

Set the proportion of invariable sites pinv in a RateVariationAcrossSites model obj, and update the rate multipliers & weights accordingly. For RVASInvGamma objects, the original rate multipliers are assumed correct, according to the original pinv value. Return the modified object.

source
PhyloNetworks.setpinvalpha!Method
setpinvalpha!(obj, pinv, alpha)

Set the proportion of invariable sites pinv and the alpha parameter for the discretized gamma distribution in a model obj of type RVASGammaInv{S}. Update the rate multipliers & weights accordingly. The mean of the distribution is constrained to 1.

Return the modified object.

source
PhyloNetworks.showQMethod
showQ(IO, model)

Print the Q matrix to the screen, with trait states as labels on rows and columns. adapted from prettyprint function by mcreel, found 2017/10 at https://discourse.julialang.org/t/display-of-arrays-with-row-and-column-names/1961/6

source
PhyloNetworks.showdataFunction
showdata(io::IO, obj::SSM, fullsiteinfo=false::Bool)

Return information about the data in an SSM object: number of species, number or traits or sites, number of distinct patterns, and more information if fullsiteinfo is true: number sites with missing data only, number of invariant sites, number of sites with 2 distinct states, number of parsimony-informative sites (with 2+ states being observed in 2+ tips), number of sites with some missing data, and overall proportion of entries with missing data.

Note: Missing is not considered an additional state. For example, if a site contains some missing data, but all non-missing values take the same state, then this site is counted in the category "invariant".

source
PhyloNetworks.shrink2cycleat!Method
shrink2cycleat!(net::HybridNetwork, minor::Edge, major::Edge, unroot::Bool)

Remove minor edge then update the branch length of the remaining major edge. Called by shrink2cycles!

Assumption: minor and major do form a 2-cycle. That is, they start and end at the same node.

source
PhyloNetworks.shrink3cycleat!Method
shrink3cycleat!(net::HybridNetwork, hybrid::Node, edge1::Edge, edge2::Edge,
                node1::Node, node2::Node, unroot::Bool)

Replace a 3-cycle at a given hybrid node by a single node, if any. Assumption: edge1 (node1) and edge2 (node2) are the parent edges (nodes) of hybrid. Return true if a 3-cycle is found and removed, false otherwise. There is a 3-cycle if nodes 1 & 2 are connected, by an edge called e3 below.

There are two cases, with differing effects on the γ inheritance values and branch lengths.

Hybrid case: the 3-cycle is shrunk to a hybrid node, which occurs if either node 1 or 2 is a hybrid node (that is, e3 is hybrid). If e3 goes from node 1 to node 2, the 3-cycle (left) is shrunk as on the right:

\eA      /eB           \eA  /eB
 1--e3->2       γ1+γ2γ3 \  / γ2(1-γ3)
  \    /               hybrid
 γ1\  /γ2
  hybrid

with new branch lengths: new tA = tA + (γ1.t1 + γ2γ3.(t2+t3))/(γ1+γ2γ3), new tB = tB + t2, provided that γ1, γ2=1-γ1, and γ3 are not missing. If one of them is missing then γ1 and γ2 remain as is, and e3 is deleted naively, such that new tA = tA + t1 and new tB = tB + t2. If γ's are not missing but one of t1,t2,t3 is missing, then the γ's are updated to γ1+γ2γ3 and γ2(1-γ3), but t's are update naively.

Tree case: the 3-cycle is shrunk to a tree node, which occurs if node 1 & 2 are both tree nodes (that is, e3 is a tree edge). If eC is the child edge of hybrid, the 3-cycle (left) is shrunk as on the right:

\eA                  \eA
 1--e3--2--eB--       \
  \    /               n--eB--
 γ1\  /γ2              |
  hybrid               |eC
    |
    |eC

with new branch lengths: new tA = tA + γ2.t3, new tB = tB + γ1.t3, new tC = tC + γ1.t1 + γ2.t2, provided that γ1, γ2=1-γ1, t1, t2 and t3 are not missing. If one is missing, then e1 is deleted naively such that tB is unchanged, new tC = tC + t2 and new tA = tA + t3.

source
PhyloNetworks.shrinkedge!Method
shrinkedge!(net::HybridNetwork, edge::Edge)

Delete edge from net, provided that it is a non-external tree edge. Specifically: delete its child node (as determined by isChild1) and connect all edges formerly incident to this child node to the parent node of edge, thus creating a new polytomy, unless the child was of degree 2.

Warning: it's best for isChild1 to be in sync with the root for this. If not, the shrinking may fail (if edge is a tree edge but its "child" is a hybrid) or the root may change arbitrarily (if the child of edge is the root).

Output: true if the remaining node (parent of edge) becomes a hybrid node with more than 1 child after the shrinking; false otherwise (e.g. no polytomy was created, or the new polytomy is below a tree node)

source
PhyloNetworks.startingBL!Function
startingBL!(net::HybridNetwork,
            trait::AbstractVector{Vector{Union{Missings.Missing,Int}}},
            siteweight=ones(length(trait[1]))::AbstractVector{Float64})

Calibrate branch lengths in net by minimizing the mean squared error between the JC-adjusted pairwise distance between taxa, and network-predicted pairwise distances, using calibrateFromPairwiseDistances!. siteweight[k] gives the weight of site (or site pattern) k (default: all 1s). Note: the network is not "unzipped". PhyLiNC unzips reticulations later.

Assumptions:

  • all species have the same number of traits (sites): length(trait[i]) constant
  • trait[i] is for leaf with node.number = i in net, and trait[i][j] = k means that leaf number i has state index k for trait j. These indices are those used in a substitution model: kth value of getlabels(model).
  • Hamming distances are < 0.75 with four states, or < (n-1)/n for n states. If not, all pairwise hamming distances are scaled by .75/(m*1.01) where m is the maximum observed hamming distance, to make them all < 0.75.
source
PhyloNetworks.startingrateMethod
startingrate(net)

Estimate an evolutionary rate appropriate for the branch lengths in the network, which should be a good starting value before optimization in fitdiscrete, assuming approximately 1 change across the entire tree. If all edge lengths are missing, set starting rate to 1/(number of taxa).

source
PhyloNetworks.startree_newickFunction
startree_newick(n, l)

String for the Newick parenthetical description of the star tree with n tips, and all branch lengths equal to l.

source
PhyloNetworks.symmetricnet_newickFunction
symmetricnet_newick(n::Int, h::Int, γ::Real)

Create a string for a symmetric network with 2^n tips, numbered from 1 to 2^n, with a symmetric major tree, whose branch lengths are all equal. 2^(n-h) hybrids are added from level h to h-1 "symmetrically". The network is time-consistent and ultrametric, with a total height of 1.

source
PhyloNetworks.symmetricnet_newickMethod
symmetricnet_newick(n::Int, i::Int, j::Int, γ::Real, l::Real)

Newick string for a network with a symmetric major tree with 2^n tips, numbered from 1 to 2^n. All the branch lengths of the major tree are set to l. One hybrid branch, going from level i to level j is added, cutting in half each initial edge in the tree. The new edge has length l and inheritance γ.

source
PhyloNetworks.symmetrictree_newickFunction
symmetrictree_newick(n::Int, ell::Real, i=1)

String for the Newick parenthetical description of a symmetric tree with 2^n tips, numbered from i to i+2^n-1. All branch lengths are set equal to ell. The tree can be created later by reading the string with readTopology.

source
PhyloNetworks.synchronizePartnersData!Method
synchronizePartnersData!(e::Edge, n::Node)

Synchronize γ and isMajor for edges e and its partner, both hybrid edges with the same child n:

  • if one γ is missing and the other is not: set the missing γ to 1 - the other
  • γ's should sum up to 1.0
  • update isMajor to match the γ information: the major edge is the one with γ > 0.5.

Warnings: does not check that e is a hybrid edge, nor that n is the child of e.

source
PhyloNetworks.taxadiffMethod
taxadiff(Vector{Quartet}, network; multiplealleles=true)
taxadiff(DataCF, network; multiplealleles=true)

Return 2 vectors:

  • taxa in at least 1 of the quartets but not in the network, and
  • taxa in the network but in none of the quartets.

When multiplealleles is true, the taxon names that end with "__2" are ignored in the quartets: they are not expected to appear in the networks that users give as input, or get as output.

source
PhyloNetworks.traitlabels2indicesMethod
traitlabels2indices(data, model::SubstitutionModel)

Check that the character states in data are compatible with (i.e. subset of) the trait labels in model. All columns are used. data can be a DataFrame or a Matrix (multiple traits), or a Vector (one trait). Return a vector of vectors (one per species) with integer entries, where each state (label) is replaced by its index in model. For DNA data, any ambiguous site is treated as missing.

source
PhyloNetworks.traverseContainRoot!Method
updateContainRoot!(HybridNetwork, Node)
traverseContainRoot!(Node, Edge, edges_changed::Array{Edge,1}, rightDir::Vector{Bool})

The input node to updateContainRoot! must be a hybrid node (can come from searchHybridNode). updateContainRoot! starts at the input node and calls traverseContainRoot!, which traverses the network recursively. By default, containRoot attributes of edges are true. Changes containRoot to false for all the visited edges: those below the input node, but not beyond any other hybrid node.

updateContainRoot! Returns a flag and an array of edges whose containRoot has been changed from true to false. flag is false if the set of edges to place the root is empty

In traverseContainRoot!, rightDir turns false if hybridizations have incompatible directions (vector of length 1, to be modified).

Warning:

  • does not update containRoot of minor hybrid edges.
  • assumes correct isMajor attributes: to stop the recursion at minor hybrid edges.
  • assumes correct hybrid attributes of both nodes & edges: to check if various hybridizations have compatible directions. For each hybrid node that is encountered, checks if it was reached via a hybrid edge (ok) or tree edge (not ok).

rightDir: vector of length 1 boolean, to be mutable and modified by the function

source
PhyloNetworks.undoGammaz!Method
undoGammaz!(node, network)

Undo updateGammaz! for the 2 cases: bad diamond I,II. node should be a hybrid node. Set length to edges that were not identifiable and change edges' gammaz attribute to -1.0. Recalculate branch lengths in terms of gammaz. warning: needs to know incycle attributes

source
PhyloNetworks.unzip_canonical!Method
unzip_canonical!(net::HybridNetwork)

Unzip all reticulations: set the length of child edge to 0, and increase the length of both parent edges by the original child edge's length, to obtain the canonical version of the network according to Pardi & Scornavacca (2015).

Output: vector of hybrid node in postorder, vector of child edges whose length is constrained to be 0, and vector of their original branch lengths to re-zip if needed using rezip_canonical!.

Assumption: net.hybrid is correct, but a preordering of all nodes is not assumed.

Note: This unzipping is not as straightforward as it might seem, because of "nested" zippers: when the child of a hybrid node is itself a hybrid node. The unzipping is propagated all the way through.

source
PhyloNetworks.updateBL!Method
updateBL!(net::HybridNetwork, d::DataCF)

Update internal branch lengths of net based on the average quartet concordance factor (CF) across all quartets that exactly correspond to a given branch: new branch length = -log(3/2(1-mean(CF observed in d))). net is assumed to be a tree, such that the above equation holds.

source
PhyloNetworks.updateContainRoot!Function
updateContainRoot!(HybridNetwork, Node)
traverseContainRoot!(Node, Edge, edges_changed::Array{Edge,1}, rightDir::Vector{Bool})

The input node to updateContainRoot! must be a hybrid node (can come from searchHybridNode). updateContainRoot! starts at the input node and calls traverseContainRoot!, which traverses the network recursively. By default, containRoot attributes of edges are true. Changes containRoot to false for all the visited edges: those below the input node, but not beyond any other hybrid node.

updateContainRoot! Returns a flag and an array of edges whose containRoot has been changed from true to false. flag is false if the set of edges to place the root is empty

In traverseContainRoot!, rightDir turns false if hybridizations have incompatible directions (vector of length 1, to be modified).

Warning:

  • does not update containRoot of minor hybrid edges.
  • assumes correct isMajor attributes: to stop the recursion at minor hybrid edges.
  • assumes correct hybrid attributes of both nodes & edges: to check if various hybridizations have compatible directions. For each hybrid node that is encountered, checks if it was reached via a hybrid edge (ok) or tree edge (not ok).

rightDir: vector of length 1 boolean, to be mutable and modified by the function

source
PhyloNetworks.updatePostOrder!Function
recursionPostOrder(net::HybridNetwork, checkPreorder::Bool,
                   init_function, tip_function, node_function,
                   indexation="b", parameters...)
recursionPostOrder(nodes, init_function, tip_function, node_function,
                   parameters)
updatePostOrder!(index, nodes, updated_matrix, tip_function, node_function,
                parameters)

Generic tool to apply a post-order (or topological ordering) algorithm, acting on a matrix where rows & columns correspond to nodes. Used by descendenceMatrix.

source
PhyloNetworks.updatePreOrder!Function
recursionPreOrder(nodes, init_function, root_function, tree_node_function,
                  hybrid_node_function, parameters)
recursionPreOrder!(nodes, AbstractArray, root_function, tree_node_function,
                   hybrid_node_function, parameters)
updatePreOrder(index, nodes, updated_matrix, root_function, tree_node_function,
               hybrid_node_function, parameters)

Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedPathMatrix and by pairwiseTaxonDistanceMatrix.

source
PhyloNetworks.updateSSM!Function
updateSSM!(obj::SSM, renumber=false::Bool;
           constraints=TopologyConstraint[]::Vector{TopologyConstraint})

After adding or removing a hybrid, displayed trees will change. Updates the displayed tree list. Return SSM object.

if renumber, reorder edge and internal node numbers. Only need to renumber after deleting a hybrid (which could remove edges and nodes from the middle of the edge and node lists).

Assumptions:

  • The SSM object has cache arrays of size large enough, that is, the constructor StatisticalSubstitutionModel was previously called with maxhybrid equal or greater than in obj.net. obj.priorltw is not part of the "cache" arrays.

Warning: Does not update the likelihood.

julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");

julia> fastafile = abspath(joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "simple.aln"));

julia> obj = PhyloNetworks.StatisticalSubstitutionModel(net, fastafile, :JC69; maxhybrid=3);

julia> PhyloNetworks.checknetwork_LiNC!(obj.net, 3, true, true);

julia> using Random; Random.seed!(432);

julia> PhyloNetworks.addhybridedge!(obj.net, obj.net.edge[8], obj.net.edge[1], true, 0.0, 0.4);

julia> writeTopology(obj.net)
"(((B:1.0)#H1:0.1::0.9,(A:1.0)#H2:1.0::0.6):1.5,(C:0.6,#H1:1.0::0.1):1.0,(D:1.25,#H2:0.0::0.4):1.25);"

julia> length(obj.displayedtree) # still as if 1 single reticulation
2

julia> PhyloNetworks.updateSSM!(obj);

julia> length(obj.displayedtree) # now correct 4 displayed treess: 2 reticulations
4
source
PhyloNetworks.update_logtransMethod
update_logtrans(obj::SSM)

Initialize and update obj.logtrans, the log transition probabilities along each edge in the full network. They are re-used for each displayed tree, which is why edges are not fused around degree-2 nodes when extracting displayed trees.

source
PhyloNetworks.updatecache_edge!Method
updatecache_edge!(lcache::CacheLengthLiNC, obj::SSM, focusedge)

Update fields in lcache, to correspond to the forward likelihood at the child node of the focus edge, backwards (at parent node) x directional (at sister edges) likelihoods, and keep track of which displayed trees do have the focus edge. These don't change if the length of the focus edge is modified. These quantities are then rescaled on the log scale (to get a max of 0) and exponentiated to get back to the probability scale.

Assumptions: the following fields of obj are up-to-date:

  • displayed trees, with nodes preordered
  • prior log tree weights in .priorltw
  • log transition probabilities in .logtrans

Output:

  • missing, if the edge length does not affect the likelihood,
  • constant used to rescale each site on the log scale, otherwise.
source
PhyloNetworks.updateconstraintfields!Method
updateconstraintfields!(constraints::Vector{TopologyConstraint}, net::HybridNetwork)

Update fields stem edge and crown node to match the given net.

Assumes that the constraints are still met in net, and that nodes & edges are numbered identically in net as in the network used to create all constraints.

fixit: remove the assumption that constraints are still met, since an NNI near the crown of a constrained clade might change the crown node and / or the stem edge (u and v exchange).

source
PhyloNetworks.updateconstraints!Method
updateconstraints!(constraints::Vector{TopologyConstraint}, net::HybridNetwork)

Update the set taxonnum in each constraint, assuming that the stem edge and the crown node are still correct, and that their descendants are still correct. May be needed if the node and edge numbers were modified by resetNodeNumbers! or resetEdgeNumbers!.

Warning: does not check that the names of leaves with numbers in taxonnum are taxonnames.

julia> net = readTopology("(((2a,2b),(((((1a,1b,1c),4),(5)#H1),(#H1,(6,7))))#H2),(#H2,10));");

julia> c_species1 = PhyloNetworks.TopologyConstraint(0x01, ["1a","1b","1c"], net)
Species constraint, on tips: 1a, 1b, 1c
 stem edge number 7
 crown node number -9

julia> c_species1.taxonnums
Set{Int64} with 3 elements:
  5
  4
  3

julia> c_clade145 = PhyloNetworks.TopologyConstraint(0x02, ["1a","1b","1c","4","5"], net)
Clade constraint, on tips: 1a, 1b, 1c, 4, 5
 stem edge number 12
 crown node number -7

julia> PhyloNetworks.resetNodeNumbers!(net)

julia> net.node[4].number = 111;

julia> PhyloNetworks.updateconstraints!([c_species1, c_clade145], net)

julia> c_species1
Species constraint, on tips: 1a, 1b, 1c
 stem edge number 7
 crown node number 21

julia> c_species1.taxonnums
Set{Int64} with 3 elements:
  5
  4
  111
source
PhyloNetworks.writeTopologyLevel1Method

writeTopologyLevel1(net::HybridNetwork)

Write the extended Newick parenthetical format of a level-1 network object with many optional arguments (see below). Makes a deep copy of net: does not modify net.

  • di=true: write in format for Dendroscope (default false)
  • namelabel=true: If namelabel is true, taxa are labelled by their names;

otherwise taxa are labelled by their numbers (unique identifiers).

  • outgroup (string): name of outgroup to root the tree/network. if "none" is given, the root is placed wherever possible.
  • printID=true, only print branch lengths for identifiable egdes according to the snaq estimation procedure (default false) (true inside of snaq!.)
  • round: rounds branch lengths and heritabilities γ (default: true)
  • digits: digits after the decimal place for rounding (defult: 3)
  • string: if true (default), returns a string, otherwise returns an IOBuffer object.
  • multall: (default false). set to true when there are multiple alleles per population.

The topology may be written using a root different than net.root, if net.root is incompatible with one of more hybrid node. Missing hybrid names are written as "#Hi" where "i" is the hybrid node number if possible.

source
StatsAPI.coeftableMethod
coeftable(m::PhyloNetworkLinearModel; level::Real=0.95)

Return coefficient estimates, standard errors, t-values, p-values, and t-intervals as a StatsBase.CoefTable.

source
StatsAPI.confintMethod
confint(m::PhyloNetworkLinearModel; level::Real=0.95)

Return confidence intervals for coefficients, with confidence level level, based on the t-distribution whose degree of freedom is determined by the number of species (as returned by dof_residual)

source
StatsAPI.devianceFunction
StatsBase.deviance(m::PhyloNetworkLinearModel)

-2 loglikelihood of the fitted model. See also loglikelihood.

Note: this is not the residual-sum-of-squares deviance as output by GLM, such as one would get with deviance(m.model).

source
StatsAPI.devianceMethod
StatsBase.deviance(m::PhyloNetworkLinearModel, Val(true))

Residual sum of squares with metric V, the estimated phylogenetic covariance, if the model is appropriate.

source
StatsAPI.fitMethod
fit(StatisticalSubstitutionModel, net, model, traits; kwargs...)
fit!(StatisticalSubstitutionModel; kwargs...)

Internal function called by fitdiscrete: with same key word arguments kwargs. But dangerous: traits should be a vector of vectors as for fitdiscrete but here traits need to contain the indices of trait values corresponding to the indices in getlabels(model), and species should appear in traits in the order corresponding to the node numbers in net. See traitlabels2indices to convert trait labels to trait indices.

Warning: does not perform checks. fitdiscrete calls this function after doing checks, preordering nodes in the network, making sure nodes have consecutive numbers, species are matched between data and network etc.

source
StatsAPI.loglikelihoodMethod
loglikelihood(m::PhyloNetworkLinearModel)

Log likelihood, or log restricted likelihood (REML), depending on m.reml.

For models with no within-species variation, the likelihood (or REML) is calculated based on the joint density for species-level mean responses.

For within-species variation models, the likelihood is calculated based on the joint density for individual-level responses. This can be calculated from individual-level data, but also by providing species-level means and standard deviations which is accepted by phylolm.

Warning: many summaries are based on the species-level model, like "dof_residual", "residuals", "predict" or "deviance". So deviance is innapropriate to compare models with within-species variation. Use loglikelihood to compare models based on data at the individual level.

Reminder: do not compare ML or REML values across models fit on different data. Do not compare REML values across models that do not have the same predictors (fixed effects): use ML instead, for that purpose.

source
StatsAPI.nobsMethod
StatsBase.nobs(m::PhyloNetworkLinearModel)

Number of observations: number of species with data, if the model assumes known species means, and number of individuals with data, if the model accounts for within-species variation.

source
StatsAPI.nulldevianceMethod
StatsBase.nulldeviance(m::PhyloNetworkLinearModel)

For appropriate phylogenetic linear models, the deviance of the null model is the total sum of square with respect to the metric V, the estimated phylogenetic covariance matrix.

source
StatsAPI.stderrorMethod
stderror(m::PhyloNetworkLinearModel)

Return the standard errors of the coefficient estimates. See vcov for related information on how these are computed.

source
StatsAPI.vcovMethod
vcov(m::PhyloNetworkLinearModel)

Return the variance-covariance matrix of the coefficient estimates.

For the continuous trait evolutionary models currently implemented, species-level mean response (conditional on the predictors), Y|X is modeled as:

  1. Y|X ∼ 𝒩(Xβ, σ²ₛV) for models assuming known species mean values (no within-species variation)
  2. Y|X ∼ 𝒩(Xβ, σ²ₛV + σ²ₑD⁻¹) for models with information from multiple individuals and assuming within-species variation

The matrix V is inferred from the phylogeny, but may also depend on additional parameters to be estimated (e.g. lambda for Pagel's Lambda model). See ContinuousTraitEM, PhyloNetworkLinearModel for more details.

If (1), then return σ²ₛ(X'V⁻¹X)⁻¹, where σ²ₛ is estimated with REML, even if the model was fitted with reml=false. This follows the conventions of nlme::gls and stats::glm in R.

If (2), then return σ²ₛ(X'W⁻¹X)⁻¹, where W = V+(σ²ₑ/σ²ₛ)D⁻¹ is estimated, and σ²ₛ & σₑ are the estimates obtained with ML or REML, depending on the reml option used to fit the model m. This follows the convention of MixedModels.fit in Julia.

source
StatsModels.isnestedMethod
isnested(m1::PhyloNetworkLinearModel, m2::PhyloNetworkLinearModel)
isnested(m1::ContinuousTraitEM, m2::ContinuousTraitEM)

True if m1 is nested in m2, false otherwise. Models fitted with different criteria (ML and REML) are not nested. Models with different predictors (fixed effects) must be fitted with ML to be considered nested.

source
PhyloNetworks.fAbsConstant

default values for tolerance parameters, used in the optimization of branch lengths (fAbs, fRel, xAbs, xRel) and in the acceptance of topologies (likAbs, numFails).

if changes are made here, make the same in the docstring for snaq! below

versionfAbsfRelxAbsxRelnumFailslikAbsmultiplier
v0.5.11e-61e-61e-31e-2751e-6
v0.3.01e-61e-51e-41e-31000.01
v0.0.11e-61e-51e-41e-310010000
older1e-101e-121e-101e-10

v0.5.1: based on Nan Ji's work. same xAbs and xRel as in phylonet (as of 2015). earlier: multiplier was used; later: likAbs = multiplier*fAbs) "older": values from GLM.jl, Prof Bates

default values used on a single topology, to optimize branch lengths and gammas, at the very end of snaq!, and by topologyMaxQPseudolik! since v0.5.1.

versionfAbsBLfRelBLxAbsBLxRelBL
v0.0.11e-101e-121e-101e-10
source