Maximum likehood estimation for (SecSSE) with parameter as complex functions. Cladogenetic version
Source:R/secsse_ml_func_def_pars.R
cla_secsse_ml_func_def_pars.Rd
Maximum likehood estimation under cla Several examined and concealed States-dependent Speciation and Extinction (SecSSE) where some paramaters are functions of other parameters and/or factors. Offers the option of cladogenesis
Usage
cla_secsse_ml_func_def_pars(
phy,
traits,
num_concealed_states,
idparslist,
idparsopt,
initparsopt,
idfactorsopt,
initfactors,
idparsfix,
parsfix,
idparsfuncdefpar,
functions_defining_params,
cond = "proper_cond",
root_state_weight = "proper_weights",
sampling_fraction,
tol = c(1e-04, 1e-05, 1e-07),
maxiter = 1000 * round((1.25)^length(idparsopt)),
optimmethod = "simplex",
num_cycles = 1,
loglik_penalty = 0,
is_complete_tree = FALSE,
take_into_account_root_edge = FALSE,
verbose = TRUE,
num_threads = 1,
atol = 1e-12,
rtol = 1e-12,
method = "odeint::runge_kutta_cash_karp54",
use_normalization = TRUE,
return_root_state = FALSE
)
Arguments
- phy
phylogenetic tree of class
phylo
, rooted and with branch lengths. Alternatively, multiple phylogenetic trees can be provided as themultiPhylo
class.- traits
vector with trait states for each tip in the phylogeny. The order of the states must be the same as the tree tips. For help, see
vignette("starting_secsse", package = "secsse")
. When providing amultiPhylo
set of multiple phylognies, traits should be a list where each entry in the list corresponds to the matching phylogeny on that position.- num_concealed_states
number of concealed states, generally equivalent to the number of examined states in the dataset.
- idparslist
overview of parameters and their values.
- idparsopt
a numeric vector with the ID of parameters to be estimated.
- initparsopt
a numeric vector with the initial guess of the parameters to be estimated.
- idfactorsopt
id of the factors that will be optimized. There are not fixed factors, so use a constant within
functions_defining_params
.- initfactors
the initial guess for a factor (it should be set to
NULL
when no factors).- idparsfix
a numeric vector with the ID of the fixed parameters.
- parsfix
a numeric vector with the value of the fixed parameters.
- idparsfuncdefpar
id of the parameters which will be a function of optimized and/or fixed parameters. The order of id should match
functions_defining_params
.- functions_defining_params
a list of functions. Each element will be a function which defines a parameter e.g.
id_3 <- (id_1 + id_2) / 2
. See example.- cond
condition on the existence of a node root:
"maddison_cond"
,"proper_cond"
(default). For details, see vignette.- root_state_weight
the method to weigh the states:
"maddison_weights"
,"proper_weights"
(default) or"equal_weights"
. It can also be specified for the root state: the vectorc(1, 0, 0)
indicates state 1 was the root state. When using amultiPhylo
object, root_state_weight should be list where each entry in the list corresponds to the root_state_weight for each tree.- sampling_fraction
vector that states the sampling proportion per trait state. It must have as many elements as there are trait states. When using a
multiPhylo
object, sampling fraction should be list where each entry in the list corresponds to the sampling proportion for each tree.- tol
A numeric vector with the maximum tolerance of the optimization algorithm. Default is
c(1e-04, 1e-05, 1e-05)
.- maxiter
max number of iterations. Default is
1000 * round((1.25) ^ length(idparsopt))
.- optimmethod
A string with method used for optimization. Default is
"simplex"
. Alternative is"subplex"
. Both are called fromDDD::optimizer()
, simplex is implemented natively in DDD, while subplex is ultimately called fromsubplex::subplex()
.- num_cycles
Number of cycles of the optimization. When set to
Inf
, the optimization will be repeated until the result is, within the tolerance, equal to the starting values, with a maximum of 10 cycles.- loglik_penalty
the size of the penalty for all parameters; default is 0 (no penalty).
- is_complete_tree
logical specifying whether or not a tree with all its extinct species is provided. If set to
TRUE
, it also assumes that all all extinct lineages are present on the tree. Defaults toFALSE
.- take_into_account_root_edge
if TRUE, the LL integration is continued along the root edge. This also affects conditioning (as now, conditioning no longer needs to assume a speciation event at the start of the tree)
- verbose
sets verbose output; default is
TRUE
whenoptimmethod
is"simplex"
. Ifoptimmethod
is set to"simplex"
, then even if set toFALSE
, optimizer output will be shown.- num_threads
number of threads to be used. Default is one thread.
- atol
A numeric specifying the absolute tolerance of integration.
- rtol
A numeric specifying the relative tolerance of integration.
- method
integration method used, available are:
"odeint::runge_kutta_cash_karp54"
,"odeint::runge_kutta_fehlberg78"
,"odeint::runge_kutta_dopri5"
,"odeint::bulirsch_stoer"
and"odeint::runge_kutta4"
. Default method is:"odeint::runge_kutta_cash_karp54"
.- use_normalization
normalize the density vector during integration, more accurate but slower (default = TRUE)
- return_root_state
if TRUE, returns the state of the system at the root, this can be useful to use as the starting point of a simulation. When used in ML, after finishing the ML optimization, the found optimum is evaluated one more time to retrieve the root state (to avoid having to store the root state every ML evaluation).
Examples
# Example of how to set the arguments for an ML search. The ML search is
# stopped after 10 iterations to keep the run time short.
library(secsse)
library(DDD)
set.seed(16)
phylotree <- ape::rbdtree(0.07,0.001,Tmax=50)
startingpoint <- bd_ML(brts = ape::branching.times(phylotree))
#> You are optimizing lambda0 mu0
#> You are fixing lambda1 mu1
#> Optimizing the likelihood - this may take a while.
#> The loglikelihood for the initial parameter values is -106.425581129189.
#>
#> Maximum likelihood parameter estimates: lambda0: 0.056537, mu0: 0.000000, lambda1: 0.000000, mu1: 0.000000:
#> Maximum loglikelihood: -104.592844
intGuessLamba <- startingpoint$lambda0
intGuessMu <- startingpoint$mu0
traits <- sample(c(0,1,2),
ape::Ntip(phylotree), replace = TRUE) # get some traits
num_concealed_states <- 3
idparslist <- cla_id_paramPos(traits, num_concealed_states)
idparslist$lambdas[1,] <- c(1,2,3,1,2,3,1,2,3)
idparslist[[2]][] <- 4
masterBlock <- matrix(c(5,6,5,6,5,6,5,6,5),ncol = 3, nrow=3, byrow = TRUE)
diag(masterBlock) <- NA
diff.conceal <- FALSE
idparslist[[3]] <- q_doubletrans(traits,masterBlock,diff.conceal)
idparsfuncdefpar <- c(3,5,6)
idparsopt <- c(1,2)
idparsfix <- c(0,4)
initparsopt <- c(rep(intGuessLamba,2))
parsfix <- c(0,0)
idfactorsopt <- 1
initfactors <- 4
# functions_defining_params is a list of functions. Each function has no
# arguments and to refer
# to parameters ids should be indicated as 'par_' i.e. par_3 refers to
# parameter 3. When a
# function is defined, be sure that all the parameters involved are either
# estimated, fixed or
# defined by previous functions (i.e, a function that defines parameter in
# 'functions_defining_params'). The user is responsible for this. In this
# example, par_3
# (i.e., parameter 3) is needed to calculate par_6. This is correct because
# par_3 is defined
# in the first function of 'functions_defining_params'. Notice that factor_1
# indicates a value
# that will be estimated to satisfy the equation. The same factor can be
# shared to define several parameters.
functions_defining_params <- list()
functions_defining_params[[1]] <- function() {
par_3 <- par_1 + par_2
}
functions_defining_params[[2]] <- function() {
par_5 <- par_1 * factor_1
}
functions_defining_params[[3]] <- function() {
par_6 <- par_3 * factor_1
}
tol = c(1e-02, 1e-03, 1e-04)
optimmethod = 'subplex'
cond <- 'proper_cond'
root_state_weight <- 'proper_weights'
sampling_fraction <- c(1,1,1)
maxiter <- 10
model <- cla_secsse_ml_func_def_pars(phylotree,
traits,
num_concealed_states,
idparslist,
idparsopt,
initparsopt,
idfactorsopt,
initfactors,
idparsfix,
parsfix,
idparsfuncdefpar,
functions_defining_params,
cond,
root_state_weight,
sampling_fraction,
tol,
maxiter,
optimmethod,
num_cycles = 1)
#> Note: you set some transitions as impossible to happen.
#> 0.0565371031208431 0.0565371031208431 4 -138.29893433203
#> Calculating the likelihood for the initial parameters.
#> The loglikelihood for the initial parameter values is -138.29893433203
#> Optimizing the likelihood - this may take a while.
#> 0.0565371031208431 0.0565371031208431 4 -138.29893433203
#> -19.6874997974797 0.0565371031208431 4 -Inf
#> 0.0565371031208431 -19.6874997974797 4 -Inf
#> 0.0565371031208431 0.0565371031208431 -2.25 -Inf
#> 2.57370518668921 2.57370518668921 -0.166666666666667 -Inf
#> 1.23970037744085 1.23970037744085 0.428571428571429 -756.458030948603
#> -19.6874997974797 -0.380096751677633 0.875 -Inf
#> 4.08936171714845 -0.164220824193095 1.22222222222222 -Inf
#> 1.23970037744085 0.0565371031208431 4 -440.773246837265
#> 0.0565371031208431 1.23970037744085 4 -429.554227390291
#> 0.435774310919359 0.435774310919359 1.22222222222222 -318.572024861172
#> Warning: Optimization has not converged. Try again with different initial values or increase the number of iterations.
# ML -136.5796 if run till completion