Maximum likehood estimation for (SecSSE) with parameter as complex functions. Cladogenetic version

Maximum likehood estimation under cla Several examined and concealed States-dependent Speciation and Extinction (SecSSE) where some paramaters are functions of other parameters and/or factors. Offers the option of cladogenesis

Usage

cla_secsse_ml_func_def_pars(
  phy,
  traits,
  num_concealed_states,
  idparslist,
  idparsopt,
  initparsopt,
  idfactorsopt,
  initfactors,
  idparsfix,
  parsfix,
  idparsfuncdefpar,
  functions_defining_params,
  cond = "proper_cond",
  root_state_weight = "proper_weights",
  sampling_fraction,
  tol = c(1e-04, 1e-05, 1e-07),
  maxiter = 1000 * round((1.25)^length(idparsopt)),
  optimmethod = "simplex",
  num_cycles = 1,
  loglik_penalty = 0,
  is_complete_tree = FALSE,
  take_into_account_root_edge = FALSE,
  verbose = TRUE,
  num_threads = 1,
  atol = 1e-12,
  rtol = 1e-12,
  method = "odeint::bulirsch_stoer",
  use_normalization = TRUE
)

Arguments

phy: phylogenetic tree of class phylo, rooted and with branch lengths. Alternatively, multiple phylogenetic trees can be provided as the multiPhylo class.
traits: vector with trait states for each tip in the phylogeny. The order of the states must be the same as the tree tips. For help, see vignette("starting_secsse", package = "secsse"). When providing a multiPhylo set of multiple phylognies, traits should be a list where each entry in the list corresponds to the matching phylogeny on that position.
num_concealed_states: number of concealed states, generally equivalent to the number of examined states in the dataset.
idparslist: overview of parameters and their values.
idparsopt: a numeric vector with the ID of parameters to be estimated.
initparsopt: a numeric vector with the initial guess of the parameters to be estimated.
idfactorsopt: id of the factors that will be optimized. There are not fixed factors, so use a constant within functions_defining_params.
initfactors: the initial guess for a factor (it should be set to NULL when no factors).
idparsfix: a numeric vector with the ID of the fixed parameters.
parsfix: a numeric vector with the value of the fixed parameters.
idparsfuncdefpar: id of the parameters which will be a function of optimized and/or fixed parameters. The order of id should match functions_defining_params.
functions_defining_params: a list of functions. Each element will be a function which defines a parameter e.g. id_3 <- (id_1 + id_2) / 2. See example.
cond: condition on the existence of a node root: "maddison_cond", "proper_cond" (default). For details, see vignette.
root_state_weight: the method to weigh the states: "maddison_weights", "proper_weights" (default) or "equal_weights". It can also be specified for the root state: the vector c(1, 0, 0) indicates state 1 was the root state. When using a multiPhylo object, root_state_weight should be list where each entry in the list corresponds to the root_state_weight for each tree.
sampling_fraction: vector that states the sampling proportion per trait state. It must have as many elements as there are trait states. When using a multiPhylo object, sampling fraction should be list where each entry in the list corresponds to the sampling proportion for each tree.
tol: A numeric vector with the maximum tolerance of the optimization algorithm. Default is c(1e-04, 1e-05, 1e-05).
maxiter: max number of iterations. Default is 1000 * round((1.25) ^ length(idparsopt)).
optimmethod: A string with method used for optimization. Default is "simplex". Alternative is "subplex". Both are called from DDD::optimizer(), simplex is implemented natively in DDD, while subplex is ultimately called from subplex::subplex().
num_cycles: Number of cycles of the optimization. When set to Inf, the optimization will be repeated until the result is, within the tolerance, equal to the starting values, with a maximum of 10 cycles.
loglik_penalty: the size of the penalty for all parameters; default is 0 (no penalty).
is_complete_tree: logical specifying whether or not a tree with all its extinct species is provided. If set to TRUE, it also assumes that all all extinct lineages are present on the tree. Defaults to FALSE.
take_into_account_root_edge: if TRUE, the LL integration is continued along the root edge. This also affects conditioning (as now, conditioning no longer needs to assume a speciation event at the start of the tree)
verbose: sets verbose output; default is TRUE when optimmethod is "simplex". If optimmethod is set to "simplex", then even if set to FALSE, optimizer output will be shown.
num_threads: number of threads to be used. Default is one thread.
atol: A numeric specifying the absolute tolerance of integration.
rtol: A numeric specifying the relative tolerance of integration.
method: integration method used, available are: "odeint::runge_kutta_cash_karp54", "odeint::runge_kutta_fehlberg78", "odeint::runge_kutta_dopri5", "odeint::bulirsch_stoer" and "odeint::runge_kutta4". Default method is: "odeint::bulirsch_stoer".
use_normalization: normalize the density vector during integration, more accurate but slower (default = TRUE)

Value

Parameter estimated and maximum likelihood

Examples

# Example of how to set the arguments for an ML search. The ML search is
# stopped after 10 iterations to keep the run time short.
library(secsse)
library(DDD)
set.seed(16)
phylotree <- ape::rbdtree(0.07,0.001,Tmax=50)
startingpoint <- bd_ML(brts = ape::branching.times(phylotree))
#> You are optimizing lambda0 mu0 
#> You are fixing lambda1 mu1 
#> Optimizing the likelihood - this may take a while. 
#> The loglikelihood for the initial parameter values is -106.425581129189.
#> 
#> Maximum likelihood parameter estimates: lambda0: 0.056537, mu0: 0.000000, lambda1: 0.000000, mu1: 0.000000: 
#> Maximum loglikelihood: -104.592844
intGuessLamba <- startingpoint$lambda0
intGuessMu <- startingpoint$mu0
traits <-  sample(c(0,1,2),
                 ape::Ntip(phylotree), replace = TRUE) # get some traits
num_concealed_states <- 3
idparslist <- cla_id_paramPos(traits, num_concealed_states)
idparslist$lambdas[1,] <- c(1,2,3,1,2,3,1,2,3)
idparslist[[2]][] <- 4
masterBlock <- matrix(c(5,6,5,6,5,6,5,6,5),ncol = 3, nrow=3, byrow = TRUE)
diag(masterBlock) <- NA
diff.conceal <- FALSE
idparslist[[3]] <- q_doubletrans(traits,masterBlock,diff.conceal)
idparsfuncdefpar <- c(3,5,6)
idparsopt <- c(1,2)
idparsfix <- c(0,4)
initparsopt <- c(rep(intGuessLamba,2))
parsfix <- c(0,0)
idfactorsopt <- 1
initfactors <- 4
# functions_defining_params is a list of functions. Each function has no
# arguments and to refer
# to parameters ids should be indicated as 'par_' i.e. par_3 refers to
# parameter 3. When a
# function is defined, be sure that all the parameters involved are either
# estimated, fixed or
# defined by previous functions (i.e, a function that defines parameter in
# 'functions_defining_params'). The user is responsible for this. In this
# example, par_3
# (i.e., parameter 3) is needed to calculate par_6. This is correct because
# par_3 is defined
# in the first function of 'functions_defining_params'. Notice that factor_1
# indicates a value
# that will be estimated to satisfy the equation. The same factor can be
# shared to define several parameters.
functions_defining_params <- list()
functions_defining_params[[1]] <- function() {
 par_3 <- par_1 + par_2
}
functions_defining_params[[2]] <- function() {
 par_5 <- par_1 * factor_1
}
functions_defining_params[[3]] <- function() {
 par_6 <- par_3 * factor_1
}

tol = c(1e-02, 1e-03, 1e-04)
optimmethod = 'subplex'
cond <- 'proper_cond'
root_state_weight <- 'proper_weights'
sampling_fraction <- c(1,1,1)
maxiter <- 10
model <- cla_secsse_ml_func_def_pars(phylotree,
traits,
num_concealed_states,
idparslist,
idparsopt,
initparsopt,
idfactorsopt,
initfactors,
idparsfix,
parsfix,
idparsfuncdefpar,
functions_defining_params,
cond,
root_state_weight,
sampling_fraction,
tol,
maxiter,
optimmethod,
num_cycles = 1)
#> Note: you set some transitions as impossible to happen.
#> 0.0565371031208431 0.0565371031208431 4 -138.298934331988
#> Calculating the likelihood for the initial parameters.
#> The loglikelihood for the initial parameter values is -138.298934331988
#> Optimizing the likelihood - this may take a while.
#> 0.0565371031208431 0.0565371031208431 4 -138.298934331988
#> -19.6874997974797 0.0565371031208431 4 -Inf
#> 0.0565371031208431 -19.6874997974797 4 -Inf
#> 0.0565371031208431 0.0565371031208431 -2.25 -Inf
#> 2.57370518668921 2.57370518668921 -0.166666666666667 -Inf
#> 1.23970037744085 1.23970037744085 0.428571428571429 -756.458030945448
#> -19.6874997974797 -0.380096751677633 0.875 -Inf
#> 4.08936171714845 -0.164220824193095 1.22222222222222 -Inf
#> 1.23970037744085 0.0565371031208431 4 -440.773246835226
#> 0.0565371031208431 1.23970037744085 4 -429.554227389694
#> 0.435774310919359 0.435774310919359 1.22222222222222 -318.572024860541
#> Warning: Optimization has not converged. Try again with different initial values or increase the number of iterations.
# ML -136.5796 if run till completion