Summarizing selected compositional features over multiple cross validations
Source:R/FLORAL.R
mcv.FLORAL.Rd
Summarizing FLORAL
outputs from multiple random k-fold cross validations
Usage
mcv.FLORAL(
mcv = 10,
ncore = 1,
seed = NULL,
x,
y,
ncov = 0,
family = "gaussian",
longitudinal = FALSE,
id = NULL,
tobs = NULL,
failcode = NULL,
corstr = "exchangeable",
scalefix = FALSE,
scalevalue = 1,
pseudo = 1,
length.lambda = 100,
lambda.min.ratio = NULL,
ncov.lambda.weight = 0,
a = 1,
mu = 1,
maxiter = 100,
ncv = 5,
intercept = FALSE,
step2 = TRUE,
progress = TRUE,
plot = TRUE
)
Arguments
- mcv
Number of random `ncv`-fold cross-validation to be performed.
- ncore
Number of cores used for parallel computation. Default is to use only 1 core.
- seed
A random seed for reproducibility of the results. By default the seed is the numeric form of
Sys.Date()
.- x
Feature matrix, where rows specify subjects and columns specify features. The first
ncov
columns should be patient characteristics and the rest columns are microbiome absolute counts corresponding to various taxa. Ifx
contains longitudinal data, the rows must be sorted in the same order of the subject IDs used iny
.- y
Outcome. For a continuous or binary outcome,
y
is a vector. For survival outcome,y
is aSurv
object.- ncov
An integer indicating the number of first
ncov
columns inx
that will not be subject to the zero-sum constraint.- family
Available options are
gaussian
,binomial
,cox
,finegray
.- longitudinal
TRUE
orFALSE
, indicating whether longitudinal data matrix is specified for inputx
. (Longitudinal=TRUE
andfamily="cox"
or"finegray"
will fit a time-dependent covariate model.Longitudinal=TRUE
andfamily="gaussian"
or"binomial"
will fit a GEE model.)- id
If
longitudinal
isTRUE
,id
specifies subject IDs corresponding to the rows of inputx
.- tobs
If
longitudinal
isTRUE
,tobs
specifies time points corresponding to the rows of inputx
.- failcode
If
family = finegray
,failcode
specifies the failure type of interest. This must be a positive integer.- corstr
If a GEE model is specified, then
corstr
is the corresponding working correlation structure. Options areindependence
,exchangeable
,AR-1
andunstructured
.- scalefix
TRUE
orFALSE
, indicating whether the scale parameter is estimated or fixed if a GEE model is specified.- scalevalue
Specify the scale parameter if
scalefix=TRUE
.- pseudo
Pseudo count to be added to
x
before taking log-transformation- length.lambda
Number of penalty parameters used in the path
- lambda.min.ratio
Ratio between the minimum and maximum choice of lambda. Default is
NULL
, where the ratio is chosen as 1e-2.- ncov.lambda.weight
Weight of the penalty lambda applied to the first
ncov
covariates. Default is 0 such that the firstncov
covariates are not penalized.- a
A scalar between 0 and 1:
a
is the weight for lasso penalty while1-a
is the weight for ridge penalty.- mu
Value of penalty for the augmented Lagrangian
- maxiter
Number of iterations needed for the outer loop of the augmented Lagrangian algorithm.
- ncv
Folds of cross-validation. Use
NULL
if cross-validation is not wanted.- intercept
TRUE
orFALSE
, indicating whether an intercept should be estimated.- step2
TRUE
orFALSE
, indicating whether a second-stage feature selection for specific ratios should be performed for the features selected by the main lasso algorithm. Will only be performed if cross validation is enabled.- progress
TRUE
orFALSE
, indicating whether printing progress bar as the algorithm runs.- plot
TRUE
orFALSE
, indicating whether returning summary plots of selection probability for taxa features.
Value
A list with relative frequencies of a certain feature being selected over mcv
ncv
-fold cross-validations.