
Using FLORAL for survival models with longitudinal microbiome data
Source:vignettes/Using-FLORAL-for-survival-models-with-longitudinal-microbiome-data.Rmd
      Using-FLORAL-for-survival-models-with-longitudinal-microbiome-data.RmdIn this vignette, we illustrate how to apply FLORAL to
fit a Cox model with longitudinal microbiome data. Due to limited
availability of public data sets with survival information, we use
simulated data for illustrative purposes.
Data simulation
We will use the built-in simulation function simu() to
generate longitudinal compositional features and the corresponding
time-to-event. The underlying methodology used for the simulation is
based on a piece-wise exponential distribution as described by Hendry 2014.
By default, the first 10 features out of the 500 features simulated below are associated with the time-to-event.
simdat <- simu(n=200, # sample size
               p=500, # number of features
               model="timedep",
               pct.sparsity = 0.8, # proportion of zeros
               rho=0, # feature-wise correlation
               longitudinal_stability = TRUE # choose to simulate longitudinal features with stable trajectories
)With the simulated data, the log-ratio lasso Cox model with time-dependent features can be fitted by running the following function. Here we provide a detailed description on each arguments:
- First of all, please use longitudinal = TRUEsuch that the algorithm would use the appropriate method to handle longitudinal data.
- The feature matrix input xshould be the count matrix where rows specify samples and columns specify features.
- The vector of IDs of subjects/patients corresponding to the rows of
xshould be input asid.
- The vector of sample collection times corresponding to the rows of
xshould be input astobs.
- The Survobject (Surv(time,status)) of unique patients should be input asy. Please note that the survival data should be sorted with respect to the IDs specified inid.
fit <- FLORAL(x=simdat$xcount,
              y=Surv(simdat$data_unique$t,simdat$data_unique$d),
              family="cox",
              longitudinal = TRUE,
              id = simdat$data$id,
              tobs = simdat$data$t0,
              progress=FALSE,
              plot=TRUE)
fit$selected
#> $min
#> [1] "taxa1"   "taxa2"   "taxa27"  "taxa366" "taxa38"  "taxa5"   "taxa6"  
#> [8] "taxa8"   "taxa9"  
#> 
#> $`1se`
#> [1] "taxa1" "taxa5" "taxa6" "taxa8" "taxa9"
#> 
#> $min.2stage
#> [1] "taxa2"   "taxa366" "taxa38"  "taxa5"   "taxa6"   "taxa8"   "taxa9"  
#> 
#> $`1se.2stage`
#> [1] "taxa1" "taxa5" "taxa6" "taxa8" "taxa9"The list of selected features is saved in fit$selected
as shown above.
To appropriately prepare the data in practice, we have the following recommendations:
- Start with patient metadata which includes survival data (time and
status), sorting the metadata by patient IDs. Extract time and status
variables for the Survobject for input asy.
- Curate the microbiome feature data matrix, sorted by patient IDs and
time of sample collection. Save the patient ID and time of sample
collection vectors for idandtobs. Save the feature table for input asx.