Joint analysis of longitudinal and survival data measured on nested time-scales using shared parameter models: an application to fecundity data

Documentation

Description

The following programs and data were utilized in the manuscript "Joint analysis of longitudinal and survival data measured on nested time-scales using shared parameter models: an application to fecundity data" by Alexander C. McLain, Rajeshwari Sundaram, and Germaine M. Buck Louis.  The programs were written by A McLain.  If you have any questions regarding the programs please contact A McLain (mclaina@mailbox.sc.edu).  The analysis was originally completed in May of 2012 with R v2.15.0.

The data in this file is:

  • Oxford_train_day_level.csv and Oxford_test_day_level.csv: Day level data used to estimate parameters and prediction error, respectively.  Each row of this file corresponds to one day for one women.  The variables included are:
    • Y = Intercourse indicator.
    • ind = Woman ID number.        
    • Intercept = 1 for all rows.         
    • FemaleAge = the woman's age minus 31.         
    • Cigarettes/ Alcohol = indicator that the woman consumed at least one cigarette/alcoholic beverage on a given day.          
    • Period = indicator that menstrual bleeding occurred on a given day.
    • Weekend = indicator that the given day was a Fri/Sat/Sun.      
    • Parity = Indicator of one previous live birth.    
    • Parity*Age = interaction between parity and age.        
    • Lag = Lag 1 intercourse value.
    • Cyc-1 = cycle number minus 1.              
    • Day = day relative to ovulation (if known)        
    • day_miss = indicator that the day relative to ovulation was known.
    • RE_ind = indicator of which rows will be used for the empirical bayes estimates (for prediction only).
  • Oxford_train_women_level.csv and Oxford_test_women_level.csv: Woman level data used to estimate parameters and prediction error, respectively.  Each row of this file corresponds to one woman.  The variables included are:
    • Age = standardized value for female age.
    • Parity = Indicator of one previous live birth.
    • TTP = minimum of the woman’s time-to-pregnancy and censoring.
    • CEN = indicator of censoring.
    • Pre_ovJ = length of the follicular phase for cycle J. Set to ‘0’ if unknown and ‘-1’ if unobserved.
    • ind_po = indicator that the woman had a missing follicular phase length.

This file contains 3 programs:

  • R_programs.R = contains all of the necessary R functions.
  • Oxford.Anal.R = used to estimate the parameters (will take ~ 3 weeks to run). 
  • Oxford_prediction.R = used to implement the prediction procedure.

functions in R_code 

R_programs.R
mjm.like.aq {},  mjm.EB {}

Usage

mjm.like.aq(par,Dat,BK)  in conjunction with an optimization program, such as optim or nls.
mjm.EB(par,Dat,nodes,knots,BK): used to get empirical Bayes estimates of the random effects.

Arguments

par: The values of the following parameters
beta: a vector of length q (= dim(X)[2]) of regression coefficients to be used in the intercourse model.
sigma: log standard deviation of the baseline random effect in the intercourse model. 
sigK: log standard deviation of the peakedness random effect in the intercourse model.
sigY: log standard deviation of the baseline random effect in the pre-ovular model.
xi: vector of 2 parameters for the effect the intercourse random effects have on the TTP model.
lambda: vector of max(T) baseline intercepts (i.e., an intercept for each cycle)
betaT: a vector of length pT (i.e., dim(X_T)[2]) to be used for regression coefficients in the TTP model
phi: coefficients for the cubic B-splines function.
rho: a vector coefficients of length pk for the peakedness function.
cov12: correlation between the baseline and peakedness random effects.
mu_po,sigma_po: coefficients for the location and scale of the lognormal preovular model.
Dat: This is a list of all data.  Which includes:

$Y: a vector of intercourse indicators (length L= total number of days observed),
$ind: a vector of indicators of person number, should be consecutively numbered 1:n (length L),
$ind_po: a vector of indicators of all cycles being observed (length n=number of subjects),
$X: intercourse covariate matrix (dimension L x q),
$X_P: intercourse peakedness covariate matrix (dimension L x pk),
$day: vector of day relative to ovulation (length L, set to the day of cycle if unknown),
$day_miss: vector of indicators that the day relative to ovulation known (length L),
$T: a vector of TTP or Censoring values (length n),
$CEN: a vector of censoring indicators (length n),
$X_T: a matrix of covariates for the TTP model (dimension n x pT),
$nij: a matrix of pre-ovular lengths (dimension n x max(T), set to -1 if unobserved).
nodes: number of quadrature nodes.
knots: the values of the knots for the cubic b-spline function.
BK: boundary knots for the cubic b-spline function.

Output

The output of function mjm.like.aq(par,Dat,BK)  is  a sum(log(likelihood))

The output of function mjm.EB(par,Dat,nodes,knots,BK) is  a   n x 3 matrix of empirical Bayes Random Effects.

Details

The function mjm.like.aq(par,Dat,BK)  evaluates a joint model of intercourse and TTP.  Pre-ovular length is also modeled, since the intercourse model is a function of pre-ovular length and pre-ovular length presents informative missingness.  The likelihood is evaluated using adaptive Gaussian quadrature.  First the people with full pre-ovular information are evaluated.  Then those with missing pre-ovular information are evaluated.

The function mjm.EB(par,Dat,nodes,knots,BK) produces empirical Bayes estimates of the Random Effects.

Authors:

Alexander C. McLain, Rajeshwari Sundaram, and Germaine M. Buck Louis

mclaina@mailbox.sc.edu

Examples:

See following R-programs:

  • Oxford.Anal.R :  used to estimate the parameters (will take ~ 3 weeks to run). 
  • Oxford_prediction.R :  used to implement the prediction procedure.

Back to Joint analysis of longitudinal and survival data measured on nested time-scales using shared parameter models: an application to fecundity data