Inference • gamlss.longitudinal

This article gives an overview of the outputs that can be generated from a gamlss.longitudinal fit for inference of various types.

The quick reference is:

(ideally) use check_model() first to make sure there aren’t any major errors flagged in the fitted model that may effect inference
use summary() and confint() for coefficient estimates, standard errors and intervals
use wald_test() for coefficient hypotheses
use likelihood_compare() for likelihoor ratio comparisons for models (ideally nested)
use bootstrap_inference() for bootstrap standard errors and confidence intervals for coefficients
use predict(..., se.fit = TRUE, interval = "confidence") for response-mean intervals, and predict() more broadly for any row-level outputs like prediction itervals
use marginal_effects(..., se.fit = TRUE) for averaged response against covariates
use simulate() for simulating replicates from the fitted model

Variance-covariance and standard errors

summary(fit) reports coefficient estimates with standard errors. The variance covariance can be requested directly using the standard call, vcov(fit). By default these methods use the Hessian of the fitted model, calculated using primarily analytical methods (i.e. second derivatives of the known margin and copula distributions) with small exceptions where analytical methods are not available and numerical methods are required.

s <- summary(fit, include_vcov = TRUE)
s$coefficients
s$fit$vcov_method

vc <- vcov(fit, method = "analytical")
vc$method
vc$method_requested

If the analycal Hessian calculation fails, or the user selects vcov(fit, method = "numderiv") or summary(fit, vcov_method="numderiv"), the package will use a fully numerically estimated Hessian and records the method used. The check_model(fit) function will also provide a warning and flag status of the model as ‘review’ rather than passed or failed if the Hessian failed to calculate analytically.

vcov(fit, method = "numderiv") requests numerical-Hessian inference directly if preferred, or of interest to compare to the analytical method. Our standard package tests verify the two methods are within a small tolerance of one another at build time for a selection of fits, so generally the results should closely agree.

vc_num <- vcov(fit, method = "numderiv")
vc_num$method

Since we are using the inverted Hessian directly, these intervals are large-sample approximations. A comparison of the intervals to the boostrap intervals should give a good test for the sensitivity of this assumption.

Confidence intervals for coefficients

Use confint() for coefficient-scale intervals.

confint(fit)
confint(fit, parm = c("mu.treatment", "sigma.treatment"))

If the fitted object reports convergence, Hessian, identifiability, or fallback warnings, these should ideally be rectified or carefully reported alongside model fit.

Wald tests

The wald_test() can be used to test coefficient estimates against an alternative hypothesis, and defaults to an individual z-test for single parameters.

wald_test(
  fit,
  terms = c("mu.treatment", "sigma.treatment"),
  method = "analytical"
)

We also provide the option to test a group of coefficients together against a null hypothesis using joint = TRUE which results in a chi-square test.

wald_test(
  fit,
  terms = c("mu.treatment", "sigma.treatment"),
  joint = TRUE,
  method = "analytical"
)

Note that we are assumping asymptotic normality for the estimator of the parameters and their covariance (for multiple parameters) which may not necessarily be true for these joint distributions.

Likelihood comparisons

Compare alternative gamlss.longitudinal fits with likelihood_compare(fit,fit2,fit3...).

likelihood_compare(
  reduced = fit_reduced,
  full = fit
)

For the likelihood ratio to follow the theoretical Chi-squared distribution, the models should be nested, i.e. the parameters of one model are a subset of the compared model. If the models are not nested the result should be considered as approximate.

The comparison reports likelihood and likelihood ratio test results alongside other helpful details such as the change in degrees of freedom and other likelihood criteria that take into account effective degrees of freedom: AIC and BIC.

Parametric bootstrap

We provide a builtin helper for bootstrap simulations from fitted gamlss.longitudinal objects which can give an idea of sensitivity of the Hessian based intervals that are reported by default. The method essentially simulates from the fitted model then refits the same model to the simulated response. Summaries are then provided for the intervals of the fitted coefficients across replications for fixed coeffients.

boot <- bootstrap_inference(
  fit,
  R = 200,
  terms = c("mu.treatment", "sigma.treatment"),
  seed = 1,
  fit_args = list(max_outer_iter = 20, max_inner_iter = 20)
)

boot
boot$summary

These fits are computationally expensive due to the refitting of the gamlss.longitudinal model so use with care.

Prediction of response or coefficients

The predict function provides a large range of functionality for row-level predictions of various quantities from the model. It’s possible to provide response scale estimates with predict(type = "mean") for fitted response means or predict(type = "mu") for the pre-transform coefficient with confidence or prediction intervals. Review the full functionality with help(predict.gamlss.longitudinal) in R.

p <- predict(
  fit,
  newdata = dat,
  type = "mean",
  se.fit = TRUE,
  interval = "confidence",
  level = 0.95
)
head(p)

Note that by default it is not possible for the prediction, if it’s on new datasets, to preserve the structure of the correlation. Essentially predict() on a new dataset is providing a marginal fit and should not be relied on for correlation analysis of the predicted object, though the marginal confidence interval predictions will be taking into acccount the impact of the fitted correlation structure on the confidence intervals.

If the quantity of interest is the dependence structure then simulation from the fitted object via simulate(fit) may provide the required functionality.

Prediction of quantiles or tail probabilities

More flexible distributional models such as gamlss provide the advantage of better overall fit to a marginal distribution, resulting in more accurate quantiles and probabilities beyond just the mean. predict(type = "quantile"), predict(type = "cdf"), predict(type = "density"), and predict(type = "probability") can give a view of these quantities for analysis and reporting on a row level.

predict(fit, newdata = dat, type = "quantile", probs = c(0.1, 0.5, 0.9))
predict(fit, newdata = dat, type = "cdf", q = 10)
predict(fit, newdata = dat, type = "density", y = 10)
predict(fit, newdata = dat, type = "probability", q = 10, direction = "above")

Averaged predictions (marginal effects)

marginal_effects() is commonly used for reporting averaged response values for mixed models across covariates since the coefficient estimates are not immediately marginally interpretable from those models. While the gamlss.longitudinal model does not suffer from the same drawback, i.e. coefficients are directly interpretable for the margin once the link function is considered, we provide the same functionality as is used for mixed models for gamlss.longitudinal with marginal_effects(). This allows users to apply the standard analysis used in existing workflows to a gamlss.longitudinal model.

eff <- marginal_effects(
  fit,
  newdata = dat,
  variable = "treatment",
  parameter = "mu",
  se.fit = TRUE
)
eff

Dependence predictions

As noted previously, the predict() function essentially provides a marginal fit estimate with confidence intervals adjusted to account for the dependence structure, but does not preserve dependence structure for predicted new datasets.

If the subject of interest is behaviour of dependence under new datasets or forecast trajectories into later timepoints, then a simulation-based approach may be more appropriate.

We provide tooling to simulate directly from a fitted gamlss.longitudinal fit with simulate(fit) or simulate(fit,type="copula").

sim <- simulate(
  fit,
  nsim = 100,
  seed = 1
)

More details on simulation is available in the reference guide for Simulator usage.