This article gives an overview of the outputs that can be generated
from a gamlss.longitudinal fit for inference of various
types.
The quick reference is:
- (ideally) use
check_model()first to make sure there aren’t any major errors flagged in the fitted model that may effect inference - use
summary()andconfint()for coefficient estimates, standard errors and intervals - use
wald_test()for coefficient hypotheses - use
likelihood_compare()for likelihoor ratio comparisons for models (ideally nested) - use
bootstrap_inference()for bootstrap standard errors and confidence intervals for coefficients - use
predict(..., se.fit = TRUE, interval = "confidence")for response-mean intervals, andpredict()more broadly for any row-level outputs like prediction itervals - use
marginal_effects(..., se.fit = TRUE)for averaged response against covariates - use
simulate()for simulating replicates from the fitted model
Variance-covariance and standard errors
summary(fit) reports coefficient estimates with standard
errors. The variance covariance can be requested directly using the
standard call, vcov(fit). By default these methods use the
Hessian of the fitted model, calculated using primarily analytical
methods (i.e. second derivatives of the known margin and copula
distributions) with small exceptions where analytical methods are not
available and numerical methods are required.
s <- summary(fit, include_vcov = TRUE)
s$coefficients
s$fit$vcov_method
vc <- vcov(fit, method = "analytical")
vc$method
vc$method_requestedIf the analycal Hessian calculation fails, or the user selects
vcov(fit, method = "numderiv") or
summary(fit, vcov_method="numderiv"), the package will use
a fully numerically estimated Hessian and records the method used. The
check_model(fit) function will also provide a warning and
flag status of the model as ‘review’ rather than passed or failed if the
Hessian failed to calculate analytically.
vcov(fit, method = "numderiv") requests
numerical-Hessian inference directly if preferred, or of interest to
compare to the analytical method. Our standard package tests verify the
two methods are within a small tolerance of one another at build time
for a selection of fits, so generally the results should closely
agree.
vc_num <- vcov(fit, method = "numderiv")
vc_num$methodSince we are using the inverted Hessian directly, these intervals are large-sample approximations. A comparison of the intervals to the boostrap intervals should give a good test for the sensitivity of this assumption.
Confidence intervals for coefficients
Use confint() for coefficient-scale intervals.
If the fitted object reports convergence, Hessian, identifiability, or fallback warnings, these should ideally be rectified or carefully reported alongside model fit.
Wald tests
The wald_test() can be used to test coefficient
estimates against an alternative hypothesis, and defaults to an
individual z-test for single parameters.
We also provide the option to test a group of coefficients together
against a null hypothesis using joint = TRUE which results
in a chi-square test.
Note that we are assumping asymptotic normality for the estimator of the parameters and their covariance (for multiple parameters) which may not necessarily be true for these joint distributions.
Likelihood comparisons
Compare alternative gamlss.longitudinal fits with
likelihood_compare(fit,fit2,fit3...).
likelihood_compare(
reduced = fit_reduced,
full = fit
)For the likelihood ratio to follow the theoretical Chi-squared distribution, the models should be nested, i.e. the parameters of one model are a subset of the compared model. If the models are not nested the result should be considered as approximate.
The comparison reports likelihood and likelihood ratio test results alongside other helpful details such as the change in degrees of freedom and other likelihood criteria that take into account effective degrees of freedom: AIC and BIC.
Parametric bootstrap
We provide a builtin helper for bootstrap simulations from fitted
gamlss.longitudinal objects which can give an idea of
sensitivity of the Hessian based intervals that are reported by default.
The method essentially simulates from the fitted model then refits the
same model to the simulated response. Summaries are then provided for
the intervals of the fitted coefficients across replications for fixed
coeffients.
boot <- bootstrap_inference(
fit,
R = 200,
terms = c("mu.treatment", "sigma.treatment"),
seed = 1,
fit_args = list(max_outer_iter = 20, max_inner_iter = 20)
)
boot
boot$summaryThese fits are computationally expensive due to the refitting of the
gamlss.longitudinal model so use with care.
Prediction of response or coefficients
The predict function provides a large range of
functionality for row-level predictions of various quantities from the
model. It’s possible to provide response scale estimates with
predict(type = "mean") for fitted response means or
predict(type = "mu") for the pre-transform coefficient with
confidence or prediction intervals. Review the full functionality with
help(predict.gamlss.longitudinal) in R.
p <- predict(
fit,
newdata = dat,
type = "mean",
se.fit = TRUE,
interval = "confidence",
level = 0.95
)
head(p)Note that by default it is not possible for the prediction, if it’s
on new datasets, to preserve the structure of the correlation.
Essentially predict() on a new dataset is providing a
marginal fit and should not be relied on for correlation analysis of the
predicted object, though the marginal confidence interval predictions
will be taking into acccount the impact of the fitted correlation
structure on the confidence intervals.
If the quantity of interest is the dependence structure then
simulation from the fitted object via simulate(fit) may
provide the required functionality.
Prediction of quantiles or tail probabilities
More flexible distributional models such as gamlss provide the
advantage of better overall fit to a marginal distribution, resulting in
more accurate quantiles and probabilities beyond just the mean.
predict(type = "quantile"),
predict(type = "cdf"),
predict(type = "density"), and
predict(type = "probability") can give a view of these
quantities for analysis and reporting on a row level.
Averaged predictions (marginal effects)
marginal_effects() is commonly used for reporting
averaged response values for mixed models across covariates since the
coefficient estimates are not immediately marginally interpretable from
those models. While the gamlss.longitudinal model does not
suffer from the same drawback, i.e. coefficients are directly
interpretable for the margin once the link function is considered, we
provide the same functionality as is used for mixed models for
gamlss.longitudinal with marginal_effects().
This allows users to apply the standard analysis used in existing
workflows to a gamlss.longitudinal model.
eff <- marginal_effects(
fit,
newdata = dat,
variable = "treatment",
parameter = "mu",
se.fit = TRUE
)
effDependence predictions
As noted previously, the predict() function essentially
provides a marginal fit estimate with confidence intervals adjusted to
account for the dependence structure, but does not preserve dependence
structure for predicted new datasets.
If the subject of interest is behaviour of dependence under new datasets or forecast trajectories into later timepoints, then a simulation-based approach may be more appropriate.
We provide tooling to simulate directly from a fitted
gamlss.longitudinal fit with simulate(fit) or
simulate(fit,type="copula").
sim <- simulate(
fit,
nsim = 100,
seed = 1
)More details on simulation is available in the reference guide for Simulator usage.