RMS Ordinal Regression for Continuous Y

In such situations I usually fit two models: one for the signed measurement and one for the absolute value. In your particular setting you’d expect the model for the signed measurement to be weak.

Dear professor Harrel,
The link for R code http://biostat.mc.vanderbilt.edu/ArchivedAnalyses. for simulations and analysis for the article Modeling continuous response variables using ordinal regression - PMC is not working. Please help

Thanks for alerting me. The correct URL is ArchivedAnalyses < Main < Vanderbilt Biostatistics Wiki

Please let me know in which page you found the out-of-date link.

1 Like

Thanks prof
The link works.

Can you give the URL of the page that referenced the obsolete address so that I can correct it?

I think @EpiLearneR is referring to the link printed in the article itself (at the very end of the Introduction). Which, I imagine, you won’t be able to correct.

1 Like

Dear Professor @f2harrell,

Cross-sectional data;
N = 147;
7 categorical & 6 continuous Xs;
Y is continuous (from 5 to 66, integer values).

Goal is to estimate associations between Xs and Y. Using Ordinal Regression (orm()). First, I fitted a full model and used bootstrap to validate it (validate()). Then I also fitted smaller models using different data reduction strategies (e.g., redun(), varclus(), and fastbw()), always validating such models/strategies with bootstrap.

Question: Which of these models should I interpret for making my inferences on the association between the Xs and Y (e.g., through anova(), summary(), contrast(), Predict())? The one with the best predictive performance? If so, which index from the validation table (validate()) should I use?

E.g., ρ, , Mean |Pr(Y≥Y0.5)-0.5|, etc

From Section 4.12.2 Developing Models for Effect Estimation, I take that for the above case there is not much need for data reduction and model validation. If that holds, then should I simply use the full model for making the inferences?

Thank you.

There are many model performance criteria. But one of the most important ones for inference is confidence interval coverage. For that a full, pre-specified model is hard to beat. There are a few cases where choosing between two competing models and using the “winner” is OK for inference. Example: Both models contain 3 thought-to-be-especially-important covariates, one model contains 4 principal components computed on the remaining 7 variables and the other model includes all 7 variables separately.

1 Like

Two questions:

  1. In Modeling continuous response variables using ordinal regression, you note that the asymptotic properties of NPMLE of these models have not been formally developed. Are you aware of any updates since this paper was published?
  2. Suppose the outcome of interest (Y) is ordinal (almost continuous, or at least many distinct values), measured at baseline and one follow-up timepoint, and roughly 15% of the outcome data are missing at follow-up (probably not MCAR, but maybe MAR). Would it be reasonable to use aregImpute and fit.mult.impute, similar to the example in RMS 15.5 to impute the missing Y at follow-up, in part based on Y at baseline and other baseline predictors? The example shows imputation of missing predictors only.

On 1. I don’t know of further work, but the referenced Stat in Med paper establishes everything I need. Some indirect evidence includes (1) what is established for the similar Cox model, even though it uses partial likelihood and not full likelihood; (2) a model without covariates is just the ECDF and properties of the ECDF are well established; (3) a model with a single binary covariate is the Wilcoxon test which is well established.

On 2. there are related references in Chapter 3 of RMS. Assuming only one follow-up time and assuming non-existence of a surrogate outcome variable, the 15% are basically non-recoverable and imputation won’t help very much. You would have to make a big MAR assumption conditional on baseline X (including the baseline version of Y).

Thank you so much for the prompt response! Regarding #2, can you recommend a better alternative? Surely an ordinal outcome in a clinical trial with some dropout is a common problem…?

In a clinical trial the efficacy analysis dataset (unlike the safety dataset) requires at least one post-randomization visit. You might call this “modified intent-to-treat” but I’m not sure. This is sometimes a reason to do longitudinal studies; the loss of the sole follow-up measurement is pretty fatal. Especially fatal is to have such dropouts with double blinding is not in effect.

Thank you Professor.

Please, how do I check the PO assumption for the following model?

orm(Y ~ X1 + X2 + X3 + ...)

Where I have multiple binary and continuous predictors, with the latter modeled as rcs().

I’ve tried different strategies (e.g., Figs. 13.1 and 13.2 from the RMS book) but don’t know which one to use.

My response variable has 37 distinct integer values ranging from 5 to 66.

If you had fewer variables I’d look at parallelism of logits of stratified ECDFs. For your situation I’d consider grouping Y into 5 intervals and running an analysis like in Statistical Thinking - Assessing the Proportional Odds Assumption and Its Impact

I am trying to model a continuous outcome which is defined as “Total of correct: sum of right answers”. This simply means that the outcome can vary between 0 (0 correct answers on the test) and 36 (the maximum score you can get on the test).

As of now, I am transforming it to a score between 0 and 1, and model it using logistic regression. In previous studies, people have used linear regression with the raw score. Would this be a good case for using ordinal regression? Thanks.

This is ideal for ordinal regression, with no transformation and no collapsing of Y-levels. This will handle floor and ceiling effects, bimodality, non-normality, etc.

1 Like

Thank you for the quick response. I was wondering whether it would be appropriate a probit link function for this case.

Probit and logit links are similar but effects are easier to interpret with logit.

Does anyone know why I’m getting this error message?

f <- orm(Y ~ rcs(Age, 3) + Sex + Time + X + X %ia% Age + X %ia% Sex + X %ia% Time, data = d, x = T, y = T)

b <- bootcov(f, B = 1000, coef.reps = T)

c <- contrast(b, list(X = 'Positive'), list(X = 'Negative'))

Error in bcoef[, w, drop = FALSE] : subscript out of bounds

See if changing to X %ia% rcs(Age, 3) works.