RMS Ordinal Regression for Continuous Y

f2harrell · October 10, 2022, 5:03pm

In such situations I usually fit two models: one for the signed measurement and one for the absolute value. In your particular setting you’d expect the model for the signed measurement to be weak.

EpiLearneR · October 22, 2022, 5:31am

Dear professor Harrel,
The link for R code http://biostat.mc.vanderbilt.edu/ArchivedAnalyses. for simulations and analysis for the article Modeling continuous response variables using ordinal regression - PMC is not working. Please help

f2harrell · October 22, 2022, 11:38am

Thanks for alerting me. The correct URL is ArchivedAnalyses < Main < Vanderbilt Biostatistics Wiki

Please let me know in which page you found the out-of-date link.

EpiLearneR · October 22, 2022, 5:51pm

Thanks prof
The link works.

f2harrell · October 22, 2022, 7:24pm

Can you give the URL of the page that referenced the obsolete address so that I can correct it?

cwatson · October 23, 2022, 1:53pm

I think @EpiLearneR is referring to the link printed in the article itself (at the very end of the Introduction). Which, I imagine, you won’t be able to correct.

FurlanLeo · July 4, 2023, 3:40pm

Dear Professor @f2harrell,

Cross-sectional data;
N = 147;
7 categorical & 6 continuous Xs;
Y is continuous (from 5 to 66, integer values).

Goal is to estimate associations between Xs and Y. Using Ordinal Regression (orm()). First, I fitted a full model and used bootstrap to validate it (validate()). Then I also fitted smaller models using different data reduction strategies (e.g., redun(), varclus(), and fastbw()), always validating such models/strategies with bootstrap.

Question: Which of these models should I interpret for making my inferences on the association between the Xs and Y (e.g., through anova(), summary(), contrast(), Predict())? The one with the best predictive performance? If so, which index from the validation table (validate()) should I use?

E.g., ρ, R², Mean |Pr(Y≥Y0.5)-0.5|, etc

From Section 4.12.2 Developing Models for Effect Estimation, I take that for the above case there is not much need for data reduction and model validation. If that holds, then should I simply use the full model for making the inferences?

Thank you.

f2harrell · July 5, 2023, 12:29pm

There are many model performance criteria. But one of the most important ones for inference is confidence interval coverage. For that a full, pre-specified model is hard to beat. There are a few cases where choosing between two competing models and using the “winner” is OK for inference. Example: Both models contain 3 thought-to-be-especially-important covariates, one model contains 4 principal components computed on the remaining 7 variables and the other model includes all 7 variables separately.

Erin1 · July 12, 2023, 6:57pm

Two questions:

In Modeling continuous response variables using ordinal regression, you note that the asymptotic properties of NPMLE of these models have not been formally developed. Are you aware of any updates since this paper was published?
Suppose the outcome of interest (Y) is ordinal (almost continuous, or at least many distinct values), measured at baseline and one follow-up timepoint, and roughly 15% of the outcome data are missing at follow-up (probably not MCAR, but maybe MAR). Would it be reasonable to use aregImpute and fit.mult.impute, similar to the example in RMS 15.5 to impute the missing Y at follow-up, in part based on Y at baseline and other baseline predictors? The example shows imputation of missing predictors only.

f2harrell · July 12, 2023, 7:30pm

On 1. I don’t know of further work, but the referenced Stat in Med paper establishes everything I need. Some indirect evidence includes (1) what is established for the similar Cox model, even though it uses partial likelihood and not full likelihood; (2) a model without covariates is just the ECDF and properties of the ECDF are well established; (3) a model with a single binary covariate is the Wilcoxon test which is well established.

On 2. there are related references in Chapter 3 of RMS. Assuming only one follow-up time and assuming non-existence of a surrogate outcome variable, the 15% are basically non-recoverable and imputation won’t help very much. You would have to make a big MAR assumption conditional on baseline X (including the baseline version of Y).

Erin1 · July 12, 2023, 7:58pm

Thank you so much for the prompt response! Regarding #2, can you recommend a better alternative? Surely an ordinal outcome in a clinical trial with some dropout is a common problem…?

f2harrell · July 12, 2023, 8:11pm

In a clinical trial the efficacy analysis dataset (unlike the safety dataset) requires at least one post-randomization visit. You might call this “modified intent-to-treat” but I’m not sure. This is sometimes a reason to do longitudinal studies; the loss of the sole follow-up measurement is pretty fatal. Especially fatal is to have such dropouts with double blinding is not in effect.

FurlanLeo · July 13, 2023, 7:33pm

Thank you Professor.

Please, how do I check the PO assumption for the following model?

orm(Y ~ X1 + X2 + X3 + ...)

Where I have multiple binary and continuous predictors, with the latter modeled as rcs().

I’ve tried different strategies (e.g., Figs. 13.1 and 13.2 from the RMS book) but don’t know which one to use.

My response variable has 37 distinct integer values ranging from 5 to 66.

f2harrell · July 13, 2023, 8:09pm

If you had fewer variables I’d look at parallelism of logits of stratified ECDFs. For your situation I’d consider grouping Y into 5 intervals and running an analysis like in Statistical Thinking - Assessing the Proportional Odds Assumption and Its Impact

lorenzoFabbri · July 14, 2023, 11:03am

I am trying to model a continuous outcome which is defined as “Total of correct: sum of right answers”. This simply means that the outcome can vary between 0 (0 correct answers on the test) and 36 (the maximum score you can get on the test).

As of now, I am transforming it to a score between 0 and 1, and model it using logistic regression. In previous studies, people have used linear regression with the raw score. Would this be a good case for using ordinal regression? Thanks.

f2harrell · July 14, 2023, 11:37am

This is ideal for ordinal regression, with no transformation and no collapsing of Y-levels. This will handle floor and ceiling effects, bimodality, non-normality, etc.

lorenzoFabbri · July 14, 2023, 12:32pm

Thank you for the quick response. I was wondering whether it would be appropriate a probit link function for this case.

f2harrell · July 14, 2023, 10:11pm

Probit and logit links are similar but effects are easier to interpret with logit.

FurlanLeo · October 13, 2023, 2:44am

Does anyone know why I’m getting this error message?

f <- orm(Y ~ rcs(Age, 3) + Sex + Time + X + X %ia% Age + X %ia% Sex + X %ia% Time, data = d, x = T, y = T)

b <- bootcov(f, B = 1000, coef.reps = T)

c <- contrast(b, list(X = 'Positive'), list(X = 'Negative'))

Error in bcoef[, w, drop = FALSE] : subscript out of bounds

f2harrell · October 13, 2023, 2:16pm

See if changing to X %ia% rcs(Age, 3) works.