Too crude to be true?

Harvesting the magical update() function. The image is CC by Waldemar Horwat.
Harvesting the magical update() function. The image is CC by Waldemar Horwat.

The key to programming is being lazy; it has actually been called a virtue by some. When I discovered the update() function it blew me away. Within short I had created a monster based upon this tiny function, allowing quick and easy output of regression tables that contain crude and adjusted estimates. In this post I’ll show you how to tame the printCrudeAndAdjusted() function in my Greg-package and show a little behind the scenes.

Although not always present I think a regression table that shows both the crude and adjusted estimates conveys important information. The crude is also referred to as un-adjusted and if this sharply drops in the adjusted model we can suspect that there is a strong confounder that steals some of the effect that this variable has.

A simple walk-through

We will start with basic usage of the function. For this I’m using the colon data set from the survival package. So first we prepare our data by generating a few factors:

If the names are a little to shorthand and we want prettier names it is easy to set the label of the name using the Hmisc label() function:

The regression model is setup in the regular way. I use the rms-package for a lot of my work but regular lm(), glm() and coxph() should work just as well. If you’re not familiar with the rms-package it likes to have a summary on the data set that is stored in the datadist.

And now it is time for my monster function. I like having the reference category in the table so I always add the add_reference option:

Variable Crude   Adjusted
  HR 2.5 % to 97.5 %   HR 2.5 % to 97.5 %
Treatment
  Obs 1 ref   1 ref
  Lev 0.98 0.84 to 1.14   0.98 0.84 to 1.14
  Lev+5FU 0.64 0.55 to 0.76   0.64 0.55 to 0.76
Sex
  Female 1 ref   1 ref
  Male 0.97 0.85 to 1.10   0.95 0.83 to 1.08
Colon perforation
  No 1 ref   1 ref
  Yes 1.30 0.92 to 1.85   1.30 0.91 to 1.85
Age (years) 1.00 0.99 to 1.00   1.00 0.99 to 1.00

In event data like this it is also nice to have some information regarding the variables raw distribution and we can easily combine the regression model with some descriptive data by adding the desc_column option:

Variable     Crude   Adjusted
  Total Event   HR 2.5 % to 97.5 %   HR 2.5 % to 97.5 %
Treatment
  Obs 630 345 (54.76 %)   1 ref   1 ref
  Lev 620 333 (53.71 %)   0.98 0.84 to 1.14   0.98 0.84 to 1.14
  Lev+5FU 608 242 (39.80 %)   0.64 0.55 to 0.76   0.64 0.55 to 0.76
Sex
  Female 890 444 (49.89 %)   1 ref   1 ref
  Male 968 476 (49.17 %)   0.97 0.85 to 1.10   0.95 0.83 to 1.08
Colon perforation
  No 1804 888 (49.22 %)   1 ref   1 ref
  Yes 54 32 (59.26 %)   1.30 0.92 to 1.85   1.30 0.91 to 1.85
Age (years) 59.75 (± 11.95) 59.46 (± 12.35)   1.00 0.99 to 1.00   1.00 0.99 to 1.00

The function does not yet handle functions. Currently it skips these by removing any variable from the formula that matches the regular expression: “^(ns|rcs|bs|pspline)[(]”. In the future I image it could be possible to add something that uses the rms:::contrast() function, although that will probably wait until the need arises.

What about the update()

As I mentioned in the introduction the update function is at the heart of this function. It has the ability to on the fly add, remove, and change datasets and other nice options to your model.

When I do crude adjustments I simply remove all other variables:

Nice, isn’t it? I find it really useful as I’m comparing AIC/BIC-values, testing interactions etc.

Where is the monster?

While the update() function is appealing in its simplicity there is much more that needed to be in place for this to work. The basic function needs to figure out the type of regression, loop through the variables, and check for the intercept. It still needs some work, this was one of my first functions that I created in my package so the coding is a little clumpsy – feel free to suggest improvements. You can find the raw code under my GitHub-account.

I hope you’ll find this as useful as I have.

Flattr this!

This entry was posted in R, Tutorial. Bookmark the permalink.

8 Responses to Too crude to be true?

  1. Shige says:

    Excellent work. Very cool.

  2. Harold Baize says:

    Nice use of a photo of “Crude Awakening” from Burning Man 2007. Are you a burner?

  3. Tomas Karpati says:

    Wonderful work. The only thing I miss in the tables are p-values or the stars on the side… (or eventually, colouring those rows which are statistically significant)

    • Max Gordon says:

      Thanks. The main problem with p-values is that there is no pval() function that would correspond to the confint(). Every time I need raw p-values I look at the print/summary functions and it is difficult to generalize. I guess the vcov() could be used but I’m uncomfortable inventing my own pvalue calculator – I have for instance no idea how to calculate p-values for quantile regression.

      • Richard says:

        One possible way is to take the full model, and then use the “update” command (on each of the variables in the formula) to remove them one at a time and then run an ANOVA to compare “full model” vs “full model minus one variable”. That is what I’ve been doing, and it works really nicely because it works seamlessly switching between categorical and single variables.

        Here is the (bad) code I’ve been using:

        fit <- lm(blah)

        pvalues <- c()
        for(i in variables){
        smallFit <- eval(parse(text=paste("update(fit, . ~ . – ",i,")",sep="")))
        test <- anova(fit, smallFit)
        pvalues <- c(pvalues, test$p)
        }

  4. Michael G says:

    How Can I get package Greg? That would enable me get regression tables I need for my data but as is following your example, i am not able

Leave a Reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.