Forestplot ❤️ dplyr

The forestplot package now loves the tidyverse! The image is CC by Julie Jablonski.

Since initial publishing my forestplot package, dplyr and tidyverse have become evermore dominant in how we think about data and data management. I have therefore just published a 2.0 version that uses many of the awesome select & group features that make tidyverse such a compelling tool.

Basic change

Below is how you can work with some standard meta-analysis data. I assume that you have a base dataset with all your studies and then one additional with the summary data. We can easily append rows for header or empty rows without all the manual fuzz the previous solution required:

# Cochrane data
base_data <- tibble(mean  = c(0.578, 0.165, 0.246, 0.700, 0.348, 0.139, 1.017), 
                    lower = c(0.372, 0.018, 0.072, 0.333, 0.083, 0.016, 0.365),
                    upper = c(0.898, 1.517, 0.833, 1.474, 1.455, 1.209, 2.831),
                    study = c("Auckland", "Block", "Doran", "Gamsu", "Morrison", "Papageorgiou", "Tauesch"),
                    deaths_steroid = c("36", "1", "4", "14", "3", "1", "8"),
                    deaths_placebo = c("60", "5", "11", "20", "7", "7", "10"),
                    OR = c("0.58", "0.16", "0.25", "0.70", "0.35", "0.14", "1.02"))
 
summary <- tibble(mean  = 0.531, 
                  lower = 0.386,
                  upper = 0.731,
                  study = "Summary",
                  OR = "0.53",
                  summary = TRUE)
 
header <- tibble(study = c("", "Study"),
                 deaths_steroid = c("Deaths", "(steroid)"),
                 deaths_placebo = c("Deaths", "(placebo)"),
                 OR = c("", "OR"),
                 summary = TRUE)
 
empty_row <- tibble(mean = NA_real_)
 
cochrane_output_df <- bind_rows(header,
                                base_data,
                                empty_row,
                                summary)
 
cochrane_output_df %>% 
  forestplot(labeltext = c(study, deaths_steroid, deaths_placebo, OR), 
             is.summary = summary,
             clip = c(0.1, 2.5), 
             hrzl_lines = list("3" = gpar(lty = 2), 
                               "11" = gpar(lwd = 1, columns = 1:4, col = "#000044")),
             xlog = TRUE,
             col = fpColors(box = "royalblue",
                            line = "darkblue", 
                            summary = "royalblue",
                            hrz_lines = "#444444"))

Just as a comparison you can see the old code for achieving the same thing:

cochrane_from_rmeta <- structure(list(mean  = c(NA, NA, 0.578, 0.165, 0.246, 0.700, 0.348, 0.139, 1.017, NA, 0.531), 
                                      lower = c(NA, NA, 0.372, 0.018, 0.072, 0.333, 0.083, 0.016, 0.365, NA, 0.386),
                                      upper = c(NA, NA, 0.898, 1.517, 0.833, 1.474, 1.455, 1.209, 2.831, NA, 0.731)),
                                 .Names = c("mean", "lower", "upper"), 
                                 row.names = c(NA, -11L), 
                                 class = "data.frame")
 
tabletext <- cbind(c("", "Study", "Auckland", "Block", "Doran", "Gamsu", "Morrison", "Papageorgiou", "Tauesch", NA, "Summary"),
                   c("Deaths", "(steroid)", "36", "1", "4", "14", "3", "1", "8", NA, NA),
                   c("Deaths", "(placebo)", "60", "5", "11", "20", "7", "7", "10", NA, NA),
                   c("", "OR", "0.58", "0.16", "0.25", "0.70", "0.35", "0.14", "1.02", NA, "0.53"))
 
forestplot(tabletext, 
           hrzl_lines = list("3" = gpar(lty = 2), 
                             "11" = gpar(lwd = 1, columns = 1:4, col = "#000044")),
           cochrane_from_rmeta,
           is.summary = c(TRUE,TRUE,rep(FALSE,8),TRUE),
           clip = c(0.1,2.5), 
           xlog = TRUE,
           col = fpColors(box = "royalblue",
                          line = "darkblue", 
                          summary = "royalblue",
                          hrz_lines = "#444444"))

The difference perhaps doesn’t seem huge here but the ability to work directly with data.frame objects instead of matrices or lists makes life much easier.

Grouped coefficients

One of the features that was rather unique when I started working on this package was the ability to have multiple bands per row. This allows us to package data more densely and thus makes it easier to compare results.

data(dfHRQoL)
dfHRQoL %>%
  group_by(group) %>%
  forestplot(legend_args = fpLegend(pos = list(x = .25, y = 0.95), 
                                    gp = gpar(col = "#CCCCCC", fill = "#F9F9F9")),
             fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .2, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             col = fpColors(box = c("blue", "darkred")),
             xlab = "EQ-5D index")

Do I have to change everything?

The major version number change is due to the API change, although most of you should not have to change anything with this update. The core things to look for are:

  • In loops/functions, you have to manually call print (or plot) on the returned object. This as the 2.0 returns an object that is only plotted once print is called on it – the same way ggplot works. This should be automatic unless you’ve encapsulated the forestplot function.
  • If you have provided a data.frame as your first input argument then there may be issues as this will result that the tidyselect syntax is used.