Forestplot ❤️ dplyr

The forestplot package now loves the tidyverse! The image is CC by Julie Jablonski.

Since initial publishing my forestplot package, dplyr and tidyverse have become evermore dominant in how we think about data and data management. I have therefore just published a 2.0 version that uses many of the awesome select & group features that make tidyverse such a compelling tool.

Basic change

Below is how you can work with some standard meta-analysis data. I assume that you have a base dataset with all your studies and then one additional with the summary data. We can easily append rows for header or empty rows without all the manual fuzz the previous solution required:

# Cochrane data
base_data <- tibble(mean  = c(0.578, 0.165, 0.246, 0.700, 0.348, 0.139, 1.017), 
                    lower = c(0.372, 0.018, 0.072, 0.333, 0.083, 0.016, 0.365),
                    upper = c(0.898, 1.517, 0.833, 1.474, 1.455, 1.209, 2.831),
                    study = c("Auckland", "Block", "Doran", "Gamsu", "Morrison", "Papageorgiou", "Tauesch"),
                    deaths_steroid = c("36", "1", "4", "14", "3", "1", "8"),
                    deaths_placebo = c("60", "5", "11", "20", "7", "7", "10"),
                    OR = c("0.58", "0.16", "0.25", "0.70", "0.35", "0.14", "1.02"))
summary <- tibble(mean  = 0.531, 
                  lower = 0.386,
                  upper = 0.731,
                  study = "Summary",
                  OR = "0.53",
                  summary = TRUE)
header <- tibble(study = c("", "Study"),
                 deaths_steroid = c("Deaths", "(steroid)"),
                 deaths_placebo = c("Deaths", "(placebo)"),
                 OR = c("", "OR"),
                 summary = TRUE)
empty_row <- tibble(mean = NA_real_)
cochrane_output_df <- bind_rows(header,
cochrane_output_df %>% 
  forestplot(labeltext = c(study, deaths_steroid, deaths_placebo, OR), 
             is.summary = summary,
             clip = c(0.1, 2.5), 
             hrzl_lines = list("3" = gpar(lty = 2), 
                               "11" = gpar(lwd = 1, columns = 1:4, col = "#000044")),
             xlog = TRUE,
             col = fpColors(box = "royalblue",
                            line = "darkblue", 
                            summary = "royalblue",
                            hrz_lines = "#444444"))

Just as a comparison you can see the old code for achieving the same thing:

cochrane_from_rmeta <- structure(list(mean  = c(NA, NA, 0.578, 0.165, 0.246, 0.700, 0.348, 0.139, 1.017, NA, 0.531), 
                                      lower = c(NA, NA, 0.372, 0.018, 0.072, 0.333, 0.083, 0.016, 0.365, NA, 0.386),
                                      upper = c(NA, NA, 0.898, 1.517, 0.833, 1.474, 1.455, 1.209, 2.831, NA, 0.731)),
                                 .Names = c("mean", "lower", "upper"), 
                                 row.names = c(NA, -11L), 
                                 class = "data.frame")
tabletext <- cbind(c("", "Study", "Auckland", "Block", "Doran", "Gamsu", "Morrison", "Papageorgiou", "Tauesch", NA, "Summary"),
                   c("Deaths", "(steroid)", "36", "1", "4", "14", "3", "1", "8", NA, NA),
                   c("Deaths", "(placebo)", "60", "5", "11", "20", "7", "7", "10", NA, NA),
                   c("", "OR", "0.58", "0.16", "0.25", "0.70", "0.35", "0.14", "1.02", NA, "0.53"))
           hrzl_lines = list("3" = gpar(lty = 2), 
                             "11" = gpar(lwd = 1, columns = 1:4, col = "#000044")),
           is.summary = c(TRUE,TRUE,rep(FALSE,8),TRUE),
           clip = c(0.1,2.5), 
           xlog = TRUE,
           col = fpColors(box = "royalblue",
                          line = "darkblue", 
                          summary = "royalblue",
                          hrz_lines = "#444444"))

The difference perhaps doesn’t seem huge here but the ability to work directly with data.frame objects instead of matrices or lists makes life much easier.

Grouped coefficients

One of the features that was rather unique when I started working on this package was the ability to have multiple bands per row. This allows us to package data more densely and thus makes it easier to compare results.

dfHRQoL %>%
  group_by(group) %>%
  forestplot(legend_args = fpLegend(pos = list(x = .25, y = 0.95), 
                                    gp = gpar(col = "#CCCCCC", fill = "#F9F9F9")),
             fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .2, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             col = fpColors(box = c("blue", "darkred")),
             xlab = "EQ-5D index")

Do I have to change everything?

The major version number change is due to the API change, although most of you should not have to change anything with this update. The core things to look for are:

  • In loops/functions, you have to manually call print (or plot) on the returned object. This as the 2.0 returns an object that is only plotted once print is called on it – the same way ggplot works. This should be automatic unless you’ve encapsulated the forestplot function.
  • If you have provided a data.frame as your first input argument then there may be issues as this will result that the tidyselect syntax is used.

A few tips for beginners on how to engage with the open source community

Engaging the open source community can be challenging at start. The image is CC by Ryan Roberts.

I have been writing code in multiple languages since 1994 and for almost a decade now I have been active within the open source community, primarily in R but I’ve also published some JavaSript/TypeScript packages. Publishing something that other people enjoy is a pure joy and the fact that some of my packages have download counts in the thousands is something that truly warms my heart.

Throughout the years I have though noticed that there is newcomers to the open source community often struggle with how to get help, which is why I decided to write this post. I’ll start out with some basics and then a little more advanced topics such as how to write an issue or a pull request. Continue reading

Easiest flowcharts eveR?

A guide to flowcharts using my Gmisc package. The image is CC by Michael Staats.

A flowchart is a type of diagram that represents a workflow or process. The building blocks are boxes and the arrows that connect them. If you have submitted any research paper the last 10 years you have almost inevitably been asked to produce a flowchart on how you generated your data. While there are excellent click-and-draw tools I have always found it to be much nicer to code my charts. In this post I will go through some of the abilities in my Gmisc package that makes this process smoother. Continue reading

News in htmlTable 2.0

A short intro to the new features in htmlTable 2.0. The image is a blend based on a CC image by Ken Xu.

The htmlTable 2.0 package was just released on CRAN! It is my most downloaded package with 160 000+ downloads/month and this update is something that I have been wanting to do for a long time. For those of you that never encountered htmlTable it is a package that takes a `matrix`/`data.frame` and outputs a nicely formatted HTML table. When I created the package there weren’t that many alternatives and knitr was this new thing that everyone was excited about, magrittr with its ubiquitous `%>%` pipe had not even entered the scene. The current update should make it easier to streamline table look, separate layout from content and use tidyverse functionality. Continue reading

Long-awaited updates to htmlTable

Lets celebrate 2019 with some updates to my most popular package ever, the htmlTable. The image is CC by Thomas Hawk

One of the most pleasant surprises has been the popularity of my htmlTable-package with more than 100k downloads per month. This is all thanks to more popular packages relying on it and the web expanding beyond its original boundaries. As a thank you I have taken the time to update and fix some features (the 1.13 release) – enjoy! Continue reading

Dealing with non-proportional hazards in R

As things change over time so should our statistical models. The image is CC by Prad Prathivi(

As things change over time so should our statistical models. The image is CC by Prad Prathivi

Since I’m frequently working with large datasets and survival data I often find that the proportional hazards assumption for the Cox regressions doesn’t hold. In my most recent study on cardiovascular deaths after total hip arthroplasty the coefficient was close to zero when looking at the period between 5 and 21 years after surgery. Grambsch and Thernau’s test for non-proportionality hinted though of a problem and as I explored it there was a clear correlation between mortality and hip arthroplasty surgery. The effect increased over time, just as we had originally thought, see below figure. In this post I’ll try to show how I handle with non-proportional hazards in R. Continue reading

R trends in 2015 (based on cranlogs)

What are the current tRends? The image is CC from coco + kelly.

What are the current tRends? The image is CC from coco + kelly.

It is always fun to look back and reflect on the past year. Inspired by Christoph Safferling’s post on top packages from published in 2015, I decided to have my own go at the top R trends of 2015. Contrary to Safferling’s post I’ll try to also (1) look at packages from previous years that hit the big league, (2) what top R coders we have in the community, and then (2) round-up with my own 2015-R-experience. Continue reading

Introducing the htmlTable-package

How should we convey complex data? The image is is CC by Sacha Fernandez.

How should we convey complex data? The image is is CC by Sacha Fernandez.

My htmlTable-function has perhaps been one of my most successful projects. I developed it in order to get tables matching those available in top medical journals. As the function has grown I’ve decided to separate it from my Gmisc-package into a separate package, and at the time of writing this I’ve just released the 1.3 version. While htmlTable allows for creating plain tables without any fancy formatting (see usage vignette) it is primarily aimed at complex tables. In this post I’ll try to show you what you can do and how to tame some of the more advanced features. Continue reading

How-to go parallel in R – basics + tips

Don’t waist another second, start parallelizing your computations today! The image is CC by  Smudge 9000

Don’t waist another second, start parallelizing your computations today! The image is CC by Smudge 9000

Today is a good day to start parallelizing your code. I’ve been using the parallel package since its integration with R (v. 2.14.0) and its much easier than it at first seems. In this post I’ll go through the basics for implementing parallel computations in R, cover a few common pitfalls, and give tips on how to avoid them. Continue reading

An exercise in non-linear modeling

Finding the right curve can be tricky. The image is CC by Martin Gommel.

Finding the right curve can be tricky. The image is CC by Martin Gommel.

In my previous post I wrote about the importance of age and why it is a good idea to try avoiding modeling it as a linear variable. In this post I will go through multiple options for (1) modeling non-linear effects in a linear regression setting, (2) benchmark the methods on a real dataset, and (3) look at how the non-linearities actually look. The post is based on the supplement in my article on age and health-related quality of life (HRQoL). Continue reading