Fast-track publishing using knitr: exporting images for sharing and press (part III)

Images can be a powerful medium if used right. The photo is CC by alemdag.

Images can be a powerful medium if used right. The image is CC by alemdag.

Fast-track publishing using knitr is a short series on how I use knitr to speedup publishing in my research. This is the third article in the series devoted to plots. Hopefully you will through this post have the need-to-know stuff so that you can (1) add auto-numbering to your figures, (2) decide on image formats, (3) choose image resolution, and (4) get anti-aliasing working.

The series consists out of five posts:

First post – an intro motivating knitr in writing your manuscript and a comparison of knitr to Word options.
Second post – setting up a .RProfile and using a custom.css file.
Third post – getting your plots the way you want (current post).
Fourth post – generating tables.
Fifth post – summary and example.

Auto-numbering of figures

In knitr you use the chunks header to declare figure size, type, caption and more. Unfortunately the fig.cap does not work by default in markdown. There is a simple remedy for this by using knitr’s “hooks”:

library(knitr)

# Notify that you want to use the counter,
# if you set the counter to 3 then it will use 
# that as starting number. You can also use strings
# if you for instance have a split figure with 
# a "1a" and "1b" setup
options(figure_counter = TRUE)

# If you want roman letters then set: 
# options(figure_counter_roman = TRUE)

# Evaluate the figure caption after the chunk, 
# sometimes you want to calculate stuff inside the
# chunk that you want to include in the caption and
# it is therefore useful to evaluate it afterwards.
opts_knit$set(eval.after='fig.cap')

# The actual hook
knit_hooks$set(plot = function(x, options) {
  fig_fn = paste0(opts_knit$get("base.url"), 
                  paste(x, collapse = "."))

  # Some stuff from the default definition
  fig.cap <- knitr:::.img.cap(options)
  
  # Style and additional options that should be included in the img tag
  style=c("display: block",
          sprintf("margin: %s;",
                   switch(options$fig.align, 
                          left = 'auto auto auto 0', 
                          center = 'auto',
                          right = 'auto 0 auto auto')))
  # Certain arguments may not belong in style, 
  # for instance the width and height are usually
  # outside if the do not have a unit specified
  addon_args = ""
  
  # This is perhaps a little overly complicated prepared 
  # with the loop but it allows for a more out.parameters if necessary
  if (any(grepl("^out.(height|width)", names(options)))){
      on <- names(options)[grep("^out.(height|width)", names(options))]
      for(out_name in on){
          dimName <- substr(out_name, 5, nchar(out_name))
          if (grepl("[0-9]+(em|px|%|pt|pc|in|cm|mm)", out_name))
              style=append(style, paste0(dimName, ": ", options[[out_name]]))
          else if (length(options$out.width) > 0)
              addon_args = paste0(addon_args, dimName, "='", options[[out_name]], "'")
      }
  }
  
  # Add counter if wanted
  fig_number_txt <- ""
  cntr <- getOption("figure_counter", FALSE)
  if (cntr != FALSE){
    if (is.logical(cntr))
      cntr <- 1
    # The figure_counter_str allows for custom 
    # figure text, you may for instance want it in
    # bold: Figure %s:
    # The %s is so that you have the option of setting the
    # counter manually to 1a, 1b, etc if needed
    fig_number_txt <- 
      sprintf(getOption("figure_counter_str", "Figure %s: "), 
              ifelse(getOption("figure_counter_roman", FALSE), 
                     as.character(as.roman(cntr)), as.character(cntr)))
    
    if (is.numeric(cntr))
      options(figure_counter = cntr + 1)
  }
      
  # Put it all together
  paste0("",
         "", fig_number_txt, fig.cap, "")
})

That’s it, put this in your first knitr-chunk and all your images with a caption will have a figure counter. If you want to reference the number you can always call getOption("figure_counter") and you can insert the next images number into your text. If you want to use roman numbers just set options(figure_counter_roman=TRUE).

Image formats

When preparing your manuscript you will need images for two different purposes; small and portable for sharing, and images suited for press. Knitr allows you to quickly convert from one to the other by adjusting the fig.dev and dpi settings. As a general rule of thumb you want PNG for including images in your Word document and EPS for press. Below I’ll try to go into these formats and more.

Basics

There are two major image formats that you need to be aware of:

Vector formats: A vector image is a set of connections between points. These connections can generate lines or fills (polygons, shapes etc.), and are therefore well suited for plots. The major advantage with vector graphics is that you can scale it losslessly to any desired size.
Common vector file formats: SVG (Scalable Vector Graphics), PDF (Portable Document Format), PS (PostScript), and EPS (Encapsulated PostScript) files. Out of these the EPS is most commonly supported by journals, I’ve had unfortunately trouble sharing (my favorite) SVG-files.
Raster formats: This is the dominating image type, useful for photos and similar applications but less suited for plots. Here you have a grid where each cell is a pixel with a set color and the size of the grid is the resolution. The major downside with raster images is that if you make them larger the squared pixel shape will become visible, i.e. you will have rough edges like in the old video games. This group can further be divided into lossy formats, such as JPEG, and lossless formats such as PNG. This simply indicates if the image compression looses information or retains every detail, it is not the same as the lossless resizing of vector formats.
Common raster file formats: PNG (Portable Network Graphics), JPEG/JPG (Joint Photographic Expert Group), and TIFF (Tagged Image File Format).

Sharing images

Although SVG is my favorite format you can’t insert these into your Word document (at least my 2010 version). I therefore rely on the PNG format in 96 DPI for sharing. Telling knitr to use this for all your images in the document is really easy, just add this code before any plots (add it only once in your document):

library(knitr)
opts_chunk$set(dev="png", 
               dev.args=list(type="cairo"),
               dpi=96)

If you browse you figure-folder (located in the same folder as your Rmd-document) you will find all the PNG images after knitting. An important detail is that you want to disable including these images in the HTML-document that knitr generates as Libre Office/Word can’t handle these, see my previous post on setting up an .RProfile.

Note: you can also set the fig.dev option for each chunk but since you usually want all images to be the same type then I prefer to use the opts_chunk option. For this to work smoothly even when I don’t knit the document I make sure to load the knitr-package to avoid any

Error: object 'opts_chunk' not found

Press

Vector graphics are excellent for publication and my preferred way of exporting for publication. Unfortunately few journals accept SVG and you are often stuck with the EPS format that is somewhat limited. Setting up EPS formatting is really easy, just change previous into:

library(knitr)
opts_chunk$set(dev='postscript')

If you browse you figure-folder you should now see all the EPS images after knitting. The main problem I’ve had with EPS is that the format does not handle transparencies. For instance, you may have generated a beautiful X and using the PNG-format you get this:

But when you open your EPS image in Inkscape the transparent polygon has suddenly been removed:

If you remove the image transparency you can get a nice image but with less finesse:

If you have transparencies and want to retain these, I recommend that you try the TIFF format when submitting to journals. They usually support it although make sure you compress the images using the compression="lzw" argument or your images may become huge, they can actually surpass the journal’s maximum image size.

library(knitr)
opts_chunk$set(dev="tiff", 
               dev.args=list(compression="lzw"),
               dpi=300)

# The code for the x-mark
library(ggplot2)
polygon_df1 <- data.frame(x=c(0,0.75,1,.25), y=c(0,1,1,0))
polygon_df2 <- data.frame(x=c(0,0.75,1,.25), y=c(1,0,0,1))
ggplot(polygon_df1, aes(x=x, y=y)) +
  geom_polygon(fill="steelblue", col="steelblue") +
  geom_polygon(data=polygon_df2, fill="#55558899", col="#55558899") +
  scale_x_continuous(expand = c(0,0)) + 
  scale_y_continuous(expand = c(0,0)) +
  xlab("") + ylab("") +   theme(line = element_blank(),
        text = element_blank(),
        line = element_blank(),
        title = element_blank())

Resolution (DPI)

For screen output use 96 or 120 DPI while for print you either use 300 or 600 DPI. DPI stands for Dots Per Inch and apply only to rasterized images. R combines the image width with the DPI and produces a corresponding graphic. While you may have specified a certain width the resulting image will have a certain number of pixels giving it its size, a low DPI will appear small since there are few pixels while a high DPI will result in a large image.

DPI come with a long history and it is important to remember that there is a difference between print and screen. Originally Macintosh (Apple) used 72 DPI, this was later on increased on Microsoft computers to 96.

I use the 96 DPI for screen resolution as it gives in my opinion images of roughly the size that I want. Paper/print on the other hand is always high-resolution and anything below 300 will appear as poor quality.

Anti-aliasing

Anti-aliasing is probably the simplest change you can add to your plots for a professional look. While all vector-images are automatically anti-aliased you need to add this to rasterized images using the option type="cairo". I have previously dedicated a whole post on how to deal with the Cairo and cairoDevice packages just to find out that these are obsolete in more recent R-versions. To get this into knitr all you need to add is a dev.args list that contains the type="cairo":

opts_chunk$set(dev="png", 
               dev.args=list(type="cairo"),
               dpi=96)

Note that the antialias-argument seems to do nothing for the actual graphics, you can compare the three alternatives below:


dev.args = list(type="windows")	dev.args = list(type="windows", antialias="cleartype")	dev.args = list(type="cairo")

It is a subtle difference but without it the plot looks unrefined, especially if you have a poor screen. Another thing that is good to know is that fills are not anti-aliased. You therefore need to add a thin line to your fills in the same color to get the desired anti-aliasing. Plain and lattice-plots both have the thin line by default while for ggplot2 you need to explicitly declare that you want the line, see how I use the col= and fill= arguments to generate the plots above.

library(ggplot2)
line <- data.frame(x=c(0.25,1), y=c(1,.45))
polygon <- data.frame(x=c(0,0.75,1,0), y=c(.75,.20,.20,1))
aa <- ggplot(line, aes(x=x, y=y)) +
  geom_line(fill="steelblue", col="steelblue", lwd=2) +
  geom_polygon(data=polygon, fill="#555588", col="#555588") +
  scale_x_continuous(expand = c(0,0)) + 
  scale_y_continuous(expand = c(0,0)) +
  theme_bw() + 
  xlab("") + ylab("") + 
  theme(line = element_blank(),
        text = element_blank())

aa + annotate("text", label="Not\nanti-\naliased", 
           size=6, y=.93, x=.7)

Or compare these two plots:

Basic plots two images with and without line colors. Note that the one to the right is properly anti-aliased.

Previous post in this series

First post - an intro motivating knitr in writing your manuscript and a comparison of knitr to Word options.
Second post - setting up a .RProfile and using a custom.css file.

7 thoughts on “Fast-track publishing using knitr: exporting images for sharing and press (part III)”

tylerrinker on January 8, 2014 at 16:04 said:
I may have messed something up but line 68 in the first code chunk throws an error: Error in ifelse(getOption("figure_counter_roman", FALSE), as.character(as.roman(tc)), : object 'tc' not found
Reply ↓
- Max Gordon on January 8, 2014 at 20:58 said:
  Sorry, forgot to update the roman section. I’ve updated the code so that it should work.
  Reply ↓
RoonyJ on January 9, 2014 at 22:30 said:
Thanks a lot for this great series! I was (almost) able to follow you through the tutorials I to III.
But I got stuck with the figure captions. I can’t figure out two things:
1) What would be the best way to deal with long captions which needs to be spread over multiple lines? I can’t directly add these to the chunk options (or can I??).
2) I was wondering how to deal with chunks which create multiple (different) plots within a loop. It would be great if I could set the fig.cap just before calling each separate plot. I tried to use opts_chunk$set(fig.cap=’my caption goes here’) but that does not seem to work.
Reply ↓
- Max Gordon on January 10, 2014 at 16:12 said:
  Thanks, I’m planning on having a summary example at the end of the series that may help with some of the details.
  1) I find that long captions are conveniently prepared in the chunk using paste0():
```{r awesome_figure, fig.cap=fig.txt} fig.txt = paste0("My amazingly witty and smart figure caption", " text goes here. The function paste0 is very", " convenient as it doesn't add the space character", " between each concatenating string.") # Add some plot code... ```
If you want line breaks within your caption you just add
(= break).
2) I don’t think this is possible, I’m also not sure it is that useful in the context of fast-track publishing. I rarely see an article with more than 5 images, these can easily be set up in separate chunks. If the images are similar I usually generate a plot function that I call with the varying element in each chunk.
Reply ↓
Yihui on January 10, 2014 at 05:17 said:
R’s png() device does have the capability of anti-aliasing using the cairo option (which is the default so you do not actually need to set up the ‘dev.args’ option), but there is a special case in which the Cairo package is still better: http://r.789695.n4.nabble.com/png-type-cairo-point-symbols-without-boarders-are-not-anti-aliased-td4678745.html Anyway, it is very easy to switch to the Cairo package in knitr — just set the ‘dev’ option to ‘CairoPNG’.
Reply ↓
- Max Gordon on January 10, 2014 at 16:14 said:
  Thank you Yihui! I was not aware of that detail, I’m really flattered that you read my posts 🙂
  Reply ↓
RoonyJ on January 11, 2014 at 10:08 said:
Thanks a lot! I am looking forward to your summary example. I am using your setup to perform the reporting in a (genetics) analysis pipeline. The paste0 appraoch will be very convenient. Your answer set me to the idea that using child chunks could do the trick to handle multiple figures within a loop. I am going to investigate its possibilities, thanks for the idea :)!
Reply ↓

G-Forge

A blog about orthopaedic surgery, R, research and more