It is always fun to look back and reflect on the past year. Inspired by Christoph Safferling’s post on top packages from published in 2015, I decided to have my own go at the top R trends of 2015. Contrary to Safferling’s post I’ll try to also (1) look at packages from previous years that hit the big league, (2) what top R coders we have in the community, and then (2) round-up with my own 2015-R-experience.
Everything in this post is based on the CRANberries reports. To harvest the information I’ve borrowed shamelessly from Safferling’s post with some modifications. He used the number of downloads as proxy for package release date, while I decided to use the release date, if that wasn’t available I scraped it off the CRAN servers. The script now also retrieves package author(s) and description (see code below for details).
library(rvest)
library(dplyr)
# devtools::install_github("hadley/multidplyr")
library(multidplyr)
library(magrittr)
library(lubridate)
getCranberriesElmnt <- function(txt, elmnt_name){
desc <- grep(sprintf("^%s:", elmnt_name), txt)
if (length(desc) == 1){
txt <- txt[desc:length(txt)]
end <- grep("^[A-Za-z/@]{2,}:", txt[-1])
if (length(end) == 0)
end <- length(txt)
else
end <- end[1]
desc <-
txt[1:end] %>%
gsub(sprintf("^%s: (.+)", elmnt_name),
"\\1", .) %>%
paste(collapse = " ") %>%
gsub("[ ]{2,}", " ", .) %>%
gsub(" , ", ", ", .)
}else if (length(desc) == 0){
desc <- paste("No", tolower(elmnt_name))
}else{
stop("Could not find ", elmnt_name, " in text: \n",
paste(txt, collapse = "\n"))
}
return(desc)
}
convertCharset <- function(txt){
if (grepl("Windows", Sys.info()["sysname"]))
txt <- iconv(txt, from = "UTF-8", to = "cp1252")
return(txt)
}
getAuthor <- function(txt, package){
author <- getCranberriesElmnt(txt, "Author")
if (grepl("No author|See AUTHORS file", author)){
author <- getCranberriesElmnt(txt, "Maintainer")
}
if (grepl("(No m|M)aintainer|(No a|A)uthor|^See AUTHORS file", author) ||
is.null(author) ||
nchar(author) <= 2){
cran_txt <- read_html(sprintf("http://cran.r-project.org/web/packages/%s/index.html",
package))
author <- cran_txt %>%
html_nodes("tr") %>%
html_text %>%
convertCharset %>%
gsub("(^[ \t\n]+|[ \t\n]+$)", "", .) %>%
.[grep("^Author", .)] %>%
gsub(".*\n", "", .)
# If not found then the package has probably been
# removed from the repository
if (length(author) == 1)
author <- author
else
author <- "No author"
}
# Remove stuff such as:
# [cre, auth]
# (worked on the...)
#
# "John Doe"
author %<>%
gsub("^Author: (.+)",
"\\1", .) %>%
gsub("[ ]*\\[[^]]{3,}\\][ ]*", " ", .) %>%
gsub("\\([^)]+\\)", " ", .) %>%
gsub("([ ]*<[^>]+>)", " ", .) %>%
gsub("[ ]*\\[[^]]{3,}\\][ ]*", " ", .) %>%
gsub("[ ]{2,}", " ", .) %>%
gsub("(^[ '\"]+|[ '\"]+$)", "", .) %>%
gsub(" , ", ", ", .)
return(author)
}
getDate <- function(txt, package){
date <-
grep("^Date/Publication", txt)
if (length(date) == 1){
date <- txt[date] %>%
gsub("Date/Publication: ([0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2}).*",
"\\1", .)
}else{
cran_txt <- read_html(sprintf("http://cran.r-project.org/web/packages/%s/index.html",
package))
date <-
cran_txt %>%
html_nodes("tr") %>%
html_text %>%
convertCharset %>%
gsub("(^[ \t\n]+|[ \t\n]+$)", "", .) %>%
.[grep("^Published", .)] %>%
gsub(".*\n", "", .)
# The main page doesn't contain the original date if
# new packages have been submitted, we therefore need
# to check first entry in the archives
if(cran_txt %>%
html_nodes("tr") %>%
html_text %>%
gsub("(^[ \t\n]+|[ \t\n]+$)", "", .) %>%
grepl("^Old.{1,4}sources", .) %>%
any){
archive_txt <- read_html(sprintf("http://cran.r-project.org/src/contrib/Archive/%s/",
package))
pkg_date <-
archive_txt %>%
html_nodes("tr") %>%
lapply(function(x) {
nodes <- html_nodes(x, "td")
if (length(nodes) == 5){
return(nodes[3] %>%
html_text %>%
as.Date(format = "%d-%b-%Y"))
}
}) %>%
.[sapply(., length) > 0] %>%
.[!sapply(., is.na)] %>%
head(1)
if (length(pkg_date) == 1)
date <- pkg_date[[1]]
}
}
date <- tryCatch({
as.Date(date)
}, error = function(e){
"Date missing"
})
return(date)
}
getNewPkgStats <- function(published_in){
# The parallel is only for making cranlogs requests
# we can therefore have more cores than actual cores
# as this isn't processor intensive while there is
# considerable wait for each http-request
cl <- create_cluster(parallel::detectCores() * 4)
parallel::clusterEvalQ(cl, {
library(cranlogs)
})
set_default_cluster(cl)
on.exit(stop_cluster())
berries <- read_html(paste0("http://dirk.eddelbuettel.com/cranberries/", published_in, "/"))
pkgs <-
# Select the divs of the package class
html_nodes(berries, ".package") %>%
# Extract the text
html_text %>%
# Split the lines
strsplit("[\n]+") %>%
# Now clean the lines
lapply(.,
function(pkg_txt) {
pkg_txt[sapply(pkg_txt, function(x) { nchar(gsub("^[ \t]+", "", x)) > 0},
USE.NAMES = FALSE)] %>%
gsub("^[ \t]+", "", .)
})
# Now we select the new packages
new_packages <-
pkgs %>%
# The first line is key as it contains the text "New package"
sapply(., function(x) x[1], USE.NAMES = FALSE) %>%
grep("^New package", .) %>%
pkgs[.] %>%
# Now we extract the package name and the date that it was published
# and merge everything into one table
lapply(function(txt){
txt <- convertCharset(txt)
ret <- data.frame(
name = gsub("^New package ([^ ]+) with initial .*",
"\\1", txt[1]),
stringsAsFactors = FALSE
)
ret$desc <- getCranberriesElmnt(txt, "Description")
ret$author <- getAuthor(txt, ret$name)
ret$date <- getDate(txt, ret$name)
return(ret)
}) %>%
rbind_all %>%
# Get the download data in parallel
partition(name) %>%
do({
down <- cran_downloads(.$name[1],
from = max(as.Date("2015-01-01"), .$date[1]),
to = "2015-12-31")$count
cbind(.[1,],
data.frame(sum = sum(down),
avg = mean(down))
)
}) %>%
collect %>%
ungroup %>%
arrange(desc(avg))
return(new_packages)
}
pkg_list <-
lapply(2010:2015,
getNewPkgStats)
pkgs <-
rbind_all(pkg_list) %>%
mutate(time = as.numeric(as.Date("2016-01-01") - date),
year = format(date, "%Y"))
Downloads and time on CRAN
The longer a package has been on CRAN the more downloaded it gets. We can illustrate this using simple linear regression, slightly surprising is that this behaves mostly linear:
pkgs %<>%
mutate(time_yrs = time/365.25)
fit <- lm(avg ~ time_yrs, data = pkgs)
# Test for non-linearity
library(splines)
anova(fit,
update(fit, .~.-time_yrs+ns(time_yrs, 2)))
Analysis of Variance Table Model 1: avg ~ time Model 2: avg ~ ns(time, 2) Res.Df RSS Df Sum of Sq F Pr(>F) 1 7348 189661922 2 7347 189656567 1 5355.1 0.2075 0.6488
Where the number of average downloads increases with about 5 downloads per year. It can easily be argued that the average number of downloads isn't that interesting since the data is skewed, we can therefore also look at the upper quantiles using quantile regression:
library(quantreg)
library(htmlTable)
lapply(c(.5, .75, .95, .99),
function(tau){
rq_fit <- rq(avg ~ time_yrs, data = pkgs, tau = tau)
rq_sum <- summary(rq_fit)
c(Estimate = txtRound(rq_sum$coefficients[2, 1], 1),
`95 % CI` = txtRound(rq_sum$coefficients[2, 1] +
c(1,-1) * rq_sum$coefficients[2, 2], 1) %>%
paste(collapse = " to "))
}) %>%
do.call(rbind, .) %>%
htmlTable(rnames = c("Median",
"Upper quartile",
"Top 5%",
"Top 1%"))
Estimate | 95 % CI | |
---|---|---|
Median | 0.6 | 0.6 to 0.6 |
Upper quartile | 1.2 | 1.2 to 1.1 |
Top 5% | 9.7 | 11.9 to 7.6 |
Top 1% | 182.5 | 228.2 to 136.9 |
The above table conveys a slightly more interesting picture. Most packages don't get that much attention while the top 1% truly reach the masses.
Top downloaded packages
In order to investigate what packages R users have been using during 2015 I've looked at all new packages since the turn of the decade. Since each year of CRAN-presence increases the download rates, I've split the table by the package release dates. The results are available for browsing below (yes - it is the new brand interactive htmlTable that allows you to collapse cells - note it may not work if you are reading this on R-bloggers and the link is lost under certain circumstances).
Downloads | ||||||
---|---|---|---|---|---|---|
Name | Author | Total | Average/day | Description | ||
Top 10 packages published in 2015 | ||||||
xml2 | Hadley Wickham, Jeroen Ooms, RStudio, R Foundation | 348,222 | 1635 | Work with XML files ... | ||
rversions | Gabor Csardi | 386,996 | 1524 | Query the main R SVN... | ||
git2r | Stefan Widgren | 411,709 | 1303 | Interface to the lib... | ||
praise | Gabor Csardi, Sindre Sorhus | 96,187 | 673 | Build friendly R pac... | ||
readxl | David Hoerl | 99,386 | 379 | Import excel files i... | ||
readr | Hadley Wickham, Romain Francois, R Core Team, RStudio | 90,022 | 337 | Read flat/tabular te... | ||
DiagrammeR | Richard Iannone | 84,259 | 236 | Create diagrams and ... | ||
visNetwork | Almende B.V. (vis.js library in htmlwidgets/lib, | 41,185 | 233 | Provides an R interf... | ||
plotly | Carson Sievert, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, Pedro Despouy | 9,745 | 217 | Easily translate ggp... | ||
DT | Yihui Xie, Joe Cheng, jQuery contributors, SpryMedia Limited, Brian Reavis, Leon Gersen, Bartek Szopka, RStudio Inc | 24,806 | 120 | Data objects in R ca... | ||
Top 10 packages published in 2014 | ||||||
stringi | Marek Gagolewski and Bartek Tartanus ; IBM and other contributors ; Unicode, Inc. | 1,316,900 | 3608 | stringi allows for v... | ||
magrittr | Stefan Milton Bache and Hadley Wickham | 1,245,662 | 3413 | Provides a mechanism... | ||
mime | Yihui Xie | 1,038,591 | 2845 | This package guesses... | ||
R6 | Winston Chang | 920,147 | 2521 | The R6 package allow... | ||
dplyr | Hadley Wickham, Romain Francois | 778,311 | 2132 | A fast, consistent t... | ||
manipulate | JJ Allaire, RStudio | 626,191 | 1716 | Interactive plotting... | ||
htmltools | RStudio, Inc. | 619,171 | 1696 | Tools for HTML gener... | ||
curl | Jeroen Ooms | 599,704 | 1643 | The curl() function ... | ||
lazyeval | Hadley Wickham, RStudio | 572,546 | 1569 | A disciplined approa... | ||
rstudioapi | RStudio | 515,665 | 1413 | This package provide... | ||
Top 10 packages published in 2013 | ||||||
jsonlite | Jeroen Ooms, Duncan Temple Lang | 906,421 | 2483 | This package is a fo... | ||
BH | John W. Emerson, Michael J. Kane, Dirk Eddelbuettel, JJ Allaire, and Romain Francois | 691,280 | 1894 | Boost provides free ... | ||
highr | Yihui Xie and Yixuan Qiu | 641,052 | 1756 | This package provide... | ||
assertthat | Hadley Wickham | 527,961 | 1446 | assertthat is an ext... | ||
httpuv | RStudio, Inc. | 310,699 | 851 | httpuv provides low-... | ||
NLP | Kurt Hornik | 270,682 | 742 | Basic classes and me... | ||
TH.data | Torsten Hothorn | 242,060 | 663 | Contains data sets u... | ||
NMF | Renaud Gaujoux, Cathal Seoighe | 228,807 | 627 | This package provide... | ||
stringdist | Mark van der Loo | 123,138 | 337 | Implements the Hammi... | ||
SnowballC | Milan Bouchet-Valat | 104,411 | 286 | An R interface to th... | ||
Top 10 packages published in 2012 | ||||||
gtable | Hadley Wickham | 1,091,440 | 2990 | Tools to make it eas... | ||
knitr | Yihui Xie | 792,876 | 2172 | This package provide... | ||
httr | Hadley Wickham | 785,568 | 2152 | Provides useful tool... | ||
markdown | JJ Allaire, Jeffrey Horner, Vicent Marti, and Natacha Porte | 636,888 | 1745 | Markdown is a plain-... | ||
Matrix | Douglas Bates and Martin Maechler | 470,468 | 1289 | Classes and methods ... | ||
shiny | RStudio, Inc. | 427,995 | 1173 | Shiny makes it incre... | ||
lattice | Deepayan Sarkar | 414,716 | 1136 | Lattice is a powerfu... | ||
pkgmaker | Renaud Gaujoux | 225,796 | 619 | This package provide... | ||
rngtools | Renaud Gaujoux | 225,125 | 617 | This package contain... | ||
base64enc | Simon Urbanek | 223,120 | 611 | This package provide... | ||
Top 10 packages published in 2011 | ||||||
scales | Hadley Wickham | 1,305,000 | 3575 | Scales map data to a... | ||
devtools | Hadley Wickham | 738,724 | 2024 | Collection of packag... | ||
RcppEigen | Douglas Bates, Romain Francois and Dirk Eddelbuettel | 634,224 | 1738 | R and Eigen integrat... | ||
fpp | Rob J Hyndman | 583,505 | 1599 | All data sets requir... | ||
nloptr | Jelmer Ypma | 583,230 | 1598 | nloptr is an R inter... | ||
pbkrtest | Ulrich Halekoh Søren Højsgaard | 536,409 | 1470 | Test in linear mixed... | ||
roxygen2 | Hadley Wickham, Peter Danenberg, Manuel Eugster | 478,765 | 1312 | A Doxygen-like in-so... | ||
whisker | Edwin de Jonge | 413,068 | 1132 | logicless templating... | ||
doParallel | Revolution Analytics | 299,717 | 821 | Provides a parallel ... | ||
abind | Tony Plate and Richard Heiberger | 255,151 | 699 | Combine multi-dimens... | ||
Top 10 packages published in 2010 | ||||||
reshape2 | Hadley Wickham | 1,395,099 | 3822 | Reshape lets you fle... | ||
labeling | Justin Talbot | 1,104,986 | 3027 | Provides a range of ... | ||
evaluate | Hadley Wickham | 862,082 | 2362 | Parsing and evaluati... | ||
formatR | Yihui Xie | 640,386 | 1754 | This package provide... | ||
minqa | Katharine M. Mullen, John C. Nash, Ravi Varadhan | 600,527 | 1645 | Derivative-free opti... | ||
gridExtra | Baptiste Auguie | 581,140 | 1592 | misc. functions | ||
memoise | Hadley Wickham | 552,383 | 1513 | Cache the results of... | ||
RJSONIO | Duncan Temple Lang | 414,373 | 1135 | This is a package th... | ||
RcppArmadillo | Romain Francois and Dirk Eddelbuettel | 410,368 | 1124 | R and Armadillo inte... | ||
xlsx | Adrian A. Dragulescu | 401,991 | 1101 | Provide R functions ... |
Just as Safferling et. al. noted there is a dominance of technical packages. This is little surprising since the majority of work is with data munging. Among these technical packages there are quite a few that are used for developing other packages, e.g. roxygen2, pkgmaker, devtools, and more.
R-star authors
Just for fun I decided to look at who has the most downloads. By splitting multi-authors into several and also splitting their downloads we can find that in 2015 the top R-coders where:
top_coders <- list(
"2015" =
pkgs %>%
filter(format(date, "%Y") == 2015) %>%
partition(author) %>%
do({
authors <- strsplit(.$author, "[ ]*([,;]| and )[ ]*")[[1]]
authors <- authors[!grepl("^[ ]*(Inc|PhD|Dr|Lab).*[ ]*$", authors)]
if (length(authors) >= 1){
# If multiple authors the statistic is split among
# them but with an added 20% for the extra collaboration
# effort that a multi-author envorionment calls for
.$sum <- round(.$sum/length(authors)*1.2)
.$avg <- .$avg/length(authors)*1.2
ret <- .
ret$author <- authors[1]
for (m in authors[-1]){
tmp <- .
tmp$author <- m
ret <- rbind(ret, tmp)
}
return(ret)
}else{
return(.)
}
}) %>%
collect() %>%
group_by(author) %>%
summarise(download_ave = round(sum(avg)),
no_packages = n(),
packages = paste(name, collapse = ", ")) %>%
select(author, download_ave, no_packages, packages) %>%
collect() %>%
arrange(desc(download_ave)) %>%
head(10),
"all" =
pkgs %>%
partition(author) %>%
do({
if (grepl("Jeroen Ooms", .$author))
browser()
authors <- strsplit(.$author, "[ ]*([,;]| and )[ ]*")[[1]]
authors <- authors[!grepl("^[ ]*(Inc|PhD|Dr|Lab).*[ ]*$", authors)]
if (length(authors) >= 1){
# If multiple authors the statistic is split among
# them but with an added 20% for the extra collaboration
# effort that a multi-author envorionment calls for
.$sum <- round(.$sum/length(authors)*1.2)
.$avg <- .$avg/length(authors)*1.2
ret <- .
ret$author <- authors[1]
for (m in authors[-1]){
tmp <- .
tmp$author <- m
ret <- rbind(ret, tmp)
}
return(ret)
}else{
return(.)
}
}) %>%
collect() %>%
group_by(author) %>%
summarise(download_ave = round(sum(avg)),
no_packages = n(),
packages = paste(name, collapse = ", ")) %>%
select(author, download_ave, no_packages, packages) %>%
collect() %>%
arrange(desc(download_ave)) %>%
head(30))
interactiveTable(
do.call(rbind, top_coders) %>%
mutate(download_ave = txtInt(download_ave)),
align = "lrr",
header = c("Coder", "Total ave. downloads per day", "No. of packages", "Packages"),
tspanner = c("Top coders 2015",
"Top coders 2010-2015"),
n.tspanner = sapply(top_coders, nrow),
minimized.columns = 4,
rnames = FALSE,
col.rgroup = c("white", "#F0F0FF"))
Coder | Total ave. downloads | No. of packages | Packages |
---|---|---|---|
Top coders 2015 | |||
Gabor Csardi | 2,312 | 11 | sankey, franc, rvers... |
Stefan Widgren | 1,563 | 1 | git2r |
RStudio | 781 | 16 | shinydashboard, with... |
Hadley Wickham | 695 | 12 | withr, cellranger, c... |
Jeroen Ooms | 541 | 10 | rjade, js, sodium, w... |
Richard Cotton | 501 | 22 | assertive.base, asse... |
R Foundation | 490 | 1 | xml2 |
David Hoerl | 455 | 1 | readxl |
Sindre Sorhus | 409 | 2 | praise, clisymbols |
Richard Iannone | 294 | 2 | DiagrammeR, stationa... |
Top coders 2010-2015 | |||
Hadley Wickham | 32,115 | 55 | swirl, lazyeval, ggp... |
Yihui Xie | 9,739 | 18 | DT, Rd2roxygen, high... |
RStudio | 9,123 | 25 | shinydashboard, lazy... |
Jeroen Ooms | 4,221 | 25 | JJcorr, gdtools, bro... |
Justin Talbot | 3,633 | 1 | labeling |
Winston Chang | 3,531 | 17 | shinydashboard, font... |
Gabor Csardi | 3,437 | 26 | praise, clisymbols, ... |
Romain Francois | 2,934 | 20 | int64, LSD, RcppExam... |
Duncan Temple Lang | 2,854 | 6 | RMendeley, jsonlite,... |
Adrian A. Dragulescu | 2,456 | 2 | xlsx, xlsxjars |
JJ Allaire | 2,453 | 7 | manipulate, htmlwidg... |
Simon Urbanek | 2,369 | 15 | png, fastmatch, jpeg... |
Dirk Eddelbuettel | 2,094 | 33 | Rblpapi, RcppSMC, RA... |
Stefan Milton Bache | 2,069 | 3 | import, blatr, magri... |
Douglas Bates | 1,966 | 5 | PKPDmodels, RcppEige... |
Renaud Gaujoux | 1,962 | 6 | NMF, doRNG, pkgmaker... |
Jelmer Ypma | 1,933 | 2 | nloptr, SparseGrid |
Rob J Hyndman | 1,933 | 3 | hts, fpp, demography |
Baptiste Auguie | 1,924 | 2 | gridExtra, dielectri... |
Ulrich Halekoh Søren Højsgaard | 1,764 | 1 | pbkrtest |
Martin Maechler | 1,682 | 11 | DescTools, stabledis... |
Mirai Solutions GmbH | 1,603 | 3 | XLConnect, XLConnect... |
Stefan Widgren | 1,563 | 1 | git2r |
Edwin de Jonge | 1,513 | 10 | tabplot, tabplotGTK,... |
Kurt Hornik | 1,476 | 12 | movMF, ROI, qrmtools... |
Deepayan Sarkar | 1,369 | 4 | qtbase, qtpaint, lat... |
Tyler Rinker | 1,203 | 9 | cowsay, wakefield, q... |
Yixuan Qiu | 1,131 | 12 | gdtools, svglite, hi... |
Revolution Analytics | 1,011 | 4 | doParallel, doSMP, r... |
Torsten Hothorn | 948 | 7 | MVA, HSAUR3, TH.data... |
It is worth mentioning that two of the top coders are companies, RStudio and Revolution Analytics. While I like the fact that R is free and open-source, I doubt that the community would have grown as quickly as it has without these companies. It is also symptomatic of 2015 that companies are taking R into account, it will be interesting what the R Consortium will bring to the community. I think the r-hub is increadibly interesting and will hopefully make my life as an R-package developer easier.
My own 2015-R-experience
My own personal R experience has been dominated by magrittr and dplyr, as seen in above code. As most I find that magrittr makes things a little easier to read and unless I have som really large dataset the overhead is small. It does have some downsides related to debugging but these are negligeable.
When I originally tried dplyr out I came from the plyr environment and was disappointed by the lack of parallelization, I found the concepts a little odd when thinking the plyr way. I had been using sqldf a lot in my data munging and merging, when I found the left_join, inner_joint, and the brilliant anti_join I was completely sold. Combined with RStudio I find the dplyr-workflow both intuitive and more productive than my previous.
When looking at those packages (including more than just the top 10 here) I did find some additional gems that I intend to look into when I have the time:
- DiagrammeR An interesting new way of producing diagrams. I've used it for gantt charts but it allows for much more.
- checkmate A neat package for checking function arguments.
- covr An excellent package for testing how much of a package's code is tested.
- rex A package for making regular easier.
- openxlsx I wish I didn't have to but I still get a lot of things in Excel-format - perhaps this package solves the Excel-import inferno...
- R6 The successor to reference classes - after working with the Gmisc::Transition-class I appreciate the need for a better system.
Pingback: Distilled News | Data Analytics & R
Looking at the CRAN downloads totals can be highly misleading due to dependencies which are automatically downloaded for install “desired package”. As you have the precise date and time of events you can detect which of packages has been installed automatically as dependency and which were the “desired packages”. While total downloads is still a measure, it is usually less valuable measure to detect the trends, which are driven by “desired package” downloads.
I agree completely, although detecting desired installs is challenging at best. E.g. how do you detect that I want both httr and devtools in
install.packages(c(“httr”, “devtools”)) ?
Odd, I was certain that I had already replied to your questions.
I completely agree with the limitations of the current metric. It took me a little longer than anticipated to compile and adding fancier adjustments was not within the time frame. If I’ll revisit the subject I’ll consider adding some more fancy statistics.
One thought that I’ve had is to add the dependencies to each package. One can then look at how popular the dependencies are and reduce the downloads/day from that regression estimate. This could be a partial reduction as the packages can very well be useful on their own. A problem with this is that dependencies change over time, making this even trickier. I’m also not sure that CRAN would agree with me scraping their entire site…
The large proportion of packages used in package development indicates that the dependency issue is huge. I would though argue that a package like
checkmate
will make other packages more useful and therefor should get merits in some way. This is probably also something that is symptomatic of the package explosion and arguably part of the R trend.Another thing to remember is that RStudio is currently dominating the IDE market. While being a RStudio fan the metrics get a little impacted by the IDE dominance. Their packages get most likely a boost as they are part of the RStudio-concept, and perhaps less because of their excellence (although they do produce very high quality packages in my mind). Still, them appearing in the lists indicates that they continue to dominate the IDE-market and are arguably also part of the current trend.
One thing that is completely lacking from the current analysis is GitHub – arguably one of the biggest open software trends in recent years. I guess we won’t reach the truth anytime soon and this is partly why I added the (based on cranlogs) in the title.