In previous posts we’ve looked into the basic structure of the torch-dataframe package. In this post we’ll go through the [mnist example][mnist ex] that shows how to best integrate the dataframe with [torchnet](https://github.com/torchnet/torchnet). Continue reading
In my previous two posts I covered the most basic data manipulation that you may need. In this post I’ll try to give a quick introduction to some of the sampling methods that we can use in our machine learning projects. Continue reading
In my [previous post][intro post] we took a look at some of the basic functionality. In this post I’ll try to show how to manipulate your dataframe. Note though, the [torch-dataframe][tdf github] is not about data munging, there are far more powerful tools in other languages for this. The aim of the modifications is to do simple tasks without being forced to switch to a different language. Continue reading
Handling [tabular data](https://en.wikipedia.org/wiki/Table_(information)) is generally at the heart of most research projects. As I started exploring [Torch](http://torch.ch/) that uses the [Lua](https://www.lua.org/) language for [deep learning](https://en.wikipedia.org/wiki/Deep_learning) I was surprised that there was no package that would correspond to the functionality available in R’s [data.frame](https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html). After some searching I found Alex Mili’s [torch-dataframe](https://github.com/AlexMili/torch-dataframe) package that I decided to update to my needs. We have during the past few months been developing the package and it has now made it onto the Torch [cheat sheet](https://github.com/torch/torch7/wiki/Cheatsheet#data-formats) (partly the reason for the posting scarcity lately). This series of posts provide a short introduction to the package (version 1.5) and examples of how to implement basic networks in Torch. Continue reading
Since I’m frequently working with large datasets and survival data I often find that the proportional hazards assumption for the Cox regressions doesn’t hold. In my most recent study on cardiovascular deaths after total hip arthroplasty the coefficient was close to zero when looking at the period between 5 and 21 years after surgery. Grambsch and Thernau’s test for non-proportionality hinted though of a problem and as I explored it there was a clear correlation between mortality and hip arthroplasty surgery. The effect increased over time, just as we had originally thought, see below figure. In this post I’ll try to show how I handle with non-proportional hazards in R. Continue reading
One of the successful insights to training neural networks has been the rectified linear unit, or short the ReLU, as a fast alternative to the traditional activation functions such as the sigmoid or the tanh. One of the major advantages of the simle ReLu is that it does not saturate at the upper end, thus the network is able to distinguish a poor answer from a really poor answer and correct accordingly.
A modification to the ReLU, the Leaky ReLU, that would not saturate in the opposite direction has been tested but did not help. Interestingly in a recent paper by the Microsoft© deep learning team, He et al. revisited the subject and introduced a Parametric ReLU, the PReLU, achieving superhuman performance on the imagenet. The PReLU learns the parameter α (alpha) and adjusts it through basic gradient descent.
In this tutorial I will benchmark a few different implementations of the ReLU and PReLU together with Theano. The benchmark test will be on the MNIST database, mostly for convenience. Continue reading
htmlTable-function has perhaps been one of my most successful projects. I developed it in order to get tables matching those available in top medical journals. As the function has grown I’ve decided to separate it from my Gmisc-package into a separate package, and at the time of writing this I’ve just released the 1.3 version. While
htmlTable allows for creating plain tables without any fancy formatting (see usage vignette) it is primarily aimed at complex tables. In this post I’ll try to show you what you can do and how to tame some of the more advanced features. Continue reading
Today is a good day to start parallelizing your code. I’ve been using the parallel package since its integration with R (v. 2.14.0) and its much easier than it at first seems. In this post I’ll go through the basics for implementing parallel computations in R, cover a few common pitfalls, and give tips on how to avoid them. Continue reading
The new R Markdown (rmarkdown-package) introduced in Rstudio 0.98.978 provides some neat features by combining the awesome knitr-package and the pandoc-system. The system allows for some neat simplifications of the fast-track-publishing (ftp) idea using so called formats. I’ve created a new package, the Grmd-package, with an extension to the html_document format, called the docx_document. The formatter allows an almost pain-free preparing of MS Word compatible web-pages.
In this post I’ll (1) give a tutorial on how to use the docx_document, (2) go behind the scenes of the new rmarkdown-package and RStudio ≥ 0.98.978, (3) show what problems currently exists when skipping some of the steps outlined in the tutorial. Continue reading
In order to celebrate my Gmisc-package being on CRAN I decided to pimp up the
forestplot2 function. I had a post on this subject and one of the suggestions I got from the comments was the ability to change the default box marker to something else. This idea had been in my mind for a while and I therefore put it into practice. Continue reading