I have been writing code in multiple languages since 1994 and for almost a decade now I have been active within the open source community, primarily in R but I’ve also published some JavaSript/TypeScript packages. Publishing something that other people enjoy is a pure joy and the fact that some of my packages have download counts in the thousands is something that truly warms my heart.
Throughout the years I have though noticed that there is newcomers to the open source community often struggle with how to get help, which is why I decided to write this post. I’ll start out with some basics and then a little more advanced topics such as how to write an issue or a pull request.
“Help me, I’m stuck!”
- 1 “Help me, I’m stuck!”
- 2 Who are these open source folks?
- 3 Writing a good Stack Overflow question
- 4 Interacting with authors
- 5 Contribute yourself!
This happens to us all and the steps for solving your issue follow a simple path:
- Google the problem:
- Find key words that are unique to your issue
- Copy-paste error codes/messages into the search field (if you get to few hits, remove sections of the error code, especially anything that is specific to your setting, e.g. filename in the message)
- Make sure you understand what is happening, the easiest way is usually to print variables and inspect their content (this is usually frowned upon, but we humans are lazy by nature). A slightly more advanced approach is to use a debugger, in R there I often use
debuggercan be quite useful).
- Ok, so none of those worked then we read the manual or if you are in R, start looking if there is a vignette that covers what you are trying to do. Usually we did look at the manual in the beginning but most likely we skimmed it quickly, jumped to conclusions and started writing code. Quite often the information is there in the manual but perhaps less accessible, writing a good manual is difficult and as a package maintainer I always welcome ideas for making my manuals more accessible.
- Now is the time to start looking for human help, ideally you start with writing up a question on Stack Overflow (see section below)
- Get a GitHub account and write an issue (see section below)
Usually before I start engaging other people I try to have a look at the source code and see if I can understand it myself. This is often useful as I sometimes pick up neat tricks while solving the original problem, although it requires some skill and is perhaps not something that you should do as a beginner.
Who are these open source folks?
Understanding who is behind the open source community is key if you want to get help. First and foremost most of us are doing this for fun in our spare time and we don’t get paid. Some examples that I recently encountered that suggest that this isn’t obvious:
- I recently helped out with a JavaSript package, graphql-passport. In a follow-up issue there was a related bug that I was asked to help out with and wrote that I’ll try to have a look at this which someone downvoted without even taking the time to comment.
- In an email from a researcher I got a question regarding my forestplot package. I recommended they post on Stack Overflow first so that others could find the solution, and got this back “It’s 6 hours I posted my problem in StackOverflow, and I have not got any response yet. It’s very urgent for me to get the solution.”
Clearly none of these bothered to understand that I’m primarily an orthopedic surgeon and that I offer my code so that others may use it, enjoy it and hopefully get something useful out of it. If you are in a hurry, then you can always find people that know how to code that you can pay to help you out.
Writing a good Stack Overflow question
Writing a question should take time. A good question will give you an answer within minutes while a poor will perhaps never be answered. I have usually had success with the following structure:
- Make the title as precise as you can
- Start with the question, possibly with an intro, e.g. “I’m trying to do a forestplot with multiple bands per row, how should I structure my input data?”
- Short background on language, packages you’re using and other relevant details.
- Add a reproducible code example – this is probably one of the most time consuming details but it is vital as the answer can build upon this code and make sure that the answer is the one that you are looking for
- Add a longer background if needed with more details – don’t be shy to use headlines that separate sections. It is sometimes difficult to understand what detail you have missed and here you can add all those things that could possibly be contributing to your problem. At this time the reader will be familiar with the problem and can easily skip anything that he/she deems irrelevant (don’t make this too large though as a huge question may be intimidating).
- Repeat your question, at the end you can feel free to repeat your question, possibly with some more details.
- Choose the relevant tags, e.g. R, forestplot if you have an R question about forest plots.
While writing your question, Stack Overflow suggests similar questions, it has happened to me some times that I write a question and the find that I missed it during the “Googling step”. Similarly, when I’ve been writing my reproducible code example I figured out the solution on my own as I have removed all the noise and made everything much clearer. Sometimes it can bee good idea to then answer your own question, you won’t get points for this but it is an easy way to become an open source contributor as others will find your answer and hopefully save precious hours of debugging.
The more popular a package is the more issues and questions arise for that package. Keep this in mind when you consider to contact authors as maintaining a popular package can be a lot of work. Make sure to be kind and solution oriented (despite being frustrated and annoyed that your code is failing). Also, once you post an issue/pull request make sure that you quickly answer any questions that the community has. If I post a question without getting an answer, my willingness to spend time solving the issue quickly diminishes unless it affects my own work.
Writing an issue
An issue can be of two types, (1) a bug in the software that needs correction, or (2) a feature that you want to be implemented. When you write your issue, usually on GitHub, you should structure it similarly to the section on “Writing a good Stack Overflow question”; it should be clear what the issue is about and there should be some code example (ideally you should look at the already existing examples within the package and use that code). If you want to speed up the implementation you can suggest a solution and/or a test case. Issues usually take longer time to get resolved than Stack Overflow questions and sometimes I write my issue first on Stack Overflow so that I am certain that I haven’t missed something in the manual.
Before you write an issue it can be a good idea to see if the problem has already an existing issue. Note that some issues may still exist and have been closed (make sure to remove “is:open” for GitHub searches) as the author may have decided that it isn’t an issue that is fixeable or where there already is a working solution.
Make a pull request
If you have found the source of the issue and your fairly certain that this is the cause of the problem, then do a pull request (also know as a PR). In addition to fixing the problem you should add a test case (unless you are changing the manual), this makes it much easier for the author to understand the problem and trust the solution. Make sure to follow each author’s coding style, e.g. if someone uses CamelCase then you should also be using CamelCase. Linting is not yet that common in languages such as R (e.g. lintr), but judging from developments in more popular languages, we can expect linting to become commonplace the coming decade. By making sure that you follow the style that the author has chosen, it usually becomes easier to read and understand code as a homogenous style allows you to focus more on the code intention than on argument spacing, brackets etc.
In GitHub creating a pull request is fairly easy, fork the repository with the package, do your changes and then click on pull request once you have uploaded the changes. A pull request can be done alone or in combination with writing an issue. I often add an issue if it is a somewhat more complex issue that I am addressing.
I sometimes receive e-mails from users asking questions. Usually I try to refer everyone to Stack Overflow as I believe that e-mailed questions scale poorly; htmlTable has 160 000+ downloads/month and therefore it is reasonable that all support should be available through a Google-search. I often encourage people to e-mail me a link to a Stack Overflow question that has gone unanswered for a week and then I can have a look at it.
At the core of the open source community is that everyone can contribute. Contribution can be simple and seeing something you wrote being used, is surprisingly satisfying.
Try to take the time to look for questions on Stack Overflow that you can answer or if you know a package well, look into all the issues that people have with that package and see if your knowledge can help them. Many of the posted issues have already existing functionality that the users haven’t found.
Improve a package
Writing and maintaining a package is a lot of work, I would therefore suggest that you look for already existing solutions and look for what is missing something. The easiest starting point is to ask yourself what in the manual doesn’t make sense and suggest changes. There are always things that can be improved upon and as an author you often become myopic and fail to see everything from spelling mistakes to errors/ambiguities.
Remember that the sky is the limit. Some of the best features in my packages come from suggestions that I have gotten from others. Sometimes getting the core functionality in place takes so much effort that the you forget about the finishing touches that can do miracles for your users.
Publishing your own R-packages
Publishing a package is a lot of work, especially if you are writing an R-package and are aiming for CRAN. I think CRAN is amazing but it has the most ambitious code quality routines that I have ever encountered. Even a published package will have to be updated once in a while as the requirements change. Despite this, it may be worthwhile; the checks are good for guaranteeing that your code has a certain quality and is well documented.
What should I publish?
As R is a programming language for statisticians I thought in the beginning that packages should address hardcore statistics, e.g. regressions. In truth, the statistics is a tiny part of the research process and everything from preparing datasets to cool new plots are good enough for publishing. My packages with the most impact have been tools that I was working on during my thesis. These were simple things like tables, plots etc that I found useful.
What to think of when writing a package?
We all have our regrets, some of mine are due to bad argument naming in my packages. Once you’ve published a package and it becomes a success, changing the argument names, function structure etc incur a pain for others and should be avoided if possible. Planning is therefore crucial, here is a simple list of things to think of when writing your first real package:
- Look at other popular packages and their names for functions and arguments and use that as a source of inspiration.
- Add as many tests as you can from the start, these will make maintenance much easier as you don’t have to worry so much about breaking stuff when you add new features.
- Use a linter for code consistency.
- Make good use of
...and the S3-system, one of my regrets have been that I have not used
summaryand other functions for my own classes as much as I should have. These could have simplified things a great deal.
- Documentation is today written in roxygen2. Take your time to look through all the features.