Chapter 7 Packaging Code

A package enshrines all the business knowledge used to create a corpus of work in one place; including the code and its relevant documentation.

One of the difficulties that can arise in the more manual methods of statistics production is that we have many different files relating to many different stages of the process, each of which needs to be documented, and kept up to date. Part of the heavy lifting can be done here with version control as described in Chapter 6, but we can go a step further: we can create a package of code. As Hadley Wickham (author of a number of essential packages for package development) puts it for R:

Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. - Hadley Wickham

Since it is a matter of statute that we produce our statistical publications, it is essential that our publications are as reproducible as possible. Packaging up the code can also help with institutional knowledge transfer. This was exemplified in Chapter 4 where we explored help files associated with code using the R ? function.

library(eesectors)
?clean_sic()

Linking the documentation to the code makes everything much easier to understand, and can help to minimising the time taken to bring new team members up to speed. This all meets the requirements of the AQUA book in that all assumptions and constraints can be described in the package documentation asssociated tied to the relevant code.

7.1 Essential reading

Hadley Wickham’s R Packages book is an excellent and comprehensive introduction to developing your own package in R. It encourages you to start with the basics and improve over time; good advice.

7.2 Development best practices for your package

7.2.1 Licensing your code

Developing your code as an R package will require you to specify a license for your code in the DESCRIPTION file (for example the eesectors package uses the GPL-3 license). We quote the GDS Service Manual by encouraging the use of an Open Source Initiative compatible licence. For example, GDS uses the MIT licence.

It is also of note that all code produced by civil servants is automatically covered by Crown Copyright.

7.2.2 Acting as the custodian for your code

When you make your code open, you should:

  • use Semantic Versioning to make it clear when you release an update to your code
  • be clear about how you’ll communicate with users of your code, for example on support channels and email lists

Encouraging contributions from people who use your code can help make your code more robust, as people will spot bugs and suggest new features. If you would like to encourage contributions, you can create a CONTRIBUTING.md file on Github, like we demonstrate for this book.