Chapter 14 Final Thoughts
It’s called Data Science for a reason, record all the data handling, experimentation and analysis in your lab-notebook.
Version control is your digital lab notebook.
14.1 A Structured RAP Course
You’re welcome to dip into the previous chapters as and when you need, but you may prefer a more comprehensive grounding in the principles of reproducibility. We provide a sequenced list of lessons here to help you on your journey in becoming a RAP champion. We suggest you work through the list. The links designated as HELP are provided as comprehensive resources if you get stuck, and can be otherwise skipped. The estimated completion time for a resource is given in parentheses.
- The Unix Shell or terminal
- Version Control with git
- Quick overview for those without a computing background (30 mins)
- Interactive lesson (3 hours)
- HELP: Comprehensive git book
- Quick overview for those without a computing background (30 mins)
- git and Github
- The difference between git and Github (5 mins)
- ADVANCED: Github workflow written (1 hour)
- Github workflow visual
- The difference between git and Github (5 mins)
- RStudio, R projects and R fundamentals
- Interactive lesson (12 hours)
- Interactive lesson (some overlap with above lesson) (7 hours)
- HELP:
- Interactive lesson (12 hours)
- Rmarkdown
- Passive lesson (3 hours)
- MoJ tutorial
- HELP: Write a book in Rmarkdown using bookdown
- Passive lesson (3 hours)
- R packages
- Prevent dependency hell with packrat
- packrat setup (1 hour)
- packrat setup (1 hour)
- Data concerns
- Tidy data (1 hour)
- The minimal tidy data set (1 hour)
- HELP: thinking about data from spreadsheets
- HELP: data management with SQL SQL
- HELP: organising data
- Tidy data (1 hour)
- Unit tests
- Hadley Wickham’s chapter (2 hours)
- Hadley Wickham’s chapter (2 hours)
- Automated testing; detects problems you might miss.
- Further reading
It’s also worthwhile looking at other Department’s RAP efforts. For a good open example see DCMS’s eesectors package.
14.2 RAP MOOC
To complement this book, one of our RAPpers has developed a Massive Online Open Course to share an approach to learning this technical skill-set. This course is an informal introduction and describes the best practices through the use of screencasts and assignments. It is currently available on Udemy and takes you through the RAP journey using a simple RAP example.
14.3 User feedback
The RAP companion is intended to point data scientists in the Civil Service towards the Data Ops toolkit that should be used when attempting to automate some of the boring stuff for their colleagues. Your feedback is welcome and important. It will help us improve the RAP companion through further iterations.
You can feedback through completing this Google form which allows us to measure your satisfaction.
Or, by raising an issue on the RAP companion repo page.
14.4 Just the beginning…
We’ve introduced you to the basics of reproducibility and Data Ops. For further development ideas and inspiration consider the Data Ops manifesto.