Reproducible Research in R


{.table .table-condensed .table-striped .text-left}

Instructor Tad Dallas
Location Coker 202
Times T & TH 4:25 - 5:40pm
Office Hours T 3:00 - 4:20pm

Overview

This course is designed for undergraduate students and early career graduate researchers regardless of prior experience. We aim to be accessible to those new to programming, but those who have been using R for years will find new material, best practices, and tools to enhance reproducibility in scientific research. The course is project-focused and centered around modules aimed at teaching programming skills while also exploring scientific data and questions. A series of short tutorials will introduce relevant technology, but most concepts will be first introduced in reading outside of class, leaving class time to focus on the more complex examples encountered in the modules.

Approach

This course will use a flipped classroom model, with new material introduced in reading assignments prior to class while class time will focus on applying these skills to explore interesting data sets. If you do not do the reading, you will quickly find yourself struggling to keep up. Students are expected to come to class with the conceptual background in the topic of the lecture, as the lectures will focus on skill-building and the analysis of biological data. Students will be expected to work collaboratively in and out of class, and course content and grading will emphasize communication and reproducibility of an analysis as much as scientific or technical completeness. That being said, there are numerous ways to programmatically solve the same problem, and I do not expect to see the same code from multiple people. The Course Syllabus provides an overview of the modules and topics covered as well as links to weekly reading, assignments, and any lecture material. This syllabus is preliminary and always subject to change.

Texts

There is no required text, but we will use some material from Grolemund and Wickham’s R For Data Science and Wickham’s Advanced R. Additional reading material will be linked from the syllabus. Please be sure to review the relevant reading prior to each class session.

Course design

This website, and the modular structure of the course, was inspired by Carl Boetigger’s ESPM 157 course at Berkeley (https://espm-157.carlboettiger.info/). I not only used his website code, but borrowed some of the readings and topics for tutorials. Without his willingness to keep his course materials open access, and without the open source tools to build the website, this course would have to be created from scratch. The content would surely have suffered. The focus of this class is on reproducible science, but reproducibility and access to tools are inextricably linked. The most reproducible MatLab code is still only reproducible on machines that have access to MatLab. This means that reproducible science and aspects of open science (e.g., development and use of open source tools) are quite related.