Why do we care about best practices and reproducibility?
- Your closest collaborator is you six months ago, and you do not reply to emails – Karl Broman
- Everything via code -> avoid embarrassment, save time, avoid mistakes
- The most important tool is the mindset, when starting, that the end product will be reproducible – Keith Baggerly
- Assume that everything that you are doing right now will need to be redone at some point in the future: be prepared
The five stages of reproducibility
- Denial: I do not need to be reproducible. I have not kept track of code/scripts in years and I have been just fine. People exagerate. We do not have to be that paranoid
- Anger: Why do I have to write these stupid notes!? It takes twice the time to write notes and do the work. I could simply do the work! This is stupid and ridiculous! I am just wasting my time with notes and comments that nobody cares about!
- Bargaining: Well, perhaps it is ok if I only keep notes in the very final script or the very final function. That makes sense. No one needs to know or would even care to read my other code. Yes, maybe it is ok if I only comment at the end on the project
- Depression: I do not understand my notes. The comments that I made a year ago do not mean anything to me anymore. This has totally failed. I am a reproducibility failure. If I am not able to understand my own notes, no one will
- Acceptance: I understand that being reproducible is a process. No one does this right the first time. No one does it right period. We are all learning, and all I can do is try my best to make notes/comments and be honest and open about my research process
Take home message: Relax!
- Enjoy the process, make mistakes (git will catch you), learn as you go
- Code with purpose, be present, be mindful
- Don’t rush! You have spent years collecting data. Honor that process through mindful and reproducible data analyses