Healthy data science is about improving the way healthcare systems use data to serve their patients. It’s a place to share insights and understanding about how to effectively analyze healthcare delivery and population health data using R. Articles are organized by the type of data set used for the analysis, including:
- Electronic health record data
- Medicare claims data (coming soon)
- Public data sets (coming soon)


While access to healthcare data must be highly regulated, insights and tools that enable effective analysis of that data should be publicly available. Health systems generate more digital data than ever before and for patients to benefit from this, meaningful analysis must keep pace. There is plenty of buzz and discussion about healthcare analytics, but healthy data science is a place for analysts in the thick of it to get in the weeds.


Hi, my name is Mark and I’m an MD/MPP student at Duke University. My bachelors is in Math and I work as an analyst for the Duke Institute for Health Innovation. I love the open source environment of R, learned just about everything I know about R through Coursera, and very frequently benefit from insights shared on stack overflow. If you have any questions, you can shoot me an email at mpdakkak@gmail.com.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. The R code is freely available for use without any restrictions. In other words: you may reuse the R code for any purpose (and under any license), but if you want to reuse the other content of this website, you must adhere to the CC license.

This website is hosted by GitHub pages and was created using R Mardkown.