- Clone the Repository from the github to your local machine
- Navigate to the project directory in terminal
There are two ways to generate the report: locally on your computer or using a Docker image.
- Locally:
- Make sure you have
makeandRinstalled on your system - Make sure you have
renvR package is installed - Open a terminal in the project directory
- Run command
make installto restore the R package environment usingrenv - Run
make reportto compile the final report
- Using Docker:
- Pull the image from DockerHub Repository
- Run command
make mount-reportin the terminal to generate the report
(This step works for both Windows-OS and a Mac/Linux-OS) - The compiled report should be in your local
\reportfolder - (Optional) If you prefer to build the image yourself instead of downloading it from DockerHub, use the command
make build_image. An image called "wwwivy111/data550_final_project" will be built.
The raw dataset Thyroid_Diff.csv was saved in the data/ folder.
Codes were saved in the code/ folder.
The final report was saved in the report/ folder.
- README.md
- Makefile
- Dockerfile
- renv.lock
- renv/
- data/
- code/
- output/
- report/
Key sections include:
- Introduction: Introduce the study
- Method and Analysis: Describes the methods and results for data preparation, exploratory data analysis, and modeling process and model evaluation
- Discussion: Discusses the implications for clinical management and future research
code/01_split_data.R
- cleans the data format
- splits the data into train and test set
- saves new datasets as different
.rdsobjects indata/folder
(clean_data.rds,train.rds,test.rds)
code/02_EDA.R
- conducts Exploratory Data Analysis (EDA)
- generates table1 and saves as
table1.rdsobject inoutput/folder - generates descriptive plots for outcome, continuous, and categorical variables and saves as
.pngobjects inoutput/folder
(descriptive_age_plots.png,descriptive_bar_outcome.png,descriptive_pie_charts.png)
code/03_modeling.R
- generate new train and test data
train_1.rdsandtest_1.rdsindata/folder - fits univariate models and multivariable model, stepwise selection model, and final model
- saves models and corresponding tables as different
.rdsobjects inoutput/folder - conducts model evaluation for the final model
- saves evaluation matrix and ROC plot as
.rdsand.pngobjects inoutput/folder
code/04_render_report.R
- renders
report.Rmd
report.Rmd
- reads outputs from
code/01_split_data.R,code/02_EDA.R,code/03_modeling.R - makes the final report
Makefile
- contains rules for building the final report and other targets
make reportwill compile the report into.htmlobjectmake split_datawill generate the outputs ofcode/01_split_data.Rmake EDAwill generate the outputs ofcode/02_EDA.Rmake modelingwill generate the outputs ofcode/03_modeling.Rmake cleanwill clean all outputs