Predicting whether a patient has diabetes using the Pima Indians diabetes data from Kaggle (https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database).
Three different models were used for this binary classifier problem: decision tree, random forest, and K-nearest neighbors. Each model has its own .py file. Additionally, several data cleaning/feature engineering techniques were attempted in the testing of these models (normalization, outlier removal/replacement, oversampling, etc.), which can also be found in the code.
Code should produce confusion matrix and accuracy scores (accuracy, precision, recall, F1-score) for the model when ran. Other README file details specifics of running the code files.