This is the second project for the Machine Learning class.
The goal of this project is to explore the Porto Seguro's Safe Driver Prediction Challenge released on Kaggle in November of 2017. We aim to experiment main models, find their best hyperparameter configurations and features combination through k-fold cross-validation to get a good final solution.
- Python >= 2.7.15
- NumPy >= 1.15.4
- matplotlib >= 2.2.3
- pandas >= 0.23.4
- scikit-learn >= 0.20.1
- imbalanced-learn >= 0.4.3
- Clone this repository into your machine
- Download and install all the requirements listed above in the given order
- Download the training and test datasets from the challenge's data page
- Place the CSV's in the data/ folder
- Edit the ConfigHelper attribute analysis_dataset with "train" or "test"
- Generate analysis of the chosen dataset
python generate_analysis.py
- Edit the ConfigHelper remaining attributes and get_training_models function according to the current sample
- Generate training results
python generate_training_results.py
- Edit the get_submission_models function in the ConfigHelper class according to the current sample
- Generate all models submission files
python generate_test_submission.py
.
├── analysis # Feature analysis files
├── code # Code files
| ├── generate_analysis.py
| ├── generate_test_submission.py
| ├── generate_training_results.py
| ├── config_helper.py
| ├── data_helper.py
| ├── io_helper.py
| ├── metrics_helper.py
| └── statistics_helper.py
├── data # Dataset files
├── results # Training results
├── submissions # Test submission files
├── LICENSE.md
└── README.md
- jpedrocm
- Flávio Filho
This project is licensed under the MIT License - see the LICENSE.md file for details.