[Kaggle - Reduce manufacturing failures] (https://www.kaggle.com/c/bosch-production-line-performance)
The goal of this competition was to predict internal failures based on thousands of measurements and tests made for each component along the assembly line, using one of the largest datasets hosted on Kaggle to date.
Each part was labelled as either passing quality control (Response = 0) or failing quality control (Response = 1) and model predictions were evaluated using [Matthew's Correlation Coefficient] (https://www.kaggle.com/c/bosch-production-line-performance/details/evaluation).
The team, Arrested Development, consisted of [Tyrone Cragg] (https://github.com/tyronecragg) and [Liam Culligan] (https://github.com/liamculligan).
The solution obtained a rank of [38th out of 1373 teams] (https://www.kaggle.com/c/bosch-production-line-performance/leaderboard/private) with a private leaderboard score of 0.48726.
The 5-fold cross validation Matthew's Correlation Coefficient was 0.47767, with a standard deviation of 0.00698.
- Create a working directory for the project
- [Download the data from Kaggle] (https://www.kaggle.com/c/bosch-production-line-performance/data) and place in the working directory
- Run
PreProcess.R
- Run feature engineering scripts:
4.1SortFeatures1.py
4.2SortFeatures2.py
4.3SortFeatures3.py
4.4SortFeatures4.py
4.5StationPath.R
4.6StationTime.R
4.7DateRolling.R
- Run the Stage 0 model scripts for the stacked generalisation:
5.1XGB1 Train.R
andXGB1 Test.R
5.2XGB2 Train.R
andXGB2 Test.R
5.3XGB3 Train.R
andXGB3 Test.R
5.4XGB4 Train.R
andXGB4 Test.R
5.5XGB5 Train.R
andXGB5 Test.R
5.6XGB6 Train.R
andXGB6 Test.R
- Run the Stage 1 model script,
XGB Stage 1.R
- R 3+
- Python 3+