This work is developed in the ambit of curricular unit intelligent systems for bioinformatics of the Bioinformatic Master by:
- Beatriz Santos (pg46723)
- Duarte Velho (pg53481)
- Ricardo Oliveira (pg53501)
- Rita Nobrega (pg46733)
- Rodrigo Esperança (pg50923)
For this work we selected the GDSC1 dataset. This dataset contains the wet lab IC50 for 208 drugs in 1000 cancer cells lines and can be used to design models that can predict drug response since the same compound can have different levels of responses in different patients. With this we aim to design a model that given a pair of drug and cell line genomics profile can predict the drug response and find the best drug to treat certain patient. In this dataset the RMD normalized gene expression was used for cancer lines and the SMILES for drugs. Y is the log normalized IC50.
To have access to the dataset use the following code
from tdc.multi_pred import DrugRes
data = DrugRes(name = 'GDSC1')
split = data.get_split()
- Review of all documentation available about the dataset
- Load the dataset and realize an exploratory analysis
- Prepare the dataset with the generation and selection of features and treatment of the missing values
This stage corresponds to the 1st section of the Notebook where:
- The dataset must the described according to the documentation
- Summarize the characteristics of the data through an exploratory analysis
- Description of the preprocessing steps justifying the choices
- Include graphics that represent the main characteristics of the dataset
- Utilization of the adequate visualization and dimensionality reduction technique
- Application of clustering methods
This stage corresponds to the section 2 of the Notebook where:
- The results must be analyzed and the procedures explain
- Compare the behavior of different models/methods of machine learning through the calculation of the performance metrics
- Present the best model for the dataset
This stage correspond to the section 3 of the notebook and all the results must be reported and analyzed in a critical way
- Utilization of deep learning methods similarly to the stage 3
This stage correspond to the section 4 of the notebook and must report the results and have a critical analysis.