Intelligent systems for bioinformatics- Group 1

This work is developed in the ambit of curricular unit intelligent systems for bioinformatics of the Bioinformatic Master by:

Beatriz Santos (pg46723)
Duarte Velho (pg53481)
Ricardo Oliveira (pg53501)
Rita Nobrega (pg46733)
Rodrigo Esperança (pg50923)

This work consists in the analysis of a dataset through the utilization of machine learning algorithms, recurring to Python as the programming language. The entire analysis is present on a Jupyter Notebook, organized in sections (explained later on) containing succinct explanations of the procedures and decisions taken throughout the analysis.

For this work we selected the GDSC1 dataset. This dataset contains the wet lab IC50 for 208 drugs in 1000 cancer cells lines and can be used to design models that can predict drug response since the same compound can have different levels of responses in different patients. With this we aim to design a model that given a pair of drug and cell line genomics profile can predict the drug response and find the best drug to treat certain patient. In this dataset the RMD normalized gene expression was used for cancer lines and the SMILES for drugs. Y is the log normalized IC50.

To have access to the dataset use the following code

from tdc.multi_pred import DrugRes
data = DrugRes(name = 'GDSC1')
split = data.get_split()

Notebook sections

1. Preprocessing and data exploration

Review of all documentation available about the dataset
Load the dataset and realize an exploratory analysis
Prepare the dataset with the generation and selection of features and treatment of the missing values

This stage corresponds to the 1st section of the Notebook where:

The dataset must the described according to the documentation
Summarize the characteristics of the data through an exploratory analysis
Description of the preprocessing steps justifying the choices
Include graphics that represent the main characteristics of the dataset

2. Non-supervised learning

Utilization of the adequate visualization and dimensionality reduction technique
Application of clustering methods

This stage corresponds to the section 2 of the Notebook where:

The results must be analyzed and the procedures explain

3. Machine Learning

Compare the behavior of different models/methods of machine learning through the calculation of the performance metrics
Present the best model for the dataset

This stage correspond to the section 3 of the notebook and all the results must be reported and analyzed in a critical way

4. Deep Leaning

Utilization of deep learning methods similarly to the stage 3

This stage correspond to the section 4 of the notebook and must report the results and have a critical analysis.

Name	Name	Last commit message	Last commit date
Latest commit esperancaa final org Jan 26, 2025 572ad3c · Jan 26, 2025 History 54 Commits
Deep_learning	Deep_learning	final org	Jan 26, 2025
data	data	[ADD] extra data information	Dec 3, 2024
.DS_Store	.DS_Store	final org	Jan 26, 2025
.gitignore	.gitignore	[UPDATE] .gitignore	Jan 24, 2025
Apresentação SIB.pdf	Apresentação SIB.pdf	Add files via upload	Jan 26, 2025
Apresentação SIB.pdf	Apresentação SIB.pdf	final org	Jan 26, 2025
README.md	README.md	Update README.md	Dec 29, 2024
Work_pipeline.ipynb	Work_pipeline.ipynb	final org	Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelligent systems for bioinformatics- Group 1

Notebook sections

1. Preprocessing and data exploration

2. Non-supervised learning

3. Machine Learning

4. Deep Leaning

About

Releases

Packages

Contributors 5

Languages

ricardofoliveira61/ML-data-analysis

Folders and files

Latest commit

History

Repository files navigation

Intelligent systems for bioinformatics- Group 1

Notebook sections

1. Preprocessing and data exploration

2. Non-supervised learning

3. Machine Learning

4. Deep Leaning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages