Welcome to the Federated Learning for Rare Genetic Disorder Classification project! This repository demonstrates the application of Federated Learning (FL) techniques to classify rare genetic disorders using Electronic Health Record (EHR) data while addressing two critical challenges: privacy and limited data availability.
The primary aim of this project is to leverage Federated Learning to classify rare genetic disorders using EHR data from various hospitals, ensuring privacy preservation and overcoming data scarcity issues. Here are some key aspects of the project:
- Data Source: EHR data from various domains, including personal history, medical history, and family history, was used. You can find the dataset on Kaggle: Link to Dataset.
The project is organized as follows:
-
clients/
: This folder contains the client files where individual private EHR data is used to train a neural network. Each client represents a different hospital or data source. -
data/
: All the individual data used by the clients is stored in this folder. Each client has its own data, ensuring data privacy. -
server.py
: This file combines the weights sent by the clients and creates a global model. It acts as the central server for coordinating Federated Learning. -
model.h5
: This file contains the trained classification model, which can be used for rare genetic disorder classification.
To replicate or use this project, follow these steps:
-
Clone the repository to your local machine:
git clone https://github.com/MaitreeVaria/Federated-Learning-Rare-Genetic-Disorder-Classification.git
-
Install the required dependencies by running:
pip install -r requirements.txt
-
Obtain the EHR data from the provided Kaggle dataset or your own data sources.
-
Customize the client files in the
clients/
folder to suit your data and privacy requirements. -
Run the Federated Learning process by executing
server.py
. This will coordinate the training process across the clients. -
Once the training is complete, you can use the
model.h5
file to classify rare genetic disorders.
Contributions to this project are welcome! If you have any improvements, bug fixes, or new features to propose, please open an issue or submit a pull request. Make sure to follow the project's coding standards and guidelines.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details. The Apache License 2.0 is an open-source license that allows you to use, modify, and distribute the code, subject to certain conditions and limitations specified in the license.
Thank you for your interest in the Federated Learning for Rare Genetic Disorder Classification project. If you have any questions or need assistance, please don't hesitate to reach out.
Happy Federated Learning!