Fed-EKit Overview

Fed-EKit

Federated Easy-to-Use Large Language Model Kit with Efficient Fine-Tuning

Fed-EKit Overview

Fed-EKit aims to provide researchers with a comprehensive solution for fine-tuning Large Language Models (LLMs) in a Federated Learning environment. This project focuses on combining ease of use with flexibility, supporting a wide range of datasets, models, performance enhancement methods, and evaluation approaches.

Core Features

Federated Learning Support

Fed-EKit is specifically designed for federated learning environments, allowing users to collaboratively train and fine-tune large language models while protecting data privacy.

Ease of Use

The project offers a set of intuitive tools and interfaces, making it accessible for both beginners and experienced developers. With simplified installation and configuration processes, users can quickly start their projects.

Large Language Model (LLM) Integration

Integrates a variety of the latest large language models, offering users a broad selection to suit different application scenarios and requirements.

Parameter Efficient Fine-Tuning

Utilizes advanced fine-tuning techniques to optimize model performance while reducing the need for computational resources, making the models more efficient.

Flexibility and Customization

Supports various datasets, model structures, and evaluation methods, allowing users to customize and adjust according to their specific needs.

Community-Driven Open Source

As an open-source project, Fed-EKit encourages community participation, thereby continuously improving and expanding its functionalities.

Data_Preparation

Prior to commencing the federated fine-tuning, make sure to create a data file for each individual client.

num_client=10 # The number of clients
diff_quantity=0 # Whether clients have different amounts of data
python client_data_allocation.py $num_client $diff_quantity

Running this command will save the data files in the folder ./data/str(num_client). The data file new-databricks-dolly-15k.json for generating each client's local dataset is the first version of databricks-dolly-15k , which is a corpus of more than 15,000 records with 8 categeries generated by thousands of Databricks Lab employees. Please refer to their official repository dolly for the latest version of data.

Use your own data

You can simply modify client_data_allocation.py to load your own dataset for federated training.

Federated_Finetuning

To fully leverage the computational resources of each participating client, our lightweight Federated Learning framework employs the well-established parameter-efficient method, LoRA, for conducting local training. The local training process is built upon the implementations of Hugging Face's PEFT, Tim Dettmers' bitsandbytes, and the Alpaca-lora, enabling the training to be completed within hours on a single NVIDIA TITAN RTX.

Example usage:

python main.py --global_model 'chavinlo/alpaca-native'\
      --data_path  "./data" \
      --output_dir  './lora-shepherd-7b/'\
      --num_communication_rounds 10 \
      --num_clients  10 \
      --train_on_inputs \
      --group_by_length

Within the main.py file, the GeneralClient is a Python class serves as a representation of the local client and encompasses five distinct sections that facilitate local training: "prepare_local_dataset," "build_local_trainer," "initiate_local_training," "train," and "terminate_local_training." Each of these sections is easy to comprehend and can be easily customized by adding your own functions to meet specific requirements.

We can also tweak the hyperparameters:

python main.py --global_model 'chavinlo/alpaca-native'\
      --data_path  "./data" \
      --output_dir  './lora-shepherd-7b/'\
      --num_communication_rounds 10 \
      --num_clients  10 \
      --client_selection_frac 0.1 \
      --local_num_epochs  2 \
      --local_batch_size  64 \
      --local_micro_batch_size 32 \
      --local_learning_rate 0.0003 \
      --lora_r 8 \
      --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
      --train_on_inputs \
      --group_by_length

Our framework supports numerous popular LLMs, such as LLaMA, Alpaca, Vicuna, Baize, and others. We welcome any pull requests that adapt our code to support additional models or datasets.

Inference

The GlobalModel_generate.py file streamlines the inference process for the global model by utilizing a Gradio interface. This file loads the foundation model from the Hugging Face Model Hub and obtains the LoRA weights and configurations from the output directory.

python GlobalModel_generate.py \
      --load_8bit \
      --base_model 'chavinlo/alpaca-native' \
      --lora_weights_path /output/path/to/lora_weights  \
      --lora_config_path /output/path/to/lora_config

Evaluation

Use evaluate.py file.

python GlobalModel_generate.py \
      --load_8bit \
      --dataset rte\
      --be_trained True \
      --base_model 'chavinlo/alpaca-native' \
      --lora_weights_path /output/path/to/lora_weights  \
      --lora_config_path /output/path/to/lora_config

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
assets		assets
bitsandbytes		bitsandbytes
data_download		data_download
data_tool		data_tool
fed_utils		fed_utils
model_utils		model_utils
output		output
templates		templates
utils		utils
vicuna-blog-eval		vicuna-blog-eval
.gitignore		.gitignore
GlobalModel_generated.py		GlobalModel_generated.py
README.md		README.md
calibration_cc.py		calibration_cc.py
dataset_demo.py		dataset_demo.py
main.py		main.py
parse.py		parse.py
requirements.txt		requirements.txt
test.py		test.py
数据加载流程.md		数据加载流程.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fed-EKit

Federated Easy-to-Use Large Language Model Kit with Efficient Fine-Tuning

Fed-EKit Overview

Core Features

Federated Learning Support

Ease of Use

Large Language Model (LLM) Integration

Parameter Efficient Fine-Tuning

Flexibility and Customization

Community-Driven Open Source

Data_Preparation

Use your own data

Federated_Finetuning

Inference

Evaluation

About

Releases

Packages

Languages

ZackZikaiXiao/FedLLM

Folders and files

Latest commit

History

Repository files navigation

Fed-EKit

Federated Easy-to-Use Large Language Model Kit with Efficient Fine-Tuning

Fed-EKit Overview

Core Features

Federated Learning Support

Ease of Use

Large Language Model (LLM) Integration

Parameter Efficient Fine-Tuning

Flexibility and Customization

Community-Driven Open Source

Data_Preparation

Use your own data

Federated_Finetuning

Inference

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages