Skip to content

mindspore-lab/mindformers

Repository files navigation

MindSpore Transformers (MindFormers)

LICENSE Downloads PyPI

1. Introduction

The goal of the MindFormers suite is to build a full-process development suite for foundation model training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based pre-trained models and SOTA downstream task applications in the industry, covering various parallel features. It is expected to help users easily implement foundation model training and innovative R&D.

Based on MindSpore's built-in parallel technology and component-based design, the MindFormers suite has the following features:

  • Seamless switch from single-device to large-scale cluster training with just one line of code
  • Flexible and easy-to-use personalized parallel configuration
  • Automatic topology awareness, efficiently combining data parallelism and model parallelism strategies
  • One-click launch for single-device/multi-device training, fine-tuning, evaluation, and inference for any task
  • Support for users to configure any module in a modular way, such as optimizers, learning strategies, and network assembly
  • High-level usability APIs such as Trainer, pipeline, and AutoClass.
  • Built-in SOTA weight auto-download and loading functionality
  • Seamless migration and deployment support for AI computing centers

For details about MindFormers tutorials and API documents, see MindFormers Documentation. The following are quick jump links to some of the key content:

If you have any suggestions on MindFormers, contact us through an issue, and we will address it promptly.

Supported Models

The following table lists models supported by MindFormers.

Model Specifications Model Type
Llama2 7B/13B/70B Dense LLM
Llama3 8B/70B Dense LLM
Llama3.1 8B/70B Dense LLM
Qwen 7B/14B Dense LLM
Qwen1.5 7B/14B/72B Dense LLM
Qwen2 0.5B/1.5B/7B/57B/57B-A14B/72B Dense/Sparse MoE LLM
Qwen-VL 9.6B Multimodal
GLM2 6B Dense LLM
GLM3 6B Dense LLM
GLM3-32K 6B Dense LLM
GLM4 9B Dense LLM
CogVLM2-Video 13B Multimodal
CogVLM2-Image 19B Multimodal
InternLM 7B/20B Dense LLM
InternLM2 7B/20B Dense LLM
DeepSeek-Coder 33B Dense LLM
DeepSeek-Coder-V1.5 7B Dense LLM
DeepSeek-V2 236B Sparse MoE LLM
CodeLlama 34B Dense LLM
Mixtral 8x7B Sparse MoE LLM
Baichuan2 7B/13B Dense LLM
Yi 6B/34B Dense LLM
GPT2 13B Dense LLM
Whisper 1.5B Multimodal

2. Installation

Version Mapping

Currently, the Atlas 800T A2 training server is supported.

Python 3.10 is recommended for the current suite.

MindFormers MindSpore CANN Driver/Firmware Image Link
In-development version In-development version In-development version In-development version Not involved

Historical Version Supporting Relationships:

MindFormers MindSpore CANN Driver/Firmware Image Link
r1.3.0 2.4.0 8.0.RC3.beta1 24.1.RC3 Link
r1.2.0 2.3.0 8.0.RC2.beta1 24.1.RC2 Link

Installation Using the Source Code

Currently, MindFormers can be compiled and installed using the source code. You can run the following commands to install MindFormers:

git clone -b dev https://gitee.com/mindspore/mindformers.git
cd mindformers
bash build.sh

3. User Guide

MindFormers supports model pre-training, fine-tuning, inference, and evaluation. You can click a model name in Supported Models to view the document and complete the preceding tasks. The following describes the distributed startup mode and provides an example.

It is recommended that MindFormers launch model training and inference in distributed mode. Currently, the scripts/msrun_launcher.sh distributed launch script is provided as the main way to launch models. For details about the msrun feature, see msrun Launching. The input parameters of the script are described as follows.

Parameter Required on Single-Node Required on Multi-Node Default Value Description
WORKER_NUM 8 Total number of compute devices used on all nodes
LOCAL_WORKER - 8 Number of compute devices used on the current node
MASTER_ADDR - 127.0.0.1 IP address of the primary node to be started in distributed mode
MASTER_PORT - 8118 Port number bound for distributed startup
NODE_RANK - 0 Rank ID of the current node
LOG_DIR - output/msrun_log Log output path. If the path does not exist, create it recursively.
JOIN - False Specifies whether to wait for all distributed processes to exit.
CLUSTER_TIME_OUT - 7200 Waiting time for distributed startup, in seconds.

Note: If you need to specify device_id for launching, you can set the environment variable ASCEND_RT_VISIBLE_DEVICES. For example, to use devices 2 and 3, input export ASCEND_RT_VISIBLE_DEVICES=2,3.

Single-Node Multi-Device

# 1. Single-node multi-device quick launch mode. Eight devices are launched by default.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config {CONFIG_PATH} \
  --run_mode {train/finetune/eval/predict}"

# 2. Single-node multi-device quick launch mode. You only need to set the number of devices to be used.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config {CONFIG_PATH} \
  --run_mode {train/finetune/eval/predict}" WORKER_NUM

# 3. Single-node multi-device custom launch mode.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config {CONFIG_PATH} \
  --run_mode {train/finetune/eval/predict}" \
  WORKER_NUM MASTER_PORT LOG_DIR JOIN CLUSTER_TIME_OUT
  • Examples

    # Single-node multi-device quick launch mode. Eight devices are launched by default.
    bash scripts/msrun_launcher.sh "run_mindformer.py \
      --config path/to/xxx.yaml \
      --run_mode finetune"
    
    # Single-node multi-device quick launch mode.
    bash scripts/msrun_launcher.sh "run_mindformer.py \
      --config path/to/xxx.yaml \
      --run_mode finetune" 8
    
    # Single-node multi-device custom launch mode.
    bash scripts/msrun_launcher.sh "run_mindformer.py \
      --config path/to/xxx.yaml \
      --run_mode finetune" \
      8 8118 output/msrun_log False 300

Multi-Node Multi-Device

To execute the multi-node multi-device script for distributed training, you need to run the script on different nodes and set MASTER_ADDR to the IP address of the primary node. The IP address should be the same across all nodes, and only the NODE_RANK parameter varies across nodes.

# Multi-node multi-device custom launch mode.
bash scripts/msrun_launcher.sh "run_mindformer.py \
 --config {CONFIG_PATH} \
 --run_mode {train/finetune/eval/predict}" \
 WORKER_NUM LOCAL_WORKER MASTER_ADDR MASTER_PORT NODE_RANK LOG_DIR JOIN CLUSTER_TIME_OUT
  • Examples

    # Node 0, with IP address 192.168.1.1, serves as the primary node. There are a total of 8 devices, with 4 devices allocated per node.
    bash scripts/msrun_launcher.sh "run_mindformer.py \
      --config {CONFIG_PATH} \
      --run_mode {train/finetune/eval/predict}" \
      8 4 192.168.1.1 8118 0 output/msrun_log False 300
    
    # Node 1, with IP address 192.168.1.2, has the same launch command as node 0, with the only difference being the NODE_RANK parameter.
    bash scripts/msrun_launcher.sh "run_mindformer.py \
      --config {CONFIG_PATH} \
      --run_mode {train/finetune/eval/predict}" \
      8 4 192.168.1.1 8118 1 output/msrun_log False 300

Single-Device Launch

MindFormers provides the run_mindformer.py script as the single-device launch method. This script can be used to complete the single-device training, fine-tuning, evaluation, and inference of a model based on the model configuration file.

# The input parameters for running run_mindformer.py will override the parameters in the model configuration file.
python run_mindformer.py --config {CONFIG_PATH} --run_mode {train/finetune/eval/predict}

4. Life Cycle And Version Matching Strategy

MindFormers version has the following five maintenance phases:

Status Duration Description
Plan 1-3 months Planning function.
Develop 3 months Build function.
Preserve 6-12 months Incorporate all solved problems and release new versions. For MindFormers of different versions, implement a differentiated preservation plan: the preservation period of the general version is 6 months, while that of the long-term support version is 12 months.
No Preserve 0—3 months Incorporate all the solved problems, there is no full-time maintenance team, and there is no plan to release a new version.
End of Life (EOL) N/A The branch is closed and no longer accepts any modifications.

MindFormers released version preservation policy:

MindFormers Version Corresponding Label Preservation Policy Current Status Release Time Subsequent Status EOL Date
1.3.2 v1.3.2 General Version No Preserve 2024/12/20 No preserve expected from 2025/06/20
1.2.0 v1.2.0 General Version No Preserve 2024/07/12 No preserve expected from 2025/01/12
1.1.0 v1.1.0 General Version No Preserve 2024/04/15 End of life is expected from 2025/01/15 2025/01/15

5. Disclaimer

scripts/examples directory are provided as reference examples and do not form part of the commercially released products. They are only for users' reference. If it needs to be used, the user should be responsible for transforming it into a product suitable for commercial use and ensuring security protection. MindSpore does not assume security responsibility for the resulting security problems.

6. Contribution

We welcome contributions to the community. For details, see MindFormers Contribution Guidelines.

7. License

Apache 2.0 License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages