A basic training example using GGML #707

bssrdf · 2024-01-24T17:37:18Z

bssrdf
Jan 24, 2024

Hi, I just want to share what I have been working on recently. This is an example of training a MNIST VAE. The goal is to use only ggml pipeline and its implementation of ADAM optimizer.

There aren't many training examples using ggml. ~~The only one I found is baby-llama. But I think its way of doing opmization is not quite right.~~ Found another training example in llama.cpp which shows a proper way of using Adam.

Some of the mods I have to add

Reuse the same forward and backward graph during training
Change in Adam and LBFGS optimizer to make GPU backend work
Add several missing OPs in both CPU and CUDA backends
Hooks (callbacks) added in optimizer to do tests and sample work

Below are some samples from the VAE trained on MNIST after each epoch (total 10 epochs).
|
|
|
|
|

ggerganov · 2024-01-26T13:22:12Z

ggerganov
Jan 26, 2024
Maintainer

Nice job! Thank you for sharing

If you have some feedback of what could be improved - please let us know. The training capabilities in ggml are a bit limited atm, but hopefully with time we will get to work on those and improve them

1 reply

chunhualiao Jan 29, 2024

It will be lots of fun to train a tiny GPT model using ggml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A basic training example using GGML #707

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

A basic training example using GGML #707

bssrdf Jan 24, 2024

Replies: 1 comment · 1 reply

ggerganov Jan 26, 2024 Maintainer

chunhualiao Jan 29, 2024

bssrdf
Jan 24, 2024

Replies: 1 comment 1 reply

ggerganov
Jan 26, 2024
Maintainer