Fine tuning & catastrophic forgetting from GNNFF frameworks? #539
-
I assume we all learned and know about what catastrophic forgetting in NN in general is. I didn't have a chance to test this from MACE. But I was wondering, if I fine-tune training MP pretrained model for specific structures and surfaces of specific alloys or oxides, then would resulting model "forget" about the 'original' description of the same or similar materials from MP pre-trained model or not? For example, if I fine-tune a model for amorphous silica and amorphous silica surface, how much accuracy of SiO2 crystals and pure silicon crystal property would remain intact from 'original' model, whether I fine-tune from MP pre-trained model or other custom models for Si-O crystals? Then what would happen to 'original' if I perform multiple steps of fine tuning? As far as I know, GNN can't escape from forgetting problem, so I'm assuming GNNFF MD frameworks like MACE (and even non-GNN type frameworks like DeePMD) could experience this issue. But I don't know if MACE ever tested about this, and how to prevent or minimize the "forgetting" of model. Any idea or suggestions? I've been trying something like rehearsal, by including some data from previous training set to new training set when I try fine-tune, just in case. But this makes training set bigger and bigger as I proceed more and more fine-tune training. So I was wondering if there are any other good ways to prevent or minimize forgetting. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
We solve this by multihead-finetuning, where we retain some of the old data during fine tuning on a different head than your new data. Seems to work well. We have a multi-head-interface branch where this is implemented automatically, you just have to provide the new data. Don't forget to provide new isolated atom energies, and use spin polarised data (like the foundation model) |
Beta Was this translation helpful? Give feedback.
We solve this by multihead-finetuning, where we retain some of the old data during fine tuning on a different head than your new data. Seems to work well. We have a multi-head-interface branch where this is implemented automatically, you just have to provide the new data. Don't forget to provide new isolated atom energies, and use spin polarised data (like the foundation model)