-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clip #116
Comments
Nice. Any comparison to learn about the improvement? |
https://wandb.ai/muinez/mysana 1 is Gemma. Don't focus on the end of the 2nd run because I broke something there. Look at the 3rd run and the beginning to mid-point of the 2nd run. |
No idea what's the improvement. Can you explain more? |
The model seems to generate more aesthetically pleasing art overall, with improvements in features like eyes and textures. Prompt following has gotten worse, though, because the prompt doesn't fit within the 64 token limit (which was the limit of the CLIP version I used for training). It looks like the art has become more varied and possibly more lively—although that could be my impression, and I’m not the only one who noticed. I shared this with others, and they also think the CLIP version performs better. It’s not a huge resource hog either—if I managed to do this on a modest A6000, the model adapts quickly. I think it’s worth experimenting with if you haven’t tried it yet. If you decide to train, maybe try using the CLIP from SDXL Animagine finetune. You could then further fine-tune it on longer prompts to improve its understanding of them |
Could you please try training Sana together with CLIP, similar to how it's done in SDXL? I experimented with fine-tuning Sana on CLIP embeddings (I modified the caption channels), and the model trained significantly better compared to using pure gemma
The text was updated successfully, but these errors were encountered: