-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compel to text #93
Comments
hmm. not sure exactly what you're trying to achieve but i don't think what you're doing will help - the raw input_embedding matrix isn't useful as-is, it needs to be selectively pushed through the whole CLIP encoder (which is what the token_ids do, they index into the input_embedding matrix) you might find this interesting though - https://github.com/YuxinWenRick/hard-prompts-made-easy . it's a system for simplifying/adjusting prompts by learning more efficient ways of prompting the same thing - eg you can convert a 75 token prompt to a 20 token prompt that produces a similar CLIP embedding. maybe you can use that to optimize your 150 token prompts down to 75. |
It was stumbling in the dark. The results were lackluster, just a vague semblance to the original prompt. Which is still kind of amazing tbh. I thought investing more time might give me some type of path forward. Your suggestion intuitively seems like it would get better results. Although, my brain keeps itching with ideas about sentence structure and weighting words like in Compel. Anything to get better results than the garbled mess I was working with. Tokens are fun. |
right, yeah. part of the problem is that CLIP text encoder is basically a black box, and the other part is that the >75 token hack is, well, a hack. in my experience you can get just as good "quality" by tweaking your short prompt (eg with a thesaurus website just try swapping out words for other similar words) than by writing a 150 token prompt |
I'm playing around a bit with compel and HF inference api for long prompts 150 token+. One thing the api expects is text for input so I'm trying to convert cosine similarties between token and text embeddings. Am I headed in the right direction or is this a waste of time? Code:
The text was updated successfully, but these errors were encountered: