Compel to text #93

HatmanStack · 2024-06-15T17:56:53Z

I'm playing around a bit with compel and HF inference api for long prompts 150 token+. One thing the api expects is text for input so I'm trying to convert cosine similarties between token and text embeddings. Am I headed in the right direction or is this a waste of time? Code:

tokenizer = AutoTokenizer.from_pretrained(item.modelID, subfolder="tokenizer") 
clip = CLIPTextModel.from_pretrained(item.modelID, subfolder="text_encoder")

compel = Compel(tokenizer=tokenizer, text_encoder=clip)
conditioning = compel.build_conditioning_tensor(prompt)
token_embeddings = clip.get_input_embeddings().weight
normalized_token_embeddings = normalize(token_embeddings, dim=1)

# Reshape the conditioning tensor to match the shape of the token embeddings
normalized_conditioning = normalize(conditioning.view(-1, normalized_token_embeddings.shape[1]), dim=1)
cosine_similarities = torch.mm(normalized_conditioning, normalized_token_embeddings.t())

max_similarity_indices = torch.argmax(cosine_similarities, dim=1)
# Convert the token indices back into text
text = tokenizer.batch_decode(max_similarity_indices.tolist(), skip_special_tokens=True)
promptString = " ".join(text)

The text was updated successfully, but these errors were encountered:

damian0815 · 2024-06-20T10:29:12Z

hmm. not sure exactly what you're trying to achieve but i don't think what you're doing will help - the raw input_embedding matrix isn't useful as-is, it needs to be selectively pushed through the whole CLIP encoder (which is what the token_ids do, they index into the input_embedding matrix)

you might find this interesting though - https://github.com/YuxinWenRick/hard-prompts-made-easy . it's a system for simplifying/adjusting prompts by learning more efficient ways of prompting the same thing - eg you can convert a 75 token prompt to a 20 token prompt that produces a similar CLIP embedding. maybe you can use that to optimize your 150 token prompts down to 75.

HatmanStack · 2024-06-20T10:46:19Z

It was stumbling in the dark. The results were lackluster, just a vague semblance to the original prompt. Which is still kind of amazing tbh. I thought investing more time might give me some type of path forward. Your suggestion intuitively seems like it would get better results. Although, my brain keeps itching with ideas about sentence structure and weighting words like in Compel. Anything to get better results than the garbled mess I was working with. Tokens are fun.

damian0815 · 2024-06-20T16:49:51Z

right, yeah. part of the problem is that CLIP text encoder is basically a black box, and the other part is that the >75 token hack is, well, a hack. in my experience you can get just as good "quality" by tweaking your short prompt (eg with a thesaurus website just try swapping out words for other similar words) than by writing a 150 token prompt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compel to text #93

Compel to text #93

HatmanStack commented Jun 15, 2024 •

edited

Loading

damian0815 commented Jun 20, 2024 •

edited

Loading

HatmanStack commented Jun 20, 2024

damian0815 commented Jun 20, 2024

Compel to text #93

Compel to text #93

Comments

HatmanStack commented Jun 15, 2024 • edited Loading

damian0815 commented Jun 20, 2024 • edited Loading

HatmanStack commented Jun 20, 2024

damian0815 commented Jun 20, 2024

HatmanStack commented Jun 15, 2024 •

edited

Loading

damian0815 commented Jun 20, 2024 •

edited

Loading