Replies: 1 comment
-
I have got exactly the same problem with local model, with all llama modells an via different apperoaches, llama.cpp, ollama, transformers, langchain. All much worse than using the hugging face api to run it online. Please let me know if you found any solution :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all, I hope this is the right location to post this!
I am prompting Llama3 8B Instruct Q5_K_M - which is running on llama.cpp - using the
/completion
API endpoint. My confusion is that the results are fundamentally different (worse), than when I use the Llama3 Online Demo. The difference does not just come down to natural variations in responses, instead they are completely different, often syntactically false, despite same parameter settings.Yes, the online demo is the base model of Llama3 while I use the finetuned Instruct version (i could not get the base model running due to not knowing the correct chat template to apply, as the tokenizer from huggingface does not include one). Yet, imo that does not result in such a difference.
Here is an original example prompt that I use for the llama3 8B Instruct. It tasks the LLM to perform a code segmentation by inserting special characters into the code:
The response from with Llama 3 8B Instruct on llama.cpp:
(thats it, it cut of here. Token limit is set to 1400)
Sometimes responses are different and the llm briefly explains something vagely related to segmenting code, but nothing at all similar to the expected (and well within capabilities) output.
In comparison, when I put the same prompt (without the special <|eot|> tokens, etc.) into the online demo, I get this response (which is exactly the expected result):
Quite obviously there is something wrong with my llama.cpp prompt. To me it looks like the result of an incorrect chat template, however I believe that the prompt template is correct and uses the correct syntax. it matches the syntax described by Meta themselves: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3#meta-llama-3
I observed similar behavior with other models (Mistral, Llama2) before and also with prompts that do not include code - I haven't been able to reproduce those examples recently though.
Any clues on how to continue debugging here? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions