Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions #83

Open
dpcdsg opened this issue Feb 17, 2025 · 4 comments
Open

some questions #83

dpcdsg opened this issue Feb 17, 2025 · 4 comments

Comments

@dpcdsg
Copy link

dpcdsg commented Feb 17, 2025

I want to ask if only https://bigcode-bigcodebench-evaluator.hf.space/ can be used to generate scores after the results are generated.

@terryyz
Copy link
Collaborator

terryyz commented Feb 17, 2025

Yes, unless you just want to check the ground-truth pass rate. To evaluate any models, you need to do the generation first.

@dpcdsg
Copy link
Author

dpcdsg commented Feb 17, 2025

I already have results generated by other models, and now I want to score them. Can I only use the https://bigcode-bigcodebench-evaluator.hf.space/ you provided to get the scores?

@terryyz
Copy link
Collaborator

terryyz commented Feb 17, 2025

Regarding the gradio (HF space) endpoint, please refer to the following note:

The gradio backend is hosted on the Hugging Face space by default. The default space can be sometimes slow, so we recommend you to use the gradio backend with a cloned bigcodebench-evaluator endpoint for faster evaluation. Otherwise, you can also use the e2b sandbox for evaluation, which is also pretty slow on the default machine.

For any other execution methods, please refer to ADVANCED_USAGE., where you can choose the execution from e2b, gradio, local. e2b is typically slower than the others. local requires you to either use the provided docker image or manually configure the local environment.

@dpcdsg
Copy link
Author

dpcdsg commented Feb 17, 2025

OK, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants