🐳 Dataset Curation

How our dataset was curated? How to create the benchmark instances?

To obtain testable real-world repositories from GitHub, we propose a fully automated curation pipeline that utilizes GitHub Actions CI and LLM assistance, eliminating the need for human involvement in benchmark construction.

Github crawling

python -m dibench.curate.crawling --help

Searches GitHub for repositories in star_range for language (10-star batches).
Check each repo for workflows, if found, dump repo instance into JSONL.

Test CI locating

python -m dibench.curate.curate --help

Locate the test CI file
Locate the test job in the CI file
Get the ACT command
Sanitize & Mask
Get the gold patch

Execution verifying

python -m dibench.curate.verify --help

Expected:

Tests Pass when dependencies unmasked
Tests Fail when dependencies masked

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

curate.md

curate.md

🐳 Dataset Curation

Github crawling

Test CI locating

Execution verifying

Files

curate.md

Latest commit

History

curate.md

File metadata and controls

🐳 Dataset Curation

Github crawling

Test CI locating

Execution verifying