-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixed json schema to allow for html elements
- Loading branch information
1 parent
4d13582
commit f7b6c6f
Showing
11 changed files
with
707 additions
and
206 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# DOM Task Format | ||
|
||
This document describes the format for DOM interaction tasks in our benchmark. | ||
|
||
## Schema | ||
|
||
Tasks are defined in JSONL format, where each line is a valid JSON object following the schema in `task_schema.json`. | ||
|
||
## Example Task | ||
|
||
```json | ||
{ | ||
"web_name": "Cambridge Dictionary", | ||
"id": "cambridge_lookup_1", | ||
"task": "Click the search box and type 'hello'", | ||
"web": "https://dictionary.cambridge.org/", | ||
"element_type": "input", | ||
"interaction": "type", | ||
"target_element": { | ||
"type": "id", | ||
"value": "searchword" | ||
}, | ||
"input_text": "hello", | ||
"target_html": "<input type=\"text\" id=\"searchword\" class=\"search-input\" ...>", | ||
"ground_truth": { | ||
"screenshot": "cambridge_lookup_1_gt.png", | ||
"description": "The word 'hello' has been entered in the search box", | ||
"visual_changes": [ | ||
"Text 'hello' appears in search box", | ||
"Text cursor visible at end of input", | ||
"Search suggestions may appear" | ||
], | ||
"success_criteria": [ | ||
"Input text matches 'hello' exactly", | ||
"Text is visible in search box", | ||
"Search box maintains focus" | ||
] | ||
} | ||
} | ||
``` | ||
|
||
## Field Descriptions | ||
|
||
### Basic Information | ||
- `web_name`: Name of the website | ||
- `id`: Unique identifier for the task | ||
- `task`: Human-readable task description | ||
- `web`: Website URL | ||
|
||
### Element and Interaction | ||
- `element_type`: Type of HTML element (input, button, link, etc.) | ||
- `interaction`: Type of interaction (click, type, hover) | ||
- `target_element`: How to find the element | ||
- `type`: Selector type (id, class, text) | ||
- `value`: Selector value | ||
- `input_text`: Text to type (only for type interactions) | ||
|
||
### Validation | ||
- `target_html`: The actual HTML element for structural validation | ||
- `ground_truth`: Validation data | ||
- `screenshot`: Reference screenshot filename | ||
- `description`: What should happen | ||
- `visual_changes`: List of expected visual changes | ||
- `success_criteria`: Specific conditions for success | ||
|
||
## Validation Process | ||
|
||
Tasks are validated using two methods: | ||
1. **Visual Validation** (60% of score) | ||
- Compares screenshots before/after interaction | ||
- Verifies visual changes match ground truth | ||
|
||
2. **HTML Validation** (40% of score) | ||
- Matches the HTML element the model interacted with | ||
- Checks structure, attributes, and content | ||
|
||
## Adding New Tasks | ||
|
||
1. Follow the schema in `task_schema.json` | ||
2. Ensure unique task IDs | ||
3. Provide clear success criteria | ||
4. Include reference screenshots | ||
5. Fill in the `target_html` field with the actual HTML element |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-07/schema#", | ||
"title": "DOM Task Schema", | ||
"description": "Schema for DOM interaction tasks in the benchmark", | ||
"type": "object", | ||
"required": [ | ||
"web_name", | ||
"id", | ||
"task", | ||
"web", | ||
"element_type", | ||
"interaction", | ||
"target_element", | ||
"target_html", | ||
"ground_truth" | ||
], | ||
"properties": { | ||
"web_name": { | ||
"type": "string", | ||
"description": "Name of the website" | ||
}, | ||
"id": { | ||
"type": "string", | ||
"description": "Unique identifier for the task", | ||
"pattern": "^[a-z0-9_]+$" | ||
}, | ||
"task": { | ||
"type": "string", | ||
"description": "Human-readable task description" | ||
}, | ||
"web": { | ||
"type": "string", | ||
"description": "Website URL", | ||
"format": "uri" | ||
}, | ||
"element_type": { | ||
"type": "string", | ||
"description": "Type of HTML element to interact with", | ||
"enum": ["input", "button", "link", "div", "span"] | ||
}, | ||
"interaction": { | ||
"type": "string", | ||
"description": "Type of interaction to perform", | ||
"enum": ["click", "type", "hover"] | ||
}, | ||
"target_element": { | ||
"type": "object", | ||
"description": "How to find the element", | ||
"required": ["type", "value"], | ||
"properties": { | ||
"type": { | ||
"type": "string", | ||
"description": "Type of selector to use", | ||
"enum": ["id", "class", "text"] | ||
}, | ||
"value": { | ||
"type": "string", | ||
"description": "Value of the selector" | ||
} | ||
} | ||
}, | ||
"input_text": { | ||
"type": "string", | ||
"description": "Text to type (only required for type interactions)" | ||
}, | ||
"target_html": { | ||
"type": "string", | ||
"description": "The actual HTML element to match against for validation" | ||
}, | ||
"ground_truth": { | ||
"type": "object", | ||
"description": "Validation data", | ||
"required": [ | ||
"screenshot", | ||
"description", | ||
"visual_changes", | ||
"success_criteria" | ||
], | ||
"properties": { | ||
"screenshot": { | ||
"type": "string", | ||
"description": "Filename of the ground truth screenshot", | ||
"pattern": "^[a-z0-9_]+\\.png$" | ||
}, | ||
"description": { | ||
"type": "string", | ||
"description": "Description of the expected outcome" | ||
}, | ||
"visual_changes": { | ||
"type": "array", | ||
"description": "List of expected visual changes", | ||
"items": { | ||
"type": "string" | ||
}, | ||
"minItems": 1 | ||
}, | ||
"success_criteria": { | ||
"type": "array", | ||
"description": "List of specific conditions that must be met for success", | ||
"items": { | ||
"type": "string" | ||
}, | ||
"minItems": 1 | ||
} | ||
} | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.