diff --git a/index.html b/index.html index 205ee21..45f485e 100644 --- a/index.html +++ b/index.html @@ -740,11 +740,11 @@

Results

- algebraic reasoning + algebraic reasoning
Whole task success rate across task difficulty levels. Easy: 2-4, Medium: 5-7, and Hard: 8-12.
- your second image description + your second image description
Whole task success rate (%) under both offline and online evaluation. Offline0 and Offline1 refer to no tolerance for error at any step and allowing for error at one step, respectively.