diff --git a/index.html b/index.html index 9dc2cb7..f10d505 100644 --- a/index.html +++ b/index.html @@ -58,12 +58,12 @@

How's GPT-4 with Vision Doing?

Response Time

-

Today, the average response time to receive results from our tests was 5.71 seconds per request.

+

Today, the average response time to receive results from our tests was 5.82 seconds per request.

This number only accounts for requests made by this application.

-

5.71 s

+

5.82 s

@@ -73,6 +73,7 @@

Response Time

Today's Failing Tests

+
@@ -86,26 +87,28 @@

Counting

-
- Last 7-Day Performance -
-
- -
- -
- -
- -
- -
- -
+ Last 7-Day Performance +

Of the last 7 tests, conducted daily, this test has passed 0% of the time.

+
+
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Today's request cost $0.008

-

Of the last 7 tests, conducted daily, this test has passed 0% of the time.

-

Today's request cost $0.008

@@ -119,64 +122,12 @@

Prompt

Image

Image of the input into GPT-4

Result

-
9
-

Test submitted by Roboflow

-
-
-
- -
-
-
-

Math OCR

-

Can GPT-4V recognize math equations?

-
-
-
-

Fail

-
-
-
-
-
- Last 7-Day Performance -
-
- -
- -
- -
- -
- -
- -
-
-
-

Of the last 7 tests, conducted daily, this test has passed 86.0% of the time.

-

Today's request cost $0.015

-
-
- -
-

Method

-
We provide a image of a math equation and ask it to provide a LaTeX string of the equation. This is scored using the Levenshtein ratio between the output and the correct answer, which is based on the number of edits necessary to achieve the correct answer.
-

Prompt

-
-                                            Produce a JSON array with a LaTeX string of each equation in the image.
-                                        
-

Image

- Image of the input into GPT-4 -

Result

-
3x^{2}-6x+2
+
8

Test submitted by Roboflow

- +
@@ -190,26 +141,28 @@

Object Detection

-
- Last 7-Day Performance -
-
- -
- -
- -
- -
- -
- -
+ Last 7-Day Performance +

Of the last 7 tests, conducted daily, this test has passed 0% of the time.

+
+
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Today's request cost $0.009

-

Of the last 7 tests, conducted daily, this test has passed 0% of the time.

-

Today's request cost $0.01

@@ -223,12 +176,12 @@

Prompt

Image

Image of the input into GPT-4

Result

-
{'x': 0.365, 'y': 0.34, 'width': 0.23, 'height': 0.28}
+
{'x': 0.375, 'y': 0.272, 'width': 0.25, 'height': 0.4}

Test submitted by Roboflow

- +
@@ -242,26 +195,28 @@

Graph Understanding

-
- Last 7-Day Performance -
-
- -
- -
- -
- -
- -
- -
+ Last 7-Day Performance +

Of the last 7 tests, conducted daily, this test has passed 0% of the time.

+
+
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Today's request cost $0.011

-

Of the last 7 tests, conducted daily, this test has passed 0% of the time.

-

Today's request cost $0.011

@@ -275,21 +230,31 @@

Prompt

Image

Image of the input into GPT-4

Result

-
-```json
+                                        
```json
 {
-  "A": {"quantity": 5, "price": 15},
-  "B": {"quantity": 20, "price": 25},
-  "C": {"quantity": 30, "price": 35},
-  "D": {"quantity": 40, "price": 42}
+  "A": {
+    "quantity": 15,
+    "price": 15
+  },
+  "B": {
+    "quantity": 27,
+    "price": 23
+  },
+  "C": {
+    "quantity": 35,
+    "price": 32
+  },
+  "D": {
+    "quantity": 42,
+    "price": 45
+  }
 }
-```
+```

Test submitted by Roboflow

- +
@@ -303,22 +268,24 @@

Color Recognition

-
- Last 5-Day Performance -
-
- -
- -
- -
- -
+ Last 5-Day Performance +

Of the last 5 tests, conducted daily, this test has passed 0% of the time.

+
+
+ +
+ +
+ +
+ +
+ +
+
+

Today's request cost $0.009

-

Of the last 5 tests, conducted daily, this test has passed 0% of the time.

-

Today's request cost $0.009

@@ -332,12 +299,18 @@

Prompt

Image

Image of the input into GPT-4

Result

-
Failed to produce a valid JSON output: {"R": 128, "G": 0, "B": 128}
+
```json
+{
+  "R": 128,
+  "G": 0,
+  "B": 128
+}
+```

Test submitted by Roboflow

- +
@@ -351,23 +324,24 @@

Annotation Quality Assurance

-
- Last 5-Day Performance -
-
- -
- -
- -
- -
+ Last 5-Day Performance +

Of the last 5 tests, conducted daily, this test has passed 0% of the time.

+
+
+ +
+ +
+ +
+ +
+ +
+

Today's request cost $0.015

-

Of the last 5 tests, conducted daily, this test has passed 0% of the time.

-

Today's request cost $0.015

@@ -381,18 +355,16 @@

Prompt

Image

Image of the input into GPT-4

Result

-
-```json
+                                        
```json
 {
   "missing": 2
 }
-```
+```

Test submitted by Roboflow

- +
@@ -406,21 +378,22 @@

Measurement Test

-
- Last 4-Day Performance -
-
- -
- -
- -
+ Last 4-Day Performance +

Of the last 4 tests, conducted daily, this test has passed 0% of the time.

+
+
+ +
+ +
+ +
+ +
+
-

Today's request cost $0.008

+

Today's request cost $0.009

-

Of the last 4 tests, conducted daily, this test has passed 0% of the time.

-

Today's request cost $0.009

@@ -434,11 +407,15 @@

Prompt

Image

Image of the input into GPT-4

Result

-
Failed to produce a valid JSON output: I'm sorry, but I can't assist with that request.
+
Failed to produce a valid JSON output: {
+  "length": 2.6,
+  "width": 2.6
+}

Test submitted by Roboflow

+
@@ -447,6 +424,7 @@

Hide
+
@@ -464,23 +442,24 @@

Zero Shot Classification

Of the last 7 tests, conducted daily, this test has passed 100% of the time.

-
- -
- -
- -
- -
- -
- -
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Today's request cost $0.005

-

Of the last 7 tests, this test has passed 100% of the time.

-

Today's request cost $0.005

@@ -499,7 +478,7 @@

Result

- +
@@ -517,23 +496,24 @@

Document OCR

Of the last 7 tests, conducted daily, this test has passed 100% of the time.

-
- -
- -
- -
- -
- -
- -
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Today's request cost $0.009

-

Of the last 7 tests, this test has passed 100% of the time.

-

Today's request cost $0.009

@@ -552,7 +532,7 @@

Result

- +
@@ -570,23 +550,24 @@

Handwriting OCR

Of the last 7 tests, conducted daily, this test has passed 100% of the time.

-
- -
- -
- -
- -
- -
- -
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Today's request cost $0.009

-

Of the last 7 tests, this test has passed 100% of the time.

-

Today's request cost $0.009

@@ -605,7 +586,7 @@

Result

- +
@@ -623,23 +604,24 @@

Structured Data OCR

Of the last 7 tests, conducted daily, this test has passed 100% of the time.

-
- -
- -
- -
- -
- -
- -
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Today's request cost $0.007

-

Of the last 7 tests, this test has passed 100% of the time.

-

Today's request cost $0.007

@@ -653,12 +635,12 @@

Prompt

Image

Image of the input into GPT-4

Result

-
[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]
+
[{'name': 'MARY THOMAS', 'time_per_day': 1, 'medication': 'ATENOLOL', 'dosage': 100, 'rx_number': '1234567-12345'}]

Test submitted by Roboflow

- +
@@ -676,21 +658,23 @@

Math OCR

Of the last 7 tests, conducted daily, this test has passed 100% of the time.

+
- +
- +
- +
- +
- +
- +
+
-

Today's request cost $0.017

+

Today's request cost $0.015

@@ -710,6 +694,7 @@

Result

+

@@ -757,4 +742,4 @@

How CLIP and GPT-4V Compare for Classification

- + \ No newline at end of file diff --git a/template.html b/template.html index 8e9dea7..b4dac51 100644 --- a/template.html +++ b/template.html @@ -6,7 +6,7 @@ GPT-4V Checkup - + @@ -87,16 +87,16 @@

{{ test_data.name }}

-
- Last {{test_data["seven_day"]["score"]|length}}-Day Performance -
+ Last {{test_data["seven_day"]["score"]|length}}-Day Performance +

Of the last {{test_data["seven_day"]["score"]|length}} tests, conducted daily, this test has passed {{ test_data["seven_day"]["success_percent"] }}% of the time.

+
+
{% for item in test_data["seven_day"]["success"] %} -
+
{% endfor %}
+

Today's request cost ${{current_results[test_id].price|round(3)}}

-

Of the last {{test_data["seven_day"]["score"]|length}} tests, conducted daily, this test has passed {{ test_data["seven_day"]["success_percent"] }}% of the time.

-

Today's request cost ${{current_results[test_id].price|round(3)}}

@@ -138,16 +138,16 @@

{{ test_data.name }}

-
- Last {{test_data["seven_day"]["score"]|length}}-Day Performance -
+ Last {{test_data["seven_day"]["score"]|length}}-Day Performance +

Of the last {{test_data["seven_day"]["score"]|length}} tests, conducted daily, this test has passed {{ test_data["seven_day"]["success_percent"] }}% of the time.

+
+
{% for item in test_data["seven_day"]["success"] %} -
+
{% endfor %}
+

Today's request cost ${{current_results[test_id].price|round(3)}}

-

Of the last {{test_data["seven_day"]["score"]|length}} tests, conducted daily, this test has passed {{ test_data["seven_day"]["success_percent"] }}% of the time.

-

Today's request cost ${{current_results[test_id].price|round(3)}}