Skip to content

Latest commit

 

History

History
652 lines (341 loc) · 299 KB

Gemini_Prototype_Report.md

File metadata and controls

652 lines (341 loc) · 299 KB

Gemini Pro Testing

Data Privacy: https://arc.net/l/quote/mmlqlsos

API: https://repository.library.northeastern.edu/api/v1/files/neu:4f199k58v

Prompts:

Prompt 1 (Basic Prompt): 16 tokens

Please Generate a title for the given image that describes what is going on in the picture

Prompt 2 (Title Guidelines Included): 396 tokens

Adhering to these general rules for a Title....

Title should be ~20 words or less. For images of a documentary nature, give the objective factual content clearly and concisely. Details of the content, nature, scope, context, purpose, etc., of the item may be fully described in the abstract. Include the following kinds of information in the devised title, as appropriate and feasible:

Succinct descriptive phrasing of the form or genre: portrait, map, candid group shot (for a limited list of genres, consult AAT, or go directly to information about vocabulary downloads: https://www.getty.edu/research/tools/vocabularies/obtain/download.html )

an identification of the main subject(s) depicted (i.e., persons, events, activities, and objects)

geographical and/or type of location depicted, if known and significant

Dates will be included when possible, but will be added as their own sub-element within titleInfo, but not as a part of the main title, so no need to include a chronological descriptor

Capitalization and formatting in titles

First letter of first word of the title is capitalized

Proper nouns and locations (person’s name, city, name of a building, etc.) are capitalized

References to race and/or ethnicity such as Black, White, Asian American, Native American are capitalized

Acronyms are capitalized, but do not include periods or spaces between letters

Everything else should be lowercase in the titles

Do not include colons or semi-colons in descriptive titles

Avoid using contractions whenever possible (can’t, it’s, they’d, you’ll)

Avoid extra spacing between words as well as spaces at the beginning and end of titles

Please generate a title for the given image that describes what is going on in the picture

Prompt 2.5 (Drew’s remodeled prompt): 588 Tokens

Please generate a title for the given image based on these rules:

General rules

  • Title should be ~20 words or less. For images of a documentary nature, give the objective factual content clearly and concisely. Details of the content, nature, scope, context, purpose, etc., of the item may be fully described in the abstract. Include the following kinds of information in the devised title, as appropriate and feasible:
    • Succinct descriptive phrasing of the form or genre of the image from the following list:
      • portrait
      • map
      • candid photograph
      • landscape
      • aerial photograph
      • documentary photograph
      • fashion photograph
      • panoramic photograph
      • sports action photograph
      • street photograph
      • illustration
      • architectural photograph
    • an identification of the main subject(s) depicted (i.e., persons, events, activities, and objects)
    • geographical and/or type of location depicted, if known and significant
    • Do NOT include: dates or a chronological descriptor

Formatting rules:

  • First letter of first word of the title is capitalized
  • Proper nouns and locations (person’s name, city, name of a building, etc.) are capitalized
  • References to race and/or ethnicity such as Black, White, Asian American, Native American are capitalized
  • Acronyms are capitalized, but do not include periods or spaces between letters
  • Everything else should be lowercase in the titles
  • Do not include colons or semi-colons in descriptive titles
  • Avoid using contractions whenever possible (can’t, it’s, they’d, you’ll)
  • Avoid extra spacing between words as well as spaces at the beginning and end of titles

Prompt 3 (Generate both title + abstract): 629 tokens

Same as the last one+:

Also Adhering to these general rules for an abstract:

A brief objective summary (2-5 sentences) of the content, meaning, or subject matter or composition of the image, if not apparent from the title (there is additional information to provide).

Include, as appropriate: date or date span depicted; description of the composition of the image; the most significant topic(s), event(s), person(s), place(s), etc.

Transcriptions of front and back of photographs can also be included or information written can inform abstracts without full textual transcription.

Can include quoted transcription followed by an indication of source

If at the end of an abstract: “[Transcription]”-- written on back of photograph.

If within or at the beginning of an abstract: “[Transcription]” (hand and type written on back of photograph).

Transcription rules

In general, use modern spacing conventions when transcribing from the source. Make no attempt to preserve additional or irregular spacing between words. If spacing between words in the source is ambiguous, or lacking, include spaces in the transcription to separate the words as needed.

Indicate missing letters in words with a “-“

Indicate single or multi-word omissions in the transcription by using “…”. When using the mark of omission, generally use it with a space on either side

Transcribe initials, initialisms, monograms, and acronyms without internal spaces (acronyms without periods), regardless of how they are presented in the source of information.

Transcribe an unintentional inaccuracy, misspelling, or misprint as it appears on the material. Follow such an inaccuracy either by “[sic]” or by the abbreviation “i.e.” and the correction within square brackets.

Please generate an abstract based on the given general rules/transcription rules

Prompt 4 (Transcribe the Back of the Photo): 33 tokens

Transcribe what's on this photo

If you can, provide this information

- Name of Photographer

- Date(s) (Seperately if there are multiple)

- Any other text on the image.

Prompt 4.5 (Transcribe the Back of the Photo + Rules): 325

Transcribe what's on this photo

Transcription rules

In general, use modern spacing conventions when transcribing from the source. Make no attempt to preserve additional or irregular spacing between words. If spacing between words in the source is ambiguous, or lacking, include spaces in the transcription to separate the words as needed.

Indicate missing letters in words with a “-“

Indicate single or multi-word omissions in the transcription by using “…”. When using the mark of omission, generally use it with a space on either side

Transcribe initials, initialisms, monograms, and acronyms without internal spaces (acronyms without periods), regardless of how they are presented in the source of information.

Transcribe an unintentional inaccuracy, misspelling, or misprint as it appears on the material. Follow such an inaccuracy either by “[sic]” or by the abbreviation “i.e.” and the correction within square brackets.

If you can, provide this information

- Name of Photographer

- Date(s) (Seperately if there are multiple)

- Any other text on the image.

Format the output like Json text:

Name:**Name of Photographer**

Date(s):**[Date1, Date2,...] (formatted as a list)**

Raw:**Full raw transcription given the transcription rules above**

Prompt 5 (Transcription Context + Prompt 2.5)

Please generate a title for the given image based on these rules:

General rules

Title should be ~20 words or less. For images of a documentary nature, give the objective factual content clearly and concisely. Details of the content, nature, scope, context, purpose, etc., of the item may be fully described in the abstract. Include the following kinds of information in the devised title, as appropriate and feasible:

Succinct descriptive phrasing of the form or genre of the image from the following list:

portrait

map

candid photograph

landscape

aerial photograph

documentary photograph

fashion photograph

panoramic photograph

sports action photograph

street photograph

illustration

architectural photograph

an identification of the main subject(s) depicted (i.e., persons, events, activities, and objects)

geographical and/or type of location depicted, if known and significant

Do NOT include: dates or a chronological descriptor

Formatting rules:

First letter of first word of the title is capitalized

Proper nouns and locations (person’s name, city, name of a building, etc.) are capitalized

References to race and/or ethnicity such as Black, White, Asian American, Native American are capitalized

Acronyms are capitalized, but do not include periods or spaces between letters

Everything else should be lowercase in the titles

Do not include colons or semi-colons in descriptive titles

Avoid using contractions whenever possible (can’t, it’s, they’d, you’ll)

Avoid extra spacing between words as well as spaces at the beginning and end of titles

Also provided will be context from the back of the given photo, which could be used to formulate a more descriptive/accurate title:

Prompt 6 (Emphasized Title Requirements)

You are an expert in digital library collections. Follow these guidelines strictly

Please generate a title for the given image based on these rules:

Title should be ~MUST BE 20 WORDS OR LESS.

For images of a documentary nature, give the objective factual content clearly and concisely.

MAKE SURE TO INCLUDE Succinct descriptive phrasing of the form or genre of the image from the following list:

- portrait

- map

- candid photograph

- landscape

- aerial photograph

- documentary photograph

- fashion photograph

- panoramic photograph

- sports action photograph

- street photograph

- illustration

- architectural photograph

INCLUDE an identification of the main subject(s) depicted (i.e., persons, events, activities, and objects)

INCLUDE the geographical and/or type of location depicted, if known and significant

ONLY INCLUDE race/ethnicity if it is absolutely necessary to the context of the image

Do NOT include: dates or a chronological descriptor

Formatting rules:

FIRST LETTER OF THE FIRST WORD IS CAPS

PROPER NOUNDS AND LOCATIONS ARE CAPITALIZED

REFERENCES TO RACE AND/OR ETHNICITY ARE CAPITALIZED

ACRONYMS ARE CAPITALIZED (No spaces/periods between letters)

EVERYTHING ELSE - should be lowercase in the title

NO semi-colons/colons

Avoid contractions

Avoid extra spacing between words as well as spaces at the beginning and end of titles

Output Formatting:

Title: generated title

Also provided could potentially (if after the : it is not blank) be context from the back of the given photo, WHICH SHOULD BE USED to formulate a more descriptive/accurate title:

Prompt 7 (Emphasized Abstract Requirements)

You are an expert in digital library collections. Follow these guidelines strictly

Adhering to these general rules for an abstract:

A brief objective summary (2-5 sentences) of the content, meaning, or subject matter or composition of the image, if not apparent from the title (there is additional information to provide).

Details of the content, nature, scope, context, purpose, etc., of the item may be fully described in the abstract.

Include, if context is provided: date or date span depicted;

Include a description of the composition of the image; the most significant topic(s), event(s), person(s), place(s), etc.

Transcriptions of front and back of photographs can also be included or information written can inform abstracts without full textual transcription.

Can include quoted transcription followed by an indication of source

If at the end of an abstract: “[Transcription]”-- written on back of photograph.

If within or at the beginning of an abstract: “[Transcription]” (hand and type written on back of photograph).

If no transcription provided:

Output Formatting:

Abstract: Descriptive abstract

Please generate an abstract based on the given general rules

Also provided could potentially (if after the : it is not blank) be context from the back of the given photo, which should be used to formulate a more descriptive/accurate abstracts:

Test Case 1

Image had a backside? : YES

Prompt Response 1 (Basic Prompt):

Output:

Here are some title options for the image:

**Descriptive:**

* Military Formation on the Pier

* Soldiers in Formation by the Water

* Coastal Drill: Troops Assembled

**Evocative:**

* Standing Ready: A Moment of Order

* Lines in the Sand: A Coastal Muster

* Facing the Tide: Military Readiness

**Specific (If you have more context about the image):**

* [Unit Name] Muster at [Location]

* Pre-Deployment Formation, [Year]

* Coastal Training Exercise, [Date]

Military Formation on the Docks (When prompted to only give one response)

Tokens: 43 tokens

Total Tokens: 59 tokens

Conclusions:

  • Evocative Titles are unnecessary
  • Descriptive Titles were good
  • specific titles could be useful given context from the back

Prompt Response 2 (Title Guidelines included);

Output:

Group portrait of soldiers in formation on pier

Tokens: 8

Total Tokens: 406 tokens

Conclusions:

  • Title output makes slight error, asserting it is a group portrait when it does not really seem to be that. Given the instructions for the title, something like candid group shot is more apt.

Prompt Response 2.5 (Drew’s modified title guidelines):

Output:

Documentary photograph of sailors in formation on pier

Tokens: 8

Total Tokens: 596

Conclusions:

  • This time it was better since it didn’t describe it as a group portrait, but rather a Documentary photograph. Other than that the title is about the same.

Prompt Response 3 (Generate both a title and abstract):

Output:

**Group portrait of soldiers in formation on pier

Black and white photographic group portrait of over fifty men wearing matching uniforms, standing at attention on a pier. The men are arranged in three staggered rows with two men standing to the right, separate from the larger group. A body of water is visible in the background.

Tokens: 48

Total Tokens: 677 tokens

Conclusions:

  • Title again suffers from previous issues like in Response 2
  • Abstract is overall good, but again is slightly off in describing the image as a group portrait.

Prompt Response 4 (Transcribe the Back of a photo):

Output:

Mass Maritime Acad

JUN 8 1964

Dept. Educa.

Mass. Scout

FUTURE OFFICERS IN REVUE -- Massachusetts Maritime Academy midshipmen stand at attention on dock near training ship.

MAY 19 1964

PHOTO BY

PAUL J. CONNELL

**Photographer:** Paul J. Connell

**Dates:** June 8, 1964 and May 19, 1964

Tokens: 59

Total Tokens: 92 tokens

Conclusions:

  • Transcribed everything well and accurately,

Prompt Response 4.5 (Transcribe the Back of a photo with formatting guidelines):

Output:

Name: PAUL J. CONNELL

Date: [MAY 19 1964, JUN 8 1964]

Raw:Mass Maritime Acad

Dept-Educate

Mass-Scout

FUTURE OFFICERS IN REVUE -- Massachusetts Maritime Academy midshipmen stand at attention on dock near training ship.

PHOTO BY

PAUL J. CONNELL

MAY 19 1964

JUN 8 1964

SCANNED

Tokens: 98

Total Tokens: 623

Conclusions:

  • Transcribed everything perfectly again and formatted it in a way that I could programmatically extract the name of photographer, date, and raw text.
  • Found that sometimes it would only show one date (earlier both times so that’s good at the least), so going to experiment with turning down the temperature.

Prompt Response 5 (Transcription Context + Title):

Output: Massachusetts Maritime Academy midshipmen at attention

Tokens: 8

Total Tokens: 691

Conclusions:

Prompt Response 6 (Transcription Context + Title + Abstract):

Output:

Title: Massachusetts Maritime Academy midshipmen stand at attention on dock

Abstract: Black and white documentary photograph of Massachusetts Maritime Academy midshipmen in uniform, standing at attention in formation on a dock, near a body of water. May 19, 1964. Photo by Paul J. Connell.

Tokens: 52

Total Tokens: 691

Conclusions:

  • Overall really good title and abstract, just needs to be formatted a bit better (aka remove the date and photo taken by at the end).

Prompt Response 7 (Title Emphasized + Transcription):

Output:

Title: Documentary photograph massachusetts maritime academy midshipmen on dock

Abstract:

Tokens: 52

Total Tokens: 691

Conclusions:

  • Overall really good title and abstract, just needs to be formatted a bit better (aka remove the date and photo taken by at the end).

Final Prompts

Transcription Prompt (530 tokens):

Transcribe what's on this photo

Transcription rules

In general, use modern spacing conventions when transcribing from the source. Make no attempt to preserve additional or irregular spacing between words. If spacing between words in the source is ambiguous, or lacking, include spaces in the transcription to separate the words as needed.

Indicate missing letters in words with a “-“

Indicate single or multi-word omissions in the transcription by using “…”. When using the mark of omission, generally use it with a space on either side

Transcribe initials, initialisms, monograms, and acronyms without internal spaces (acronyms without periods), regardless of how they are presented in the source of information.

Transcribe an unintentional inaccuracy, misspelling, or misprint as it appears on the material. Follow such an inaccuracy either by “[sic]” or by the abbreviation “i.e.” and the correction within square brackets.

If you can, provide this information

- Name of Photographer

- Date(s) (Seperately if there are multiple)

- Any other text on the image.

Format the output like this text:

Name:Name of Photographer

Date:[Date1, Date2,...] (formatted as a list from earliest to latest date)

Raw:(Full raw transcription given the transcription rules above)

Title Prompt (537 tokens):

You are an expert in digital library collections. Follow these guidelines strictly

Please generate a title for the given image based on these rules:

Title should be ~MUST BE 20 WORDS OR LESS.

For images of a documentary nature, give the objective factual content clearly and concisely.

MAKE SURE TO INCLUDE Succinct descriptive phrasing of the form or genre of the image from the following list:

- portrait

- map

- candid photograph

- landscape

- aerial photograph

- documentary photograph

- fashion photograph

- panoramic photograph

- sports action photograph

- street photograph

- illustration

- architectural photograph

INCLUDE an identification of the main subject(s) depicted (i.e., persons, events, activities, and objects)

ONLY INCLUDE race/ethnicity if it is absolutely necessary to the context of the image

DON'T Identify individuals by name, UNLESS NAMES ARE SPECIFIED IN GIVEN CONTEXT

INCLUDE the geographical and/or type of location depicted, if known and significant

Do NOT include: dates or a chronological descriptor

Output Formatting:

Title: generated title

Also potentially provided IN THIS PROMPT (if after the : it is not blank), is context from the back of the given photo,

WHICH SHOULD BE USED to formulate a more descriptive/accurate title:

Abstract Prompt (439):

You are an expert in digital library collections. Follow these guidelines strictly

ADHERE to these general rules for an abstract:

A brief objective summary (2-5 SENTENCES) of the content, meaning, subject matter or composition of the image, if not apparent from the title (there is additional information to provide).

INCLUDE, if context is provided: date or date span depicted;

INCLUDE a description of the composition of the image; the most significant topic(s), event(s), person(s), place(s), etc.

Output Formatting:

Abstract: abstract

Please generate an abstract based on the given general rules

Also provided could potentially (if after the : it is not blank) be context from the back of the given photo, which SHOULD BE USED to formulate a more descriptive/accurate abstracts:

Final Token Analysis:

Transcription Request: 530 Input Tokens + 40 Output Tokens (can vary from as low as 2 to as large as 100): 570

Title Request: 537 input tokens + 9 output tokens (max is 20 output tokens, but never really would get to that): 546

Abstract Request: 439 input tokens + 69 output tokens: 508.

Total Input Tokens: 530 + 537 + 439 = 1506

Total Output Tokens: 40 + 20 + 69 = 129

Based on that

  • For every 663 images processed, we will incur a cost of $3.50 (INPUTS)
  • For every 7,751 images processed we will incur a cost of $10.50 (Outputs)
  • For every batch of 100k images (50k front 50k back) = $527 dollars for the inputs + $135 dollars for the outputs = $662

Rough Time Complexity:

30 seconds per script run

50,000*30 = 300,000 seconds = 41.5 Hours of processing time