Human TOT Query Elicitation Interface by MTurk

This repository contains the source code used to create an MTurk interface designed for collecting human-elicited Tip of the Tongue (TOT) queries. This version of the interface is intended for internal pilot testing, while the actual data collection was conducted at NIST by trained contractors. The human-elicited TOT queries will be used as part of the test queries in TREC 2025 TOT track.

Implementation details (frontend and backend) and design can be found in the Interface Design.

Python Environment

This codebase is implemented in Python 3.11.5. The necessary libraries can be found in requirements.txt.

(Offline) Visual Stimuli Collection Process

We collected images from TMDB (for the Movie domain) and Wikipedia (for the Landmark and Person domains). We then deselected some images based on the Image Deselection Criteria.

The final set of images was sorted by their popularity (measured by Wikipedia page views) and binned into 20 groups. The most popular entities (those in the first bin) were saved in fallback files (<DOMAIN_NAME>_fallbacks.csv), while entities in the other 19 bins were saved in candidate files (<DOMAIN_NAME>_candidates.csv).

(Online) Creating HITs and Retrieving/Logging Responses

AWS Configuration

Use aws configure to set up your credentials: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html
Set your permissions to ensure you have access to S3 and DynamoDB.

S3

In our implementation, there are multiple S3 buckets, each dedicated to a specific domain (Movie, Landmark, Person). Bucket names are formatted as tot-mturk-images-<DOMAIN_NAME>.
There are also buckets that store images used in instructions, named tot-mturk-instruction-images.

DynamoDB

There are two tables per stage (sandbox and live): HIT-Table and Assignment-Table.
HIT-Table has the following keys:
- HITID (PK)
- HITTypeID
- Domain
- AllAssignmentsDone
- AssignmentID
Assignment-Table has the following keys:
- AssignmentID (PK)
- HITID (SK)
- WorkerID
- Image
- Phase1
- Phase2
- Phase3
- Phase4
- TimeStamps

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
dynamo_db		dynamo_db
media		media
mturk		mturk
param_files		param_files
s3		s3
.gitignore		.gitignore
README.md		README.md
e2e_create.py		e2e_create.py
e2e_retrieve.py		e2e_retrieve.py
image_deselection_criteria.md		image_deselection_criteria.md
requirements.txt		requirements.txt
run_batch_e2e_create.sh		run_batch_e2e_create.sh
run_e2e_retrieve.sh		run_e2e_retrieve.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human TOT Query Elicitation Interface by MTurk

Python Environment

(Offline) Visual Stimuli Collection Process

(Online) Creating HITs and Retrieving/Logging Responses

AWS Configuration

S3

DynamoDB

About

Releases

Packages

Languages

kimdanny/human-tot-query-elicitation-mturk

Folders and files

Latest commit

History

Repository files navigation

Human TOT Query Elicitation Interface by MTurk

Python Environment

(Offline) Visual Stimuli Collection Process

(Online) Creating HITs and Retrieving/Logging Responses

AWS Configuration

S3

DynamoDB

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages