Tellina is a natural language -> command translation tool. Tellina accepts a natural language description of file system operations and displays a ranked list of bash one-liner suggestions made by the model. The user can scroll down the web page to explore more suggestions. http://tellina.rocks/
This repository contains the infrastructure for formally conducting user experiments for Tellina.
- static: contains the static content hosted on a public server. This is where the homepage and consent page of the experiment are located.
- backend: contains the code used for the server side of the experiment, this includes both the post handler.
- client_side: contains the files to be distributed to the users of the experiment.
- data_set: contains the scripts used to produce the data set for the user experiment (referenced as taskset).
- infrastructure.md: describes the technical infrastructure and implementation of the user experiment.
- Clone this repository locally
- Update the Makefile in the local repo with the intended values for
HOST
andHOST_DIR
. - Update
SERVER_HOST
inclient_side/.infrastructure/setup.sh
with the new host. - Create directory
HOST_DIR/WEBSITE_NAME
onHOST
. a. (Optional) Create directoryHOST_DIR/staging/WEBSITE_NAME
if you would like to have a testing website. - Run
make all publish
to build and upload the study website on the new host. - Update the permission of
$HOST/$HOST_DIR/backend/log.csv
withchmod 666 log.csv
In a past experiment, people were given descriptions of file system operations, and asked to write bash commands to perform the operations. The experimental group had access to Tellina, web search, and man pages; the control group had access only to web search and man pages. Measurements were done on whether subjects successfully complete the tasks, and the amount of time that it takes to complete the tasks. A post-task questionnaire obtained qualitative feedback.
We need to redo the experiment, for a few reasons.
- Tellina has changed since the user study was performed. Tellina has better accuracy and handles more commands. It would not be compelling to report an experiment on an implementation that has since been superseded.
- The user study was relatively small (around 30 subjects), so the experimental results were not always statistically significant. With a larger pool of subjects, the results will be more compelling.