Skip to content

Tutorial

Niema Moshiri edited this page Jun 22, 2023 · 20 revisions

Step 0: Format the Exam Responses

Before using MESS, the student responses to your exam need to be formatted in the tab-delimited (TSV) format supported by MESS, which is the following:

Student Correct Q1 Q2 Q3 ...
[email protected] Q1,Q3 A blue 42 ...
... ... ... ... ... ...
  • The top row is a header row, and each subsequent row corresponds to a single student
  • The leftmost column (labeled "Student" in the example above) should contain a unique student identifier (e.g. email address, student ID, etc.)
  • The second leftmost column (labeled "Correct" in the example above) should contain a comma-separated list of questions the student answered correctly
  • Each remaining column (labeled "Q1", "Q2", etc. in the example above) should correspond to a single exam question: cell (i,j) of the spreadsheet (excluding the two leftmost columns and the top row) should be the response student i submitted for question j

We will provide helper scripts to convert from the export formats supported by popular platforms. These scripts can be found in the helper_scripts folder of the MESS GitHub repository. If you are using a platform that exports exam responses in a format that is not easily converted to the format supported by MESS, please feel free to submit a GitHub Issue that includes the following information:

  1. Name of + details about the platform you are using
  2. Example anonymized file(s) in the format exported by the platform you are using
  3. Description of the file format

We will try to add a conversion script accordingly. If you have written your own conversion script and would like to contribute, please feel free to share it with us via Pull Request or GitHub Issue (whichever is easiest for you).

Step 1: Run MESS with Default Settings

To get an initial idea of how the similarity score distribution looks, first run MESS on your dataset using the default settings, specifying the input TSV file (-i/--input), the output TSV file (-ot/--output_tsv), and the output PDF file (-op/--output_pdf). For example, using the provided example_input.tsv (assuming MESS.py is in your PATH):

MESS.py -i example_input.tsv -ot output.tsv -op output.pdf

The output PDF should look something like the following:

  • The dashed curve is the Kernel Density Estimate (KDE) of the similarity score distribution from your exam responses
  • The solid line is the Probability Density Function (PDF) of the best-fit Exponential distribution

Step 2: Adjust Best-Fit Exponential

In its default settings, MESS fits the Exponential distribution PDF on the entirety of the similarity score KDE. However, as can be seen in the example above, outliers skew the best-fit Exponential distribution PDF, whereas we want to fit this theoretical distribution on a close-to-linear segment of the empirical similarity score KDE. Try to find a window in which the similarity score KDE is roughly linear, such as between 0.025 and 0.12 (shown in red) in the example dataset:

Rerun MESS, but this time, specify the minimum and maximum score for the regression (-rm/--reg_min and -rM/--reg_max, respectively). For example, using the range [0.025, 0.12]:

MESS.py -i example_input.tsv -ot output_2.tsv -op output_2.pdf -rm 0.025 -rM 0.12

As can be seen, the best-fit Exponential distribution's PDF now aligns much more nicely with the near-linear segment of the empirical similarity score distribution's KDE:

You may need to keep iterating on this step until you are satisfied with the fit. Note that the regression only impacts the p-values (and thus corrected q-values): it does not impact the similarity scores themselves whatsoever.

Step 3: Optional Adjustments

Once you have found an appropriate range for the Exponential distribution regression, there are some optional adjustments you may want to perform.

Number of Significance Tests

While a MESS score and p-value is calculated for every pair of students, performing multiple significance tests will require Multiple Hypothesis Test Correction, and performing too many significance tests will reduce statistical power. The MESS scores at the left end of the distribution are not interesting, so including them in the significance tests reduces statistical power without any benefit. As such, MESS will only perform significance tests on the top k MESS scores.

By default, MESS will automatically detect a reasonable value of k. Specifically, it will search for the smallest MESS score x > median such that the MESS score histogram has a count of 0 at x, and it will perform statistical tests on all k MESS scores that are greater than x. Alternatively, you can provide a value of k via -nt/--num_tests, which you will likely want to do if you find that the histogram has gaps earlier than you want to test (which is expected due to the long tail of the Exponential distribution).

Method of Multiple Hypothesis Test Correction

By default, to compute corrected q-values from p-values, MESS will use the Benjamini-Hochberg Procedure. You can adjust this selection via -c/--correction:

Note that the selection of multiple hypothesis test correction technique will only impact the q-values in the output TSV file: the figure in the output PDF file will not be impacted.

Figure Aesthetics

The aesthetics of the figure in the output PDF can be configured via the command line arguments.

Empirical Similarity Score Distribution Kernel Density Estimate (KDE)

  • -kc/--kde_color: KDE Color
  • -kl/--kde_linestyle: KDE Linestyle
  • -kw/--kde_linewidth: KDE Line Width

Best-Fit Exponential Distribution Probability Density Function (PDF)

  • -rc/--reg_color: Regression Color
  • -rl/--reg_linestyle: Regression Linestyle
  • -rw/--reg_linewidth: Regression Line Width

Figure Labels

  • -t/--title: Figure Title
  • -xl/--xlabel: X-Axis Label
  • -yl/--ylabel: Y-Axis Label

Figure Axes

  • -xm/--xmin: Minimum X
  • -xM/--xmax: Maximum X
  • -ym/--ymin: Minimum Y
  • -yM/--ymax: Maximum Y
  • --no_ylog: Don't Plot Y-Axis in Log-Scale