Releases: EducationalTestingService/rsmtool
RSMTool 8.1.1
This is a bugfix release with some minor improvements.
-
Continuous integration build for RSMTool migrated from Travis CI to Gitlab CI.
-
Minor bug fixed in
parse_json_with_comments
to handle URLs correctly. -
Minor updates to warnings and documentation.
RSMTool 8.1.0
This is a minor but backwards-incompatible release which includes changes necessary to make RSMTool compatible with SKLL v2.5.
What's new
- RSMTool is now compatible with SKLL 2.5!
💥 Breaking Changes 💥
-
Python 3.6 is no longer officially supported since the latest versions of
pandas
andnumpy
have dropped support for it. RSMTool officially supports Python 3.7, 3.8, and 3.9. -
RSMTool no longer supports
.xls
files. For users who use Excel to prepare their data, we continue supportingxlsx
files. -
Models trained with earlier versions of RSMTool can no longer be used to generate predictions. If you use
rsmpredict
orcompute_and_save_predictions
to generate predictions based on existing models, you will need to re-train the models.
RSMTool 8.0.2
This is a bugfix release with some minor improvements.
-
The version of
nbconvert
used by RSMTool is now pinned to<6.0
due to a change in v6.0 and above that broke RSMTool report generation. We will remove the pin in a future release when the upstream issue is fixed. -
RSMTool reports no longer displays a pie chart for the model coefficients if any of the coefficients are negative.
-
Minor updates for compatibility with external packages.
-
Minor updates to warnings and documentation.
RSMTool 8.0.1
This is a bugfix release with some minor improvements.
-
Update the code for compatibility with
pandas
1.1.0. -
prmse_true
no longer raises an error if there are no double-scored responses. Instead the function displays a warning and returns None. -
Command line tools
rsmtool
,rsmeval
,rsmpredict
,rsmcompare
andrsmsummarize
no longer raise an error if a user does not provide any command line arguments. Instead the tools display the help message. -
Minor updates to documentation.
-
Improvements to the testing and coverage measurement process.
RSMTool 8.0
This is a major new release. It includes a lot of new functionality and multiple changes to the API.
⚡️ RSMTool 8.0 is backwards incompatible with previous versions ⚡️
💡 New features 💡
Dependencies
-
RSMTool is now compatible with SKLL v2.1
-
All dependencies other than
skll
are now unpinned. -
RSMTool now supports Python versions 3.6, 3.7 and 3.8.
Interactive generation of configuration files
- Configuration files for
rsmtool
,rsmeval
,rsmpredict
,rsmcompare
andrsmsummarize
can now be generated automatically, either interactively or non-interactively. This exciting new functionality makes it easier to keep track of the many configuration options available in RSMTool and greatly simplifies the process of setting up the experiment. Watch the video demonstrating the new interactive generation or read the documentation.
Passing hyperparameters to SKLL models
- It is now possible to pass custom hyperparameter values to
skll
learners used through RSMTool. This is done using a new configuration fieldskll_fixed_parameters
. The parameters are also displayed in the report.
Generalized version of PRMSE
-
The formula for PRMSE has been updated to a more general version derived by Matthew S. Johnson that allows computation of PRMSE for any number of raters. For two raters, the formula returns the same result as the formula used in previous versions of the tool.
-
The API now provides a new function
prmse_true()
which accepts scikit-learn style parameters and returns the PRMSE value. -
It is now possible to supply error variance of human raters necessary to compute PRMSE. This can be useful when the experiments require computing this parameter on data other than the evaluation set. This can be done via the
rater_error_variance
field in the configuration file or by passing the variance as a parameter toprmse_true()
.
Changes to RSMTool reports
- The report now always displays the headers for the "Consistency" and "True score evaluations" sections. If no second score is available, the report will indicate this. If you do not want these section headers to appear in your report, use the
general_section
field to exclude these sections. TIP: If you use automatic configuration generation, you configuration file will contain the full list of available sections that you can edit to exclude unnecessary sections.
💥 Incompatible Changes 💥
File formats
-
rsmcompare
andrsmsummarize
no longer support experiments that were generated with earlier versions of RSMTool. You will need to re-run the experiments that you want to compare or summarize. -
rsmtool
no longer supports old-style configuration files (not used since v5.5 or earlier). -
rsmtool
no longer supports feature files in.json
format (not used since v5.5 or earlier). -
The Intermediate file containing true score evaluations
true_score_eval
no longer contains variance of human scores. This information can still be obtained fromconsistency
files.
API Changes
-
The
Configuration
andConfigurationParser
objects in the
configuration_parser
module have been fully refactored. A newConfiguration
object can now be instantiated using a dictionary with keys using the same name as the fields in the configuration file . Validation and normalization is now done as part of initialization. See this PR for more detail. -
Configuration
objects no longer have afilepath
attribute. Use theconfigdir
attribute to indicate what any relative paths in the dictionary are relative to. -
Functions in the erstwhile
rsmtool.utils
module have been moved to new locations. This includes several functions for computing evaluation metrics (agreement
,difference_of_standardized_means
,partial_correlations
,quadratic_weighted_kappa
, andstandardized_mean_difference
). See the API documentation for the new location of these functions. -
The API for computing PRMSE has changed. See the API documentation for new functions.
🛠 Bugfixes & Improvements 🛠
-
v7.1.0 did not allow
run_*
functions to acceptpathlib.Path
objects for paths to configuration files. This is now allowed. -
Error messages and warnings produced by RSMTool are now more meaningful and consistent.
-
Multiple changes to improve code readability and consistency.
RSMTool 7.1
This is a minor release which includes changes necessary to make RSMTool compatible with SKLL 2.0.
What's new
-
RSMTool is now compatible with SKLL 2.0.
-
The implementation of
scipy.stats.pearsonr
used in RSMTool to compute Pearson's correlation coefficient has changed. The new implementation is equivalent to the old one in the majority of cases but tends to produce slightly different values for very smallN
. See #343 for further detail. -
If you use the Dash app on macOS, you can now download the complete RSMTool documentation for offline use. Go to Dash preferences, click on "Downloads", then "User Contributed", and search for "RSMTool".
-
The conda package for RSMTool is now available from the official ETS conda channel.
API changes
-
The
run_experiment
,run_evaluation
,run_comparison
,run_summary
, andcompute_and_save_predictions
functions now accept Python dictionaries as input. -
The
.filepath
attribute ofConfiguration
object will be deprecated in a future version and replaced with two new atttributes:configdir
andfilename
. Usejoin(configdir, filename)
if you need the full path to the configuration file.
Other
- Minor changes to the documentation.
- Many functions used for tests have been refactored for efficiency.
RSMTool 7.0
This is a major release which includes changes to several key evaluation metrics computed by RSMTool.
What's new
Changes to evaluation metrics
The exact definitions of all evaluation metrics and their method of computation are now available in
- RSMTool documentation under evaluation metrics.
Changes to evaluation metrics
-
Quadratic weighted kappa (QWK) for
raw
,raw_trim
,scale
andscale_trim
scores is now computed on continuous score values using formula suggested by Haberman (2019). In previous versions of RSMTool such continuous score values were rounded to compute QWK. -
Subgroup differences are now evaluated using a new metrics "Difference in standardized means". This metrics was designed to be more robust to differences in scale between human and machine scores.
-
SMD for human-human agreement is now computed using pooled standard deviation of H1 and H2 for the double-scored sample in the denominator.
-
The default
tolerance
for score postprocessing is now set to 0.4998 (instead of 0.49998). This may result in small changes to the values of all evaluation metrics forraw_trim
andscale_trim
scores. See below for new configuration files if you need to define custom tolerance.
New evaluation metrics
-
Test-theory based evaluations: RSMTool and RSMEval now compute proportional reduction in mean squared error when using system scores to predict true scores.
-
RSMTool and RSMEval now compute various additional metrics of model fairness suggested in Loukina et al. 2019.
New configuration settings
-
A new configuration setting
experiment_names
for RSMSummarize allows specifying custom names for each experiment. These will be used to refer to the experiments in intermediate output files and in the report. -
A new configuration setting
trim_tolerance
allows specifying custom tolerance when trimming scores to ceiling and floor values in RSMTool and RSMEval. -
A new configuration setting
min_n_per_group
allows defining a threshold so that only groups with more than a certain number of members are included into the report. All groups are still included into the intermediate output files.
Other new functionality
.jsonlines
format is now one of the supported input file formats.
API changes
-
Several additional methods for computing standardized mean difference (SMD) are now available via
rsmtool.utils.standardized_mean_difference
-
The new routine for computing QWK is available via
rsmtool.utils.quadratic_weighted_kappa
-
The new metrics differences in standardized means (DSM) is available via
rsmtool.utils.difference_of_standardized_means
-
Functions for computing fairness analyses are now available via
rsmtool.fairness_utils.get_fairness_analyses
.
Bugfixes
-
partial_correlations()
function has been updated to return a correctly formatted matrix in a situation where the covariance matrix is very close to zero. -
The reports have been updated to correctly display plots for features with very long names.
v6.1.0
This is a major release which includes a number of improvements primarily aimed to increase the flexibility of RSMTool API.
What's New
New functionality
-
RSMTool now supports input files in SAS
SAS7BDAT
format. -
New learner
NNLRIterative
. This is a new built-in linear regression model that learns empirical OLS regression weights with feature selection using an iterative implementation of non-negative least squares regression. -
Custom truncation thresholds. The user can now remove outliers using pre-existing truncation thresholds specified in the
features
file by using the field use_truncation_thresholds -
Users can now run the
.ipynb
notebook generated from the experiment interactively, without having to set any environment variables. Each experiment now generates a (hidden) environment JSON file, which the notebook will automatically read.
API changes
-
There is now a separate function
utils.standardized_mean_difference()
that can be used to compute SMD. -
A new function
reader.try_to_load_file()
allows API user to specify what they want to happen if a file cannot be loaded. The functions can be set to returnNone
, to raise warning, or to raise error. -
DataContainer
class now includes additional helper methods. These methods allow users todrop()
andrename()
data frames in the DataContainer, and to select data frames using a specified prefix or suffix with theget_frames()
method. -
Configuration
class now includes several additional helper methodspop()
andcopy()
. -
utils.get_thumbnail_as_html()
now accepts an optional argumentpath_to_thumbnail
which allows using two different paths for thumbnails and full-size images.
Other
-
Support for
seaborn 0.9.0
andstatsmodels 0.9.0.
-
Support for
numpy 1.14.0
,scipy 1.1.0
, andpandas 0.23.0+
. -
Support for
ipython 6.5.0
andnotebook 5.7.2
. -
The documentation incorrectly stated the order of operations in the processing pipeline: the change of feature sign (if applicable) happens after standardization.
-
If the user specifies a list of features and one of such features has zero variance, the tool now displays the correct error message.
-
The logging messages displayed by
check_flag_column
now indicate the partition if different flag columns were used for training and evaluating the model. -
Miscellaneous minor bug fixes in the notebooks.
Version 6.0.1
This is a bugfix release.
- The "System Information" section of the reports now uses
pkg_resources
instead ofpip
to get the list of installed packages sincepip
disallows the use of its internal API starting with v10. - Fix incorrect formatting in the documentation.
- Update
ipython
andnotebook
package versions in order to address an incompatibility issue with the latest version of thetornado
web server that affects interactive use ofipython notebook
but not the report generation itself. - Updated the description of the marginal/partial correlation plot in the report.
Version 6.0
What's new?
This is a major release. The entire code base has been fully refactored to use a much more object-oriented design. This should make it much easier to make improvements and to add extensions. As result, there have been significant changes to the RSMTool API (see link in documentation below for more details).
New features
New learners
-
New regressors from the latest SKLL release (v1.5.1) have been added to
rsmtool
. -
rsmtool
can now be used with both regressors and classifiers from SKLL, including classifiers that produce probabilistic output which can be used to produce expected values as predictions.See the SKLL documentation for the full list of learners.
Enhanced outputs
- Users can now specify the
file_format
configuration option to save intermediate files in eithertsv
,csv
, orxlsx
format. - Users can specify a
use_thumbnails
configuration option that will embed clickable thumbnails in the HTML report, rather than full-sized images. Upon clicking the thumbnails, full-sized images will be displayed in a new window. This is particularly useful for larger reports with many images, improving both the readability and the loading speed of such reports. - Reports for
rsmtool
,rsmeval
, andrsmsummarize
now contain a new section containing links to intermediate files (intermediate_file_paths.ipynb
) so that users can now easily inspect these files from the report itself.
New configuration options
- Users can now specify
features
in the configuration file as alist
. When providing a list of features, signs or transformations cannot be specified. This makes creating configuration files for simple experiments much easier and faster. - Users can now specify a
skll_objective
for tuning the SKLL learners used in their experiments. - Users can now specify a
flag_column_test
configuration option to use different flags for the test file and the training file. - Users can now specify a
standardize_features
boolean option if they do not want the feature values standardized, which is the default.
New evaluations
rsmtool
andrsmeval
now compute disattenuated correlations if the data includes two human scores.
Code changes
- New helper classes have been added to
rsmtool
, which allow easy reading, writing, and manipulation of multiplepandas
data frames.container.DataContainer()
: A class to encapsulate multiple data frames.reader.DataReader()
: A class to read multiple tabular files into aDataContainer()
object.writer.DataWriter()
: A class to write all data frames contained in aDataContainer()
object to separate files, with a specified file extension.
- The
rsmtool
module is now installable viapip
, in addition to being installable withconda
. preprocessor.trim()
can now take both numpy arrays and lists as inputs.
Bugfixes
- Fixed warning in
rsmcompare
when computing summary evaluations. - Previously confusion matrices forced human scores to integers, while score distributions used the value "as is". Now both analyses use rounded human scores.
- Length columns are now forced to numeric, if they are non-numeric.
Documentation
- Added documentation for refactored API.
- Added detailed documentation about how to write RSMTool tests.