R-Markdown that produces all the Tables and Figures presented in the paper. (Re-)compilation of the HTML report containing the paper results:
R -e "rmarkdown::render('ECCB2018.Rmd',output_file='../ECCB2018.html')"
Set of functions that allows to load the results of the experiments into R data.table
s.
Load the results of the experiment evaluating the pairwise prediction ("accuracy"
)
performance using RankSVM ("ranksvm_slacktype=on_pairs"
) when trained
on a single system and applied to a single target system:
sdir_results <- "results/raw/PredRet/v2/final/" # example when dataset PredRet/v2 is used
res <- load_baseline_single_results (
measure = c("accuracy", "accuracy_std"),
base_dir = paste0 (sdir_results, "ranksvm_slacktype=on_pairs/"),
predictor = "maccs",
kernel = "tanimoto",
pair_params = list (allow_overlap = "True", d_lower = 0, d_upper = 16, ireverse = "False", type = "order_graph"),
feature_type = "difference",
flavor = list (allpairsfortest = "True", featurescaler = "noscaling", sysset = 10))
Parameters:
measure
: Which evaluation measure to load, e.g., accuracy, correlation, ... (see also: evaluation_scenarios_cls.py)base_dir
: Directory of the processed input data of a certain dataset, e.g.PredRet/v2
- For RankSVM this paramter is set to
paste0 (sdir_results, "ranksvm_slacktype=on_pairs/")
- For SVR this parameter is set to
paste0 (sdir_results, "svr/")
- If the evaluation script is run in debug mode, than replace
final
bydebug
.
- For RankSVM this paramter is set to
predictor
: Which feature was used to represent the molecules, e.g., MACCS fingerprints.kernel
: Which kernel was used on top of the molecular features, e.g., Tanimoto kernel.pair_params
: Paramters for the training pair generation from the retention times for the RankSVM (see for example functionget_pairs_from_order_graph
for details)- In the paper all the results are calculated using the paramters shown in the example
- For SVR this paramter can be set to
NULL
feature_type
: Feature type used in the RankSVM.- Only
"difference"
is supported and used in the paper. - For SVR this paramter can be set to
NULL
- Only
flavor
: List of parameters used to identify the some settings during the evaluation:allpairsfortest
: See parameter documentation offind_hparan_ranksvm
.featurescaler
: Feature scaler used for the molecular features. (see also:evaluate_on_target_systems
)sysset
: Which (sub)set of systems from the specified dataset should be used, e.g., the used in the paper (=10).
The different experiments evaluated in the paper require different load_*
functions. Those are provided in the helper.R script. Further examples
how to load the results can be found in the report / summary R-markdown script.
> res[d_lower == 0 & d_upper == Inf]
accuracy accuracy_std target source d_lower d_upper
1: 0.8439 0.0053 Eawag_XBridgeC18 Eawag_XBridgeC18 0 Inf
2: 0.9048 0.0075 FEM_long FEM_long 0 Inf
3: 0.8623 0.0173 LIFE_old LIFE_old 0 Inf
4: 0.8484 0.0083 RIKEN RIKEN 0 Inf
5: 0.8019 0.0086 UFZ_Phenomenex UFZ_Phenomenex 0 Inf
source
refers in the result files to the system(s) used for training.d_lower
andd_upper
refers here to the paramters used to calculate the test pairs for evaluation.0
andInf
means, that all possible pairs are used for testing (as the paper defines the Pairwise accuracy in Section 3.1.2)- The result files also contain other
d_lower
andd_upper
pairs. - Those can be used to, e.g., evaluate the pairwise prediction accuracy for nearby eluting molecules, i.e. with small retention time difference:
d_lower = 0
andd_upper = 4
. - Please look at the source code generating the pairwise accuracies.