-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
139 lines (122 loc) · 5.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
output: github_document
references:
- id: lanera18
title: Extending PubMed Searches to ClinicalTrials.gov Through a Machine Learning Approach for Systematic Reviews
author:
- family: Lanera
given: Corrado
- family: Minto
given: Clara
- family: Sharma
given: Abhinav
- family: Gregori
given: Dario
- family: Berchialla
given: Paola
- family: Baldi
given: Ileana
container-title: Jurnal of Clinical Epidemiology
page: 22-30
issue: 103
URL: 'http://www.sciencedirect.com/science/article/pii/S0895435618300854'
DOI: 10.1016/j.jclinepi.2018.06.015
type: article-journal
issued:
year: 2018
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
[![Travis build status](https://travis-ci.org/UBESP-DCTV/costumer.svg?branch=master)](https://travis-ci.org/UBESP-DCTV/costumer)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/UBESP-DCTV/costumer?branch=master&svg=true)](https://ci.appveyor.com/project/UBESP-DCTV/costumer)
[![Coverage status](https://codecov.io/gh/UBESP-DCTV/costumer/branch/master/graph/badge.svg)](https://codecov.io/github/UBESP-DCTV/costumer?branch=master)
# costumer
The goal of costumer is to provide the data, the functions, scripts for the
analyses and the documentation (report) within the relative templates for
the paper *Building Comprehensive Searches including PubMed and
ClinicalTrials.gov Through a Machine Learning Approach for Systematic
Reviews*. @lanera18
## Installation
<!-- You can install the released version of costumer from -->
<!-- [CRAN](https://CRAN.R-project.org) with: -->
<!-- ``` r -->
<!-- install.packages("costumer") -->
<!-- ``` -->
You can install the development version from [GitHub](https://github.com/)
with the following procedure:
```{r, eval = FALSE}
## If you do not have the `devtools` package installed, please install it
# install.packages("devtools")
devtools::install_github("UBESP-DCTV/costumer")
```
## folders' organization
* `R/` contains all the functions provided to implement the analyses
* `tests/` contains all the automated test to run for CI
* `man/` contains the documents for each function or data provided
(accessible in R by `?<name_of_the_object>`)
* `data-raw/` contains all the script used to import and manage the data used
in the analyses and the (automated) tests of the package
* `data/` contains the data provided by the package. In particular, it
contains:
- the customized `caret` models (used to incorporate a correct
management of cross-validation process with textual data, especially
for iDF reweighting) --- `*_cvAble.rda`;
- the customized functions for the unbalance management ---
`R[OU]S(3565|5050)_new.rda`;
- sample data used in the automated tests --- `liu_*.rda`.
* `inst/` contains one folder `doc/` which contains:
- `hutch_analyses_p1_v2.0.R`, the script used to perform all the
analyses reported in @lanera18;
- `AACT201603_comprehensive_data_dictionary.xlsx`, the data-dictionary
for the original clinicaltrial.gov data.
> **Note**: the main data used are too huge to be included in an R package or
> in a GitHub repository.
> [Here](https://1drv.ms/f/s!AtlSkmthbrG4i8lA1fk5LPhmsOt0pg) you can find a
> folder named `non_git_nor_build_derived_data/` (2.86 GB) which contains:
>
> - `171106-all_svm_3565/` folder with all the outputs of the last
> analyses:
> + `CV-Plots/` folder which contains all the cross-validation plots
> representing the decision levels for the tuning parameter used in
> each model;
> + `models/` folder which contains all the trained models;
> + `hutch3.rda` data which contains the `hutch3` data frame containing
> all the data-step of the analyses, i.e., starting data, preprocessed
> data, DMT, testing, data, the model used, plots provided, ...
> everything!
> + `*.txt` log files.
> - `raw_pubmed/` folder with the data used to train the models, which are
> needed to run the script `data-raw/import_pubmed.R`. Hence, if you
> would like to run that script by yourself you need to put this folder
> as it is into the `data-raw/` one.
> - `raw_ctgov.zip` zip file with the data used to test the models, i.e.,
> the clinicaltrial.gov snapshot used and which is (when unzipped)
> needed to run the script `data-raw/import_ctgov.R`. Hence, if you
> would like to run that script by yourself you need to unzip this file
> and put the output folder as it is (~841 MB) into the `data-raw/` one.
> - `random4h28.xlsx` file with the sample data used to (automated) test
> functions provided with the package, which is needed to run the script
> `data-raw/import_liu.R`. Hence, if you would like to run that script
> by yourself you need to put this file as it is into the `data-raw/`
> folder.
> - `summaries_*.rda` the outputs ready-to-use of the functions
> `import_*.R` which are needed to run the script of the analyses. Hence,
> if you would like to run that script by yourself you need to put this
> files as they are into the `data/` folder.
> - `test_*.rda` data which are the outputs of the function
> `data-raw/ct_corpus_and_dtm.R` which are also needed (and here are
> ready-to-use) to run the script of the analyses. Hence, if you would
> like to run that script by yourself, you need to put this files as they
> are into the `data/` folder.
## Bug reports
If you encounter a bug, please file a
[reprex](https://github.com/tidyverse/reprex) (minimal reproducible example)
to <https//github.com/UBESP-DCTV/imthcm/issues>
## Reference