Skip to content

Releases: X-DataInitiative/SCALPEL-Extraction

Featuring 2.1 pre-release

25 Jul 11:34
6e2fa10
Compare
Choose a tag to compare
Pre-release

This release contains:

N-level exposures and Prescription

Featuring 2.0 pre-release

06 May 14:33
151d597
Compare
Choose a tag to compare
Pre-release

This includes:

  1. Refactoring of all Event Extractors.
  2. Bulk main where all Events are extracted from Source.

This has a backward compatibility with previous studies.

An important documentation effort is needed before beta release.

Featuring 1.1

30 Aug 11:32
8699448
Compare
Choose a tag to compare

This release adds the following:

  1. Fall study is now for three years data.
  2. Pio and Rosi are now metadata compatible.
  3. Adresses the problem for Molecule extractor for Pio and Rosi.

Pureconfig compatible release.

05 Jul 15:35
2dcae42
Compare
Choose a tag to compare

This release uses Pureconfig configuration used all over the package.

Cumulative Cox 1.0.0

10 Feb 09:08
Compare
Choose a tag to compare

Important: To run this release, it is necessary to use a version of DCIR that contains the column ER_PHA_F_PHA_ACT_QSN (with "_" instead of ".") and the line 36 of the file filtering/implicits/package.scala should be changed from:

.extract(path).persist().where(col("`ER_PHA_F.PHA_ACT_QSN`") <= upperBoundIrphaQuantity && col("`ER_PHA_F.PHA_ACT_QSN`")>0) 

to:

.extract(path).where(col("ER_PHA_F_PHA_ACT_QSN") <= upperBoundIrphaQuantity && col("ER_PHA_F_PHA_ACT_QSN")>0)

Note: The uploaded jar already has this change, but the source code doesn't.

MLPP Featuring 1.4.0

06 Dec 12:22
Compare
Choose a tag to compare

Same as previous, with a new filter added for removing patients who didn't have a target cancer within the study period.

A new entry was added to the config file:

mlpp_parameters = {
  ...
  exposures = {
    ...
    filter_never_sick_patients = false
  }
}

MLPP Featuring 1.3.0

06 Dec 12:23
Compare
Choose a tag to compare

Same as previous except for two main changes:

  • Allows lists for bucket_size and lag_count parameters
  • Added a new parameter: include_death_bucket, which determines if the bucket in which a patient died should be filled with zeroes in the final matrix (if false) or not

The final deafult mlpp_parameters config object is:

mlpp_parameters = {
  bucket_size = [30]  # in days
  lag_count = [10]
  min_timestamp = [2006, 1, 1]
  max_timestamp = [2009, 12, 31, 23, 59, 59]
  include_death_bucket = false

  exposures = {
    min_purchases = 1
    start_delay = 0
    purchases_window = 0
    only_first = false
    filter_lost_patients = false
    filter_diagnosed_patients = true
    diagnosed_patients_threshold = 0
    filter_delayed_entries = true
    delayed_entry_threshold = 12
  }
}

MLPP Featuring 1.2.0

06 Dec 12:23
Compare
Choose a tag to compare

Code run at the CNAM to get MLPP features.

Steps to run the featuring:

1) Run the jar with spark-submit. Example:

spark-submit \
--executor-memory 110G \
--class fr.polytechnique.cmap.cnam.filtering.mlpp.MLPPMain \
./SNIIRAM-flattening-assembly.jar conf=./mlpp_config.conf env=cnam

Where mlpp_config.conf is the custom configuration file.

2) The csv features will be written to the path found in mlpp_config.conf under the key mlpp_features, so a cal to hdfs get is needed. Example:

mkdir mlpp && cd mlpp
hdfs dfs -get /shared/mlpp_features/csv/*

3) Copy the MLPP_featuring.py script to the same directory of the local features and run it. example:

cp MLPP_featuring.py mlpp && cd mlpp
python MLPP_featuring.py

Cox Experiment

01 Dec 12:16
Compare
Choose a tag to compare

The attached results.zip file contains the results of the cox model (5.txt files + 1 .R file) run at CNAM using the fr.polytechnique.cmap.cnam.filtering.cox.CoxMain class with 4 different configuration changes compared to src/main/resources/config/filtering-default.conf file as follows:

  1. cox_parameters.exposures.Start delay = 0
  2. cox_parameters.exposures.Min Purchases = 1
  3. cox_parameters.Follow-up delay = 4 months
  4. cox_parameters.Follow-up delay = 2 months

using the R script cox_pio.R


And it yielded the corresponding result files:

  1. startDelay0Result.txt
  2. minPurchase1Result.txt
  3. followupDelay4Result.txt
  4. followupDelay2Result.txt
  5. coxDefaultResult.txt (Result without any changes in the config file)

MLPP Featuring 1.1.0

16 Nov 16:55
Compare
Choose a tag to compare

Code run at the CNAM to get MLPP features.

Steps to run the featuring:

1) Run the jar with spark-submit. Example:

spark-submit \
--executor-memory 110G \
--class fr.polytechnique.cmap.cnam.filtering.mlpp.MLPPProvisoryMain \
./SNIIRAM-flattening-assembly-1.0.jar cnam 10 30

Note: the expected arguements are, respectively, environment, lagCount and bucketSize (in days)

2) The csv features will be written to /shared/mlpp_features/<broad|narrow>/csv/, so a cal to hdfs get is needed. Example:

mkdir mlpp_broad && cd mlpp_broad
hdfs dfs -get /shared/mlpp_features/broad/csv/*

3) Copy the MLPP_featuring.py script to the same directory of the local features and run it. example:

cp MLPP_featuring.py mlpp_broad && cd mlpp_broad
python MLPP_featuring.py

Note: results.tar contains the results of the longitudinal multinomial model implemented in MLPP-147. The archive contains an HTML extraction of the notebook used to produce the results, and the coefficients obtained for several parameters. The coefficients were saved in text files using numpy.savetxt