Releases · X-DataInitiative/SCALPEL-Extraction

Important: To run this release, it is necessary to use a version of DCIR that contains the column ER_PHA_F_PHA_ACT_QSN (with "_" instead of ".") and the line 36 of the file filtering/implicits/package.scala should be changed from:

.extract(path).persist().where(col("`ER_PHA_F.PHA_ACT_QSN`") <= upperBoundIrphaQuantity && col("`ER_PHA_F.PHA_ACT_QSN`")>0)

to:

.extract(path).where(col("ER_PHA_F_PHA_ACT_QSN") <= upperBoundIrphaQuantity && col("ER_PHA_F_PHA_ACT_QSN")>0)

Note: The uploaded jar already has this change, but the source code doesn't.

Assets 4

06 Dec 12:22

danielpes

mlpp-1.4.0

bccbaa7

MLPP Featuring 1.4.0

Same as previous, with a new filter added for removing patients who didn't have a target cancer within the study period.

A new entry was added to the config file:

mlpp_parameters = {
  ...
  exposures = {
    ...
    filter_never_sick_patients = false
  }
}

Assets 3

06 Dec 12:23

danielpes

mlpp-1.3.0

8521f6b

MLPP Featuring 1.3.0

Same as previous except for two main changes:

Allows lists for bucket_size and lag_count parameters
Added a new parameter: include_death_bucket, which determines if the bucket in which a patient died should be filled with zeroes in the final matrix (if false) or not

The final deafult mlpp_parameters config object is:

mlpp_parameters = {
  bucket_size = [30]  # in days
  lag_count = [10]
  min_timestamp = [2006, 1, 1]
  max_timestamp = [2009, 12, 31, 23, 59, 59]
  include_death_bucket = false

  exposures = {
    min_purchases = 1
    start_delay = 0
    purchases_window = 0
    only_first = false
    filter_lost_patients = false
    filter_diagnosed_patients = true
    diagnosed_patients_threshold = 0
    filter_delayed_entries = true
    delayed_entry_threshold = 12
  }
}

Assets 3

06 Dec 12:23

danielpes

mlpp-1.2.0

5ed7fb1

MLPP Featuring 1.2.0

Code run at the CNAM to get MLPP features.

Steps to run the featuring:

1) Run the jar with spark-submit. Example:

spark-submit \
--executor-memory 110G \
--class fr.polytechnique.cmap.cnam.filtering.mlpp.MLPPMain \
./SNIIRAM-flattening-assembly.jar conf=./mlpp_config.conf env=cnam

Where mlpp_config.conf is the custom configuration file.

2) The csv features will be written to the path found in mlpp_config.conf under the key mlpp_features, so a cal to hdfs get is needed. Example:

mkdir mlpp && cd mlpp
hdfs dfs -get /shared/mlpp_features/csv/*

3) Copy the MLPP_featuring.py script to the same directory of the local features and run it. example:

cp MLPP_featuring.py mlpp && cd mlpp
python MLPP_featuring.py

Assets 3

01 Dec 12:16

sathiyapk

coxExperiment-1.0.1

723604e

Cox Experiment

The attached results.zip file contains the results of the cox model (5.txt files + 1 .R file) run at CNAM using the fr.polytechnique.cmap.cnam.filtering.cox.CoxMain class with 4 different configuration changes compared to src/main/resources/config/filtering-default.conf file as follows:

cox_parameters.exposures.Start delay = 0
cox_parameters.exposures.Min Purchases = 1
cox_parameters.Follow-up delay = 4 months
cox_parameters.Follow-up delay = 2 months

using the R script cox_pio.R

And it yielded the corresponding result files:

startDelay0Result.txt
minPurchase1Result.txt
followupDelay4Result.txt
followupDelay2Result.txt
coxDefaultResult.txt (Result without any changes in the config file)

Assets 5

16 Nov 16:55

danielpes

mlpp-1.1.0

24ea995

MLPP Featuring 1.1.0

Code run at the CNAM to get MLPP features.

Steps to run the featuring:

1) Run the jar with spark-submit. Example:

spark-submit \
--executor-memory 110G \
--class fr.polytechnique.cmap.cnam.filtering.mlpp.MLPPProvisoryMain \
./SNIIRAM-flattening-assembly-1.0.jar cnam 10 30

Note: the expected arguements are, respectively, environment, lagCount and bucketSize (in days)

2) The csv features will be written to /shared/mlpp_features/<broad|narrow>/csv/, so a cal to hdfs get is needed. Example:

mkdir mlpp_broad && cd mlpp_broad
hdfs dfs -get /shared/mlpp_features/broad/csv/*

3) Copy the MLPP_featuring.py script to the same directory of the local features and run it. example:

cp MLPP_featuring.py mlpp_broad && cd mlpp_broad
python MLPP_featuring.py

Note: results.tar contains the results of the longitudinal multinomial model implemented in MLPP-147. The archive contains an HTML extraction of the notebook used to produce the results, and the coefficients obtained for several parameters. The coefficients were saved in text files using numpy.savetxt

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: X-DataInitiative/SCALPEL-Extraction

Featuring 2.1 pre-release

Featuring 2.0 pre-release

Featuring 1.1

Pureconfig compatible release.

Cumulative Cox 1.0.0

MLPP Featuring 1.4.0

MLPP Featuring 1.3.0

MLPP Featuring 1.2.0

Cox Experiment

MLPP Featuring 1.1.0