Finish to update Sankar et al

I2Cvb · Apr 18, 2016 · 23c2541 · 23c2541
1 parent 271f68a
commit 23c2541
Show file tree

Hide file tree

Showing 28 changed files with 182 additions and 4,118 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,115 @@
 Classification of SD-OCT volumes for DME detection: an anomaly detection approach
 =================================================================================
 
-S. Sankar, D. Sidibé, C. Y. Cheung, T. Y. Wong, E. Lamoureux, D. Milea, F. Meriaudeau, “Classification of SD-OCT volumes for DME detection: an anomaly detection approach”, SPIE Medical Imaging 2016, San Diego, USA.
+```
+@proceeding{sankar2016classification,
+author = {Sankar, S. and Sidib\'{e}, D. and Cheung, Y. and Wong, T. Y. and Lamoureux, E. and Milea, D. and Meriaudeau, F.},
+title = {Classification of SD-OCT volumes for DME detection: an anomaly detection approach},
+journal = {Proc. SPIE},
+volume = {9785},
+pages = {97852O-97852O-6},
+year = {2016}
+}
+```
+
+How to use the pipeline?
+-------
+
+### Pre-processing pipeline
+
+The follwoing pre-processing routines were applied:
+
+- Flattening,
+- Cropping.
+
+#### Data variables
+
+In the file `pipeline/feature-preprocessing/pipeline_preprocessing.m`, you need to set the following variables:
+
+- `data_directory`: this directory contains the orignal SD-OCT volume. The format used was `.img`.
+- `store_directory`: this directory corresponds to the place where the resulting data will be stored. The format used was `.mat`.
+
+#### Algorithm variables
+
+The variables which are not indicated in the inital publication and that can be changed are:
+
+- `x_size`, `y_size`, `z_size`: the original size of the SD-OCT volume. It is needed to open `.img` file.
+- `kernelratio`, `windowratio`, `filterstrength`: the NLM parameters.
+- `h_over_rpe`, `h_under_rpe`, `width_crop`: the different variables driving the cropping.
+- `thres_method`, `thres_val`: method to threshold and its associated value to binarize the image.
+- `gpu_enable`: method to enable GPU.
+- `median_sz`: size of the kernel when applying the median filter.
+- `se_op`, `se_cl`: size of the kernel when applying the closing and opening operations.
+
+#### Run the pipeline
+
+From the root directory, launch MATLAB and run:
+
+```
+>> run pipeline/feature-preprocessing/pipeline_preprocessing.m
+```
+
+### Extraction pipeline
+
+For this pipeline, the following features were extracted:
+
+- PCA on vectorized B-scans.
+
+#### Data variables
+
+In the file `pipeline/feature-extraction/pipeline_extraction.m`, you need to set the following variables:
+
+- `data_directory`: this directory contains the pre-processed SD-OCT volume. The format used was `.mat`.
+- `store_directory`: this directory corresponds to the place where the resulting data will be stored. The format used was `.mat`.
+- `pca_compoments`: this the number of components to keep when reducing the dimension by PCA.
+
+#### Run the pipeline
+
+From the root directory, launch MATLAB and run:
+
+```
+>> run pipeline/feature-extraction/pipeline_extraction.m
+```
+
+### Classification pipeline
+
+The method for classification used was:
+
+- GMM modelling.
+
+#### Data variables
+
+In the file `pipeline/feature-preprocessing/pipeline_classifier.m`, you need to set the following variables:
+
+- `data_directory`: this directory contains the feature extracted from the SD-OCT volumes. The format used was `.mat`.
+- `store_directory`: this directory corresponds to the place where the resulting data will be stored. The format used was `.mat`.
+- `gt_file`: this is the file containing the label for each volume. You will have to make your own strategy.
+- `gmm_k`: this is the number of mixture components of the GMM.
+- `pca_components`: this is the number of components of the PCA used in the extraction.
+- `mahal_thresh`: the treshold to use to consider a B-scan as abnormal or not.
+- `n_slices_thres`: the minimum number of abnormal slices to consider the volume as DME.
+
+#### Run the pipeline
+
+From the root directory, launch MATLAB and run:
+
+```
+>> run pipeline/feature-classification/pipeline_classifier.m
+```
+
+### Validation pipeline
+
+#### Data variables
+
+In the file `pipeline/feature-validation/pipeline_validation.m`, you need to set the following variables:
+
+- `data_directory`: this directory contains the classification results. The format used was `.mat`.
+- `gt_file`: this is the file containing the label for each volume. You will have to make your own strategy.
+
+#### Run the pipeline
+
+From the root directory, launch MATLAB and run:
+
+```
+>> run pipeline/feature-validation/pipeline_validation.m
+```
diff --git a/pipeline/check_outlier.m b/pipeline/check_outlier.m
diff --git a/pipeline/crop_vols.m b/pipeline/crop_vols.m
diff --git a/pipeline/do_pca.m b/pipeline/do_pca.m
diff --git a/pipeline/feature-classification/pipeline_classifier.m b/pipeline/feature-classification/pipeline_classifier.m
@@ -23,15 +23,17 @@
 idx_class_pos = find( data_label ==  1 );
 idx_class_neg = find( data_label == -1 );
 
-% Number of mixture components
-gmm_k = 8;
+% Parameter for the GMM
+rng(1);
+gmm_k = 15;
+options = statset('MaxIter', 1000);
 
 % Mahalanobis threshold
-pca_components = 300;
+pca_components = 500;
 mahal_thresh = chi2inv(0.95, pca_components);
 
 % Number of abnormal slices tolerated
-n_slices_thres = 32;
+n_slices_thres = 15;
 
 % Number of slice per volume
 x_size = 128;
@@ -50,13 +52,17 @@
     load(strcat(data_directory, filename_cv));
 
     % Apply a GMM learning on the training set
-    gmm_model = fitgmdist(training_data, gmm_k);
+    gmm_model = fitgmdist(training_data, gmm_k, ...
+                          'Options', options, ...
+                          'CovarianceType', 'diagonal', ...
+                          'RegularizationValue', 0.001, ...
+                          'Replicated', 10);
 
     test_vol = 1;
     % Test the gmm_model and count the number of outliers
     for test_id = 1 : x_size : size(testing_data,1)
         % Extract the data to use in the gmm model
-        t_data = testing_data(test_id : test_id + x_size - 1,:));
+        t_data = testing_data(test_id : test_id + x_size - 1,:);
 
         % Compute the Mahalanobis distance for all the slices
         mahal_dist = mahal(gmm_model, t_data);
@@ -65,7 +71,11 @@
         mahal_dist_near = min(mahal_dist, [], 2);
 
         % Check how many slices are abnormal
-        n_abnormal_slices = nnz(mahal_dist_near > mahal_thresh);
+        % Apply a median filter in order to reject the case that
+        % you do not have consecutive abnormal slices
+        n_abnormal_slices = nnz(medfilt1(single(mahal_dist_near > mahal_thresh)));
+
+        disp(['Number of estimated outliers: ', num2str(n_abnormal_slices)]);
 
         % Affect the predicted label
         if n_abnormal_slices > n_slices_thres

diff --git a/pipeline/feature-extraction/pipeline_extraction.m b/pipeline/feature-extraction/pipeline_extraction.m
@@ -24,7 +24,7 @@
 idx_class_neg = find( data_label == -1 );
 
 % Number of components for the PCA
-pca_components = 300;
+pca_components = 500;
 
 % poolobj = parpool('local', 48);
 

diff --git a/pipeline/feature-validation/pipeline_validation.m b/pipeline/feature-validation/pipeline_validation.m
@@ -0,0 +1,52 @@
+clear all;
+close all;
+clc;
+
+% Execute the setup for protoclass matlab
+run('../../../../third-party/protoclass_matlab/setup.m');
+
+% Refer to the classification pipeline to know how the testing set
+% was created
+% Location of the ground-truth
+gt_file = '/data/retinopathy/OCT/SERI/data.xls';
+
+% Load the csv data
+[~, ~, raw_data] = xlsread(gt_file);
+% Extract the information from the raw data
+% Store the filename inside a cell
+filename = { raw_data{ 2:end, 1} };
+% Store the label information into a vector
+data_label = [ raw_data{ 2:end, 2 } ];
+% Get the index of positive and negative class
+idx_class_pos = find( data_label ==  1 );
+idx_class_neg = find( data_label == -1 );
+
+gt_label = [];
+% We gan create the GT labels
+for idx_cv_lpo = 1:length(idx_class_pos)
+    % Concatenate the value as in the classification pipeline
+    gt_label = [ gt_label 1 -1 ];
+end
+
+% Load the results data
+results_filename = ['/data/retinopathy/OCT/SERI/results/' ...
+                    'sankar_2016/predicition.mat'];
+load(results_filename);
+
+% Linearize the vector loaded
+pred_label = pred_label_cv';
+pred_label = pred_label(:);
+
+% Get the statistic 
+[ sens, spec, prec, npv, acc, f1s, mcc, gmean, cm ] = metric_confusion_matrix( ...
+    pred_label, gt_label );
+
+% Display the information
+disp( ['Sensitivity: ',                     num2str(sens)] );
+disp( ['Specificity: ',                     num2str(spec)] );
+disp( ['Precision: ',                       num2str(prec)] );
+disp( ['Negative Predictive Value: ',       num2str(npv)] );
+disp( ['Accuracy: ',                        num2str(acc)] );
+disp( ['F1-score: ',                        num2str(f1s)] );
+disp( ['Matthew Correlation Coefficiant: ', num2str(mcc)] );
+disp( ['Geometric Mean: ',                  num2str(gmean)] );