GitHub

Regarding the machine learning pipelines, in the published paper, we performed Cox regression using the whole data. To further alleviate the potential issue of double dipping, we now upload the code and detailed results by rigid performing both Cox and protein selection under strict 10 fold cross-validation. The proteins were initially performed through Cox regressions and those survived multiple comparison tests passed into sequential forward selection. Selected proteins under each cross-validation partition were then used for model development. The results were largely consistent with those obtained in the published paper. Top selected proteins of NEFL, GFAP, GDF15, BCAN, LTBP2, NPTXR, EDA2R were all selected under each cross-validation partition. The protein panel and protein panel + demographic information were also obtained similar performance of AUCs. These analysis and corresponding results were uploaded within folder 10-FoldCV-Analysis-ACD.

Code:

Association analysis of Cox regression:

s0_Cox_M1.py

s0_Cox_M2.py

Importance ranking of proteins:

s1_ACD_ProImp.py

Sequential forward protein selection:

s2_SFS.py

Machine learning model development:

s3_ML.py

Model evaluation:

S4_Eval.py

Results:

CV_Fold_SelectedProteins.csv: Selected proteins under each cross-validation partition

CV_Fold_Eval_ProPanel.csv: Cross-validation model performance of protein panel

CV_Fold_Eval_ProDemo.csv: Cross-validation model performance of protein panel + Demographic

CV_Fold_Eval_Top3ProDemo.csv: Cross-validation model performance of top-3 protein + Demographic; notably, the top-3 proteins were NEFL, GFAP and GDF15

/Results/TestFold*/ :

cross-validation results under each cross-validation partition

ACD_Cox_M1.csv & ACD_Cox_M2.csv: Cox regression to identify associated proteins

ProImportance_cv.csv: calculate the protein importance and ranking proteins

SFS_cv.csv: sequential forward selection procedure to determine optimal number of proteins

pred_probs.csv: predicted probabilities of selected proteins and predicted probabilities of selected proteins + demographic information

pred_probs_Top3ProDemo.csv: predicted probabilities of top-3 selected proteins + demographic information

Eval_ProPanel.csv & Eval_ProDemo.csv & Eval_Top3ProDemo.csv: cross-validation evaluation of selected protein, selected protein + demographic information and top-3 selected proteins + demographic information

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
10-FoldCV-Analysis-ACD		10-FoldCV-Analysis-ACD
Plots		Plots
Utility		Utility
s1_Cox		s1_Cox
s3_ML_Models		s3_ML_Models
s4_ML_Models_ProRS		s4_ML_Models_ProRS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

jasonHKU0907/DementiaProteomicPrediction

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages