Graph Neural Network for Anomaly Detection and Classification in Scientific Workflows

Repo Content

adjacency_list_dags: json files with dependencies between nodes in each workflow
data: raw data - characterizations of jobs in each workflow during multiple execustions
data_new: raw data - new workflows with individual job anomalies
deepHyp_scripts: scripts connected to finding best hyperparameters for GNN
psd_gnn: module folder

setup env

Install with CUDA available (default)

sh setup.sh gpu
Install with CPU only

sh setup.sh cpu

Step 0 : ETA

visualize the graphs
feature analysis

Data Description

graphs stat

	cpu samples	hdd samples	loss samples	normal samples	# of nodes	# of edges
1000gemome	250	575	250	200	57	129
nowcast-clustering-8	300	360	300	270	13	20
nowcast-clustering-16	300	360	300	270	9	12
wind-clustering-casa	300	360	300	270	7	8
wind-noclustering-casa	300	360	300	270	26	44

Running scripts

We have two main scripts to run the experiments under examples folder. The scripts are:

graph-level anomaly detection

$ python demo_graph_classification.py --help
usage: demo_graph_classification.py [-h]
                                    [--workflow {1000genome,nowcast-clustering-8,nowcast-clustering-16,wind-clustering-casa,wind-noclustering-casa,1000genome_new_2022,montage,predict_future_sales,casa-wind-full,all}]
                                    [--binary] [--gpu GPU] [--epoch EPOCH] [--hidden_size HIDDEN_SIZE] [--batch_size BATCH_SIZE] [--conv_blocks CONV_BLOCKS] [--train_size TRAIN_SIZE]
                                    [--lr LR] [--weight_decay WEIGHT_DECAY] [--dropout DROPOUT] [--feature_option FEATURE_OPTION] [--seed SEED] [--path PATH] [--log] [--logdir LOGDIR]
                                    [--force] [--balance] [--verbose] [--output] [--anomaly_cat ANOMALY_CAT] [--anomaly_level [ANOMALY_LEVEL ...]] [--anomaly_num ANOMALY_NUM]

options:
  -h, --help            show this help message and exit
  --workflow {1000genome,nowcast-clustering-8,nowcast-clustering-16,wind-clustering-casa,wind-noclustering-casa,1000genome_new_2022,montage,predict_future_sales,casa-wind-full,all}, -w {1000genome,nowcast-clustering-8,nowcast-clustering-16,wind-clustering-casa,wind-noclustering-casa,1000genome_new_2022,montage,predict_future_sales,casa-wind-full,all}
                        Name of workflow.
  --binary              Toggle binary classification.
  --gpu GPU             GPU id. `-1` for CPU only.
  --epoch EPOCH         Number of epoch in training.
  --hidden_size HIDDEN_SIZE
                        Hidden channel size.
  --batch_size BATCH_SIZE
                        Batch size.
  --conv_blocks CONV_BLOCKS
                        Number of convolutional blocks
  --train_size TRAIN_SIZE
                        Train size [0.5, 1). And equal split on validation and testing.
  --lr LR               Learning rate.
  --weight_decay WEIGHT_DECAY
                        Weight decay for Adam.
  --dropout DROPOUT     Dropout in neural networks.
  --feature_option FEATURE_OPTION
                        Feature option.
  --seed SEED           Fix the random seed. `-1` for no random seed.
  --path PATH, -p PATH  Specify the root path of file.
  --log                 Toggle to log the training
  --logdir LOGDIR       Specify the log directory.
  --force               To force reprocess datasets.
  --balance             Enforce the weighted loss function.
  --verbose, -v         Toggle for verbose output.
  --output, -o          Toggle for pickle output file.
  --anomaly_cat ANOMALY_CAT
                        Specify the anomaly set.
  --anomaly_level [ANOMALY_LEVEL ...]
                        Specify the anomaly levels. Multiple inputs.
  --anomaly_num ANOMALY_NUM
                        Specify the anomaly num from nodes.

node-level anomaly detection

$ python demo_node_classification.py --help 
usage: demo_node_classification.py [-h]
                                   [--workflow {1000genome,nowcast-clustering-8,nowcast-clustering-16,wind-clustering-casa,wind-noclustering-casa,1000genome_new_2022,montage,predict_future_sales,casa-wind-full,all}]
                                   [--binary] [--gpu GPU] [--epoch EPOCH] [--hidden_size HIDDEN_SIZE] [--batch_size BATCH_SIZE] [--conv_blocks CONV_BLOCKS] [--train_size TRAIN_SIZE]
                                   [--lr LR] [--weight_decay WEIGHT_DECAY] [--dropout DROPOUT] [--feature_option FEATURE_OPTION] [--seed SEED] [--path PATH] [--log] [--logdir LOGDIR]
                                   [--force] [--balance] [--verbose] [--output] [--anomaly_cat ANOMALY_CAT] [--anomaly_level [ANOMALY_LEVEL ...]] [--anomaly_num ANOMALY_NUM]

options:
  -h, --help            show this help message and exit
  --workflow {1000genome,nowcast-clustering-8,nowcast-clustering-16,wind-clustering-casa,wind-noclustering-casa,1000genome_new_2022,montage,predict_future_sales,casa-wind-full,all}, -w {1000genome,nowcast-clustering-8,nowcast-clustering-16,wind-clustering-casa,wind-noclustering-casa,1000genome_new_2022,montage,predict_future_sales,casa-wind-full,all}
                        Name of workflow.
  --binary              Toggle binary classification.
  --gpu GPU             GPU id. `-1` for CPU only.
  --epoch EPOCH         Number of epoch in training.
  --hidden_size HIDDEN_SIZE
                        Hidden channel size.
  --batch_size BATCH_SIZE
                        Batch size.
  --conv_blocks CONV_BLOCKS
                        Number of convolutional blocks
  --train_size TRAIN_SIZE
                        Train size [0.5, 1). And equal split on validation and testing.
  --lr LR               Learning rate.
  --weight_decay WEIGHT_DECAY
                        Weight decay for Adam.
  --dropout DROPOUT     Dropout in neural networks.
  --feature_option FEATURE_OPTION
                        Feature option.
  --seed SEED           Fix the random seed. `-1` for no random seed.
  --path PATH, -p PATH  Specify the root path of file.
  --log                 Toggle to log the training
  --logdir LOGDIR       Specify the log directory.
  --force               To force reprocess datasets.
  --balance             Enforce the weighted loss function.
  --verbose, -v         Toggle for verbose output.
  --output, -o          Toggle for pickle output file.
  --anomaly_cat ANOMALY_CAT
                        Specify the anomaly set.
  --anomaly_level [ANOMALY_LEVEL ...]
                        Specify the anomaly levels. Multiple inputs.
  --anomaly_num ANOMALY_NUM
                        Specify the anomaly num from nodes.

Reference

@inproceedings{jin2022workflow,
  title={Workflow anomaly detection with graph neural networks},
  author={Jin, Hongwei and Raghavan, Krishnan and Papadimitriou, George and Wang, Cong and Mandal, Anirban and Krawczuk, Patrycja and Pottier, Lo{\"\i}c and Kiran, Mariam and Deelman, Ewa and Balaprakash, Prasanna},
  booktitle={2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS)},
  pages={35--42},
  year={2022},
  organization={IEEE}
}

@article{jin2023graph,
  title={Graph neural networks for detecting anomalies in scientific workflows},
  author={Jin, Hongwei and Raghavan, Krishnan and Papadimitriou, George and Wang, Cong and Mandal, Anirban and Kiran, Mariam and Deelman, Ewa and Balaprakash, Prasanna},
  journal={The International Journal of High Performance Computing Applications},
  volume={37},
  number={3-4},
  pages={394--411},
  year={2023},
  publisher={SAGE Publications Sage UK: London, England}
}

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
_draft		_draft
adjacency_list_dags		adjacency_list_dags
bad_montage_data		bad_montage_data
data		data
data_new		data_new
examples		examples
explainer		explainer
hps		hps
notebooks		notebooks
pickles		pickles
psd_gnn		psd_gnn
results		results
submission_scripts		submission_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fix_montage_data.py		fix_montage_data.py
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Neural Network for Anomaly Detection and Classification in Scientific Workflows

Repo Content

setup env

Step 0 : ETA

Data Description

Running scripts

Reference

About

Releases

Packages

Contributors 4

Languages

License

PoSeiDon-Workflows/FlowGAD

Folders and files

Latest commit

History

Repository files navigation

Graph Neural Network for Anomaly Detection and Classification in Scientific Workflows

Repo Content

setup env

Step 0 : ETA

Data Description

Running scripts

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages