-
Notifications
You must be signed in to change notification settings - Fork 2
jmw464/Vertex-GNN
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
########################## ## Secondary Vertex GNN ## ########################## ----------------------------------------------------------------------------- Prerequesites: Make sure you have the following packages installed before attempting to run the GNN scripts: -> Python (min version 3.6) -> ROOT (min version 6.18.0) -> Pytorch (min version 1.4.0) -> DGL (min version 0.5.3) - https://docs.dgl.ai/en/latest/ -> h5py (min version 2.10.0) Running the GNN might work with older versions of some of these, but this hasn't been tested (except for the fact that Python 3.6 is required by DGL). ROOT and h5py are used for data processing and plotting while DGL (using Pytorch as a backend) is the core package with which the network is built. All of these are fairly standard, except for DGL for which installation instructions can be found on the listed website. ----------------------------------------------------------------------------- Running the GNN: In order to generate input files, a modified version of the "FlavourTagPerformanceFramework" is used (which can be found at https://gitlab.cern.ch/jmwagner/FlavourTagPerformanceFramework). This produces ROOT ntuples from DAODs that are directly compatible with the GNN. Further instructions on how to generate these ntuples can be found in the README of the aforementioned repository. Once all desired ntuples are generated, running the GNN is relatively straightforward. The whole pipeline consists of two steps: data processing and training/evaluation. These can be run successively with the "run_processing.sh" and "run_GNN.sh" scripts. All variable arguments are set either within these two bash scripts or within the "scripts/options.py" file. The bash scripts contain the paths to input/output data as well as some higher level options that affect the pipeline in general (such as whether or not input data should be normalized etc). The python script on the other hand contains a large number of smaller options that allow for modification of the behavior of the GNN. This is where hyperparameters and cuts are set as well as some plotting options. Other than these three places there should be no need to edit any other scripts other than to add features or modify GNN behavior more significantly. All scripts are intended to be as modular and flexible as possible in order to ensure ease of use. ----------------------------------------------------------------------------- Description of individual scripts: PROCESSING o) process_ntuple.py: Processes a single ROOT ntuple, extracts information relevant to the GNN and saves this information in the form of HDF5 files. Track information is saved directly while truth particle information is used within the script to derive relevant information (such as track labels). Also performs cuts on data (cut jets are removed entirely, cut tracks are marked but kept for plotting purposes).. o) create_graphs.py: Takes a single HDF5 file created with process_ntuple.py and turns it into DGL compatible graph file that can be used for training directly. Also saves text file containing some values derived from the data that are used further down the line. o) combine_graphs.py: Imports and shuffles graphs from multiple DGL files and splits them into training, testing and validation sets, which are all saved in separate files. Note: because of this, this script is still used even if only a single ntuple is used to generate a dataset. o) norm_graphs.py: This script is OPTIONAL. It performs normalization of all input features for the GNN in testing, training and validation files by subtracting the mean of each feature and dividing by its standard deviation, which are calculated from the training data only. Some features are normalized slightly differently (details are in the script). o) prune_graphs.py: Final script to be run before GNN training. Fully removes all tracks marked as cut from graphs in training, testing and validation files and writes a text file containing some parameters of the training data relevant for GNN training (such as the proportion of true to false edges). o) truth_functions.py: Contains functions that help with processing truth information (such as generating dictionaries of truth particles/tracks). Used extensively in process_ntuple.py. GNN TRAINING o) GNN_main.py: Contains core training and testing loops of the GNN. Uses DGL graph files as input and generates modified DGL graph files containing training results which are used in other evaluation scripts. Also calculates simple GNN performance metrics and generates some rudimentary performance plots. o) GNN_model.py: Defines the actual GNN model used in GNN_main.py. o) compare_performance.py: Turns GNN output labels into actual secondary vertices by associating tracks based on their edge scores via a deterministic algorithm. Based on these secondary vertices, this script then performs comparisons with SV1 (a different SV reconstruction algorithm) and generates a large number of plots. Note: this script is only written to work with the GNN in binary classification mode. o) GNN_eval.py: Contains functions used to evaluate GNN performance and reconstruct secondary vertices (used in compare_performance.py). PLOTTING o) plot_data.py: Run automatically after data processing. Generates plots showing distribution of input features in the generated dataset categorized by truth labels for each track. Also makes plots showing effects of cuts on data. o) plot_results.py: Run automatically after GNN training. Generates plots comparing feature distributions of jets the GNN performed poorly on with the rest of the dataset. This is again categorized by the truth labels of each track. o) plot_functions.py: Contains functions to generate consistent plots of several types. Used in all other plotting scripts. OTHER SCRIPTS o) options.py: Defines global variables used across all scripts. ----------------------------------------------------------------------------- written by Johannes Wagner ([email protected])
About
Graph Neural Network for secondary vertex reconstructions in ATLAS experiment
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published