Scripts to ease the reduction of MC data on the LST cluster at La Palma. With this package, the analysis/creation of R1/DL0/DL1/DL2/IRFs can be orchestrated.
Contact: Thomas Vuillaume, thomas.vuillaume [at] Enrique Garcia, garcia [at] Lukas Nickel, lukas.nickel [at]
If lstMCpipe was used for your analysis, please cite:
@misc{garcia2022lstmcpipe, title={The lstMCpipe library}, author={Enrique Garcia and Thomas Vuillaume and Lukas Nickel}, year={2022}, eprint={2212.00120}, archivePrefix={arXiv}, primaryClass={astro-ph.IM} }
in addition to the exact lstMCpipe version used from
You may also want to include the config file with your published code for reproducibility.
As as user:
For lstmcpipe >= 0.10.3, the preferred installation should be conda:
conda install lstmcpipe
Former versions:
VERSION=0.10.1 # change as desired wget$VERSION/environment.yml conda env create -f environment.yml conda activate lstmcpipe pip install lstmcpipe==$VERSION
This will setup a new enviroment with lstchain and other needed tools available in supported versions. If you already have your lstchain conda environment, you may simply activate it and install lstmcpipe there using pip install lstmcpipe.
HIPERTA (referred to as rta in the following) support is builtin, but no installation instructions can be provided as of now.
Alternatively, you can install lstmcpipe in your own enviroment to use different versions of the analysis pipelines. WARNING: Due to changing APIs and data models, we cannot support other versions than the ones specified in the enviroment.
As as developer:
git clone cd lstmcpipe conda env create -n lstmcpipe_dev -f environment.yml conda activate lstmcpipe_dev pip install -e . pre-commit install
This will setup a pre-commit hook: Given that you are in the right enviroment, it will run and format files you are about to commit with black. (You need to stage the changes again after that). This ensures the formatting of the code follows our guidelines and there is less work dealing with the code checker in the CI.
You may find the list of already run productions in the documentation. Please check in this list that the request you are about to make does not exist already!
As a LST member, you may require a MC analysis with a specific configuration, for example to later analyse a specific source with tuned MC parameters.
To do so, please:
- Make sure to be part of the github cta-observatory/lst-dev team. If not, ask one of the admins.
- > note that you can also fork the repository and open the pull request from your fork, but the tests will fail because they need the private LST test data
- Clone the repository in the cluster at La Palma.
- Create a new branch named with you
- Make a new directory named
in the production_configs dir (have a look at theproduction_configs/template_prod
as an example) - Generate your config (see below)
- Open a pull request into lstMCpipe with a clear description (probably the same as in the readme of your config dir)
- The requested config must contain:
- a lstchain config file (please provide an exhaustive config that will help others and provide a more explicit provenance information)
- a lstmcpipe config file (to generate it, please refer to the documentation)
- a readme with a short description of why you require this analysis to be run. Do not add information that should not appear publicly (such as source names) here. If you are requesting a production for a specific new source, please edit this table on LST wiki. Also add the command line to generate the lstmcpipe config, that will help debugging.
The proposed configuration will be tested for validity by continuous integration tests and we will interact with you to run the analysis on the cluster at La Palma.
Depending on the number of requests, we may give priorities.
Need help?
Join the CTA North slack and ask for help in the
To generate your lstmcpipe configuration file, use lstmcpipe_generate_config command. If the type of production you want is not listed in the existing ones, you may create your own PathConfig class from an existing one, or generate a config from an existing prod type and edit the file manually.
Once you have your configuration file, you way launch the pipeline with the described stages in the config using:
lstmcpipe -c config_MC_prod.yml -conf_lst lstchain_*.json [-conf_cta CONFIG_FILE_CTA] [-conf_rta CONFIG_FILE_RTA] [--debug] [--log-file LOG_FILE]
script is the orchestrator of the pipeline, it schedules the stages specified in the
file. All the configuration related with the MC pipe must be declared in this file (stages,
particles to be analysed, zenith, pointing, type of MC production...).
Pipeline-specific configuration options (such as cleaning or model parameters) are declared in a different configuration file,
which is passed via the options -conf_lst/-conf_cta/-conf_rta
Note: You can always launch this command without fear; there is an intermediate step that verifies and shows the configuration that you are passing to the pipeline.
The use of slurms jobarrays in the r0_to_dl1 stage in combination with a limited amount of maximum jobs running at the same time reduces the load on the cluster compared to previous versions, but please note that it still requires a lot of resources to process a full MC production. Think about other LP-IT cluster users.
After launching of the pipeline all selected tasks will be performed in order.
These are referred to as stages and are collected in lstmcpipe/stages
Following is a short overview over each stage, that can be specified in the configuration.
In this stage simtel-files are processed up to datalevel 1 and separated into files for training
and for testing.
For efficiency reasons files are processed in batches: N files (depending on paricle type
as that influences the averages duration of the processing) are submitted as one job in a jobarray.
To group the files together, the paths are saved in files that are passed to
python scripts in lstmcpipe/scripts
which then call the selected pipelines
processing tool. These are:
- lstchain: lstchain_mc_r0_to_dl1
- ctapipe: ctapipe-stage1
- rta: lstmcpipe_hiperta_r0_to_dl1lstchain (
As an alternative to the processing of simtel r0 files, existing dl1 files can be reprocessed.
This can be useful to apply different cleanings or alter the images by adding noise etc.
For this to work the old files have to contain images, i.e. they need to have been processed
using the no_image: False
flag in the config.
The config key dl1_reference_id
is used to determine the input files.
Its value needs to be the full prod_id including software versions (i.e. the name of the
directories directly above the dl1 files).
For lstchain the dl1ab script is used, ctapipe can use the same script as for simtel
processing. There is no support for hiperta!
In this stage the previously created dl1 files are merged so that you end up with train and test datesets for the next stages.
Split the dataset into training and testing datasets, performing a random selection of files with the specified ratio (default=0.5).
IMPORTANT: From here on out only lstchain
tools are available. More about that at the end.
In this stage the models to reconstruct the primary particles properties are trained
on the gamma-diffuse and proton train data.
At present this means that random forests are created using lstchains
Models will be stored in the models
The previously trained models are evaluated on the merged dl1 files using lstchain_dl1_to_dl2
the lstchain package.
DL2 data can be found in DL2
Point-like IRFs are produced for each set of offset gammas.
The processing is performed by calling lstchain_create_irf_files
A sensitivity curve is estimated using a script based on pyirf which performs a cut optimisation
similar to EventDisplay.
The script can be found in lstmcpipe/scripts/
This does not use the IRFs and cuts computed in dl2_to_irfs, so this can not be compared to observed data.
It is a mere benchmark for the pipeline.
NOTE: lstmcpipe
expects the data to be located in a specific structure on the cluster.
Output will be written in a stanardized way next to the input data to make sure everyone can access it.
Analysing a custom dataset requires replicating parts of the directory structure and is not the
intended use case for this package.
All the `r0_to_dl1
stage job logs are stored /fefs/aswg/data/mc/running_analysis/.../job_logs
and later
moved to /fefs/aswg/data/mc/analysis_logs/.../
Every time a full MC production is launched, two files with logging information are created:
The first one contains a reduced summary of all the scheduled job ids (to which particle the job corresponds to), while the second one contains the same plus all the commands passed to slurm.
The directory structure and the stages to run are determined by the config stages. After that, the job dependency between stages is done automatically.
- If the full workflow is launched, directories will not be verified as containing data. Overwriting will only happen when a MC prods sharing the same
and analysed the same day is run- If each step is launched independently (advanced users), no overwriting directory will take place prior confirmation from the user
Example of default directory structure for a prod5 MC prod:
/fefs/aswg/data/ ├── mc/ | ├── DL0/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/ | | └── simtel files | | | ├── running_analysis/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/ | | └── YYYYMMDD_v{lstchain}_{prod_id}/ | | └── temporary dir for r0_to_dl1 + merging stages | | | ├── analysis_logs/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/ | | └── YYYYMMDD_v{lstchain}_{prod_id}/ | | ├── file_lists_training/ | | ├── file_lists_testing/ | | └── job_logs/ | | | ├── DL1/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/ | | └── YYYYMMDD_v{lstchain}_{prod_id}/ | | ├── dl1 files | | ├── training/ | | └── testing/ | | | ├── DL2/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/ | | └── YYYYMMDD_v{lstchain}_{prod_id}/ | | └── dl2 files | | | └── IRF/20200629_prod5_trans_80/zenith_20deg/south_pointing/ | └── YYYYMMDD_v{lstchain}_{prod_id}/ | ├── off0.0deg/ | ├── off0.4deg/ | └── diffuse/ | └── models/ └── 20200629_prod5_trans_80/zenith_20deg/south_pointing/ └── YYYYMMDD_v{lstchain}_{prod_id}/ ├── reg_energy.sav ├── reg_disp_vector.sav └── cls_gh.sav
Real data analysis is not supposed to be supported by these scripts. Use at your own risk.
So far the reference pipeline is lstchain
and only with it a full analysis is possible.
There is however support for ctapipe
and hiperta
as well.
The processing up to dl1 is relatively agnostic of the pipeline; working implementations exist for all of them.
In the case of hiperta
a custom script converts the dl1 output to lstchain
compatible files and the later stages
run using lstchain
In the case of ctapipe
dl1 files can be produced using ctapipe-stage1
. Once the dependency issues are solved and
ctapipe 0.12 is released, this will most likely switch to using ctapipe-process
. We do not have plans to keep supporting older
versions longer than necessary currently.
Because the files are not compatible to lstchain
and there is no support for higher datalevels in ctapipe
yet, it is not possible
to use any of the following stages. This might change in the future.