How to execute workflows with BPS at CC-IN2P3
The pipetask
command is limited to multiprocessing on a single node, and therefore does not allow for sufficient parallelization when execution of large number of tasks is needed. BPS (Batch Production Service) is the LSST framework dedicated to multi-node processing. It can be used with several Workflow Management Systems like Parsl, which provides parallelism of Python codes on distributed computing resources. Using BPS together with its Parsl plugin, LSST pipelines can be easily executed on the CC-IN2P3 Slurm computing at large scale.
Submitting a BPS run
General setup
First, create the following directories that will be used to store all files related to the BPS run:
export BPS_RUNDIR=/sps/lsst/users/${USER}/bps
mkdir -p ${BPS_RUNDIR}/bps_parsl_logs
mkdir -p ${BPS_RUNDIR}/submit
Place your BPS configuration file in ${BPS_RUNDIR}/bps_config.yaml
. See How to configure your BPS run.
Then cd in your bps directory:
cd ${BPS_RUNDIR}
Source the environment:
source /pbs/throng/lsst/software/bps-parsl/prod/setup.sh
And submit your run:
bps_submit.sh --interactive bps_config.yaml
Hint
The --interactive
option runs BPS interactively, without this option it would run in a Slurm job, which can be preferred in case of long runs for example.
More details on using the submission script can be obtained with bps_submit.sh --help
.
How to configure your BPS run
The processing campaign to be launched and the configuration to use must be described in a YAML file. An example of such file with explanatory comments is provided here: /pbs/throng/lsst/software/bps-parsl/prod/examples/ci_hsc_isr.yaml
. In particular, the most important items to be modified are:
release
: version of stack to be used for the processing jobs, for instancew_2024_09
(see How to use the LSST Science Pipelines)pipelineYaml
: pipeline to be executedproject
,campaign
andpayloadName
: description of the processing, used for the creation of directories and collectionsbutlerConfig
: the butler repositoryinCollection
: collection where input data can be founddataQuery
: input data selection
More details can be found in the BPS documentation.
Accessing informations relative to the run
Once the run has been submitted, the BPS ans Parsl logs are written in the ${BPS_RUNDIR}/bps_parsl_logs
directory. The ${BPS_RUNDIR}/submit
directory contain one subdirectory for each run, containing the task logs.
Usual Slurm commands can also be used to obtain informations on the jobs. And if needed, the ${BPS_RUNDIR}/parsl_runinfo
directory contains more detailed informations provided by Parsl.
Advanced configuration
Task resources specification
CPU cores and memory requirements of each task can be specified in the BPS YAML file, either directly or by including a dedicated file:
includeConfigs:
- requestMemory.yaml
The memory needed to run each task can be specified using the following syntax:
pipetask:
task1:
requestMemory: 4096
task2:
requestMemory: 8192
The Parsl plugin for BPS will then ensure that this task run in a job that provide enough memory. Note that in the case of Clustering, the maximum memory of all tasks in the cluster is taken.
Hint
The CC-IN2P3 configuration for the BPS Parsl plugin currently does not support multicore tasks. It is generally advised to run multiple parallel single-core tasks.
Clustering
The BPS clustering feature allows for gathering several tasks in a single pipetask command execution. This is in particular useful to reduce the number of tasks and increase scalability, especially when their execution time is short.
Clustering can be enabled in the BPS YAML file, either directly or by including a dedicated file:
includeConfigs:
- clustering.yaml
The general syntax is the following:
clusterAlgorithm: lsst.ctrl.bps.quantum_clustering_funcs.dimension_clustering
cluster:
clusterLabel1:
pipetasks: label1, label2 # comma-separated list of labels
dimensions: dim1, dim2 # comma-separated list of dimensions
equalDimensions: dim1:dim1a # e.g., visit:exposure
Multiple clusters can be defined in a single file. Similar tasks running on multiple datasets can be clustered together. For example one can gather isr
tasks on all the detectors for each visit:
clusterAlgorithm: lsst.ctrl.bps.quantum_clustering_funcs.dimension_clustering
cluster:
visit_isr:
pipetasks: isr
dimensions: visit
equalDimensions: visit:exposure
Different tasks can also be clustered together, especially when they depend on each other and are run sequentially in the workflow. For example one can gather the isr
task with its downstream tasks, for each detector:
clusterAlgorithm: lsst.ctrl.bps.quantum_clustering_funcs.dimension_clustering
cluster:
visit_step1:
pipetasks: isr,characterizeImage,calibrate,writePreSourceTable,transformPreSourceTable
dimensions: visit,detector
equalDimensions: visit:exposure
Hint
It is recommended to execute BPS runs that do not excede ~ 100.000 tasks. If needed, in addition to clustering, runs can be split in multiple smaller runs using the dataQuery
.
More details on clustering can be found in the dedicated section of the BPS documentation.
How to configure your Parsl jobs
By default Parsl will launch 4 type of jobs: “small”, “medium”, “large” and “xlarge” jobs. Each of these jobs types have a different amount of memory. Tasks will run on the job which is the closest to its memory requirement (see Task resources specification). If needed, these jobs can be configured in the site.ccin2p3
section of the BPS YAML file, see the ccin2p3 class documentation for more details.
Hint
In order to optimize the usage of computing resources, it is important to submit jobs with memory allocations as close as possible to the tasks requirements.
Example: ci_hsc processing
In this tutorial we will use the ci_hsc scripts to process HSC test data with the isr
task using BPS. The required scripts are in the ci_hsc_gen3 repository, and the corresponding input set of data is provided by the testdata_ci_hsc repository.
Create a directory dedicated to this tutorial, and clone the required repositories:
export BPS_RUNDIR=/sps/lsst/users/${USER}/test_ci_hsc_bps
mkdir -p ${BPS_RUNDIR}/bps_parsl_logs
mkdir -p ${BPS_RUNDIR}/submit
cd ${BPS_RUNDIR}
git clone git@github.com:lsst/ci_hsc_gen3.git
git clone git@github.com:lsst/testdata_ci_hsc.git
After a successfull clone, the directory testdata_ci_hsc
should contain 2200 files and 1400 directories for a total volume of 8 GB.
Setup the lsst_distrib
version v27.0.0
and the EUPS packages ‘testdata_ci_hsc’ and ‘ci_hsc_gen3’ we cloned above:
source /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/v27.0.0/loadLSST.bash
setup lsst_distrib
setup --root testdata_ci_hsc
setup --keep --root ci_hsc_gen3
Prepare a PostgreSQL database schema for this butler repo:
psql --host=ccpglsstdev.in2p3.fr --port=6553 --username=$(whoami) --dbname=$(whoami)
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE SCHEMA ci_hsc_bps;
Create a butler seed butler-seed.yaml
with contents (adapt your login):
registry:
db: postgresql://ccpglsstdev.in2p3.fr:6553/<login>
namespace: ci_hsc_bps
datastore:
name: ci_hsc_bps
Create the butler repository and ingest data into it:
cd ${BPS_RUNDIR}/ci_hsc_gen3
scons --butler-config=${BPS_RUNDIR}/butler-seed.yaml ingest
The butler repository is created in ${BPS_RUNDIR}/ci_hsc_gen3/DATA
and ingestion should take a few minutes.
Copy the BPS configuration file from /pbs/throng/lsst/software/bps-parsl/prod/examples/ci_hsc_isr.yaml
, then setup and launch bps_submit.sh
:
cd ${BPS_RUNDIR}
cp /pbs/throng/lsst/users/leboulch/ci_hsc_isr.yaml .
source /pbs/throng/lsst/software/bps-parsl/prod/setup.sh
bps_submit.sh --interactive ci_hsc_isr.yaml
bps_submit.sh
will launch BPS that will generate a Quantum Graph and execute the corresponding tasks on the Slurm farm using Parsl. You may see Slurm jobs submitted by bps_submit.sh
on your behalf via squeue -u $USER
. Execution should complete after a few minutes ore more, depending on the availability of computing resources. Messages produced by bps_submit.sh
are of the form:
2024-10-11T11:46:16+02:00 bps_runner.sh: execution of workflow "/sps/lsst/users/login/test_ci_hsc_bps/ci_hsc_isr.yaml" took 00:02:22
2024-10-11T11:46:16+02:00 bps_runner.sh: exit code: 0
Log file of the bps_submit.sh
execution is under the ${BPS_RUNDIR}/bps_parsl_logs
directory. Details of execution of pipetasks, including task logs can be found under ${BPS_RUNDIR}/submit/run/ci_hsc
as specified in the workflow configuration file (entry submitPath
).
After running this tutorial, the butler repository can be deleted, as well as the database registry.