Remote Machine Configuration
This document briefly details how user/developer can set up a remote machine on FabSim3 for job submission.
The aim of FabSim3 is to automate and simplify the creation, management, execution and modification of complex application workflows using functionalities, such as ensemble runs, remote executions and code couplings. This helps researchers to become more productive in general, and more effective with handling complex applications in particular. For instance, researchers may want to run and rerun static configurations, run a range of slightly different workflows, or to define new types of complex applications altogether.
FabSim3 is built on top of the Python Fabric library, and therefore shares automation and remote access features with Fabric (http://www.fabfile.org) and other system-level automation tools, such as Ansible (http://www.ansible.com) and Homebrew (http://brew.sh).
FabSim3 Remote Job Management Command¶
To submit a job on the remote machine, type
fab <remote machine name> <fabsim task>:<input parameters>
- create required files and folder on your local machine
- transfer all these files and folder to remote machine
- submit a job to the remote machine
stat: returns a report for all submitted job(s).By default it reports the status of jobs. You can also request a report status of specific job by
fab <remote machine name> job_stat
fab <remote machine name> job_stat:<jobID>
You can fetch the remote data using
fab <remote machine name> fetch_results
How to add new remote a machine configuration¶
By default, a FabSim3 install will contain 3 machine-specific YML files:
FabSim3/fabsim/deploy/machines.yml, which shows default values for machine-specific variables for each machine.
FabSim3/fabsim/deploy/machines_user.yml, which shows user-specific values for machine-specific variables for each machine. This may include the name of the user account. This file is not committed to the repository.
FabSim3/fabsim/deploy/machines_user_example.yml, a basic example of
Adding a new machine to be used by the community¶
To add a new machine definition, to be used by the community, one can add a corresponding entry to the
machines.yml file. For example, for the Prometheus supercomputer in Cracow, Poland, this could be done as follow:
prometheus: max_job_name_chars: 15 job_dispatch: "sbatch" stat: "squeue" run_command: "mpiexec -n $cores" batch_header: slurm-prometheus remote: "prometheus.cyfronet.pl" home_path_template: "/net/people/$username" runtime_path_template: "/net/scratch/people/$username" modules: ["load apps/lammps"] temp_path_template: "$work_path/tmp" queue: "plgrid" python_build: "python2.7" corespernode: 24
We recommend commiting these type of definitions to the GitHub repository of FabSim3, unless the machine is non-public.
Adding user information for an existing machine¶
User information for an existing machine can be added in
machines_user.yml. Any variable set here will supersede the value in
machines.yml. For example, for the ARCHER supercomputer, one can enter user information as follows:
archer: username: "ucljames" project: "e283" budget: "e283-suter" lammps_exec: "/home/e283/e283/ucljames/lmp_xc30" # custom variable overwrite
Changing connectivity settings for specific machines¶
Please note that some connectivity settings are not explicitly exposed as FabSim3 environment variables, but are present in the
env through the original fabric environment variables. An example of such a variable is port, which indicates the port that any SSH connection would rely on.
A full list of fabric
env variables can be found on
Adding shortened commands for specific machines¶
In FabSim3, it is possible to introduce a shortened alias. For instance, you can define a
fab command to use in place of
fab eagle_vecma. Such aliases can help speed up the typing of interactive commands.
To define an alias, simply type
fabsim <remote machine name> bash_machine_alias:name=<name_of_alias>
fabsim eagle_vecma bash_machine_alias:name=feh
FabSim3/fabsim/binand cannot be named
fabsim, as that would break the main
This has to be done for every user, as different people have different existing shell commands, and we want to avoid accidental conflicts.
QCG Pilot Job Manager¶
A Pilot Job is a container for many subjobs that can be started and managed without having to wait individually for resources to become available.
The QCG PilotJob mechanism provides two interfaces that may be used interchangeably. The first one allows to specify a file with the description of sub-jobs and execute the scenario in a batch-like mode, conveniently supporting static scenarios. The second interface is offered with the REST API and it can be accessed remotely in a more dynamic way. It will be used to support scenarios where a number of replicas and their complexity dynamically changes at application runtime.
Within FabSim3, you can install this python library or any other python library required for your application as a python virtual environment in your account.
QCG-PJ installation on your remote machine¶
To install QCG-PJ on your remote machine, simply type:
fabsim <remote machine name> install_app:QCG-PilotJob,venv=True
venv=Trueensures that you are using the VirtualEnv virtual environment. If you choose to omit the flag, you may have to face various errors and complications that accompany direct deployment in the default remote machine environment.
All required packages are downloaded and transferred to the remote machine. Therefore, even if the compute nodes do not have an access to Internet, the installation part will be done in off-line mode
Python packages installation on your remote machine¶
If you application requires a python package, which is not available on pre-installed packages on your remote machine, you can install it as your local virtual environment. To do that, please specify the package name in
packages: - matplotlib
fabsim <remote machine name> install_packages:venv=True