PSC Usage
Step by step starting towards successful ssh login
- Look for guide with some screenshots above
- Create an account for ACCESS
- This account is used for both PSC and Delta
- When you create, Register with an existing identity. Don’t do Register without an existing identity.
- Send the username to allocation managers (e.g. Xuankai) to add the user in our group.
- After this step is done, you should be able to see
list of resources
at https://allocations.access-ci.org - To see the
list of resources
, log-in with identity providerCarnegie Mellon University
- After this step is done, you should be able to see
- Initialise your PSC password (used for ssh login)
- Go to https://www.psc.edu/resources/bridges-2/user-guide-2-2/ and click
PSC Password Change Utility
- It may take few hours for your
username
andemail
to be recognised, even if they’re correct.
- Go to https://www.psc.edu/resources/bridges-2/user-guide-2-2/ and click
- Access via ssh
- ssh [username]@[resource_dir]
- E.g.,
jjung1@bridges2.psc.edu
- E.g.,
- Use the password you initialised in step 3.
- ssh [username]@[resource_dir]
Important
Home
directory is of limited space. Please do most of your work in ocean storage ($ cd ${PROJECT}
)- When you publish a paper, please acknowledge the PSC and ACCESS. We will get benefit when we apply for PSC credits next time.
- Example: Experiments of this work used the Bridges2 system at PSC through allocations CIS210014 and IRI120008P from the Advanced Cyberinfrastructure Coordination Ecosystem: Services \& Support (ACCESS) program, supported by National Science Foundation grants #2138259,#tel:2138286, #tel:2138307, #tel:2137603, and #tel:2138296.
Add the following references
``` @ARTICLE{xsede, author = {J. Towns and T. Cockerill and M. Dahan and I. Foster and K. Gaither and A. Grimshaw and V. Hazlewood and S. Lathrop and D. Lifka and G. D. Peterson and R. Roskies and J. R. Scott and N. Wilkins-Diehr}, journal = {Computing in Science \& Engineering}, title = {XSEDE: Accelerating Scientific Discovery}, year = {2014}, volume = {16}, number = {5}, pages = {62-74}, keywords={Knowledge discovery;Scientific computing;Digital systems;Materials engineering;Supercomputers}, doi = {10.1109/MCSE.2014.80}, url = {doi.ieeecomputersociety.org/10.1109/MCSE.2014.80}, ISSN = {1521-9615}, month={Sept.-Oct.} } @inproceedings{nystrom2015bridges, title={Bridges: a uniquely flexible HPC resource for new communities and data analytics}, author={Nystrom, Nicholas A and Levine, Michael J and Roskies, Ralph Z and Scott, J Ray}, booktitle={Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure}, pages={1--8}, year={2015} } ```
Summary of PSC usage and the partitions
- Both PSC have limited service units (SUs) for resource availability.
sinfo
lists all the available partitions in PSC and their status.
PSC(Bridges-2) | |
---|---|
Partitions | GPU, GPU-shared, RM-small, RM, RM-512, RM-shared |
GPU resources | GPU, GPU-shared (Recommended) |
CPU resources | RM-small, RM, RM-512, RM-shared (Recommended) |
Default | `RM`, which request to allocate a 128-cpu node. (**Be careful of this case**) |
Misc. resources
PSC(Bridges-2) | ||
---|---|---|
User Guide | https://www.psc.edu/resources/bridges-2/user-guide-2/ | |
Connect from browser | https://ondemand.bridges2.psc.edu/ | |
ESPnet installation guide | https://espnet.github.io/espnet/installation.html | |
Step-by-step guide with pictures | https://granite-echidna-ff2.notion.site/Access-for-PSC-07c3d4c05b54426895e3ddc87276e4b5 |
GPU Partitions
- In
GPU / GPU-shared
partitions, each node consists of 8 v100 GPU devices - There are two types of GPU nodes:
v100-16
andv100-32
having GPU units with 16GB and 32GB memory respectively. - Submit jobs to
GPU-shared
partition. (Recommended)- Using
-p GPU-shared --gpus=type:n
insbatch
orsrun
. Heretype
can bev100-16
orv100-32
andn
can range from 1 to 4 (1 is recommended).
- Using
- Submit jobs to
GPU
partition.- Please use it only when necessary.
- Using
-p GPU
insbatch
orsrun
. It request to allocate a whole GPU node, 8 GPUs, for each job. - In this case, it deducts 8 SUs from our team’s GPU allocation every hour the job runs.
- Good Practice: Usually, users are strongly recommended to use
GPU-shared
partition and allocate 1 GPU only for each job.- When allocating multiple GPUs, you usually need to wait much longer time.
- Single-GPU jobs are more efficient than multi-GPU ones: the latter will have communication overhead.
- Multi-GPU jobs will be allowed only when (1) the users have been familiar with the cluster and (2) the project really needs that resource. Beginners are strongly discouraged from trying this option.
- Please avoid using
GPU
partition by mistake: it makes the allocated GPUs idle and causes much waste to our resources. - We will periodically check each user’s usage and send a reminder when necessary.
RM Partitions (For CPU Jobs)
- In the
RM and RM-512
partitions, each node consists of 128 cores. - Nodes in RM, RM-shared partitions have a memory of 128GB, while nodes in RM-512 partitions have 512GB memory.
- using an entire node for an hour will deduct 128 SUs from our team’s Regular Memory allocation.
- Submit jobs to
RM-shared
partition. (suggested)- Using
-p RM-shared --ntasks-per-node=n --mem=2000M
insbatch
orsrun
. Heren
can range from 1 to 64. - Usually, jobs only require a few cpu cores.
- Using
- Submit jobs to
RM
partition.- Please use it only when necessary.
- It request to allocate all of the 128 cpu cores.
- Using
-p RM --ntasks-per-node=n
insbatch
orsrun
. Heren
can range from 1 to 64.
- Good Practice: Usually, users are strongly recommended to use
RM-shared
partition.- You can adjust the memory allocation of each CPU job, but please bear in mind that CPU is also charged by every 2000M memory allocation. E.g., a CPU core with 4000M memory running 1 hour will also be charged 2SUs.
- Usage of
RM
orRM-512
partitions will be allowed only when (1) the users have been familiar with the cluster and (2) the project really needs that resource. Beginners are strongly discouraged from trying this option. - Please avoid using
RM
orRM-512
partitions by mistake: it makes the allocated CPUs idle and causes much waste to our resources. - We will periodically check each user’s usage and send a reminder when necessary.
Other usage
- Data copying / file transfer
- Suggest to use
data.bridges2.psc.edu
as the machine name. Doing file transfer withrsync
,sftp
,scp
, etc. Transferring Files. Following is an example ofscp
.# This requires the enrollment of Two-Factor Authentication (TFA) scp -P 2222 myfile XSEDE-username@data.bridges2.psc.edu:/path/to/file
- Suggest to use
- Submitting jobs with dependency
- This can be used to submit a job which is expected to start run after some specific jobs finish. In many cases, training a model can take a few days. However, PSC has the restriction that each job can run for 2 days at most. In this case, we can start a job with dependency for long jobs. For example, you already start a job ID is 000001 and you want a following job right after it. You can submit jobs like:
sbatch --time 2-0:00:00 --dependency=afterany:000001 run.sh
- This can be used to submit a job which is expected to start run after some specific jobs finish. In many cases, training a model can take a few days. However, PSC has the restriction that each job can run for 2 days at most. In this case, we can start a job with dependency for long jobs. For example, you already start a job ID is 000001 and you want a following job right after it. You can submit jobs like:
- Common arguments in sbatch / srun
-p, --partition=partition partition requested -J, --job-name=jobname name of job -t, --time=time time limit --gres=rsrc_name[:rsrc_type]:rsrc_num required generic resources -c, --cpus-per-task=ncpus number of cpus required per task -d, --dependency=type:jobid[:time] defer job until condition on jobid is satisfied -e, --error=err file for batch script's standard error -o, --output=out file for batch script's standard output --mem-per-cpu=MB maximum amount of real memory per allocated --ntasks-per-node=n number of tasks to invoke on each node --reservation=name allocate resources from named reservation, e.g. `GPUcis210027`
- View tools
- slurm commands
# view jobs in the queue squeue -u ${username} # view detailed job info scontrol show jobid -d ${jobid} # view job history and billing info, e.g. since time 04/22/2022 12am. sacct -u ${username} -S 2022-04-22T00:00:00 --format=JobID,jobname,user,elapsed,nnodes,alloccpus,state,partition,nodelist,AllocTRES%50,CPUTime
- PSC provides
slurm-tool
# Show or watch job queue: slurm-tool [watch] queue show own jobs slurm-tool [watch] q <user> show user's jobs slurm-tool [watch] quick show quick overview of own jobs slurm-tool [watch] shorter sort and compact entire queue by job size slurm-tool [watch] short sort and compact entire queue by priority slurm-tool [watch] full show everything slurm-tool [w] [q|qq|ss|s|f] shorthands for above! slurm-tool qos show job service classes slurm-tool top [queue|all] show summary of active users # Show detailed information about jobs: slurm-tool prio [all|short] show priority components slurm-tool j|job <jobid> show everything else slurm-tool steps <jobid> show memory usage of running srun job steps # Show usage and fair-share values from accounting database: slurm-tool h|history <time> show jobs finished since, e.g. "1day" (default) slurm-tool shares # Show nodes and resources in the cluster: slurm-tool p|partitions all partitions slurm-tool n|nodes all cluster nodes slurm-tool c|cpus total cpu cores in use slurm-tool cpus <partition> cores available to partition, allocated and free slurm-tool cpus jobs cores/memory reserved by running jobs slurm-tool cpus queue cores/memory required by pending jobs slurm-tool features List features and GRES # Other: slurm-tool v|version Print versions of slurm-tool tool and slurm itself
- Simple helper script for minimum usage
# ~/.sbatch_helper.py # You can easily use it by adding the below line in .bashrc" # alias my_sbatch="python3 ~/.sbatch_helper.py" machine = int(input("CPU (0) GPU-16GB (1) GPU-32GB (2): ")) if machine == 0: cpus = int(input("# of CPUs (Max 128): ")) cpus = f"--ntasks-per-node={cpus}" gpus = "" else: gpus = int(input("# of GPUs (Max 8): ")) gpus = "-N 1 --gpus=v100-" + ["", "16", "32"][machine] + f":{gpus}" cpus = "--cpus-per-gpu 5" hours = int(input("# of hours (Max 48): ")) machine_name = ["RM", "GPU", "GPU"][machine] print(f"sbatch -p {machine_name}-shared {gpus} {cpus} -t {hours}:00:00")
- slurm commands
ESPnet installation steps
- Miniconda installation
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh
- Load modules and set environment (deprecated. You don’t need to do this anymore)
- Kaldi download/installation:
# If you only use Espnet2, simply cloning kaldi is sufficient: cd /ocean/projects/cis210027p/<user> git clone https://github.com/kaldi-asr/kaldi # If you use Espnet version 1 (not very common), you need to compile kaldi cd /ocean/projects/cis210027p/<user> git clone https://github.com/kaldi-asr/kaldi cd kaldi/tools make -j 8 ./extras/install_irstlm.sh cd ../src/ ./configure --use-cuda=no make -j clean depend; make -j 8
- ESPNet installation:
cd /ocean/projects/cis210027p/<user> git clone https://github.com/espnet/espnet cd espnet/tools/ ln -s /ocean/projects/cis210027p/<user>/kaldi . . ./setup_anaconda.sh <MINICONDA_ROOT> espnet 3.9 # The CUDA and Torch versions are recommended, but you can change them based on your needs. make -j 8 CUDA_VERSION=11.7 TH_VERSION=1.13.1
- Verify torch installation:
srun --pty -p GPU-shared -N 1 --gpus=v100-16:1 /bin/bash -l nvidia-smi cd /ocean/projects/cis210027p/<user>/espnet/tools . ./activate_python.sh python -c "import torch; print(torch.cuda.is_available())" exit
ESPnet usage tutorial
- Full documentation: https://espnet.github.io/espnet/tutorial.html
Running an example an4
recipe with run.sh
using slurm backend
- Go the the directory:
cd egs2/an4/asr1
- Modify
cmd.sh
andconf/slurm.conf
to be ready for slurm management# cmd.sh # line 31 cmd_backend='slurm' # cmd_backend='local'
# conf/slurm.conf # Default configuration command sbatch --export=PATH option name=* --job-name $0 default time=48:00:00 option time=* --time $0 option mem=* --mem-per-cpu $0 option mem=0 option num_threads=* --cpus-per-task $0 option num_threads=1 --cpus-per-task 1 option num_nodes=* --nodes $0 default gpu=0 option gpu=0 -p RM-shared option gpu=* -p GPU-shared --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU # note: the --max-jobs-run option is supported as a special case # by slurm.pl and you don't have to handle it in the config file.
- Run experiments
./run.sh --stage 1 --stop-stage 13
Running an example an4
recipe step by step:
- Execute the data preparation stages:
-1
and0
- Stage
-1
downloads and un-tars thean4
dataset with 948 training and 130 test utterances. - Stage
0
prepares the dataset by creatingdata/train
anddata/test
directories. Names of these directories can vary for other datasets. Each of these directories contains 4 files:wav.scp
,text
,utt2spk
andspk2utt
. A mapping from a unique utterance-ID to the utterance’s filepath, text and speaker-ID is noted inwav.scp
,text
,utt2spk
files respectively. An inverse mapping from the speaker-ID to the speaker’s utterance-IDs is noted inspk2utt file
.cd /ocean/projects/cis210027p/<user>/espnet cd egs/an4/asr1 ./run.sh --stage -1 --stop_stage 0
- Stage
- Execute the feature extraction stage:
1
- Stage
1
extracts the 80 log-mel and 3 pitch features from a 25ms frame shifted every 10ms for each audio sample. - Parameter
nj
in this stage’s code represents the number of CPUs used to parallelly extract the features. - Since this dataset doesn’t provide a validation split, we split the 948 training utterances into 848 training and 100 dev samples.
- A mapping from utterance-IDs to feature filepaths is noted in
data/*/feats.scp
file and extracted features are stored in thedump/*/deltafalse/feats.{1-$nj}.ark
files../run.sh --stage 1 --stop_stage 1
- Stage
- Prepare a dictionary and data.json in stage:
2
- A mapping of each character, special tokens (ex:
, etc.) to a unique token-ID is stored as a dictionary at `data/lang_1char/train_nodev_units.txt` - Pairs of extracted features and mapped tokens (using above dictionary) for each utterance-ID is stored as a JSON file in the respective
dump
directories../run.sh --stage 2 --stop_stage 2
- A mapping of each character, special tokens (ex:
- Train the RNN-LM and ASR models in stages:
3
and4
- Trained LM models are stored by default in
exp/train_rnnlm_pytorch_lm_word100/
directory - Trained ASR models are stored by default in
exp/train_nodev_pytorch_train_mtlalpha1.0/results/
directory - Request GPU resources for training:
sbatch -t 2-00:00:00 -p GPU-shared -N 1 --gpus=v100-16:1 --mem=16G train_model.sh
- Preview of
train_model.sh
file to train the RNN based language model (RNN-LM) in stage:3
#!/usr/bin/env bash cd /ocean/projects/cis210027p/<user>/espnet/egs/an4/asr1 ./run.sh --stage 3 --stop_stage 3
- Preview of
train_model.sh
file to train the ASR model in stage:4
#!/usr/bin/env bash cd /ocean/projects/cis210027p/<user>/espnet/egs/an4/asr1 ./run.sh --stage 4 --stop_stage 4
- Trained LM models are stored by default in
- Request computational resources and perform decoding in stage:
5
- Parameter
nj
in this stage’s code represents the number of CPUs used to parallelly decode each recognition set: validation, test splits. So we must request 2xnj
number of CPUs in total. - Note that for each CPU, we can request a maximum memory of 2000M.
- Request RM-shared node resources for decoding:
sbatch -t 12:00:00 -p RM-shared -N 1 --cpus-per-task=16 --mem=32000M decode_model.sh
- Preview of
decode_model.sh
file to decode the ASR model in stage:5
#!/usr/bin/env bash cd /ocean/projects/cis210027p/<user>/espnet/egs/an4/asr1 ./run.sh --stage 5 --stop_stage 5
- Parameter
- Misc.
- An interrupted training can be resumed by specifying
--resume
parameter with the path to most recent snapshot inexp/train_nodev_pytorch_train_mtlalpha1.0/results/
directory. - A unique tag for managing your experiments can be set by specifying
--lmtag
and--tag
parameters. - Multi-GPU training can be done by specifying
--ngpu
in the training stages. - For interactive debugging purposes,
srun
command can be used to request GPU-shared or RM-shared nodes as below. Note that PSC has limited service units (SUs), so usesrun
based debugging for only the required duration.srun --pty -p GPU-shared -N 1 --gpus=v100-16:1 /bin/bash -l srun --pty -p RM-shared -N 1 --cpus-per-task=16 --mem=32000M /bin/bash -l
- An interrupted training can be resumed by specifying