SLURM
- 可能因为srun是实时交互的,所以如果链接中断的话,提交的任务也就会中断,但是sbatch不是实时交互的,所以即使链接终端的话,提交的任务也不会中断
配置样例
1
2
3
4
5
6
7
8
9
10
11
12#!/bin/bash
#SBATCH --job-name=fileTest
#SBATCH --partition=audace2018
#SBATCH --mem=1024
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --error=../batchLog/error.log
#SBATCH --output=../batchLog/output.log
python ../code/file.py在配置过程中所有的以#SBATCH 开始的配置短句,如果有一个解析不成功的话,会直接跳到最后,进行任务的执行
不需要使用
--get-user-env 1
2
3
4
5
65. ```#SBATCH --gres=gpu:1``` 的意思就是分配一块GPU,而不是从名字为gpu的分区上拿一块GPU
6. AssertionError:
The NVIDIA driver on your system is too old (found version 10000
* 这这个[网页](http://www.nvidia.com/Download/index.aspx
)下根据不同的型号进行相应驱动的下载
* 具体下载链接 ```http://us.download.nvidia.com/tesla/440.33.01/NVIDIA-Linux-x86_64-440.33.01.runIt is possible to specify several partitions in the options of your script or srun. In this case SLURM launches your job on the first available partition.
- You can also define several steps in a job (and therefore launch several programs in ==parallel or sequentially==) via the srun command
- Job arrays provide a very simple way to submit a large number of independent jobs. They can typically be used to apply the same program to different input data.
- 很奇怪啊,不管是HPC还是HPC2使用C语言都能够跑得动CUDA,但是换成PYTHON就不行
- I checked on opale (the only node which has CUDA installed)
- To see which machines are in which partition use the
sinfo -N comman
使用
sbatch
的话,所有的运行准则,都在.sh
文件中,如果使用srun
的话,直接把需要执行的命令放在srun
之后就可以了。所以涉及到makefile以及运行环境的设置PYTHON写日志
import logging
import timeprint(“Hello World”)
fileName = ‘../codeLog/‘ + time.strftime(“%Y:%m:%d_%I-%M-%S_%p”) + ‘.log’
logFormat = ‘%(levelname)s: %(message)s’
logging.basicConfig(filename= fileName, filemode= ‘w’, format= logFormat, level=logging.DEBUG)logging.debug(‘This is a debug message’)
logging.info(‘This is an info message’)
logging.warning(‘This is a warning message’)
logging.error(‘This is an error message’)
logging.critical(‘This is a critical message’)- 因为默认的logging.level=warning, 所以如果不重置的话,就会导致无法显示debug、info的信息,所以需要将其设置为最低等级的debug,才能够显示所有的信息
- 时间里面不能够使用
strftime("%Y/%m/%d_%I-%M-%S_%p")
的格式,因为找不到20/2/10这个文件夹