*** Wartungsfenster jeden ersten Mittwoch vormittag im Monat ***

Skip to content
Snippets Groups Projects
Commit c4539d58 authored by Pfister, Martin's avatar Pfister, Martin
Browse files

Add LUMI results for mistral7b-gptq

parent 355ad215
No related branches found
No related tags found
No related merge requests found
Showing
with 2260 additions and 38 deletions
...@@ -35,7 +35,7 @@ Finetune and evaluate [Mistral 7B Instruct v0.3](https://huggingface.co/mistrala ...@@ -35,7 +35,7 @@ Finetune and evaluate [Mistral 7B Instruct v0.3](https://huggingface.co/mistrala
| VSC5 (Nvidia A40) | 5.1 samples/s | 11.0 samples/s | 10.3 GB | 18.4 GB | | VSC5 (Nvidia A40) | 5.1 samples/s | 11.0 samples/s | 10.3 GB | 18.4 GB |
| VSC5 (Nvidia A100) | 8.8 samples/s | 18.8 samples/s | 9.7 GB | 18.5 GB | | VSC5 (Nvidia A100) | 8.8 samples/s | 18.8 samples/s | 9.7 GB | 18.5 GB |
| Leonardo (Nvidia A100) | 10.0 samples/s | 21.0 samples/s | 9.8 GB | 18.5 GB | | Leonardo (Nvidia A100) | 10.0 samples/s | 21.0 samples/s | 9.8 GB | 18.5 GB |
| LUMI (AMD MI250X) | 5.4 samples/s | 5.6 samples/s | 9.0 GB | 17.0 GB | | LUMI (AMD MI250X) | 5.2 samples/s | 5.5 samples/s | 9.0 GB | 17.0 GB |
### [mistral7b-bnb](mistral7b-bnb) multi GPU, multi node training ### [mistral7b-bnb](mistral7b-bnb) multi GPU, multi node training
...@@ -70,17 +70,23 @@ Finetune and evaluate [Mistral 7B Instruct v0.3](https://huggingface.co/mistrala ...@@ -70,17 +70,23 @@ Finetune and evaluate [Mistral 7B Instruct v0.3](https://huggingface.co/mistrala
Finetune and evaluate [Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) with 4-bit [GPTQ quantisation](https://arxiv.org/abs/2210.17323) on the [MedMCQA](https://medmcqa.github.io) dataset on multiple GPUs on a single node using the [distributed data parallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) approach. Finetune and evaluate [Mistral 7B Instruct v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) with 4-bit [GPTQ quantisation](https://arxiv.org/abs/2210.17323) on the [MedMCQA](https://medmcqa.github.io) dataset on multiple GPUs on a single node using the [distributed data parallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) approach.
| System | GPUs | Training speed | Training speed per GPU | Training GPU memory | | System | GPUs | Nodes | Training speed | Training speed per GPU | Training GPU memory (max.) |
| - | - | - | - | - | | - | - | - | - | - | - |
| VSC5 (Nvidia A40) | 1 | 5.1 samples/s | 5.1 samples/s | 10.3 GB | | VSC5 (Nvidia A40) | 1 | 1 | 5.1 samples/s | 5.1 samples/s | 10.3 GB |
| | 2 | 8.1 samples/s | 4.5 samples/s | 10.3 + 11.6 GB | | | 2 | 1 | 8.1 samples/s | 4.5 samples/s | 11.6 GB |
| VSC5 (Nvidia A100) | 1 | 8.8 samples/s | 8.8 samples/s | 9.7 GB | | VSC5 (Nvidia A100) | 1 | 1 | 8.8 samples/s | 8.8 samples/s | 9.7 GB |
| | 2 | 12.7 samples/s | 6.4 samples/s | 9.8 + 11.0 GB | | | 2 | 1 | 12.7 samples/s | 6.4 samples/s | 11.0 GB |
| Leonardo (Nvidia A100) | 1 | 10.0 samples/s | 10.0 samples/s | 9.8 GB | | Leonardo (Nvidia A100) | 1 | 1 | 10.0 samples/s | 10.0 samples/s | 9.8 GB |
| | 2 | 16.2 samples/s | 8.1 samples/s | 10.8 + 11.9 GB | | | 2 | 1 | 16.2 samples/s | 8.1 samples/s | 11.9 GB |
| | 4 | 30.6 samples/s | 7.6 samples/s | 11.5 + 12.8 + 11.6 + 12.3 GB | | | 4 | 1 | 30.6 samples/s | 7.6 samples/s | 12.8 GB |
| LUMI (AMD MI250X) | 1 | 5.4 samples/s | 5.4 samples/s | 9.0 GB | | LUMI (AMD MI250X) | 1 | 1 | 5.2 samples/s | 5.2 samples/s | 9.0 GB |
| | 2 | 1 | 9.2 samples/s | 4.6 samples/s (88%) | 9.8 GB |
| | 4 | 1 | 17.4 samples/s | 4.3 samples/s (83%) | 11.3 GB |
| | 8 | 1 | 32.7 samples/s | 4.1 samples/s (79%) | 12.1 GB |
| | 8 | 8 | 14.4 samples/s | 1.8 samples/s (35%) | 11.8 GB |
| | 16 | 2 | 45.5 samples/s | 2.8 samples/s (54%) | 11.6 GB |
| | 32 | 4 | 83.6 samples/s | 2.6 samples/s (50%) | 11.4 GB |
| | 64 | 8 | 149.3 samples/s | 2.3 samples/s (44%) | 12.4 GB |
## Usage ## Usage
......
+ date + date
Wed Jul 31 20:27:49 EEST 2024 Fri Aug 9 16:55:08 EEST 2024
+ hostname + hostname
nid005034 nid005037
++ git rev-parse --show-toplevel ++ git rev-parse --show-toplevel
+ CONTAINER=/pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif + CONTAINER=/pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif
+ export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl, + export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
...@@ -12,7 +12,7 @@ nid005034 ...@@ -12,7 +12,7 @@ nid005034
======================= ROCm System Management Interface ======================= ======================= ROCm System Management Interface =======================
================================= Concise Info ================================= ================================= Concise Info =================================
GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 42.0c 255.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0% 0 46.0c 127.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
================================================================================ ================================================================================
============================= End of ROCm SMI Log ============================== ============================= End of ROCm SMI Log ==============================
+ srun singularity exec /pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif python mistral7b_train.py + srun singularity exec /pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif python mistral7b_train.py
...@@ -28,37 +28,36 @@ trainable params: 41,943,040 || all params: 310,644,736 || trainable%: 13.5019 ...@@ -28,37 +28,36 @@ trainable params: 41,943,040 || all params: 310,644,736 || trainable%: 13.5019
attn_output = torch.nn.functional.scaled_dot_product_attention( attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.) /opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention( attn_output = torch.nn.functional.scaled_dot_product_attention(
100%|██████████| 100/100 [02:29<00:00, 1.49s/it] 100%|██████████| 100/100 [02:33<00:00, 1.54s/it]
{'loss': 1.8248, 'grad_norm': 1.8413294553756714, 'learning_rate': 4.7e-05, 'epoch': 0.0} {'loss': 1.8183, 'grad_norm': 1.8688207864761353, 'learning_rate': 4.7e-05, 'epoch': 0.0}
{'loss': 0.9426, 'grad_norm': 1.314175009727478, 'learning_rate': 9.7e-05, 'epoch': 0.0} {'loss': 0.9452, 'grad_norm': 1.3900845050811768, 'learning_rate': 9.7e-05, 'epoch': 0.0}
{'train_runtime': 149.3765, 'train_samples_per_second': 5.356, 'train_steps_per_second': 0.669, 'train_loss': 1.3837376403808594, 'epoch': 0.0} {'train_runtime': 153.8564, 'train_samples_per_second': 5.2, 'train_steps_per_second': 0.65, 'train_loss': 1.3817491149902343, 'epoch': 0.0}
Run time: 149.38 seconds Run time: 153.86 seconds
Samples/second: 5.36 1 GPUs used.
Memory occupied on GPU 0: 9186 MB. Training speed: 5.2 samples/s (=5.2 samples/s/GPU)
Memory occupied on GPUs: 9.0 GB.
real 3m32.159s real 3m46.519s
user 0m0.015s user 0m0.018s
sys 0m0.017s sys 0m0.015s
+ srun singularity exec /pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif python mistral7b_test.py + srun singularity exec /pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif python mistral7b_test.py
srun: Job 7878123 step creation temporarily disabled, retrying (Requested nodes are busy)
srun: Step created for job 7878123
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored. /opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg) warnings.warn(warning_msg)
CUDA extension not installed. CUDA extension not installed.
CUDA extension not installed. CUDA extension not installed.
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead /opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn( warnings.warn(
Map: 100%|██████████| 4183/4183 [00:00<00:00, 5400.36 examples/s] Map: 100%|██████████| 4183/4183 [00:00<00:00, 4287.53 examples/s]
0%| | 0/66 [00:00<?, ?it/s]/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.) 0%| | 0/66 [00:00<?, ?it/s]/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention( attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.) /opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention( attn_output = torch.nn.functional.scaled_dot_product_attention(
100%|██████████| 66/66 [12:32<00:00, 11.40s/it] 100%|██████████| 66/66 [12:45<00:00, 11.59s/it]
45.11% (1887 out of 4183) answers correct. 44.42% (1858 out of 4183) answers correct.
Run time: 752.49 seconds Run time: 765.11 seconds
Samples/second: 5.56 Samples/second: 5.47
Memory occupied on GPU 0: 17409 MB. Memory occupied on GPUs: 17.0 GB.
real 12m57.740s real 13m12.596s
user 0m0.016s user 0m0.019s
sys 0m0.017s sys 0m0.015s
+ date
Fri Aug 9 16:53:30 EEST 2024
+ hostname
nid005050
++ git rev-parse --show-toplevel
+ CONTAINER=/pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif
+ export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
+ SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
+ rocm-smi
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 47.0c 90.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
1 49.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
================================================================================
============================= End of ROCm SMI Log ==============================
+ srun singularity exec /pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif torchrun --nproc_per_node 2 mistral7b_train.py
[2024-08-09 16:53:46,267] torch.distributed.run: [WARNING]
[2024-08-09 16:53:46,267] torch.distributed.run: [WARNING] *****************************************
[2024-08-09 16:53:46,267] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-08-09 16:53:46,267] torch.distributed.run: [WARNING] *****************************************
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
trainable params: 41,943,040 || all params: 310,644,736 || trainable%: 13.5019
max_steps is given, it will override any value given in num_train_epochs
0%| | 0/100 [00:00<?, ?it/s]/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
39%|███▉ {'loss': 1.8514, 'grad_norm': 1.9098231792449951, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.0}
86%|████████▌ | 86/100 [02:29<00:24, 1.76s{'loss': 0.909, 'grad_norm': 0.9930923581123352, 'learning_rate': 9.6e-05, 'epoch': 0.01}
100%|██{'train_runtime': 173.0809, 'train_samples_per_second': 9.244, 'train_steps_per_second': 0.578, 'train_loss': 1.3802322006225587, 'epoch': 0.01}
100%|██████████| 100/100 [02:53<00:00, 1.73s/it]
Run time: 173.08 seconds
2 GPUs used.
Training speed: 9.2 samples/s (=4.6 samples/s/GPU)
Memory occupied on GPUs: 9.8 + 6.0 GB.
real 5m10.680s
user 0m0.017s
sys 0m0.017s
This diff is collapsed.
+ date
Tue Aug 13 10:27:28 EEST 2024
+ hostname
nid005032
++ git rev-parse --show-toplevel
+ CONTAINER=/pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif
+ export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
+ SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
+ rocm-smi
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 50.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
1 41.0c 86.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
2 43.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
3 43.0c 90.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
================================================================================
============================= End of ROCm SMI Log ==============================
+ srun singularity exec /pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif torchrun --nproc_per_node 4 mistral7b_train.py
[2024-08-13 10:27:41,396] torch.distributed.run: [WARNING]
[2024-08-13 10:27:41,396] torch.distributed.run: [WARNING] *****************************************
[2024-08-13 10:27:41,396] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-08-13 10:27:41,396] torch.distributed.run: [WARNING] *****************************************
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
trainable params: 41,943,040 || all params: 310,644,736 || trainable%: 13.5019
max_steps is given, it will override any value given in num_train_epochs
0%| | 0/100 [00:00<?, ?it/s]/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
39%|███▉ {'loss': 1.8326, 'grad_norm': 1.8880447149276733, 'learning_rate': 4.7e-05, 'epoch': 0.01}
86%|████████▌ | 86/100 [02:39<00:26, 1.90s{'loss': 0.9009, 'grad_norm': 0.8143380880355835, 'learning_rate': 9.7e-05, 'epoch': 0.02}
100%|██{'train_runtime': 184.0957, 'train_samples_per_second': 17.382, 'train_steps_per_second': 0.543, 'train_loss': 1.3667191314697265, 'epoch': 0.02}
100%|██████████| 100/100 [03:03<00:00, 1.84s/it]
Run time: 184.10 seconds
4 GPUs used.
Training speed: 17.4 samples/s (=4.3 samples/s/GPU)
Memory occupied on GPUs: 11.3 + 10.2 + 10.8 + 9.8 GB.
real 4m48.227s
user 0m0.017s
sys 0m0.016s
This diff is collapsed.
+ date
Fri Aug 9 16:50:00 EEST 2024
+ hostname
nid005145
++ git rev-parse --show-toplevel
+ CONTAINER=/pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif
+ export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
+ SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
+ export SINGULARITY_BIND=/var/spool/slurmd,/opt/cray,/usr/lib64/libcxi.so.1,/usr/lib64/libjansson.so.4,/pfs,/scratch,/projappl,/project,/flash,/appl,
+ SINGULARITY_BIND=/var/spool/slurmd,/opt/cray,/usr/lib64/libcxi.so.1,/usr/lib64/libjansson.so.4,/pfs,/scratch,/projappl,/project,/flash,/appl,
+ rocm-smi
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 47.0c 90.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
1 48.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
2 37.0c 88.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
3 50.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
4 44.0c 87.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
5 45.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
6 41.0c 84.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
7 43.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
================================================================================
============================= End of ROCm SMI Log ==============================
+ export MASTER_PORT=24998
+ MASTER_PORT=24998
++ head -n 1
++ scontrol show hostnames 'nid[005145-005152]'
+ export MASTER_ADDR=nid005145
+ MASTER_ADDR=nid005145
+ export NCCL_SOCKET_IFNAME=hsn0,hsn1,hsn2,hsn3
+ NCCL_SOCKET_IFNAME=hsn0,hsn1,hsn2,hsn3
+ export NCCL_NET_GDR_LEVEL=PHB
+ NCCL_NET_GDR_LEVEL=PHB
+ srun singularity exec /pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif torchrun --nnodes=8 --nproc_per_node=1 --rdzv_id=7934548 --rdzv_endpoint=nid005145:24998 --rdzv_backend=c10d mistral7b_train.py
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
Map: 100%|██████████| 182822/182822 [00:27<00:00, 6563.36 examples/s]167833/182822 [00:25<00:02, 6992.49 examples/s]/s]examples/s]s/s]]amples/s]
Map: 100%|██████████| 182822/182822 [00:27<00:00, 6562.03 examples/s]
Map: 100%|██████████| 182822/182822 [00:27<00:00, 6545.65 examples/s]
Map: 100%|██████████| 182822/182822 [00:27<00:00, 6590.39 examples/s]
Map: 100%|██████████| 182822/182822 [00:27<00:00, 6636.14 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6212.45 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6211.78 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6240.70 examples/s]
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
trainable params: 41,943,040 || all params: 310,644,736 || trainable%: 13.5019
Map: 100%|██████████| 182822/182822 [00:09<00:00, 19730.54 examples/s]ples/s]
Map: 100%|██████████| 182822/182822 [00:09<00:00, 19766.39 examples/s]
Map: 100%|██████████| 182822/182822 [00:09<00:00, 19725.38 examples/s]
Map: 100%|██████████| 182822/182822 [00:09<00:00, 19787.15 examples/s]
Map: 100%|██████████| 182822/182822 [00:15<00:00, 11562.67 examples/s]82822 [00:13<00:02, 11271.98 examples/s]70.78 examples/s]
Map: 100%|██████████| 182822/182822 [00:16<00:00, 11322.34 examples/s]
Map: 100%|██████████| 182822/182822 [00:16<00:00, 11249.85 examples/s]
Map: 100%|██████████| 182822/182822 [00:15<00:00, 11608.60 examples/s]
max_steps is given, it will override any value given in num_train_epochs
max_steps is given, it will override any value given in num_train_epochs
max_steps is given, it will override any value given in num_train_epochs
max_steps is given, it will override any value given in num_train_epochs
max_steps is given, it will override any value given in num_train_epochs
max_steps is given, it will override any value given in num_train_epochs
max_steps is given, it will override any value given in num_train_epochs
max_steps is given, it will override any value given in num_train_epochs
0%| | 0/100 [00:00<?, ?it/s]/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
39%|███▉ {'loss': 1.8314, 'grad_norm': 2.0096566677093506, 'learning_rate': 4.7e-05, 'epoch': 0.02}
86%|████████▌ | 86/100 [06:20<01:00, 4.29s{'loss': 0.8769, 'grad_norm': 0.5495427250862122, 'learning_rate': 9.7e-05, 'epoch': 0.04}
100%|██{'train_runtime': 444.2417, 'train_samples_per_second': 14.407, 'train_steps_per_second': 0.225, 'train_loss': 1.354150924682617, 'epoch': 0.04}
100%|██████████| 100/100 [07:21<00:00, 4.42s/it]
Run time: 444.24 seconds
8 GPUs used.
Training speed: 14.4 samples/s (=1.8 samples/s/GPU)
Memory occupied on GPUs: 11.3 GB.
Memory occupied on GPUs: 10.0 GB.
Memory occupied on GPUs: 11.0 GB.
Memory occupied on GPUs: 11.3 GB.
Memory occupied on GPUs: 10.9 GB.
Memory occupied on GPUs: 11.4 GB.
Memory occupied on GPUs: 11.8 GB.
Memory occupied on GPUs: 9.2 GB.
real 9m21.027s
user 0m0.017s
sys 0m0.068s
+ date
Fri Aug 9 16:49:57 EEST 2024
+ hostname
nid005630
++ git rev-parse --show-toplevel
+ CONTAINER=/pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif
+ export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
+ SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,
+ rocm-smi
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 48.0c 94.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
1 49.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
2 42.0c 87.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
3 41.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
4 40.0c 86.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
5 48.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
6 37.0c 87.0W 800Mhz 1600Mhz 0% manual 500.0W 0% 0%
7 46.0c N/A 800Mhz 1600Mhz 0% manual 0.0W 0% 0%
================================================================================
============================= End of ROCm SMI Log ==============================
+ srun singularity exec /pfs/lustrep1/scratch/project_465001276/mpfister/llm-finetuning/lumi_container.sif torchrun --nproc_per_node 8 mistral7b_train.py
[2024-08-09 16:50:11,407] torch.distributed.run: [WARNING]
[2024-08-09 16:50:11,407] torch.distributed.run: [WARNING] *****************************************
[2024-08-09 16:50:11,407] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-08-09 16:50:11,407] torch.distributed.run: [WARNING] *****************************************
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
CUDA extension not installed.
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6292.57 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6294.48 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6292.52 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6292.91 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6292.55 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6292.78 examples/s]
Map: 100%|██████████| 182822/182822 [00:29<00:00, 6292.48 examples/s]
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/quantizers/auto.py:167: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
warnings.warn(warning_msg)
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
trainable params: 41,943,040 || all params: 310,644,736 || trainable%: 13.5019
max_steps is given, it will override any value given in num_train_epochs
0%| | 0/100 [00:00<?, ?it/s]/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:264.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
/opt/conda/envs/conda_container_env/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:647: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at ../aten/src/ATen/native/transformers/hip/sdp_utils.cpp:320.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
39%|███▉ {'loss': 1.8351, 'grad_norm': 1.872937560081482, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.02}
86%|████████▌ | 86/100 [02:48<00:26, 1.89s{'loss': 0.8854, 'grad_norm': 0.573049783706665, 'learning_rate': 9.6e-05, 'epoch': 0.04}
100%|██{'train_runtime': 195.5189, 'train_samples_per_second': 32.733, 'train_steps_per_second': 0.511, 'train_loss': 1.3602521514892578, 'epoch': 0.04}
100%|██████████| 100/100 [03:15<00:00, 1.95s/it]
Run time: 195.52 seconds
8 GPUs used.
Training speed: 32.7 samples/s (=4.1 samples/s/GPU)
Memory occupied on GPUs: 11.6 + 10.2 + 11.2 + 9.3 + 10.4 + 11.7 + 12.1 + 11.3 GB.
real 6m19.382s
user 0m0.018s
sys 0m0.044s
This diff is collapsed.
#!/bin/bash #!/bin/bash
#SBATCH --partition=small-g #SBATCH --partition=small-g
#SBATCH --gpus-per-node=1 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies #SBATCH --nodes=1
#SBATCH --ntasks-per-node=1 #SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=7 # 7 * number of GPUs #SBATCH --gpus-per-node=1 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies
#SBATCH --mem-per-gpu=60G #SBATCH --mem-per-gpu=60G
#SBATCH --cpus-per-task=7 # 7 * number of GPUs
#SBATCH --time=1:00:00 #SBATCH --time=1:00:00
# Include commands in output: # Include commands in output:
......
#!/bin/bash
#SBATCH --partition=small-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=2 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies
#SBATCH --mem-per-gpu=60G
#SBATCH --cpus-per-task=14 # 7 * number of GPUs
#SBATCH --time=1:00:00
# Include commands in output:
set -x
# Print current time and date:
date
# Print host name:
hostname
# Find container in top level directory of git repository:
CONTAINER=$(git rev-parse --show-toplevel)/lumi_container.sif
# Tell singularity to bind all relevant paths to container:
export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,$SINGULARITY_BIND
# List available GPUs:
rocm-smi
# Run AI scripts:
# time srun singularity exec $CONTAINER python mistral7b_train.py
# time srun singularity exec $CONTAINER python mistral7b_test.py
time srun singularity exec $CONTAINER torchrun --nproc_per_node 2 mistral7b_train.py
\ No newline at end of file
#!/bin/bash
#SBATCH --partition=standard-g
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=8 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies
#SBATCH --mem-per-gpu=60G
#SBATCH --cpus-per-task=56 # 7 * number of GPUs
#SBATCH --time=1:00:00
# Include commands in output:
set -x
# Print current time and date:
date
# Print host name:
hostname
# Find container in top level directory of git repository:
CONTAINER=$(git rev-parse --show-toplevel)/lumi_container.sif
# Tell singularity to bind all relevant paths to container:
export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,$SINGULARITY_BIND
export SINGULARITY_BIND=/var/spool/slurmd,/opt/cray,/usr/lib64/libcxi.so.1,/usr/lib64/libjansson.so.4,$SINGULARITY_BIND
# List available GPUs:
rocm-smi
# Set environment variables for communication between nodes:
export MASTER_PORT=24998
export MASTER_ADDR=$(scontrol show hostnames ${SLURM_JOB_NODELIST} | head -n 1)
# Tell RCCL to use only Slingshot interfaces and GPU RDMA
export NCCL_SOCKET_IFNAME=hsn0,hsn1,hsn2,hsn3
export NCCL_NET_GDR_LEVEL=PHB
# Run AI scripts:
# time srun singularity exec $CONTAINER python mistral7b_train.py
# time srun singularity exec $CONTAINER python mistral7b_test.py
# time srun singularity exec $CONTAINER torchrun --nproc_per_node 8 mistral7b_train.py
time srun singularity exec $CONTAINER torchrun \
--nnodes=$SLURM_JOB_NUM_NODES \
--nproc_per_node=8 \
--rdzv_id=$SLURM_JOB_ID \
--rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT \
--rdzv_backend=c10d \
mistral7b_train.py
#!/bin/bash
#SBATCH --partition=small-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=4 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies
#SBATCH --mem-per-gpu=60G
#SBATCH --cpus-per-task=28 # 7 * number of GPUs
#SBATCH --time=1:00:00
# Include commands in output:
set -x
# Print current time and date:
date
# Print host name:
hostname
# Find container in top level directory of git repository:
CONTAINER=$(git rev-parse --show-toplevel)/lumi_container.sif
# Tell singularity to bind all relevant paths to container:
export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,$SINGULARITY_BIND
# List available GPUs:
rocm-smi
# Run AI scripts:
# time srun singularity exec $CONTAINER python mistral7b_train.py
# time srun singularity exec $CONTAINER python mistral7b_test.py
time srun singularity exec $CONTAINER torchrun --nproc_per_node 4 mistral7b_train.py
\ No newline at end of file
#!/bin/bash
#SBATCH --partition=standard-g
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=8 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies
#SBATCH --mem-per-gpu=60G
#SBATCH --cpus-per-task=56 # 7 * number of GPUs
#SBATCH --time=1:00:00
# Include commands in output:
set -x
# Print current time and date:
date
# Print host name:
hostname
# Find container in top level directory of git repository:
CONTAINER=$(git rev-parse --show-toplevel)/lumi_container.sif
# Tell singularity to bind all relevant paths to container:
export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,$SINGULARITY_BIND
export SINGULARITY_BIND=/var/spool/slurmd,/opt/cray,/usr/lib64/libcxi.so.1,/usr/lib64/libjansson.so.4,$SINGULARITY_BIND
# List available GPUs:
rocm-smi
# Set environment variables for communication between nodes:
export MASTER_PORT=24998
export MASTER_ADDR=$(scontrol show hostnames ${SLURM_JOB_NODELIST} | head -n 1)
# Tell RCCL to use only Slingshot interfaces and GPU RDMA
export NCCL_SOCKET_IFNAME=hsn0,hsn1,hsn2,hsn3
export NCCL_NET_GDR_LEVEL=PHB
# Run AI scripts:
# time srun singularity exec $CONTAINER python mistral7b_train.py
# time srun singularity exec $CONTAINER python mistral7b_test.py
# time srun singularity exec $CONTAINER torchrun --nproc_per_node 8 mistral7b_train.py
time srun singularity exec $CONTAINER torchrun \
--nnodes=$SLURM_JOB_NUM_NODES \
--nproc_per_node=8 \
--rdzv_id=$SLURM_JOB_ID \
--rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT \
--rdzv_backend=c10d \
mistral7b_train.py
#!/bin/bash
#SBATCH --partition=standard-g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=8 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies
#SBATCH --mem-per-gpu=60G
#SBATCH --cpus-per-task=56 # 7 * number of GPUs
#SBATCH --time=1:00:00
# Include commands in output:
set -x
# Print current time and date:
date
# Print host name:
hostname
# Find container in top level directory of git repository:
CONTAINER=$(git rev-parse --show-toplevel)/lumi_container.sif
# Tell singularity to bind all relevant paths to container:
export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,$SINGULARITY_BIND
# List available GPUs:
rocm-smi
# Run AI scripts:
# time srun singularity exec $CONTAINER python mistral7b_train.py
# time srun singularity exec $CONTAINER python mistral7b_test.py
time srun singularity exec $CONTAINER torchrun --nproc_per_node 8 mistral7b_train.py
\ No newline at end of file
#!/bin/bash
#SBATCH --partition=standard-g
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=1 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies
#SBATCH --mem-per-gpu=60G
#SBATCH --cpus-per-task=7 # 7 * number of GPUs
#SBATCH --time=1:00:00
# Include commands in output:
set -x
# Print current time and date:
date
# Print host name:
hostname
# Find container in top level directory of git repository:
CONTAINER=$(git rev-parse --show-toplevel)/lumi_container.sif
# Tell singularity to bind all relevant paths to container:
export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,$SINGULARITY_BIND
export SINGULARITY_BIND=/var/spool/slurmd,/opt/cray,/usr/lib64/libcxi.so.1,/usr/lib64/libjansson.so.4,$SINGULARITY_BIND
# List available GPUs:
rocm-smi
# Set environment variables for communication between nodes:
export MASTER_PORT=24998
export MASTER_ADDR=$(scontrol show hostnames ${SLURM_JOB_NODELIST} | head -n 1)
# Tell RCCL to use only Slingshot interfaces and GPU RDMA
export NCCL_SOCKET_IFNAME=hsn0,hsn1,hsn2,hsn3
export NCCL_NET_GDR_LEVEL=PHB
# Run AI scripts:
# time srun singularity exec $CONTAINER python mistral7b_train.py
# time srun singularity exec $CONTAINER python mistral7b_test.py
# time srun singularity exec $CONTAINER torchrun --nproc_per_node 8 mistral7b_train.py
time srun singularity exec $CONTAINER torchrun \
--nnodes=$SLURM_JOB_NUM_NODES \
--nproc_per_node=1 \
--rdzv_id=$SLURM_JOB_ID \
--rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT \
--rdzv_backend=c10d \
mistral7b_train.py
#!/bin/bash
#SBATCH --partition=standard-g
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=8 # 1-8, but recommended to use multiples of 2, as each MI250X contains 2 compute dies
#SBATCH --mem-per-gpu=60G
#SBATCH --cpus-per-task=56 # 7 * number of GPUs
#SBATCH --time=1:00:00
# Include commands in output:
set -x
# Print current time and date:
date
# Print host name:
hostname
# Find container in top level directory of git repository:
CONTAINER=$(git rev-parse --show-toplevel)/lumi_container.sif
# Tell singularity to bind all relevant paths to container:
export SINGULARITY_BIND=/pfs,/scratch,/projappl,/project,/flash,/appl,$SINGULARITY_BIND
export SINGULARITY_BIND=/var/spool/slurmd,/opt/cray,/usr/lib64/libcxi.so.1,/usr/lib64/libjansson.so.4,$SINGULARITY_BIND
# List available GPUs:
rocm-smi
# Set environment variables for communication between nodes:
export MASTER_PORT=24998
export MASTER_ADDR=$(scontrol show hostnames ${SLURM_JOB_NODELIST} | head -n 1)
# Tell RCCL to use only Slingshot interfaces and GPU RDMA
export NCCL_SOCKET_IFNAME=hsn0,hsn1,hsn2,hsn3
export NCCL_NET_GDR_LEVEL=PHB
# Run AI scripts:
# time srun singularity exec $CONTAINER python mistral7b_train.py
# time srun singularity exec $CONTAINER python mistral7b_test.py
# time srun singularity exec $CONTAINER torchrun --nproc_per_node 8 mistral7b_train.py
time srun singularity exec $CONTAINER torchrun \
--nnodes=$SLURM_JOB_NUM_NODES \
--nproc_per_node=8 \
--rdzv_id=$SLURM_JOB_ID \
--rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT \
--rdzv_backend=c10d \
mistral7b_train.py
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment