Add unsloth LEONARDO results

77a8b152 · Pfister, Martin · 2ec4b371 · 77a8b152 · 77a8b152
Commit 77a8b152 authored 1 year ago by Pfister, Martin
--- a/README.md
+++ b/README.md
@@ -36,7 +36,7 @@ Finetune and evaluate [Mistral 7B Instruct v0.3](https://huggingface.co/mistrala
 | - | - | - | - | - |
 | VSC5 (Nvidia A40) |  samples/s |  samples/s |  GB |  GB |
 | VSC5 (Nvidia A100) | 13.6 samples/s (+72%) | 18.4 samples/s (-7%) | 6.9 GB (-36%) | 13.7 GB (-51%) |
-| Leonardo (Nvidia A100) |  samples/s |  samples/s |  GB |  GB |
+| Leonardo (Nvidia A100) | 14.9 samples/s (+71%) | 21.5 samples/s (-5%) | 7.0 GB (-40%) | 13.7 GB (-51%) |

 ### [mistral7b-bnb](mistral7b-bnb) multi GPU training with DDP


--- a/mistral7b-bnb-unsloth/output_leonardo_20240910.txt
+++ b/mistral7b-bnb-unsloth/output_leonardo_20240910.txt
+Unloading profile/base
+  ERROR: Module evaluation aborted
+ date
+Tue Sep 10 17:33:56 CEST 2024
+ hostname
+lrdn2201.leonardo.local
+ nvidia-smi
+Tue Sep 10 17:33:56 2024       
+---------------------------------------------------------------------------------------+
+| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
+|-----------------------------------------+----------------------+----------------------+
+| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|                                         |                      |               MIG M. |
+|=========================================+======================+======================|
+|   0  NVIDIA A100-SXM-64GB            On | 00000000:1D:00.0 Off |                    0 |
+| N/A   43C    P0               63W / 474W|      0MiB / 65536MiB |      0%      Default |
+|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
+                                                                                         
+---------------------------------------------------------------------------------------+
+| Processes:                                                                            |
+|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
+|        ID   ID                                                             Usage      |
+|=======================================================================================|
+|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
+ conda run -n finetuning --no-capture-output python mistral7b_train.py
+Unsloth: Will load unsloth/mistral-7b-instruct-v0.3-bnb-4bit as a legacy tokenizer.
+Using the latest cached version of the dataset since medmcqa couldn't be found on the Hugging Face Hub
+Found the latest cached dataset configuration 'default' at /leonardo/home/userexternal/mpfister/.cache/huggingface/datasets/medmcqa/default/0.0.0/91c6572c454088bf71b679ad90aa8dffcd0d5868 (last modified on Thu Aug 29 19:38:14 2024).
+🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
+==((====))==  Unsloth 2024.8: Fast Mistral patching. Transformers = 4.43.4.
+   \\   /|    GPU: NVIDIA A100-SXM-64GB. Max memory: 63.423 GB. Platform = Linux.
+O^O/ \_/ \    Pytorch: 2.2.0+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
+\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.24. FA2 = True]
+ "-____-"     Free Apache license: http://github.com/unslothai/unsloth
+Map: 100%|██████████| 182822/182822 [00:38<00:00, 4688.50 examples/s]
+Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
+Map: 100%|██████████| 182822/182822 [01:53<00:00, 1615.62 examples/s]
+Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
+max_steps is given, it will override any value given in num_train_epochs
+==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
+   \\   /|    Num examples = 182,822 | Num Epochs = 1
+O^O/ \_/ \    Batch size per device = 8 | Gradient Accumulation steps = 1
+\        /    Total batch size = 8 | Total steps = 100
+ "-____-"     Number of trainable parameters = 41,943,040
+trainable params: 41,943,040 || all params: 7,289,966,592 || trainable%: 0.5754
+100%|██████████| 100/100 [00:53<00:00,  1.86it/s]
+/leonardo/home/userexternal/mpfister/.conda/envs/finetuning/lib/python3.11/site-packages/peft/utils/other.py:619: UserWarning: Unable to fetch remote file due to the following error (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /unsloth/mistral-7b-instruct-v0.3-bnb-4bit/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x150ea3ee3d10>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: f7b42e8c-3e90-4492-a1cc-1fbf976543d7)') - silently ignoring the lookup for the file config.json in unsloth/mistral-7b-instruct-v0.3-bnb-4bit.
+  warnings.warn(
+/leonardo/home/userexternal/mpfister/.conda/envs/finetuning/lib/python3.11/site-packages/peft/utils/save_and_load.py:218: UserWarning: Could not find a config file in unsloth/mistral-7b-instruct-v0.3-bnb-4bit - will assume that the vocabulary was not modified.
+  warnings.warn(
+{'loss': 1.6021, 'grad_norm': 1.7103830575942993, 'learning_rate': 4.7e-05, 'epoch': 0.0}
+{'loss': 0.9107, 'grad_norm': 1.489850640296936, 'learning_rate': 9.7e-05, 'epoch': 0.0}
+{'train_runtime': 53.6592, 'train_samples_per_second': 14.909, 'train_steps_per_second': 1.864, 'train_loss': 1.2563672256469727, 'epoch': 0.0}
+Run time: 53.66 seconds
+1 GPUs used.
+Training speed: 14.9 samples/s (=14.9 samples/s/GPU)
+Memory occupied on GPUs: 7.0 GB.
+
+real	6m9.210s
+user	3m22.330s
+sys	0m20.955s
+ conda run -n finetuning --no-capture-output python mistral7b_test.py
+Unsloth: Will load unsloth/mistral-7b-instruct-v0.3-bnb-4bit as a legacy tokenizer.
+Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
+Using the latest cached version of the dataset since medmcqa couldn't be found on the Hugging Face Hub
+Found the latest cached dataset configuration 'default' at /leonardo/home/userexternal/mpfister/.cache/huggingface/datasets/medmcqa/default/0.0.0/91c6572c454088bf71b679ad90aa8dffcd0d5868 (last modified on Tue Sep 10 17:39:08 2024).
+🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
+==((====))==  Unsloth 2024.8: Fast Mistral patching. Transformers = 4.43.4.
+   \\   /|    GPU: NVIDIA A100-SXM-64GB. Max memory: 63.423 GB. Platform = Linux.
+O^O/ \_/ \    Pytorch: 2.2.0+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
+\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.24. FA2 = True]
+ "-____-"     Free Apache license: http://github.com/unslothai/unsloth
+Map: 100%|██████████| 4183/4183 [00:00<00:00, 4712.54 examples/s]
+100%|██████████| 66/66 [03:14<00:00,  2.95s/it]
+45.16% (1889 out of 4183) answers correct.
+Run time: 194.71 seconds
+Samples/second: 21.5
+Memory occupied on GPUs: 13.7 GB.
+
+real	5m25.199s
+user	2m52.396s
+sys	0m41.669s