*** Wartungsfenster jeden ersten Mittwoch vormittag im Monat ***

Skip to content
Snippets Groups Projects
Commit e636022a authored by Pfister, Martin's avatar Pfister, Martin
Browse files

Add Gradio example for VSC with GPU attached to jupyterlab

parent 9619f9e5
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:fc68bcd6-bca8-4376-b231-0c8c84c532e4 tags:
## Gradio
%% Cell type:markdown id:c96197c4-9bdf-4895-a209-f97a30660b66 tags:
[Gradio](https://www.gradio.app) can enable simple web interfaces to your software. In this example, we are using Gradio to get a simple chat interface to a large language model.
%% Cell type:code id:18b34bbb-3946-4e31-8c51-c50f66d327bd tags:
``` python
# Import necessary libraries
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
import gradio as gr
import os
import random
```
%% Cell type:code id:744de243-d842-4fd8-a2d9-e0ebfb85f91e tags:
``` python
# Use a random TCP port:
port = random.randint(10000, 50000)
# Get username
username = os.environ['USER']
# Construct URL:
relative_url = f'/user/{username}/proxy/absolute/{port}/' # Needs to start with '/'
absolute_url = f'https://jupyterhub.vsc.ac.at{relative_url}'
```
%% Cell type:code id:114aa1d0-c7c2-49f5-878b-de7f76a6eb9e tags:
``` python
# Load tokenizer and model and create a pipeline that can be used for inference:
model_name = '/gpfs/data/fs70824/LLMs_models_datasets/models/microsoft--phi-3.5-mini-instruct'
# model_name = 'microsoft/Phi-3.5-mini-instruct'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='cuda',
torch_dtype=torch.bfloat16,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer)
```
%% Output
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Device set to use cpu
%% Cell type:code id:96fefd3b-f679-4b3a-b0de-f519c24249bd tags:
``` python
# Prepare a function that takes chatbot questions and returns the answer from the LLM:
def get_answer(question, history=[]):
history.append(
{'role': 'user', 'content': question}
)
result = pipe(history, max_new_tokens=500, return_full_text=False)
return result[0]['generated_text'].strip()
```
%% Cell type:code id:53ae1640-5515-45d1-a8ff-494d72662c62 tags:
``` python
# Create a Gradio ChatInterface and launch it:
chat_interface = gr.ChatInterface(get_answer, type='messages')
chat_interface.launch(share=False, inline=False, server_name='127.0.0.1', server_port=port, root_path=f'/user/{username}/proxy/absolute/{port}')
print(f'\nOpen the following URL in your webbrowser:\n{absolute_url}')
```
%% Output
which: no node in (/opt/sw/jupyterhub/envs/conda/vsc5/jupyterhub-llm-training-v4/bin:/opt/sw/conda/miniconda3-24.1.2/condabin:/opt/sw/cuda-zen/spack-0.19.0/bin:/home/fs71550/mpfister/.local/bin:/home/fs71550/mpfister/bin:/usr/share/Modules/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/lpp/mmfs/bin:/opt/sw/slurm/x86_64/alma8.8/22-05-2-1/bin:/opt/sw/slurm/x86_64/alma8.8/22-05-2-1/sbin:/opt/sw/vsc_modules/modules-4.2.2/bin:/opt/sw/vsc4/VSC/x86_64/generic/bin:/opt/sw/tools:/opt/sw/conda/miniconda3/condabin:/opt/sw/conda/miniconda3/bin)
* Running on local URL: http://127.0.0.1:39953
To create a public link, set `share=True` in `launch()`.
Open the following URL in your webbrowser:
https://jupyterhub.vsc.ac.at/user/mpfister/proxy/absolute/39953/
%% Cell type:code id:2b049395-b496-4412-abb7-1c5a7f592cce tags:
``` python
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment