*** Wartungsfenster jeden ersten Mittwoch vormittag im Monat ***

Skip to content
Snippets Groups Projects
Commit 93db53cf authored by Pfister, Martin's avatar Pfister, Martin
Browse files

Remove duplicate D3_Gradio... file

parent b653ea3a
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:fc68bcd6-bca8-4376-b231-0c8c84c532e4 tags:
## Gradio
%% Cell type:markdown id:c96197c4-9bdf-4895-a209-f97a30660b66 tags:
[Gradio](https://www.gradio.app) can enable simple web interfaces to your software. In this example, we are using Gradio to get a simple chat interface to a large language model.
%% Cell type:code id:18b34bbb-3946-4e31-8c51-c50f66d327bd tags:
``` python
# Import necessary libraries
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
import gradio as gr
import os
import random
```
%% Cell type:code id:744de243-d842-4fd8-a2d9-e0ebfb85f91e tags:
``` python
# Use a random TCP port:
port = random.randint(10000, 50000)
# Get username
username = os.environ['USER']
# Construct URL:
relative_url = f'/user/{username}/proxy/absolute/{port}/' # Needs to start with '/'
absolute_url = f'https://jupyterhub.vsc.ac.at{relative_url}'
```
%% Cell type:code id:114aa1d0-c7c2-49f5-878b-de7f76a6eb9e tags:
``` python
# Load tokenizer and model and create a pipeline that can be used for inference:
model_name = '/gpfs/data/fs70824/LLMs_models_datasets/models/microsoft--phi-3.5-mini-instruct'
# model_name = 'microsoft/Phi-3.5-mini-instruct'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='cuda',
torch_dtype=torch.bfloat16,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer)
```
%% Output
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Device set to use cpu
%% Cell type:code id:96fefd3b-f679-4b3a-b0de-f519c24249bd tags:
``` python
# Prepare a function that takes chatbot questions and returns the answer from the LLM:
def get_answer(question, history=[]):
history.append(
{'role': 'user', 'content': question}
)
result = pipe(history, max_new_tokens=500, return_full_text=False)
return result[0]['generated_text'].strip()
```
%% Cell type:code id:53ae1640-5515-45d1-a8ff-494d72662c62 tags:
``` python
# Create a Gradio ChatInterface and launch it:
chat_interface = gr.ChatInterface(get_answer, type='messages')
chat_interface.launch(share=False, inline=False, server_name='127.0.0.1', server_port=port, root_path=f'/user/{username}/proxy/absolute/{port}')
print(f'\nOpen the following URL in your webbrowser:\n{absolute_url}')
```
%% Output
which: no node in (/opt/sw/jupyterhub/envs/conda/vsc5/jupyterhub-llm-training-v4/bin:/opt/sw/conda/miniconda3-24.1.2/condabin:/opt/sw/cuda-zen/spack-0.19.0/bin:/home/fs71550/mpfister/.local/bin:/home/fs71550/mpfister/bin:/usr/share/Modules/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/lpp/mmfs/bin:/opt/sw/slurm/x86_64/alma8.8/22-05-2-1/bin:/opt/sw/slurm/x86_64/alma8.8/22-05-2-1/sbin:/opt/sw/vsc_modules/modules-4.2.2/bin:/opt/sw/vsc4/VSC/x86_64/generic/bin:/opt/sw/tools:/opt/sw/conda/miniconda3/condabin:/opt/sw/conda/miniconda3/bin)
* Running on local URL: http://127.0.0.1:39953
To create a public link, set `share=True` in `launch()`.
Open the following URL in your webbrowser:
https://jupyterhub.vsc.ac.at/user/mpfister/proxy/absolute/39953/
%% Cell type:code id:2b049395-b496-4412-abb7-1c5a7f592cce tags:
``` python
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment