*** Wartungsfenster jeden ersten Mittwoch vormittag im Monat ***

Skip to content
Snippets Groups Projects
Commit 6fc78ebb authored by Harrison, Simeon's avatar Harrison, Simeon
Browse files

Added new notebooks

parent 2b60a966
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:78131162-a068-41cb-b1e5-4f80b03cdfa1 tags: %% Cell type:markdown id:78131162-a068-41cb-b1e5-4f80b03cdfa1 tags:
# Prompt Engineering Essentials # Prompt Engineering Essentials
The D3 notebooks will cover the essential topics of prompt engineering, beginning with inference in general and an introduction to LangChain. We will then cover the topics of prompt templates and parsing and will then go on to the concept of creating chains and connecting these in different ways to build more sophisticated constructs to make the most of LLMs. The D3 notebooks will cover the essential topics of prompt engineering, beginning with inference in general and an introduction to LangChain. We will then cover the topics of prompt templates and parsing and will then go on to the concept of creating chains and connecting these in different ways to build more sophisticated constructs to make the most of LLMs.
%% Cell type:markdown id:72d12851-f8e8-4143-8ec0-da82284066a0 tags: %% Cell type:markdown id:72d12851-f8e8-4143-8ec0-da82284066a0 tags:
## API vs. Locally Hosted LLM ## API vs. Locally Hosted LLM
Using the an API-hosted LLM (e.g. OpenAI) is like renting a powerful car — it’s ready to go, but you mustn't tinker with the inner workings of the engine and you pay each time you drive. Using the an API-hosted LLM (e.g. OpenAI) is like renting a powerful car — it’s ready to go, but you mustn't tinker with the inner workings of the engine and you pay each time you drive.
Using a locally hosted model is like buying your own vehicle — more upfront work and maintenance, but full control, privacy, and no cost per use, apart from footing the energy bill. Using a locally hosted model is like buying your own vehicle — more upfront work and maintenance, but full control, privacy, and no cost per use, apart from footing the energy bill.
| **Aspect** | **API-based (e.g. OpenAI)** | **Local Model (e.g. Mistral, PyTorch + LangChain)** | | **Aspect** | **API-based (e.g. OpenAI)** | **Local Model (e.g. Mistral, PyTorch + LangChain)** |
|---------------------------|------------------------------------------------------|-------------------------------------------------------------| |---------------------------|------------------------------------------------------|-------------------------------------------------------------|
| **Setup time** | Minimal – just an API key | Requires downloading and managing the model | | **Setup time** | Minimal – just an API key | Requires downloading and managing the model |
| **Hardware requirement** | None (runs in the cloud) | Requires a GPU (sometimes large memory) | | **Hardware requirement** | None (runs in the cloud) | Requires a GPU (sometimes large memory) |
| **Latency** | Network-dependent | Faster inference (once model is loaded) | | **Latency** | Network-dependent | Faster inference (once model is loaded) |
| **Privacy / Data control**| Data sent to external servers | Data stays on your infrastructure | | **Privacy / Data control**| Data sent to external servers | Data stays on your infrastructure |
| **Cost** | Pay-per-use (based on tokens) | Free at inference (after download), but uses your compute | | **Cost** | Pay-per-use (based on tokens) | Free at inference (after download), but uses your compute |
| **Scalability** | Handled by provider | You manage and scale infrastructure | | **Scalability** | Handled by provider | You manage and scale infrastructure |
| **Flexibility** | Limited to provider's models and settings | Full control: quantization, fine-tuning, prompt handling | | **Flexibility** | Limited to provider's models and settings | Full control: quantization, fine-tuning, prompt handling |
| **Offline use** | Not possible | Yes, after initial download | | **Offline use** | Not possible | Yes, after initial download |
| **Customizability** | No access to internals | You can modify and extend anything | | **Customizability** | No access to internals | You can modify and extend anything |
**Using an API (e.g. OpenAI)** <br> **Using an API (e.g. OpenAI)** <br>
- You use OpenAI or ChatOpenAI class from LangChain - You use OpenAI or ChatOpenAI class from LangChain
- LangChain sends your prompt to api.openai.com - LangChain sends your prompt to api.openai.com
- You don’t manage the model, only the request and response - You don’t manage the model, only the request and response
``` ```
from langchain.chat_models import ChatOpenAI from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(api_key="...", model="gpt-4") llm = ChatOpenAI(api_key="...", model="gpt-4")
response = llm.invoke("Summarize this legal clause...") response = llm.invoke("Summarize this legal clause...")
``` ```
📝 You can store your API key in different ways, it is common, however, to set it as an **environment variable**. 📝 You can store your API key in different ways, it is common, however, to set it as an **environment variable**.
Note, that LangChain automatically looks up any environment variable with the name **`OPENAI_API_KEY`** automatically when making a connection to OpenAI. Note, that LangChain automatically looks up any environment variable with the name **`OPENAI_API_KEY`** automatically when making a connection to OpenAI.
``` ```
import os import os
os.environ['OPENAI_API_KEY'] = 'my_API_key_123' os.environ['OPENAI_API_KEY'] = 'my_API_key_123'
llm = ChatOpenAI(api_key=os.environ['OPENAI_API_KEY'], model="gpt-4") llm = ChatOpenAI(api_key=os.environ['OPENAI_API_KEY'], model="gpt-4")
``` ```
Alternatively, you could just pass in the openai key via a string (not very secure, you should NEVER hard-code your API keys), or even just save it somewhere on your computer in a text file and then read it in: Alternatively, you could just pass in the openai key via a string (not very secure, you should NEVER hard-code your API keys), or even just save it somewhere on your computer in a text file and then read it in:
``` ```
f = open('C:\\Users\\Simeon\\Desktop\\openai.txt') f = open('C:\\Users\\Simeon\\Desktop\\openai.txt')
api_key = f.read() api_key = f.read()
llm = OpenAI(openai_api_key=api_key) llm = OpenAI(openai_api_key=api_key)
``` ```
**Using a Local Model (e.g. Mistral, LLaMA)**<br> **Using a Local Model (e.g. Mistral, LLaMA)**<br>
- You load the model and tokenizer using Hugging Face Transformers - You load the model and tokenizer using Hugging Face Transformers
- You wrap the pipeline using HuggingFacePipeline or similar in LangChain - You wrap the pipeline using HuggingFacePipeline or similar in LangChain
- You manage memory, GPU allocation, quantization, etc. - You manage memory, GPU allocation, quantization, etc.
``` ```
from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import AutoModelForCausalLM, AutoTokenizer
from langchain_huggingface import ChatHuggingFace from langchain_huggingface import ChatHuggingFace
tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
llm = ChatHuggingFace(llm=HuggingFacePipeline(pipeline=pipe)) llm = ChatHuggingFace(llm=HuggingFacePipeline(pipeline=pipe))
``` ```
%% Cell type:markdown id:ef97785f-47e8-4e33-98a1-366843cdd23d tags: %% Cell type:markdown id:ef97785f-47e8-4e33-98a1-366843cdd23d tags:
## Basic Setup for Inference ## Basic Setup for Inference
%% Cell type:markdown id:f5365e87-dbae-4f26-871f-74f672fc12b9 tags: %% Cell type:markdown id:f5365e87-dbae-4f26-871f-74f672fc12b9 tags:
Apart from the usual suspects of Pytorch and Huggingface libraries, we get our first imports of the LangChain library and some of its classes. Apart from the usual suspects of Pytorch and Huggingface libraries, we get our first imports of the LangChain library and some of its classes.
Since we want to show you how to how to work with LLMs that are not part of the closed OpenAI and Anthropic world, we are going to show you how to work with open and downloadable models. As it makes no sense for all of us to download the models and store them in our home directory, we've done that for your before the start of the course. You can find the path to the models down below. Since we want to show you how to how to work with LLMs that are not part of the closed OpenAI and Anthropic world, we are going to show you how to work with open and downloadable models. As it makes no sense for all of us to download the models and store them in our home directory, we've done that for your before the start of the course. You can find the path to the models down below.
%% Cell type:code id:77fbb51f-032e-4b72-83d5-37da49f8dfa7 tags: %% Cell type:code id:77fbb51f-032e-4b72-83d5-37da49f8dfa7 tags:
``` python ``` python
import torch import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.messages import HumanMessage, SystemMessage from langchain_core.messages import HumanMessage, SystemMessage
from langchain_huggingface.llms import HuggingFacePipeline from langchain_huggingface.llms import HuggingFacePipeline
from langchain_huggingface import ChatHuggingFace from langchain_huggingface import ChatHuggingFace
``` ```
%% Cell type:markdown id:6c43e856-81b6-4509-b0ac-227a096d2e38 tags: %% Cell type:markdown id:6c43e856-81b6-4509-b0ac-227a096d2e38 tags:
If you choose to work with a model such as `meta-llama/Llama-3.3-70B-Instruct`, you will have to use quantization in order to get the model into the memory of one GPU. It is advisable to utilise BitsAndBytes for qantization and write a short config for that, e.g.: If you choose to work with a model such as `meta-llama/Llama-3.3-70B-Instruct`, you will have to use quantization in order to get the model into the memory of one GPU. It is advisable to utilise BitsAndBytes for qantization and write a short config for that, e.g.:
``` ```
# Define quantization config # Define quantization config
quantization_config = BitsAndBytesConfig( quantization_config = BitsAndBytesConfig(
load_in_4bit=True, # Enable 4-bit quantization load_in_4bit=True, # Enable 4-bit quantization
bnb_4bit_compute_dtype=torch.float16, # Use float16 for computation bnb_4bit_compute_dtype=torch.float16, # Use float16 for computation
bnb_4bit_use_double_quant=True # Double quantization for efficiency bnb_4bit_use_double_quant=True # Double quantization for efficiency
) )
``` ```
However, beware, a model of that size takes roughly 30 minutes to load... However, beware, a model of that size takes roughly 30 minutes to load...
In this course we do not want to wait around for that long, so we will use a smaller model called [Nous-Hermes-2-Mistral-7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO). In this course we do not want to wait around for that long, so we will use a smaller model called [Nous-Hermes-2-Mistral-7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO).
%% Cell type:code id:d71156aa-4c3c-420e-8345-f5052c0655a7 tags: %% Cell type:code id:d71156aa-4c3c-420e-8345-f5052c0655a7 tags:
``` python ``` python
path_to_model = "/gpfs/data/fs70824/LLMs_models_datasets/models" path_to_model = "/gpfs/data/fs70824/LLMs_models_datasets/models"
``` ```
%% Cell type:code id:41d0ca09-2bce-4761-b6b0-6503f5fb0f56 tags: %% Cell type:code id:41d0ca09-2bce-4761-b6b0-6503f5fb0f56 tags:
``` python ``` python
#model_name = "meta-llama/Llama-3.3-70B-Instruct" #model_name = "meta-llama/Llama-3.3-70B-Instruct"
model_name = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO" model_name = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO"
cache_dir = path_to_model cache_dir = path_to_model
``` ```
%% Cell type:code id:79281d6e-2dc6-4651-94f6-53b56d7152f5 tags: %% Cell type:code id:79281d6e-2dc6-4651-94f6-53b56d7152f5 tags:
``` python ``` python
# Load tokenizer # Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir) tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
# Load model with new quantization method # Load model
model = AutoModelForCausalLM.from_pretrained( model = AutoModelForCausalLM.from_pretrained(
model_name, model_name,
cache_dir=cache_dir, cache_dir=cache_dir,
device_map="auto", device_map="auto",
#quantization_config=quantization_config, # This is what you would need for the LLama3-70B (and similar) models #quantization_config=quantization_config, # This is what you would need for the LLama3-70B (and similar) models
local_files_only=True, # Prevent any re-downloads local_files_only=True, # Prevent any re-downloads
trust_remote_code=True trust_remote_code=True
) )
# Verify model config # Verify model config
print(model.config) print(model.config)
``` ```
%% Output %% Output
MistralConfig { MistralConfig {
"_attn_implementation_autoset": true, "_attn_implementation_autoset": true,
"_name_or_path": "NousResearch/Nous-Hermes-2-Mistral-7B-DPO", "_name_or_path": "NousResearch/Nous-Hermes-2-Mistral-7B-DPO",
"architectures": [ "architectures": [
"MistralForCausalLM" "MistralForCausalLM"
], ],
"attention_dropout": 0.0, "attention_dropout": 0.0,
"bos_token_id": 1, "bos_token_id": 1,
"eos_token_id": 32000, "eos_token_id": 32000,
"head_dim": 128, "head_dim": 128,
"hidden_act": "silu", "hidden_act": "silu",
"hidden_size": 4096, "hidden_size": 4096,
"initializer_range": 0.02, "initializer_range": 0.02,
"intermediate_size": 14336, "intermediate_size": 14336,
"max_position_embeddings": 32768, "max_position_embeddings": 32768,
"model_type": "mistral", "model_type": "mistral",
"num_attention_heads": 32, "num_attention_heads": 32,
"num_hidden_layers": 32, "num_hidden_layers": 32,
"num_key_value_heads": 8, "num_key_value_heads": 8,
"rms_norm_eps": 1e-05, "rms_norm_eps": 1e-05,
"rope_theta": 10000.0, "rope_theta": 10000.0,
"sliding_window": 4096, "sliding_window": 4096,
"tie_word_embeddings": false, "tie_word_embeddings": false,
"torch_dtype": "float32", "torch_dtype": "float32",
"transformers_version": "4.49.0", "transformers_version": "4.49.0",
"use_cache": false, "use_cache": false,
"vocab_size": 32002 "vocab_size": 32002
} }
%% Cell type:markdown id:719deb1f-d1a8-42db-8e91-22ec586f6b15 tags: %% Cell type:markdown id:719deb1f-d1a8-42db-8e91-22ec586f6b15 tags:
Now, let's try out a prompt or two: Now, let's try out a prompt or two:
%% Cell type:code id:3f803804-9451-43b6-9b6e-427470a07b15 tags: %% Cell type:code id:3f803804-9451-43b6-9b6e-427470a07b15 tags:
``` python ``` python
prompt = "What is the capital of France? Can you give me some facts about it?" prompt = "What is the capital of France? Can you give me some facts about it?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad(): with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=250) output = model.generate(**inputs, max_new_tokens=250)
print(tokenizer.decode(output[0], skip_special_tokens=True)) print(tokenizer.decode(output[0], skip_special_tokens=True))
``` ```
%% Output %% Output
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation. Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
What is the capital of France? Can you give me some facts about it? What is the capital of France? Can you give me some facts about it?
The capital of France is Paris. Paris is the largest city in France and is located in the northern part of the country. It is situated on the Seine River and is known for its beautiful architecture, art, and culture. The capital of France is Paris. Paris is the largest city in France and is located in the northern part of the country. It is situated on the Seine River and is known for its beautiful architecture, art, and culture.
Here are some interesting facts about Paris: Here are some interesting facts about Paris:
1. Paris is home to some of the world’s most famous landmarks, including the Eiffel Tower, the Louvre Museum, and the Notre-Dame Cathedral. 1. Paris is home to some of the world’s most famous landmarks, including the Eiffel Tower, the Louvre Museum, and the Notre-Dame Cathedral.
2. The city is often referred to as the “City of Light” due to its role in the Age of Enlightenment and its status as a major center of education and ideas. 2. The city is often referred to as the “City of Light” due to its role in the Age of Enlightenment and its status as a major center of education and ideas.
3. Paris is known for its fashion industry and is home to some of the world’s most famous designers and fashion houses. 3. Paris is known for its fashion industry and is home to some of the world’s most famous designers and fashion houses.
4. The city is also famous for its cuisine, with dishes such as croissants, escargot, and macarons originating in France. 4. The city is also famous for its cuisine, with dishes such as croissants, escargot, and macarons originating in France.
5. Paris is a major transportation hub, with an extensive network of buses, trains, and subways that connect the city to the rest of France and Europe. 5. Paris is a major transportation hub, with an extensive network of buses, trains, and subways that connect the city to the rest of France and Europe.
6. The city is divided into 20 6. The city is divided into 20
%% Cell type:markdown id:14e1bb25-f65d-48ba-9517-00be6a1afa3d tags: %% Cell type:markdown id:14e1bb25-f65d-48ba-9517-00be6a1afa3d tags:
**Not bad, however, we can do better!** **Not bad, however, we can do better!**
%% Cell type:markdown id:e1d7cbcb-f127-474d-9db4-b0d0daa705ea tags: %% Cell type:markdown id:e1d7cbcb-f127-474d-9db4-b0d0daa705ea tags:
## Enter LangChain ## Enter LangChain
[LangChain](https://www.langchain.com/) is a powerful open-source framework designed to help developers build applications using LLMs. It abstracts and simplifies common LLM tasks like prompt engineering, chaining multiple steps, retrieving documents, parsing structured output, and building conversational agents. [LangChain](https://www.langchain.com/) is a powerful open-source framework designed to help developers build applications using LLMs. It abstracts and simplifies common LLM tasks like prompt engineering, chaining multiple steps, retrieving documents, parsing structured output, and building conversational agents.
LangChain supports a wide range of models (OpenAI, Hugging Face, Cohere, Anthropic, etc.) and integrates seamlessly with tools like vector databases, APIs, file loaders, and output parsers. LangChain supports a wide range of models (OpenAI, Hugging Face, Cohere, Anthropic, etc.) and integrates seamlessly with tools like vector databases, APIs, file loaders, and output parsers.
--- ---
### LangChain Building Blocks ### LangChain Building Blocks
``` ```
+-------------------+ +-------------------+
| PromptTemplate | ← Create structured prompts | PromptTemplate | ← Create structured prompts
+-------------------+ +-------------------+
+-------------------+ +-------------------+
| LLM | ← Connect to local or remote LLM | LLM | ← Connect to local or remote LLM
+-------------------+ +-------------------+
+-------------------+ +-------------------+
| Output Parsers | ← Extract structured results (e.g. JSON) | Output Parsers | ← Extract structured results (e.g. JSON)
+-------------------+ +-------------------+
+-------------------+ +-------------------+
| Chains / Agents | ← Combine steps into flows | Chains / Agents | ← Combine steps into flows
+-------------------+ +-------------------+
+-------------------+ +-------------------+
| Memory / Tools | ← Use search, APIs, databases, etc. | Memory / Tools | ← Use search, APIs, databases, etc.
+-------------------+ +-------------------+
``` ```
--- ---
### Core LLM/ChatModel Methods in LangChain ### Core LLM/ChatModel Methods in LangChain
How to do inference with LangChain: How to do inference with LangChain:
| **Method** | **Purpose** | **Input Type** | **Output Type** | | **Method** | **Purpose** | **Input Type** | **Output Type** |
|------------------|------------------------------------------------------------|-------------------------|--------------------------| |------------------|------------------------------------------------------------|-------------------------|--------------------------|
| `invoke()` | Handles a **single input**, returns one response | `str` or `Message(s)` | `str` / `AIMessage` | | `invoke()` | Handles a **single input**, returns one response | `str` or `Message(s)` | `str` / `AIMessage` |
| `generate()` | Handles a **batch of inputs**, returns multiple outputs | `list[str]` | `LLMResult` | | `generate()` | Handles a **batch of inputs**, returns multiple outputs | `list[str]` | `LLMResult` |
| `batch()` | Batched input, returns a flat list of outputs | `list[str]` | `list[str]` / Messages | | `batch()` | Batched input, returns a flat list of outputs | `list[str]` | `list[str]` / Messages |
| `stream()` | Streams the output as tokens are generated | `str` / `Message(s)` | Generator (streamed text)| | `stream()` | Streams the output as tokens are generated | `str` / `Message(s)` | Generator (streamed text)|
| `ainvoke()` | Async version of `invoke()` | `str` / `Message(s)` | Awaitable result | | `ainvoke()` | Async version of `invoke()` | `str` / `Message(s)` | Awaitable result |
| `agenerate()` | Async version of `generate()` | `list[str]` | Awaitable result | | `agenerate()` | Async version of `generate()` | `list[str]` | Awaitable result |
Before we use one of these methods, we need to create a pipeline and apply the LangChain wrapper to the pipeline, so we create a format that LangChain can call with .invoke() or .generate() etc. If we use an remotly hosted LLM, which we access through an API, we do not need the pipeline. Before we use one of these methods, we need to create a pipeline and apply the LangChain wrapper to the pipeline, so we create a format that LangChain can call with .invoke() or .generate() etc. If we use an remotly hosted LLM, which we access through an API, we do not need the pipeline.
--- ---
%% Cell type:code id:338c010a-d31f-4a4e-9118-83a66673d3f7 tags: %% Cell type:code id:338c010a-d31f-4a4e-9118-83a66673d3f7 tags:
``` python ``` python
# Create a text generation pipeline # Create a text generation pipeline
text_pipeline = pipeline( text_pipeline = pipeline(
"text-generation", "text-generation",
model=model, model=model,
tokenizer=tokenizer, tokenizer=tokenizer,
max_new_tokens=150, max_new_tokens=150,
device_map="auto" device_map="auto"
) )
# Wrap in LangChain's HuggingFacePipeline # Wrap in LangChain's HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=text_pipeline) llm = HuggingFacePipeline(pipeline=text_pipeline)
``` ```
%% Output %% Output
Device set to use cuda:0 Device set to use cuda:0
%% Cell type:markdown id:0bfd95c6-e389-47b1-83c1-c26c05b18061 tags: %% Cell type:markdown id:0bfd95c6-e389-47b1-83c1-c26c05b18061 tags:
#### llm.invoke() #### llm.invoke()
%% Cell type:code id:dd2f8136-d8dc-4095-85b7-d9da665f05f7 tags: %% Cell type:code id:dd2f8136-d8dc-4095-85b7-d9da665f05f7 tags:
``` python ``` python
print(llm.invoke('Here is a fun fact about Mars:')) print(llm.invoke('Here is a fun fact about Mars:'))
``` ```
%% Output %% Output
Here is a fun fact about Mars: it has the largest volcano in our solar system. It’s called Olympus Mons, and it’s so big that if it were on Earth, it would stretch from New York City to Denver. Here is a fun fact about Mars: it has the largest volcano in our solar system. It’s called Olympus Mons, and it’s so big that if it were on Earth, it would stretch from New York City to Denver.
But what’s even more amazing is that Olympus Mons is a shield volcano, which means it was formed by slow-moving lava flows. And those lava flows were so thick that they built up over time, creating a mountain that’s three times taller than Mount Everest. But what’s even more amazing is that Olympus Mons is a shield volcano, which means it was formed by slow-moving lava flows. And those lava flows were so thick that they built up over time, creating a mountain that’s three times taller than Mount Everest.
So how did Olympus Mons get so big? Well, Mars has a thin atmosphere, which means that the pressure inside its volcanoes is much lower than on Earth. This allows the l So how did Olympus Mons get so big? Well, Mars has a thin atmosphere, which means that the pressure inside its volcanoes is much lower than on Earth. This allows the l
%% Cell type:markdown id:96043523-d4ac-474a-abf8-cabec94c35ed tags: %% Cell type:markdown id:96043523-d4ac-474a-abf8-cabec94c35ed tags:
#### llm.batch() #### llm.batch()
%% Cell type:code id:772dbcf0-6a73-4a59-8cb8-dc2d76c399f6 tags: %% Cell type:code id:772dbcf0-6a73-4a59-8cb8-dc2d76c399f6 tags:
``` python ``` python
results = llm.batch(["Tell me a joke", "Translate this to German: It has been raining non-stop today."]) results = llm.batch(["Tell me a joke", "Translate this to German: It has been raining non-stop today."])
print(results) print(results)
``` ```
%% Output %% Output
['Tell me a joke.\n\nI’ll tell you a joke. Why don’t scientists trust atoms? Because they make up everything.\n\nWhat’s the difference between a well-dressed man on a tricycle and a poorly dressed man on a bicycle? Attire.\n\nWhat did the fish say when he hit the wall? Dam.\n\nWhat do you call a fake noodle? An impasta.\n\nWhat do you call a bear with no teeth? A gummy bear.\n\nWhat do you call a boomerang that doesn’t come back? A stick.\n\nWhat do you call a fake noodle that doesn’t work? An impasta-fail.\n\n', 'Translate this to German: It has been raining non-stop today.\n\n# German Translation\n\nheute hat es ununterbrochen geregnet.\n\nLearn German with us! Additionally, our language school in Berlin can help you improve your German skills.\n\nGerman (Deutsch) is a West Germanic language that is mainly spoken in Central Europe. It is the most widely spoken and official or co-official language in Germany, Austria, Switzerland, South Tyrol in Italy, the German-speaking Community of Belgium, and Liechtenstein. It is one of the three official languages of Luxembourg and a co-official language in the Opole Voivodeship in Poland. The languages which are most similar to German are the other members of the'] ['Tell me a joke.\n\nI’ll tell you a joke. Why don’t scientists trust atoms? Because they make up everything.\n\nWhat’s the difference between a well-dressed man on a tricycle and a poorly dressed man on a bicycle? Attire.\n\nWhat did the fish say when he hit the wall? Dam.\n\nWhat do you call a fake noodle? An impasta.\n\nWhat do you call a bear with no teeth? A gummy bear.\n\nWhat do you call a boomerang that doesn’t come back? A stick.\n\nWhat do you call a fake noodle that doesn’t work? An impasta-fail.\n\n', 'Translate this to German: It has been raining non-stop today.\n\n# German Translation\n\nheute hat es ununterbrochen geregnet.\n\nLearn German with us! Additionally, our language school in Berlin can help you improve your German skills.\n\nGerman (Deutsch) is a West Germanic language that is mainly spoken in Central Europe. It is the most widely spoken and official or co-official language in Germany, Austria, Switzerland, South Tyrol in Italy, the German-speaking Community of Belgium, and Liechtenstein. It is one of the three official languages of Luxembourg and a co-official language in the Opole Voivodeship in Poland. The languages which are most similar to German are the other members of the']
%% Cell type:markdown id:dcfd9c64-9a79-4b1e-91c0-f03c8c243bcb tags: %% Cell type:markdown id:dcfd9c64-9a79-4b1e-91c0-f03c8c243bcb tags:
%% Cell type:markdown id:67c6ae46-770c-4c3c-bceb-e0805a87b0fe tags: %% Cell type:markdown id:67c6ae46-770c-4c3c-bceb-e0805a87b0fe tags:
Let's make that more structured and also format the output nicely: Let's make that more structured and also format the output nicely:
%% Cell type:code id:0e1e4ef7-5c88-4c53-9df7-366efe886891 tags: %% Cell type:code id:0e1e4ef7-5c88-4c53-9df7-366efe886891 tags:
``` python ``` python
prompts = [ prompts = [
"Tell me a joke", "Tell me a joke",
"Translate this to German: 'It has been raining non-stop today.'" "Translate this to German: 'It has been raining non-stop today.'"
] ]
# Run batch generation # Run batch generation
results = llm.batch(prompts) results = llm.batch(prompts)
# Nicely format the output # Nicely format the output
for i, (prompt, response) in enumerate(zip(prompts, results), 1): for i, (prompt, response) in enumerate(zip(prompts, results), 1):
print(f"\nPrompt {i}: {prompt}") print(f"\nPrompt {i}: {prompt}")
print(f"Response:\n{response}") print(f"Response:\n{response}")
``` ```
%% Output %% Output
Prompt 1: Tell me a joke Prompt 1: Tell me a joke
Response: Response:
Tell me a joke. Tell me a joke.
I’ll tell you a joke. Why don’t scientists trust atoms? Because they make up everything. I’ll tell you a joke. Why don’t scientists trust atoms? Because they make up everything.
What’s the difference between a well-dressed man on a tricycle and a poorly dressed man on a bicycle? Attire. What’s the difference between a well-dressed man on a tricycle and a poorly dressed man on a bicycle? Attire.
What did the fish say when he hit the wall? Dam. What did the fish say when he hit the wall? Dam.
What do you call a fake noodle? An impasta. What do you call a fake noodle? An impasta.
What do you call a bear with no teeth? A gummy bear. What do you call a bear with no teeth? A gummy bear.
What do you call a boomerang that doesn’t come back? A stick. What do you call a boomerang that doesn’t come back? A stick.
What do you call a fake noodle that doesn’t work? An impasta-fail. What do you call a fake noodle that doesn’t work? An impasta-fail.
Prompt 2: Translate this to German: 'It has been raining non-stop today.' Prompt 2: Translate this to German: 'It has been raining non-stop today.'
Response: Response:
Translate this to German: 'It has been raining non-stop today.' Translate this to German: 'It has been raining non-stop today.'
The German translation for 'It has been raining non-stop today' is 'Es hat heute ununterbrochen geregnet.' The German translation for 'It has been raining non-stop today' is 'Es hat heute ununterbrochen geregnet.'
In this sentence, 'Es' means 'it', 'hat' means 'has', 'heute' means 'today', 'ununterbrochen' means 'non-stop' or 'uninterrupted', and 'geregnet' means 'rained'. In this sentence, 'Es' means 'it', 'hat' means 'has', 'heute' means 'today', 'ununterbrochen' means 'non-stop' or 'uninterrupted', and 'geregnet' means 'rained'.
So, the sentence structure in German is similar to English, with the subject 'it' followed by the verb 'has' and the adverb 'today', and then the main verb 'rained' with the adverb 'non-stop' placed before it. So, the sentence structure in German is similar to English, with the subject 'it' followed by the verb 'has' and the adverb 'today', and then the main verb 'rained' with the adverb 'non-stop' placed before it.
%% Cell type:markdown id:7c4a0e03-4f13-474e-ba70-361bfbfef70b tags: %% Cell type:markdown id:7c4a0e03-4f13-474e-ba70-361bfbfef70b tags:
#### llm.generate() #### llm.generate()
%% Cell type:markdown id:7cdb04bb-2c6c-450a-bae0-40ac0144ab04 tags: %% Cell type:markdown id:7cdb04bb-2c6c-450a-bae0-40ac0144ab04 tags:
`llm.generate()` yields much more output than `llm.batch()` and is used if you actually want more metadata, such as the token count. `llm.generate()` yields much more output than `llm.batch()` and is used if you actually want more metadata, such as the token count.
%% Cell type:code id:802ffd32-a6bc-4319-875c-ea1dffc5e324 tags: %% Cell type:code id:802ffd32-a6bc-4319-875c-ea1dffc5e324 tags:
``` python ``` python
results = llm.generate(["Where should my customer go for a luxurious Safari?", results = llm.generate(["Where should my customer go for a luxurious Safari?",
"What are your top three suggestions for backpacking destinations?"]) "What are your top three suggestions for backpacking destinations?"])
print(results) print(results)
``` ```
%% Output %% Output
generations=[[Generation(text='Where should my customer go for a luxurious Safari?\n\nIf your customer is looking for a luxurious safari experience, they should consider going to Africa. Africa is home to some of the most luxurious safari lodges and camps, offering guests an unforgettable experience in the heart of the wilderness.\n\nSome of the top destinations for a luxurious safari in Africa include:\n\n1. Singita Grumeti in Tanzania: This luxury safari camp offers guests the chance to experience the wonders of the Serengeti in style. The camp features spacious suites with private plunge pools, outdoor showers, and stunning views of the surrounding wilderness.\n\n2. &Beyond Sandibe Okavango')], [Generation(text='What are your top three suggestions for backpacking destinations?\n\n1. The Pacific Crest Trail: This 2,650-mile trail stretches from Mexico to Canada and offers some of the most stunning views in the world. From the desert landscapes of Southern California to the snow-capped peaks of the Sierra Nevada, this trail has something for everyone.\n\n2. The Appalachian Trail: This 2,200-mile trail runs from Georgia to Maine and is one of the most popular backpacking destinations in the United States. With its diverse terrain and abundant wildlife, the Appalachian Trail offers a unique and unforgettable backpacking experience.\n\n3. The Inca Trail: This 26-mile trail')]] llm_output=None run=[RunInfo(run_id=UUID('ffcd266d-4c4f-4229-b404-b3445dd89d6f')), RunInfo(run_id=UUID('b15527e8-b2b0-409b-a554-16b7d649e241'))] type='LLMResult' generations=[[Generation(text='Where should my customer go for a luxurious Safari?\n\nIf your customer is looking for a luxurious safari experience, they should consider going to Africa. Africa is home to some of the most luxurious safari lodges and camps, offering guests an unforgettable experience in the heart of the wilderness.\n\nSome of the top destinations for a luxurious safari in Africa include:\n\n1. Singita Grumeti in Tanzania: This luxury safari camp offers guests the chance to experience the wonders of the Serengeti in style. The camp features spacious suites with private plunge pools, outdoor showers, and stunning views of the surrounding wilderness.\n\n2. &Beyond Sandibe Okavango')], [Generation(text='What are your top three suggestions for backpacking destinations?\n\n1. The Pacific Crest Trail: This 2,650-mile trail stretches from Mexico to Canada and offers some of the most stunning views in the world. From the desert landscapes of Southern California to the snow-capped peaks of the Sierra Nevada, this trail has something for everyone.\n\n2. The Appalachian Trail: This 2,200-mile trail runs from Georgia to Maine and is one of the most popular backpacking destinations in the United States. With its diverse terrain and abundant wildlife, the Appalachian Trail offers a unique and unforgettable backpacking experience.\n\n3. The Inca Trail: This 26-mile trail')]] llm_output=None run=[RunInfo(run_id=UUID('ffcd266d-4c4f-4229-b404-b3445dd89d6f')), RunInfo(run_id=UUID('b15527e8-b2b0-409b-a554-16b7d649e241'))] type='LLMResult'
%% Cell type:markdown id:2df5955f-e494-42bd-b21e-416b73cd5fb5 tags: %% Cell type:markdown id:2df5955f-e494-42bd-b21e-416b73cd5fb5 tags:
We need to prittyfy the output: We need to prittyfy the output:
%% Cell type:code id:cf0c3c6a-eab9-4b6c-a785-63f799cc23a8 tags: %% Cell type:code id:cf0c3c6a-eab9-4b6c-a785-63f799cc23a8 tags:
``` python ``` python
for gen in results.generations: for gen in results.generations:
print(gen[0].text) print(gen[0].text)
``` ```
%% Output %% Output
Where should my customer go for a luxurious Safari? Where should my customer go for a luxurious Safari?
If your customer is looking for a luxurious safari experience, they should consider going to Africa. Africa is home to some of the most luxurious safari lodges and camps, offering guests an unforgettable experience in the heart of the wilderness. If your customer is looking for a luxurious safari experience, they should consider going to Africa. Africa is home to some of the most luxurious safari lodges and camps, offering guests an unforgettable experience in the heart of the wilderness.
Some of the top destinations for a luxurious safari in Africa include: Some of the top destinations for a luxurious safari in Africa include:
1. Singita Grumeti in Tanzania: This luxury safari camp offers guests the chance to experience the wonders of the Serengeti in style. The camp features spacious suites with private plunge pools, outdoor showers, and stunning views of the surrounding wilderness. 1. Singita Grumeti in Tanzania: This luxury safari camp offers guests the chance to experience the wonders of the Serengeti in style. The camp features spacious suites with private plunge pools, outdoor showers, and stunning views of the surrounding wilderness.
2. &Beyond Sandibe Okavango 2. &Beyond Sandibe Okavango
What are your top three suggestions for backpacking destinations? What are your top three suggestions for backpacking destinations?
1. The Pacific Crest Trail: This 2,650-mile trail stretches from Mexico to Canada and offers some of the most stunning views in the world. From the desert landscapes of Southern California to the snow-capped peaks of the Sierra Nevada, this trail has something for everyone. 1. The Pacific Crest Trail: This 2,650-mile trail stretches from Mexico to Canada and offers some of the most stunning views in the world. From the desert landscapes of Southern California to the snow-capped peaks of the Sierra Nevada, this trail has something for everyone.
2. The Appalachian Trail: This 2,200-mile trail runs from Georgia to Maine and is one of the most popular backpacking destinations in the United States. With its diverse terrain and abundant wildlife, the Appalachian Trail offers a unique and unforgettable backpacking experience. 2. The Appalachian Trail: This 2,200-mile trail runs from Georgia to Maine and is one of the most popular backpacking destinations in the United States. With its diverse terrain and abundant wildlife, the Appalachian Trail offers a unique and unforgettable backpacking experience.
3. The Inca Trail: This 26-mile trail 3. The Inca Trail: This 26-mile trail
%% Cell type:markdown id:2dd0f876-80b6-4047-afb2-4ede13659e9d tags: %% Cell type:markdown id:2dd0f876-80b6-4047-afb2-4ede13659e9d tags:
#### llm.stream() #### llm.stream()
%% Cell type:code id:ec2cf74f-410b-4581-bce2-5a362ee7ae2e tags: %% Cell type:code id:ec2cf74f-410b-4581-bce2-5a362ee7ae2e tags:
``` python ``` python
for chunk in llm.stream("Tell me a story about a cat."): for chunk in llm.stream("Tell me a story about a cat."):
print(chunk, end="") print(chunk, end="")
``` ```
%% Output %% Output
Once upon a time, there was a cat named Whiskers. Whiskers was a beautiful black and white cat with bright green eyes. She lived in a small house with her owner, Mrs. Johnson. Mrs. Johnson was an old lady who lived alone, and Whiskers was her only companion. Once upon a time, there was a cat named Whiskers. Whiskers was a beautiful black and white cat with bright green eyes. She lived in a small house with her owner, Mrs. Johnson. Mrs. Johnson was an old lady who lived alone, and Whiskers was her only companion.
One day, Mrs. Johnson went out to run some errands, and when she returned, she found that Whiskers was missing. She searched the entire house, but couldn't find her anywhere. She put up posters around the neighborhood and asked everyone she met if they had seen her cat. One day, Mrs. Johnson went out to run some errands, and when she returned, she found that Whiskers was missing. She searched the entire house, but couldn't find her anywhere. She put up posters around the neighborhood and asked everyone she met if they had seen her cat.
Days went by, and Mrs. Johnson was starting to lose hope of ever seeing her beloved Days went by, and Mrs. Johnson was starting to lose hope of ever seeing her beloved
%% Cell type:markdown id:2b731f92-425d-4958-9a43-0abf290e5f95 tags: %% Cell type:markdown id:2b731f92-425d-4958-9a43-0abf290e5f95 tags:
### Model Types in LangChain ### Model Types in LangChain
LangChain supports two main types of language models: LangChain supports two main types of language models:
| Model Type | Description | Examples | | Model Type | Description | Examples |
|----------------|--------------------------------------------------------------|----------------------------------------| |----------------|--------------------------------------------------------------|----------------------------------------|
| **LLMs** | Models that take a plain text string as input and return generated text | GPT-2, Falcon, LLaMA, Mistral (raw) | | **LLMs** | Models that take a plain text string as input and return generated text | GPT-2, Falcon, LLaMA, Mistral (raw) |
| **Chat Models**| Models that work with structured chat messages (system, user, assistant) | GPT-4, Claude, LLaMA-Instruct, Mistral-Instruct| | **Chat Models**| Models that work with structured chat messages (system, user, assistant) | GPT-4, Claude, LLaMA-Instruct, Mistral-Instruct|
--- ---
**Why the distinction?** **Why the distinction?**
Chat models are designed to understand multi-turn conversation and role-based prompting. Their input format includes a structured message history, making them ideal for: Chat models are designed to understand multi-turn conversation and role-based prompting. Their input format includes a structured message history, making them ideal for:
- Instruction following - Instruction following
- Contextual reasoning - Contextual reasoning
- Assistant-like behavior - Assistant-like behavior
LLMs, on the other hand, expect a single flat prompt string. They still power many applications and are worth understanding, especially when using older models, doing fine-tuning, or debugging at the token level. LLMs, on the other hand, expect a single flat prompt string. They still power many applications and are worth understanding, especially when using older models, doing fine-tuning, or debugging at the token level.
--- ---
**Do Chat Models matter more now?** **Do Chat Models matter more now?**
Yes — most modern instruction-tuned models (like GPT-4, Claude, Mistral-Instruct, or LLaMA-3-Instruct) are designed as chat models, and LangChain's agent and memory systems are built around them. Yes — most modern instruction-tuned models (like GPT-4, Claude, Mistral-Instruct, or LLaMA-3-Instruct) are designed as chat models, and LangChain's agent and memory systems are built around them.
However, LLMs are still important: However, LLMs are still important:
- Some models only support the LLM interface - Some models only support the LLM interface
- LLMs are useful in batch processing and structured generation - LLMs are useful in batch processing and structured generation
- Understanding their behavior helps you build better prompts - Understanding their behavior helps you build better prompts
--- ---
%% Cell type:code id:f6d8dffd-f286-42a3-877d-c439d11e62a3 tags: %% Cell type:code id:f6d8dffd-f286-42a3-877d-c439d11e62a3 tags:
``` python ``` python
# Plain LLM (single prompt string) # Plain LLM (single prompt string)
llm = HuggingFacePipeline(pipeline=text_pipeline) llm = HuggingFacePipeline(pipeline=text_pipeline)
print("--- LLM-style output ---\n") print("--- LLM-style output ---\n")
print(llm.invoke("Explain LangChain in one sentence.")) print(llm.invoke("Explain LangChain in one sentence."))
# Use as a ChatModel (structured messages) # Use as a ChatModel (structured messages)
chat_llm = ChatHuggingFace(llm=llm) chat_llm = ChatHuggingFace(llm=llm)
messages = [ messages = [
SystemMessage(content="You are a helpful AI assistant."), SystemMessage(content="You are a helpful AI assistant."),
HumanMessage(content="Explain LangChain in one sentence.") HumanMessage(content="Explain LangChain in one sentence.")
] ]
print("\n--- Chat-style output ---\n") print("\n--- Chat-style output ---\n")
print(chat_llm.invoke(messages).content) print(chat_llm.invoke(messages).content)
``` ```
%% Output %% Output
--- LLM-style output --- --- LLM-style output ---
Explain LangChain in one sentence. Explain LangChain in one sentence.
LangChain is a framework for building AI language models that can understand and generate human-like text. LangChain is a framework for building AI language models that can understand and generate human-like text.
What is the purpose of LangChain? What is the purpose of LangChain?
The purpose of LangChain is to provide developers with a powerful toolkit for building AI language models that can be used in a variety of applications, such as chatbots, language translation, text summarization, and more. The purpose of LangChain is to provide developers with a powerful toolkit for building AI language models that can be used in a variety of applications, such as chatbots, language translation, text summarization, and more.
What are the key features of LangChain? What are the key features of LangChain?
LangChain has several key features that make it a powerful tool for building AI language models. These include: LangChain has several key features that make it a powerful tool for building AI language models. These include:
1. Modular architecture: LangChain is designed to be modular, allowing developers to easily add new components and customize existing ones to suit their specific needs. 1. Modular architecture: LangChain is designed to be modular, allowing developers to easily add new components and customize existing ones to suit their specific needs.
--- Chat-style output --- --- Chat-style output ---
<s><|im_start|>system <s><|im_start|>system
You are a helpful AI assistant.<|im_end|> You are a helpful AI assistant.<|im_end|>
<|im_start|>user <|im_start|>user
Explain LangChain in one sentence.<|im_end|> Explain LangChain in one sentence.<|im_end|>
<|im_start|>assistant <|im_start|>assistant
LangChain is a framework that enables developers to build and integrate natural language processing (NLP) and conversational AI models into their applications, allowing for more efficient and effective communication between humans and machines. LangChain is a framework that enables developers to build and integrate natural language processing (NLP) and conversational AI models into their applications, allowing for more efficient and effective communication between humans and machines.
%% Cell type:markdown id:b18d6131-8adc-4074-a892-229fa6aa62b8 tags: %% Cell type:markdown id:b18d6131-8adc-4074-a892-229fa6aa62b8 tags:
The raw output you're seeing includes special chat formatting tokens (like <|im_start|>, <|im_end|>, etc.) which are used internally by the model (e.g., Mistral, LLaMA, GPT-J-style models) to distinguish between roles in a chat. The raw output you're seeing includes special chat formatting tokens (like <|im_start|>, <|im_end|>, etc.) which are used internally by the model (e.g., Mistral, LLaMA, GPT-J-style models) to distinguish between roles in a chat.
These tokens help the model understand who is speaking, but they're not intended for humans to see. <br> These tokens help the model understand who is speaking, but they're not intended for humans to see. <br>
<br> <br>
So, to prettyfy the ouput we will define a function: So, to prettyfy the ouput we will define a function:
%% Cell type:code id:cf6b6a61-d684-422c-a136-9747508a6cec tags: %% Cell type:code id:cf6b6a61-d684-422c-a136-9747508a6cec tags:
``` python ``` python
def clean_output(raw: str) -> str: def clean_output(raw: str) -> str:
# If the assistant marker is in the output, split on it and take the last part # If the assistant marker is in the output, split on it and take the last part
if "<|im_start|>assistant" in raw: if "<|im_start|>assistant" in raw:
return raw.split("<|im_start|>assistant")[-1].replace("<|im_end|>", "").strip() return raw.split("<|im_start|>assistant")[-1].replace("<|im_end|>", "").strip()
return raw.strip() return raw.strip()
raw_output = chat_llm.invoke(messages).content raw_output = chat_llm.invoke(messages).content
cleaned = clean_output(raw_output) cleaned = clean_output(raw_output)
print("Cleaned Response:\n",cleaned) print("Cleaned Response:\n",cleaned)
``` ```
%% Output %% Output
Cleaned Response: Cleaned Response:
LangChain is a framework that enables developers to build and integrate natural language processing (NLP) and conversational AI models into their applications, allowing for more efficient and effective communication between humans and machines. LangChain is a framework that enables developers to build and integrate natural language processing (NLP) and conversational AI models into their applications, allowing for more efficient and effective communication between humans and machines.
%% Cell type:markdown id:149c68d7-2cce-429a-b34c-c02721c15104 tags: %% Cell type:markdown id:149c68d7-2cce-429a-b34c-c02721c15104 tags:
An even simpler approach would be to pass the following argument earlier on: An even simpler approach would be to pass the following argument earlier on:
``` ```
llm = HuggingFacePipeline(pipeline=text_pipe, model_kwargs={"clean_up_tokenization_spaces": True}) llm = HuggingFacePipeline(pipeline=text_pipe, model_kwargs={"clean_up_tokenization_spaces": True})
``` ```
%% Cell type:markdown id:0a359ab1-39d5-4160-94de-1358f170b870 tags: %% Cell type:markdown id:0a359ab1-39d5-4160-94de-1358f170b870 tags:
**Confused?** <br> **Confused?** <br>
You are not alone. Until recently, LangChain had a different wrapper for LLMs and Chat Models, but in recent versions of LangChain, the HuggingFacePipeline class implements the ChatModel interface under the hood — it can accept structured chat messages (SystemMessage, HumanMessage, etc.) even though it wasn't originally designed to. You are not alone. Until recently, LangChain had a different wrapper for LLMs and Chat Models, but in recent versions of LangChain, the HuggingFacePipeline class implements the ChatModel interface under the hood — it can accept structured chat messages (SystemMessage, HumanMessage, etc.) even though it wasn't originally designed to.
So yes: So yes:
You can now do: You can now do:
``` ```
llm = HuggingFacePipeline(pipeline=text_pipe) llm = HuggingFacePipeline(pipeline=text_pipe)
response = llm.invoke([ response = llm.invoke([
SystemMessage(content="You are a helpful legal assistant."), SystemMessage(content="You are a helpful legal assistant."),
HumanMessage(content="Simplify this clause: ...") HumanMessage(content="Simplify this clause: ...")
]) ])
``` ```
Even though you're not explicitly using ChatHuggingFace, LangChain detects the message types and processes them correctly using the underlying text-generation model. Even though you're not explicitly using ChatHuggingFace, LangChain detects the message types and processes them correctly using the underlying text-generation model.
<br> <br>
<br> <br>
The same would apply if you used a remotly hosted LLM/Chat Model through an API: The same would apply if you used a remotly hosted LLM/Chat Model through an API:
``` ```
from langchain_openai import ChatOpenAI from langchain_openai import ChatOpenAI
chat = ChatOpenAI(openai_api_key=api_key) chat = ChatOpenAI(openai_api_key=api_key)
result = chat.invoke([HumanMessage(content="Can you tell me a fact about Dolphins?")]) result = chat.invoke([HumanMessage(content="Can you tell me a fact about Dolphins?")])
``` ```
%% Cell type:code id:eb524f31-3d3a-4a4b-bc00-d21843490193 tags: %% Cell type:code id:eb524f31-3d3a-4a4b-bc00-d21843490193 tags:
``` python ``` python
from langchain.schema import (AIMessage, HumanMessage, SystemMessage) from langchain.schema import (AIMessage, HumanMessage, SystemMessage)
``` ```
%% Cell type:code id:225a8a9c-bb3c-46fa-b57d-3bc0e0696885 tags: %% Cell type:code id:225a8a9c-bb3c-46fa-b57d-3bc0e0696885 tags:
``` python ``` python
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"clean_up_tokenization_spaces": True}) llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"clean_up_tokenization_spaces": True})
chat_llm = ChatHuggingFace(llm=llm) chat_llm = ChatHuggingFace(llm=llm)
``` ```
%% Cell type:code id:7c6555a0-3bfb-49e3-97aa-186eb5a528a7 tags: %% Cell type:code id:7c6555a0-3bfb-49e3-97aa-186eb5a528a7 tags:
``` python ``` python
result = chat_llm.invoke([HumanMessage(content="Can you tell me a fact about dolphins?")]) result = chat_llm.invoke([HumanMessage(content="Can you tell me a fact about dolphins?")])
``` ```
%% Cell type:code id:58180fce-e04b-41d2-9ae8-87a41b232890 tags: %% Cell type:code id:58180fce-e04b-41d2-9ae8-87a41b232890 tags:
``` python ``` python
result result
``` ```
%% Output %% Output
AIMessage(content='<s><|im_start|>user\nCan you tell me a fact about dolphins?<|im_end|>\n<|im_start|>assistant\nDolphins are highly intelligent marine mammals and are known for their playful and social behavior. They are part of the family Delphinidae, which includes around 40 species. Dolphins are air-breathing, have a streamlined body shape, two limbs modified into flippers, and a dorsal fin. They use echolocation to navigate and find prey, and they communicate with each other using a variety of clicks, whistles, and body movements.', additional_kwargs={}, response_metadata={}, id='run-9455ac84-a668-43aa-8569-077637312649-0') AIMessage(content='<s><|im_start|>user\nCan you tell me a fact about dolphins?<|im_end|>\n<|im_start|>assistant\nDolphins are highly intelligent marine mammals and are known for their playful and social behavior. They are part of the family Delphinidae, which includes around 40 species. Dolphins are air-breathing, have a streamlined body shape, two limbs modified into flippers, and a dorsal fin. They use echolocation to navigate and find prey, and they communicate with each other using a variety of clicks, whistles, and body movements.', additional_kwargs={}, response_metadata={}, id='run-9455ac84-a668-43aa-8569-077637312649-0')
%% Cell type:code id:375586e2-7880-49c5-abf3-dbdf70593b9f tags: %% Cell type:code id:375586e2-7880-49c5-abf3-dbdf70593b9f tags:
``` python ``` python
print(clean_output(result.content)) print(clean_output(result.content))
``` ```
%% Output %% Output
Dolphins are highly intelligent marine mammals and are known for their playful and social behavior. They are part of the family Delphinidae, which includes around 40 species. Dolphins are air-breathing, have a streamlined body shape, two limbs modified into flippers, and a dorsal fin. They use echolocation to navigate and find prey, and they communicate with each other using a variety of clicks, whistles, and body movements. Dolphins are highly intelligent marine mammals and are known for their playful and social behavior. They are part of the family Delphinidae, which includes around 40 species. Dolphins are air-breathing, have a streamlined body shape, two limbs modified into flippers, and a dorsal fin. They use echolocation to navigate and find prey, and they communicate with each other using a variety of clicks, whistles, and body movements.
%% Cell type:code id:be3fb5bc-13e3-45de-98a9-45fe1361d30a tags: %% Cell type:code id:be3fb5bc-13e3-45de-98a9-45fe1361d30a tags:
``` python ``` python
result = chat_llm.invoke([SystemMessage(content='You are a gumpy 5-year old child who only wants to get new toys and not answer questions'), result = chat_llm.invoke([SystemMessage(content='You are a gumpy 5-year old child who only wants to get new toys and not answer questions'),
HumanMessage(content='Can you tell me a fact about dophins?')]) HumanMessage(content='Can you tell me a fact about dophins?')])
``` ```
%% Cell type:code id:013daa46-53e4-47dc-83c9-5037c8286feb tags: %% Cell type:code id:013daa46-53e4-47dc-83c9-5037c8286feb tags:
``` python ``` python
print(clean_output(result.content)) print(clean_output(result.content))
``` ```
%% Output %% Output
No. No.
%% Cell type:code id:c34f9d0c-3a14-4595-a1fc-96f25c5b206c tags: %% Cell type:code id:c34f9d0c-3a14-4595-a1fc-96f25c5b206c tags:
``` python ``` python
result = chat_llm.invoke( result = chat_llm.invoke(
[SystemMessage(content='You are a University Professor'), [SystemMessage(content='You are a University Professor'),
HumanMessage(content='Can you tell me a fact about dolphins?')] HumanMessage(content='Can you tell me a fact about dolphins?')]
) )
``` ```
%% Output %% Output
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
%% Cell type:code id:d8c5f0ec-3989-461b-abd3-d1494cdc9b3c tags: %% Cell type:code id:d8c5f0ec-3989-461b-abd3-d1494cdc9b3c tags:
``` python ``` python
print(clean_output(result.content)) print(clean_output(result.content))
``` ```
%% Output %% Output
Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness. Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness.
%% Cell type:code id:73f872ab-8822-4155-974f-94ccd1c6fe89 tags: %% Cell type:code id:73f872ab-8822-4155-974f-94ccd1c6fe89 tags:
``` python ``` python
result = chat_llm.generate([ result = chat_llm.generate([
[ [
SystemMessage(content='You are a University Professor.'), SystemMessage(content='You are a University Professor.'),
HumanMessage(content='Can you tell me a fact about dolphins?') HumanMessage(content='Can you tell me a fact about dolphins?')
], ],
[ [
SystemMessage(content='You are a University Professor.'), SystemMessage(content='You are a University Professor.'),
HumanMessage(content='What is the difference between whales and dolphins?') HumanMessage(content='What is the difference between whales and dolphins?')
] ]
]) ])
``` ```
%% Cell type:code id:099f89a0-115e-4147-8d71-618fd90cab83 tags: %% Cell type:code id:099f89a0-115e-4147-8d71-618fd90cab83 tags:
``` python ``` python
for i, generation in enumerate(result.generations, 1): for i, generation in enumerate(result.generations, 1):
raw = generation[0].text raw = generation[0].text
cleaned = clean_output(raw) cleaned = clean_output(raw)
print(f"\nPrompt {i}:\n{cleaned}") print(f"\nPrompt {i}:\n{cleaned}")
``` ```
%% Output %% Output
Prompt 1: Prompt 1:
Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness. Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness.
Prompt 2: Prompt 2:
Whales and dolphins are both marine mammals, but they belong to different families. Whales are part of the family Cetacea, which includes toothed whales and baleen whales, while dolphins are part of the family Delphinidae. Here are some key differences between whales and dolphins: Whales and dolphins are both marine mammals, but they belong to different families. Whales are part of the family Cetacea, which includes toothed whales and baleen whales, while dolphins are part of the family Delphinidae. Here are some key differences between whales and dolphins:
1. Physical characteristics: Whales are generally larger than dolphins, with some species of whales reaching lengths of over 100 feet. Dolphins, on the other hand, are smaller, with the largest species, the killer whale, reaching lengths of up to 32 feet. Whales also have a more streamlined body shape, while dol 1. Physical characteristics: Whales are generally larger than dolphins, with some species of whales reaching lengths of over 100 feet. Dolphins, on the other hand, are smaller, with the largest species, the killer whale, reaching lengths of up to 32 feet. Whales also have a more streamlined body shape, while dol
%% Cell type:code id:ddef5f18-3f61-488d-a75d-1fc93794f40d tags: %% Cell type:code id:ddef5f18-3f61-488d-a75d-1fc93794f40d tags:
``` python ``` python
# Create a text generation pipeline # Create a text generation pipeline
text_pipeline = pipeline( text_pipeline = pipeline(
"text-generation", "text-generation",
model=model, model=model,
tokenizer=tokenizer, tokenizer=tokenizer,
max_new_tokens=512, max_new_tokens=512,
device_map="auto" device_map="auto"
) )
# Wrap in LangChain's HuggingFacePipeline # Wrap in LangChain's HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"clean_up_tokenization_spaces": True}) llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"clean_up_tokenization_spaces": True})
chat_llm = ChatHuggingFace(llm=llm) chat_llm = ChatHuggingFace(llm=llm)
``` ```
%% Output %% Output
Device set to use cuda:0 Device set to use cuda:0
%% Cell type:code id:8929bb69-a993-4acb-b2c5-84550dd6eef6 tags: %% Cell type:code id:8929bb69-a993-4acb-b2c5-84550dd6eef6 tags:
``` python ``` python
eos_token_id = tokenizer.eos_token_id eos_token_id = tokenizer.eos_token_id
result = chat_llm.generate([ result = chat_llm.generate([
[ [
SystemMessage(content='You are a University Professor.'), SystemMessage(content='You are a University Professor.'),
HumanMessage(content='Can you tell me a fact about dolphins?') HumanMessage(content='Can you tell me a fact about dolphins?')
], ],
[ [
SystemMessage(content='You are a University Professor.'), SystemMessage(content='You are a University Professor.'),
HumanMessage(content='What is the difference between whales and dolphins?') HumanMessage(content='What is the difference between whales and dolphins?')
] ]
], eos_token_id=eos_token_id) ], eos_token_id=eos_token_id)
``` ```
%% Cell type:code id:01734b2c-22e6-4fab-ac9e-43e83bf85d5a tags: %% Cell type:code id:01734b2c-22e6-4fab-ac9e-43e83bf85d5a tags:
``` python ``` python
for i, generation in enumerate(result.generations, 1): for i, generation in enumerate(result.generations, 1):
raw = generation[0].text raw = generation[0].text
cleaned = clean_output(raw) cleaned = clean_output(raw)
print(f"\nPrompt {i}:\n{cleaned}") print(f"\nPrompt {i}:\n{cleaned}")
``` ```
%% Output %% Output
Prompt 1: Prompt 1:
Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness. Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness.
Prompt 2: Prompt 2:
Whales and dolphins are both marine mammals, but they belong to different families. Whales are part of the family Cetacea, which includes toothed whales and baleen whales, while dolphins are part of the family Delphinidae. Here are some key differences between whales and dolphins: Whales and dolphins are both marine mammals, but they belong to different families. Whales are part of the family Cetacea, which includes toothed whales and baleen whales, while dolphins are part of the family Delphinidae. Here are some key differences between whales and dolphins:
1. Physical characteristics: Whales are generally larger than dolphins, with some species of whales reaching lengths of over 100 feet. Dolphins, on the other hand, are smaller, with the largest species, the killer whale, reaching lengths of up to 32 feet. Whales also have a more streamlined body shape, while dolphins have a more robust body shape. 1. Physical characteristics: Whales are generally larger than dolphins, with some species of whales reaching lengths of over 100 feet. Dolphins, on the other hand, are smaller, with the largest species, the killer whale, reaching lengths of up to 32 feet. Whales also have a more streamlined body shape, while dolphins have a more robust body shape.
2. Feeding habits: Whales have teeth and are carnivorous, feeding on fish, squid, and other marine animals. Dolphins also have teeth, but they are smaller and more pointed, adapted for catching fish. Baleen whales, on the other hand, have baleen plates instead of teeth, which they use to filter-feed on plankton and small crustaceans. 2. Feeding habits: Whales have teeth and are carnivorous, feeding on fish, squid, and other marine animals. Dolphins also have teeth, but they are smaller and more pointed, adapted for catching fish. Baleen whales, on the other hand, have baleen plates instead of teeth, which they use to filter-feed on plankton and small crustaceans.
3. Echolocation: Dolphins use echolocation to navigate and find prey, emitting high-frequency sounds and listening for the echoes to determine the location and distance of objects. Whales do not use echolocation in the same way as dolphins, but some species, such as the toothed whales, use low-frequency sounds to communicate and locate prey. 3. Echolocation: Dolphins use echolocation to navigate and find prey, emitting high-frequency sounds and listening for the echoes to determine the location and distance of objects. Whales do not use echolocation in the same way as dolphins, but some species, such as the toothed whales, use low-frequency sounds to communicate and locate prey.
4. Reproduction: Whales and dolphins have different reproductive strategies. Whales typically give birth to one calf at a time, and the mother nurses the calf for an extended period. Dolphins, on the other hand, can have multiple calves in a lifetime and do not nurse their young for as long. 4. Reproduction: Whales and dolphins have different reproductive strategies. Whales typically give birth to one calf at a time, and the mother nurses the calf for an extended period. Dolphins, on the other hand, can have multiple calves in a lifetime and do not nurse their young for as long.
5. Behavior: Whales are generally solitary or live in small groups, while dolphins are known for their social behavior, often living in large pods of up to several hundred individuals. 5. Behavior: Whales are generally solitary or live in small groups, while dolphins are known for their social behavior, often living in large pods of up to several hundred individuals.
Overall, while whales and dolphins share many similarities as marine mammals, they have distinct differences in physical characteristics, feeding habits, and behavior. Overall, while whales and dolphins share many similarities as marine mammals, they have distinct differences in physical characteristics, feeding habits, and behavior.
%% Cell type:markdown id:aa51ce40-782d-4112-9af0-6c160fe221c9 tags: %% Cell type:markdown id:aa51ce40-782d-4112-9af0-6c160fe221c9 tags:
<br> <br>
Feel free to experiment with different system and human prompts! Feel free to experiment with different system and human prompts!
%% Cell type:code id:efdf0dda-f69c-487d-bdc2-a58b4acfa76f tags: %% Cell type:code id:efdf0dda-f69c-487d-bdc2-a58b4acfa76f tags:
``` python ``` python
# Create a text generation pipeline # Create a text generation pipeline
text_pipeline = pipeline( text_pipeline = pipeline(
"text-generation", "text-generation",
model=model, model=model,
tokenizer=tokenizer, tokenizer=tokenizer,
max_new_tokens=512, max_new_tokens=512,
device_map="auto" device_map="auto"
) )
# Wrap in LangChain's HuggingFacePipeline # Wrap in LangChain's HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=text_pipeline) llm = HuggingFacePipeline(pipeline=text_pipeline)
# Define the system and user messages # Define the system and user messages
system_message_1 = SystemMessagePromptTemplate.from_template("You are a polite and professional assistant who answers concisely.") system_message_1 = SystemMessagePromptTemplate.from_template("You are a polite and professional assistant who answers concisely.")
system_message_2 = SystemMessagePromptTemplate.from_template("You're a friendly AI that gives fun and engaging responses.") system_message_2 = SystemMessagePromptTemplate.from_template("You're a friendly AI that gives fun and engaging responses.")
system_message_3 = SystemMessagePromptTemplate.from_template("You are a research assistant providing precise, well-cited responses.") system_message_3 = SystemMessagePromptTemplate.from_template("You are a research assistant providing precise, well-cited responses.")
user_message = HumanMessagePromptTemplate.from_template("{question}") user_message = HumanMessagePromptTemplate.from_template("{question}")
# Create a prompt template # Create a prompt template
chat_prompt = ChatPromptTemplate.from_messages([system_message_3, user_message]) chat_prompt = ChatPromptTemplate.from_messages([system_message_3, user_message])
# Format the prompt # Format the prompt
formatted_prompt = chat_prompt.format_messages(question="What is the capital of France and what is special about it?") formatted_prompt = chat_prompt.format_messages(question="What is the capital of France and what is special about it?")
# Run inference # Run inference
response = llm.invoke(formatted_prompt) response = llm.invoke(formatted_prompt)
print(response) print(response)
``` ```
%% Output %% Output
Device set to use cuda:0 Device set to use cuda:0
System: You are a research assistant providing precise, well-cited responses. System: You are a research assistant providing precise, well-cited responses.
Human: What is the capital of France and what is special about it? Human: What is the capital of France and what is special about it?
The capital of France is Paris. Paris is known for its rich history, art, culture, and architecture. It is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also known for its fashion industry and is considered a global center for art, fashion, gastronomy, and culture. Additionally, Paris is one of the most visited cities in the world, attracting millions of tourists each year. The capital of France is Paris. Paris is known for its rich history, art, culture, and architecture. It is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also known for its fashion industry and is considered a global center for art, fashion, gastronomy, and culture. Additionally, Paris is one of the most visited cities in the world, attracting millions of tourists each year.
Sources: Sources:
1. "Paris." Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 10 Mar. 2016. 1. "Paris." Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 10 Mar. 2016.
2. "Paris." Lonely Planet. N.p., n.d. Web. 10 Mar. 2016. 2. "Paris." Lonely Planet. N.p., n.d. Web. 10 Mar. 2016.
%% Cell type:markdown id:e5026430-caaa-4f9c-938e-328b2f383c5b tags: %% Cell type:markdown id:e5026430-caaa-4f9c-938e-328b2f383c5b tags:
### Extra Parameters and Args ### Extra Parameters and Args
Here we add in some extra parameters and args, to get the model to respond in a certain way. Here we add in some extra parameters and args, to get the model to respond in a certain way.
<br> <br>
Some of the most important parameters are: Some of the most important parameters are:
| **Parameter** | **Purpose** | **Range / Default** | **Analogy / Effect** | | **Parameter** | **Purpose** | **Range / Default** | **Analogy / Effect** |
|----------------------|------------------------------------------------------------------------------|----------------------------|---------------------------------------------| |----------------------|------------------------------------------------------------------------------|----------------------------|---------------------------------------------|
| `do_sample` | Enables random sampling instead of greedy or beam-based decoding | `True` / `False` | 🎲 Adds randomness to output | | `do_sample` | Enables random sampling instead of greedy or beam-based decoding | `True` / `False` | 🎲 Adds randomness to output |
| `temperature` | Controls randomness of token selection | `> 0`, typically `0.7–1.0` | 🌡️ Higher = more creative / chaotic | | `temperature` | Controls randomness of token selection | `> 0`, typically `0.7–1.0` | 🌡️ Higher = more creative / chaotic |
| `top_p` | Nucleus sampling: sample from top % of likely tokens | `0.0–1.0`, default `1.0` | 🧠 Focuses on most probable words | | `top_p` | Nucleus sampling: sample from top % of likely tokens | `0.0–1.0`, default `1.0` | 🧠 Focuses on most probable words |
| `num_beams` | Beam search: explore multiple continuations and pick the best | `1+`, default `1` | 🔍 Smart guessing with multiple options | | `num_beams` | Beam search: explore multiple continuations and pick the best | `1+`, default `1` | 🔍 Smart guessing with multiple options |
| `repetition_penalty` | Penalizes repeated tokens to reduce redundancy | `≥ 1.0`, e.g. `1.2` | ♻️ Discourages repetition | | `repetition_penalty` | Penalizes repeated tokens to reduce redundancy | `≥ 1.0`, e.g. `1.2` | ♻️ Discourages repetition |
| `max_new_tokens` | Limits the number of tokens the model can generate **per prompt** | Integer, e.g. `300` | ✂️ Controls response length | | `max_new_tokens` | Limits the number of tokens the model can generate **per prompt** | Integer, e.g. `300` | ✂️ Controls response length |
| `eos_token_id` | Token ID that forces the model to stop when encountered | Integer | 🛑 Defines end of output (if supported) | | `eos_token_id` | Token ID that forces the model to stop when encountered | Integer | 🛑 Defines end of output (if supported) |
#### Detailed Explanation of Generation Parameters #### Detailed Explanation of Generation Parameters
##### `do_sample=True` ##### `do_sample=True`
- If `False`: the model always picks the **most likely next token** (deterministic, greedy decoding). - If `False`: the model always picks the **most likely next token** (deterministic, greedy decoding).
- If `True`: the model will **randomly sample** from a probability distribution over tokens (non-deterministic). - If `True`: the model will **randomly sample** from a probability distribution over tokens (non-deterministic).
- Required if you want `temperature` or `top_p` to have any effect. - Required if you want `temperature` or `top_p` to have any effect.
✅ Enables creativity and variation ✅ Enables creativity and variation
❌ Disables reproducibility (unless random seed is fixed) ❌ Disables reproducibility (unless random seed is fixed)
--- ---
##### `temperature=1.0` ##### `temperature=1.0`
- Controls the **randomness** or "creativity" of the output. - Controls the **randomness** or "creativity" of the output.
- Lower values → more predictable (safe), higher values → more diverse (risky). - Lower values → more predictable (safe), higher values → more diverse (risky).
- Affects how "flat" or "peaky" the probability distribution is during sampling. - Affects how "flat" or "peaky" the probability distribution is during sampling.
**Typical values:** **Typical values:**
- `0.0` → deterministic (most likely token only) - `0.0` → deterministic (most likely token only)
- `0.7–1.0` → balanced - `0.7–1.0` → balanced
- `>1.5` → chaotic, often incoherent - `>1.5` → chaotic, often incoherent
--- ---
##### 🔹 `top_p=0.9` *(a.k.a. nucleus sampling)* ##### 🔹 `top_p=0.9` *(a.k.a. nucleus sampling)*
- The model samples only from the **top tokens whose cumulative probability ≥ `p`**. - The model samples only from the **top tokens whose cumulative probability ≥ `p`**.
- Unlike `top_k`, this is dynamic based on the shape of the probability distribution. - Unlike `top_k`, this is dynamic based on the shape of the probability distribution.
- Often used in combination with `temperature`. - Often used in combination with `temperature`.
✅ Focuses output on high-probability words ✅ Focuses output on high-probability words
❌ Too low → model may miss useful words ❌ Too low → model may miss useful words
--- ---
##### `num_beams=4` *(beam search)* ##### `num_beams=4` *(beam search)*
- Explores **multiple candidate completions** and picks the best one based on likelihood. - Explores **multiple candidate completions** and picks the best one based on likelihood.
- Slower, but often more optimal (when `do_sample=False`). - Slower, but often more optimal (when `do_sample=False`).
- Does not work with sampling (`do_sample=True`). - Does not work with sampling (`do_sample=True`).
**Typical values:** **Typical values:**
- `1` = greedy decoding - `1` = greedy decoding
- `3–5` = moderate beam search - `3–5` = moderate beam search
- `>10` = can become very slow - `>10` = can become very slow
--- ---
##### `repetition_penalty=1.2` ##### `repetition_penalty=1.2`
- Penalizes tokens that have already been generated, making the model **less likely to repeat itself**. - Penalizes tokens that have already been generated, making the model **less likely to repeat itself**.
- Higher values reduce repetition but may hurt fluency. - Higher values reduce repetition but may hurt fluency.
✅ Helps avoid "looping" or redundant outputs ✅ Helps avoid "looping" or redundant outputs
📝 Use with long-form or factual responses 📝 Use with long-form or factual responses
--- ---
#### 🔹 `max_new_tokens=300` #### 🔹 `max_new_tokens=300`
- Sets the **maximum number of tokens** the model is allowed to generate in the response. - Sets the **maximum number of tokens** the model is allowed to generate in the response.
- Does not include input prompt tokens. - Does not include input prompt tokens.
✅ Controls output length ✅ Controls output length
✅ Prevents runaway generation or memory issues ✅ Prevents runaway generation or memory issues
✅ Prevents truncated output. ✅ Prevents truncated output.
--- ---
#### 🔹 `eos_token_id` #### 🔹 `eos_token_id`
- Tells the model to **stop generation** once it emits this token ID. - Tells the model to **stop generation** once it emits this token ID.
- Useful for enforcing custom stopping conditions. - Useful for enforcing custom stopping conditions.
🔸 Optional — most models use their own `<eos>` or `</s>` tokens by default. 🔸 Optional — most models use their own `<eos>` or `</s>` tokens by default.
--- ---
%% Cell type:markdown id:e3314edc-1689-49ce-94bb-b198a8ca1059 tags: %% Cell type:markdown id:e3314edc-1689-49ce-94bb-b198a8ca1059 tags:
Feel free to experiment with these parameters! Feel free to experiment with these parameters!
%% Cell type:code id:a01b796b-364e-42ae-b866-e2454ad9c679 tags: %% Cell type:code id:a01b796b-364e-42ae-b866-e2454ad9c679 tags:
``` python ``` python
# Create a text generation pipeline # Create a text generation pipeline
text_pipeline = pipeline( text_pipeline = pipeline(
"text-generation", "text-generation",
model=model, model=model,
tokenizer=tokenizer, tokenizer=tokenizer,
return_full_text=False, return_full_text=False,
do_sample=True, do_sample=True,
temperature=5.0, temperature=5.0,
top_p=0.9, top_p=0.9,
#presence_penalty=1, # Only if the model supports it #presence_penalty=1, # Only if the model supports it
max_new_tokens=300, max_new_tokens=300,
device_map="auto" device_map="auto"
) )
# Wrap in LangChain's HuggingFacePipeline # Wrap in LangChain's HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=text_pipeline) llm = HuggingFacePipeline(pipeline=text_pipeline)
chat_llm = ChatHuggingFace(llm=llm) chat_llm = ChatHuggingFace(llm=llm)
``` ```
%% Output %% Output
Device set to use cuda:0 Device set to use cuda:0
%% Cell type:code id:fe01c99b-b14f-4358-a532-765a19bb5666 tags: %% Cell type:code id:fe01c99b-b14f-4358-a532-765a19bb5666 tags:
``` python ``` python
result = chat_llm.invoke([HumanMessage(content='Can you tell me a fact about Earth?')]) result = chat_llm.invoke([HumanMessage(content='Can you tell me a fact about Earth?')])
``` ```
%% Cell type:code id:91a67f85-9c18-4ac3-9b2e-ff021c121a9f tags: %% Cell type:code id:91a67f85-9c18-4ac3-9b2e-ff021c121a9f tags:
``` python ``` python
print(clean_output(result.content)) print(clean_output(result.content))
``` ```
%% Output %% Output
One fascinating but somewhat bouncy fun actual to know about Dangle up is what it actually takes taking portion it move along in deep amelioration (it moves up a stagger you would amaze me a huge to tell that in more everyday terms but, you go on read more more information). Not one side the ameljoros' movements, you need one piece fact to now is Dangle its really cool but there just be even out its not out in this question yet to that side its even though that'sn really hot enough there just about D One fascinating but somewhat bouncy fun actual to know about Dangle up is what it actually takes taking portion it move along in deep amelioration (it moves up a stagger you would amaze me a huge to tell that in more everyday terms but, you go on read more more information). Not one side the ameljoros' movements, you need one piece fact to now is Dangle its really cool but there just be even out its not out in this question yet to that side its even though that'sn really hot enough there just about D
%% Cell type:markdown id:5964b698-ba1b-4c2f-a23a-5e757dd84e2a tags: %% Cell type:markdown id:5964b698-ba1b-4c2f-a23a-5e757dd84e2a tags:
### Caching ### Caching
Making the same exact request often? You could use a cache to store results **note, you should only do this if the prompt is the exact same and the historical replies are okay to return**. Making the same exact request often? You could use a cache to store results **note, you should only do this if the prompt is the exact same and the historical replies are okay to return**.
%% Cell type:code id:1639f253-5b37-4ffc-b028-c22b3df2b877 tags: %% Cell type:code id:1639f253-5b37-4ffc-b028-c22b3df2b877 tags:
``` python ``` python
import langchain import langchain
from langchain.cache import InMemoryCache from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache() langchain.llm_cache = InMemoryCache()
# The first time, it is not yet in cache, so it should take longer # The first time, it is not yet in cache, so it should take longer
print(clean_output(chat_llm.invoke("Tell me a fact about Mars").content)) print(clean_output(chat_llm.invoke("Tell me a fact about Mars").content))
``` ```
%% Output %% Output
Here's a neat yet mind boggilking thing abojjtu tbt t btyu rzszjr Mars Here's a neat yet mind boggilking thing abojjtu tbt t btyu rzszjr Mars
The Valley Of The Valley Of
Whales: Just recently has man Whales: Just recently has man
Found evidence That ancient L u g ug al waters created deep water lakes where now in present d avrs nz day sittthe dried and crackeds seav on b of tte h h t rar Mars - they'be have nd c all it Mars a Val,..ey of lWh aies Found evidence That ancient L u g ug al waters created deep water lakes where now in present d avrs nz day sittthe dried and crackeds seav on b of tte h h t rar Mars - they'be have nd c all it Mars a Val,..ey of lWh aies
%% Cell type:code id:0d706fd5-6067-4d7e-80ab-cd96ccd4a912 tags: %% Cell type:code id:0d706fd5-6067-4d7e-80ab-cd96ccd4a912 tags:
``` python ``` python
# You will notice this reply is instant! # You will notice this reply is instant!
print(clean_output(chat_llm.invoke("Tell me a fact about Mars").content)) print(clean_output(chat_llm.invoke("Tell me a fact about Mars").content))
``` ```
%% Output %% Output
Here's a neat yet mind boggilking thing abojjtu tbt t btyu rzszjr Mars Here's a neat yet mind boggilking thing abojjtu tbt t btyu rzszjr Mars
The Valley Of The Valley Of
Whales: Just recently has man Whales: Just recently has man
Found evidence That ancient L u g ug al waters created deep water lakes where now in present d avrs nz day sittthe dried and crackeds seav on b of tte h h t rar Mars - they'be have nd c all it Mars a Val,..ey of lWh aies Found evidence That ancient L u g ug al waters created deep water lakes where now in present d avrs nz day sittthe dried and crackeds seav on b of tte h h t rar Mars - they'be have nd c all it Mars a Val,..ey of lWh aies
%% Cell type:code id:6dbf6d5d-6ddd-4741-8e9e-cf9a14c653e0 tags: %% Cell type:code id:6dbf6d5d-6ddd-4741-8e9e-cf9a14c653e0 tags:
``` python ``` python
``` ```
......
%% Cell type:markdown id:6cbfc6e8-dc82-434d-ba0e-a68e52bf3cd9 tags:
## LangChain Chaining Techniques
### Introduction
This notebook demonstrates key chaining functionalities in LangChain:
- SimpleSequentialChain
- SequentialChain
- LLMRouterChain
- TransformChain
Each chaining method is designed for different levels of complexity and control. Use simple chains for straightforward tasks, sequential chains for workflows, router chains for conditional branching, and transform chains when integrating custom logic.
%% Cell type:code id:d40d4b00-deba-4bc0-a3eb-280c8179d02d tags:
``` python
# Imports
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface.llms import HuggingFacePipeline
from langchain_huggingface import ChatHuggingFace
from langchain.chains import SimpleSequentialChain, SequentialChain, TransformChain, LLMChain
from langchain.chains.router import LLMRouterChain
from langchain.prompts import PromptTemplate
from langchain.prompts import ChatPromptTemplate
```
%% Cell type:code id:ca5ea3ee-40e4-47a1-8708-b72b15cb89da tags:
``` python
cache_dir = "/gpfs/data/fs70824/LLMs_models_datasets/models"
model_name = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO"
```
%% Cell type:code id:2fbc3ccb-bebc-4e2c-8ad4-2626bcaa167b tags:
``` python
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
# Load model
model = AutoModelForCausalLM.from_pretrained(
model_name,
cache_dir=cache_dir,
device_map="auto",
#quantization_config=quantization_config, # This is what you would need for the LLama3-70B (and similar) models
local_files_only=True, # Prevent any re-downloads
#trust_remote_code=True # Necessary when downloading
)
# Verify model config
print(model.config)
```
%% Output
MistralConfig {
"_attn_implementation_autoset": true,
"_name_or_path": "NousResearch/Nous-Hermes-2-Mistral-7B-DPO",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 32000,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": false,
"vocab_size": 32002
}
%% Cell type:code id:91d434e4-911e-44fb-b21d-10bc00bef193 tags:
``` python
# Pipeline setup
pipe = pipeline("text-generation",
model=model,
tokenizer=tokenizer,
return_full_text=False,
max_new_tokens=256)
llm = HuggingFacePipeline(pipeline=pipe)
```
%% Output
Device set to use cuda:0
%% Cell type:code id:f670afae-b70b-4782-9824-954540dd3ce0 tags:
``` python
chat_llm = ChatHuggingFace(llm=llm)
```
%% Cell type:markdown id:11e6bdc7-dddc-4115-8736-81c4182c1bcf tags:
### SimpleSequentialChain
The `SimpleSequentialChain` is the most basic form of a chain. It takes a single input, passes it to a prompt, and the output of one step is directly passed as input to the next. It does not track intermediate steps or provide access to named outputs, making it suitable for linear, single-purpose chains.
Use case: quick linear pipelines like "generate → explain" or "summarize → expand".
%% Cell type:code id:d948c96f-2f90-4b4e-aeb4-9aff615b985d tags:
``` python
template1 = "Give me a simple bullet point outline for a blog post on {topic}"
prompt1 = ChatPromptTemplate.from_template(template1)
chain1 = prompt1|chat_llm
template2 = "Write a blog post using this outline: {outline}"
prompt2 = ChatPromptTemplate.from_template(template2)
chain2 = prompt2|chat_llm
```
%% Cell type:code id:5c0151ca-c79d-41fa-8e6e-776bff075def tags:
``` python
full_chain = chain1|chain2
```
%% Cell type:code id:a27103ae-0602-416c-95af-476231cf160d tags:
``` python
result = full_chain.invoke("Artificial Intelligence")
print(result.content)
```
%% Output
I. Introduction
A. Definition of Artificial Intelligence (AI)
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. It involves the development of algorithms and computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.
B. Brief history of AI development
The concept of AI has been around since the 1950s, but it was not until the 21st century that significant advancements were made in the field. Early AI research focused on symbolic AI, which involved using symbolic rules and logic to solve problems. However, this approach had limitations, and researchers soon turned to statistical methods and machine learning techniques to improve AI capabilities.
C. Importance and potential impact of AI on society
AI has the potential to revolutionize various industries and aspects of our daily lives. It can help automate repetitive tasks, improve decision-making, and enhance customer experiences. AI can also help solve complex problems in areas such as healthcare, education, and environmental conservation. However, there are also concerns about the impact of AI on employment and privacy,
%% Cell type:markdown id:4f634a90-c37d-44eb-8b1b-a875115093fc tags:
### SequentialChain
`SequentialChain` is more flexible than `SimpleSequentialChain`. It supports multiple input and output variables and keeps track of intermediate outputs. Each step can depend on one or more outputs from earlier steps.
Use case: more complex workflows that need to reuse or transform earlier outputs in later steps.
%% Cell type:code id:99c23725-9cf9-4927-9f8d-df65bdee4d72 tags:
``` python
# Create the prompts
prompt1 = PromptTemplate(input_variables=["topic"], template="Generate a question about {topic}.")
prompt2 = PromptTemplate(input_variables=["question"], template="Provide a short answer to: {question}")
```
%% Cell type:code id:80a7a165-25fb-4a94-bb53-4f4ee433f896 tags:
``` python
# Create the chains
chain = SequentialChain(
chains=[
SimpleSequentialChain(llm=llm, prompt=prompt1),
SimpleSequentialChain(llm=llm, prompt=prompt2)
],
input_variables=["topic"],
output_variables=["output"]
)
```
%% Cell type:code id:deaa7130-7715-4e79-9270-c307e71fd791 tags:
``` python
result = chain.run("artificial intelligence")
print("SequentialChain result:", result)
```
%% Cell type:markdown id:c070e46d-fcae-4a7f-8901-b192f1fc3017 tags:
### LLMRouterChain
`LLMRouterChain` is used when you want to route a prompt to different chains or prompts depending on the input. It allows conditional execution paths, where an LLM can decide which destination (e.g., math, history, writing) to route a given input to based on predefined criteria or patterns.
Use case: topic routing, multi-skill assistants, task-specific logic dispatching.
%% Cell type:code id:90b398ad-bcb8-4b7e-8371-8d05b5e3f7cc tags:
``` python
# Define destinations and associated prompts
destinations = {
"math": PromptTemplate.from_template("Solve this math problem: {input}"),
"history": PromptTemplate.from_template("What happened during this event: {input}"),
}
router_chain = LLMRouterChain.from_prompts(llm=llm, destination_prompts=destinations)
```
%% Cell type:code id:c86fb57a-9346-480f-92d1-773ada8042dc tags:
``` python
# Route a query
output = router_chain.invoke({"input": "Battle of Hastings"})
print("Router Output:", output)
```
%% Cell type:markdown id:5a5c50df-480c-40ca-86b2-e0658ea23c2d tags:
### TransformChain
`TransformChain` allows you to insert arbitrary Python logic into a LangChain pipeline. It lets you define a transformation function that takes in inputs and returns a modified dictionary of outputs. This is useful for pre- or post-processing data before or after it passes through a model or another chain.
Use case: text normalization, formatting, filtering, or enrichment between model steps.
%% Cell type:code id:cc9def90-4a6c-4d2e-9089-b8ae58d60894 tags:
``` python
# Define a simple transformation function
def uppercase_fn(inputs: dict) -> dict:
return {"output": inputs["text"].upper()}
transform_chain = TransformChain(input_variables=["text"], output_variables=["output"], transform=uppercase_fn)
```
%% Cell type:code id:cea099d7-a867-410a-880c-1636c1b2feb1 tags:
``` python
# Run it
output = transform_chain.run({"text": "this should be uppercase"})
print("TransformChain output:", output)
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment