Added new notebooks

6fc78ebb · Harrison, Simeon · 2b60a966 · 6fc78ebb · 6fc78ebb
Commit 6fc78ebb authored 6 months ago by Harrison, Simeon
--- a/D3_01_Prompting_with_LangChain.ipynb
+++ b/D3_01_Prompting_with_LangChain.ipynb
@@ -205,7 +205,7 @@
    "# Load tokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)\n",
    "\n",
-    "# Load model with new quantization method\n",
+    "# Load model\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    model_name,\n",
    "    cache_dir=cache_dir,\n",

 %% Cell type:markdown id:78131162-a068-41cb-b1e5-4f80b03cdfa1 tags:
 # Prompt Engineering Essentials
 The D3 notebooks will cover the essential topics of prompt engineering, beginning with inference in general and an introduction to LangChain. We will then cover the topics of prompt templates and parsing and will then go on to the concept of creating chains and connecting these in different ways to build more sophisticated constructs to make the most of LLMs.
 %% Cell type:markdown id:72d12851-f8e8-4143-8ec0-da82284066a0 tags:
 ## API vs. Locally Hosted LLM
 Using the an API-hosted LLM (e.g. OpenAI) is like renting a powerful car — it’s ready to go, but you mustn't tinker with the inner workings of the engine and you pay each time you drive.
 Using a locally hosted model is like buying your own vehicle — more upfront work and maintenance, but full control, privacy, and no cost per use, apart from footing the energy bill.
 | **Aspect**                 | **API-based (e.g. OpenAI)**                          | **Local Model (e.g. Mistral, PyTorch + LangChain)**        |
 |---------------------------|------------------------------------------------------|-------------------------------------------------------------|
 | **Setup time**            | Minimal – just an API key                            | Requires downloading and managing the model                 |
 | **Hardware requirement**  | None (runs in the cloud)                             | Requires a GPU (sometimes large memory)                     |
 | **Latency**               | Network-dependent                                    | Faster inference (once model is loaded)                     |
 | **Privacy / Data control**| Data sent to external servers                      | Data stays on your infrastructure                         |
 | **Cost**                  | Pay-per-use (based on tokens)                        | Free at inference (after download), but uses your compute   |
 | **Scalability**           | Handled by provider                                  | You manage and scale infrastructure                         |
 | **Flexibility**           | Limited to provider's models and settings            | Full control: quantization, fine-tuning, prompt handling    |
 | **Offline use**           | Not possible                                       | Yes, after initial download                               |
 | **Customizability**       | No access to internals                             | You can modify and extend anything                        |
 **Using an API (e.g. OpenAI)** <br>
 - You use OpenAI or ChatOpenAI class from LangChain
 - LangChain sends your prompt to api.openai.com
 - You don’t manage the model, only the request and response
 ```
 from langchain.chat_models import ChatOpenAI
 llm = ChatOpenAI(api_key="...", model="gpt-4")
 response = llm.invoke("Summarize this legal clause...")
 ```
 📝 You can store your API key in different ways, it is common, however, to set it as an **environment variable**.
 Note, that LangChain automatically looks up any environment variable with the name **`OPENAI_API_KEY`** automatically when making a connection to OpenAI.
 ```
 import os
 os.environ['OPENAI_API_KEY'] = 'my_API_key_123'
 llm = ChatOpenAI(api_key=os.environ['OPENAI_API_KEY'], model="gpt-4")
 ```
 Alternatively, you could just pass in the openai key via a string (not very secure, you should NEVER hard-code your API keys), or even just save it somewhere on your computer in a text file and then read it in:
 ```
 f = open('C:\\Users\\Simeon\\Desktop\\openai.txt')
 api_key = f.read()
 llm = OpenAI(openai_api_key=api_key)
 ```
 **Using a Local Model (e.g. Mistral, LLaMA)**<br>
 - You load the model and tokenizer using Hugging Face Transformers
 - You wrap the pipeline using HuggingFacePipeline or similar in LangChain
 - You manage memory, GPU allocation, quantization, etc.
 ```
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from langchain_huggingface import ChatHuggingFace
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
 pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
 llm = ChatHuggingFace(llm=HuggingFacePipeline(pipeline=pipe))
 ```
 %% Cell type:markdown id:ef97785f-47e8-4e33-98a1-366843cdd23d tags:
 ## Basic Setup for Inference
 %% Cell type:markdown id:f5365e87-dbae-4f26-871f-74f672fc12b9 tags:
 Apart from the usual suspects of Pytorch and Huggingface libraries, we get our first imports of the LangChain library and some of its classes.
 Since we want to show you how to how to work with LLMs that are not part of the closed OpenAI and Anthropic world, we are going to show you how to work with open and downloadable models. As it makes no sense for all of us to download the models and store them in our home directory, we've done that for your before the start of the course. You can find the path to the models down below.
 %% Cell type:code id:77fbb51f-032e-4b72-83d5-37da49f8dfa7 tags:
 ``` python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
 from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_huggingface.llms import HuggingFacePipeline
 from langchain_huggingface import ChatHuggingFace
 ```
 %% Cell type:markdown id:6c43e856-81b6-4509-b0ac-227a096d2e38 tags:
 If you choose to work with a model such as `meta-llama/Llama-3.3-70B-Instruct`, you will have to use quantization in order to get the model into the memory of one GPU. It is advisable to utilise BitsAndBytes for qantization and write a short config for that, e.g.:
 ```
 # Define quantization config
 quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Enable 4-bit quantization
    bnb_4bit_compute_dtype=torch.float16,  # Use float16 for computation
    bnb_4bit_use_double_quant=True  # Double quantization for efficiency
 )
 ```
 However, beware, a model of that size takes roughly 30 minutes to load...
 In this course we do not want to wait around for that long, so we will use a smaller model called [Nous-Hermes-2-Mistral-7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO).
 %% Cell type:code id:d71156aa-4c3c-420e-8345-f5052c0655a7 tags:
 ``` python
 path_to_model = "/gpfs/data/fs70824/LLMs_models_datasets/models"
 ```
 %% Cell type:code id:41d0ca09-2bce-4761-b6b0-6503f5fb0f56 tags:
 ``` python
 #model_name = "meta-llama/Llama-3.3-70B-Instruct"
 model_name = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO"
 cache_dir = path_to_model
 ```
 %% Cell type:code id:79281d6e-2dc6-4651-94f6-53b56d7152f5 tags:
 ``` python
 # Load tokenizer
 tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
-# Load model with new quantization method
+# Load model
 model = AutoModelForCausalLM.from_pretrained(
    model_name,
    cache_dir=cache_dir,
    device_map="auto",
    #quantization_config=quantization_config, # This is what you would need for the LLama3-70B (and similar) models
    local_files_only=True,  # Prevent any re-downloads
    trust_remote_code=True
 )
 # Verify model config
 print(model.config)
 ```
 %% Output
    MistralConfig {
      "_attn_implementation_autoset": true,
      "_name_or_path": "NousResearch/Nous-Hermes-2-Mistral-7B-DPO",
      "architectures": [
        "MistralForCausalLM"
      ],
      "attention_dropout": 0.0,
      "bos_token_id": 1,
      "eos_token_id": 32000,
      "head_dim": 128,
      "hidden_act": "silu",
      "hidden_size": 4096,
      "initializer_range": 0.02,
      "intermediate_size": 14336,
      "max_position_embeddings": 32768,
      "model_type": "mistral",
      "num_attention_heads": 32,
      "num_hidden_layers": 32,
      "num_key_value_heads": 8,
      "rms_norm_eps": 1e-05,
      "rope_theta": 10000.0,
      "sliding_window": 4096,
      "tie_word_embeddings": false,
      "torch_dtype": "float32",
      "transformers_version": "4.49.0",
      "use_cache": false,
      "vocab_size": 32002
    }
 %% Cell type:markdown id:719deb1f-d1a8-42db-8e91-22ec586f6b15 tags:
 Now, let's try out a prompt or two:
 %% Cell type:code id:3f803804-9451-43b6-9b6e-427470a07b15 tags:
 ``` python
 prompt = "What is the capital of France? Can you give me some facts about it?"
 inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
 with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=250)
 print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
 %% Output
    Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
    What is the capital of France? Can you give me some facts about it?
    The capital of France is Paris. Paris is the largest city in France and is located in the northern part of the country. It is situated on the Seine River and is known for its beautiful architecture, art, and culture.
    Here are some interesting facts about Paris:
    1. Paris is home to some of the world’s most famous landmarks, including the Eiffel Tower, the Louvre Museum, and the Notre-Dame Cathedral.
    2. The city is often referred to as the “City of Light” due to its role in the Age of Enlightenment and its status as a major center of education and ideas.
    3. Paris is known for its fashion industry and is home to some of the world’s most famous designers and fashion houses.
    4. The city is also famous for its cuisine, with dishes such as croissants, escargot, and macarons originating in France.
    5. Paris is a major transportation hub, with an extensive network of buses, trains, and subways that connect the city to the rest of France and Europe.
    6. The city is divided into 20
 %% Cell type:markdown id:14e1bb25-f65d-48ba-9517-00be6a1afa3d tags:
 **Not bad, however, we can do better!**
 %% Cell type:markdown id:e1d7cbcb-f127-474d-9db4-b0d0daa705ea tags:
 ## Enter LangChain
 [LangChain](https://www.langchain.com/) is a powerful open-source framework designed to help developers build applications using LLMs. It abstracts and simplifies common LLM tasks like prompt engineering, chaining multiple steps, retrieving documents, parsing structured output, and building conversational agents.
 LangChain supports a wide range of models (OpenAI, Hugging Face, Cohere, Anthropic, etc.) and integrates seamlessly with tools like vector databases, APIs, file loaders, and output parsers.
 ---
 ### LangChain Building Blocks
 ```
 +-------------------+
 |   PromptTemplate  |  ← Create structured prompts
 +-------------------+
         ↓
 +-------------------+
 |       LLM         |  ← Connect to local or remote LLM
 +-------------------+
         ↓
 +-------------------+
 | Output Parsers    |  ← Extract structured results (e.g. JSON)
 +-------------------+
         ↓
 +-------------------+
 | Chains / Agents   |  ← Combine steps into flows
 +-------------------+
         ↓
 +-------------------+
 | Memory / Tools    |  ← Use search, APIs, databases, etc.
 +-------------------+
 ```
 ---
 ### Core LLM/ChatModel Methods in LangChain
 How to do inference with LangChain:
 | **Method**       | **Purpose**                                               | **Input Type**         | **Output Type**         |
 |------------------|------------------------------------------------------------|-------------------------|--------------------------|
 | `invoke()`        | Handles a **single input**, returns one response           | `str` or `Message(s)`   | `str` / `AIMessage`      |
 | `generate()`      | Handles a **batch of inputs**, returns multiple outputs     | `list[str]`             | `LLMResult`              |
 | `batch()`         | Batched input, returns a flat list of outputs              | `list[str]`             | `list[str]` / Messages   |
 | `stream()`        | Streams the output as tokens are generated                 | `str` / `Message(s)`    | Generator (streamed text)|
 | `ainvoke()`       | Async version of `invoke()`                                | `str` / `Message(s)`    | Awaitable result         |
 | `agenerate()`     | Async version of `generate()`                              | `list[str]`             | Awaitable result         |
 Before we use one of these methods, we need to create a pipeline and apply the LangChain wrapper to the pipeline, so we create a format that LangChain can call with .invoke() or .generate() etc. If we use an remotly hosted LLM, which we access through an API, we do not need the pipeline.
 ---
 %% Cell type:code id:338c010a-d31f-4a4e-9118-83a66673d3f7 tags:
 ``` python
 # Create a text generation pipeline
 text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=150,
    device_map="auto"
 )
 # Wrap in LangChain's HuggingFacePipeline
 llm = HuggingFacePipeline(pipeline=text_pipeline)
 ```
 %% Output
    Device set to use cuda:0
 %% Cell type:markdown id:0bfd95c6-e389-47b1-83c1-c26c05b18061 tags:
 #### llm.invoke()
 %% Cell type:code id:dd2f8136-d8dc-4095-85b7-d9da665f05f7 tags:
 ``` python
 print(llm.invoke('Here is a fun fact about Mars:'))
 ```
 %% Output
    Here is a fun fact about Mars: it has the largest volcano in our solar system. It’s called Olympus Mons, and it’s so big that if it were on Earth, it would stretch from New York City to Denver.
    But what’s even more amazing is that Olympus Mons is a shield volcano, which means it was formed by slow-moving lava flows. And those lava flows were so thick that they built up over time, creating a mountain that’s three times taller than Mount Everest.
    So how did Olympus Mons get so big? Well, Mars has a thin atmosphere, which means that the pressure inside its volcanoes is much lower than on Earth. This allows the l
 %% Cell type:markdown id:96043523-d4ac-474a-abf8-cabec94c35ed tags:
 #### llm.batch()
 %% Cell type:code id:772dbcf0-6a73-4a59-8cb8-dc2d76c399f6 tags:
 ``` python
 results = llm.batch(["Tell me a joke", "Translate this to German: It has been raining non-stop today."])
 print(results)
 ```
 %% Output
    ['Tell me a joke.\n\nI’ll tell you a joke. Why don’t scientists trust atoms? Because they make up everything.\n\nWhat’s the difference between a well-dressed man on a tricycle and a poorly dressed man on a bicycle? Attire.\n\nWhat did the fish say when he hit the wall? Dam.\n\nWhat do you call a fake noodle? An impasta.\n\nWhat do you call a bear with no teeth? A gummy bear.\n\nWhat do you call a boomerang that doesn’t come back? A stick.\n\nWhat do you call a fake noodle that doesn’t work? An impasta-fail.\n\n', 'Translate this to German: It has been raining non-stop today.\n\n# German Translation\n\nheute hat es ununterbrochen geregnet.\n\nLearn German with us! Additionally, our language school in Berlin can help you improve your German skills.\n\nGerman (Deutsch) is a West Germanic language that is mainly spoken in Central Europe. It is the most widely spoken and official or co-official language in Germany, Austria, Switzerland, South Tyrol in Italy, the German-speaking Community of Belgium, and Liechtenstein. It is one of the three official languages of Luxembourg and a co-official language in the Opole Voivodeship in Poland. The languages which are most similar to German are the other members of the']
 %% Cell type:markdown id:dcfd9c64-9a79-4b1e-91c0-f03c8c243bcb tags:
 %% Cell type:markdown id:67c6ae46-770c-4c3c-bceb-e0805a87b0fe tags:
 Let's make that more structured and also format the output nicely:
 %% Cell type:code id:0e1e4ef7-5c88-4c53-9df7-366efe886891 tags:
 ``` python
 prompts = [
    "Tell me a joke",
    "Translate this to German: 'It has been raining non-stop today.'"
 ]
 # Run batch generation
 results = llm.batch(prompts)
 # Nicely format the output
 for i, (prompt, response) in enumerate(zip(prompts, results), 1):
    print(f"\nPrompt {i}: {prompt}")
    print(f"Response:\n{response}")
 ```
 %% Output
    Prompt 1: Tell me a joke
    Response:
    Tell me a joke.
    I’ll tell you a joke. Why don’t scientists trust atoms? Because they make up everything.
    What’s the difference between a well-dressed man on a tricycle and a poorly dressed man on a bicycle? Attire.
    What did the fish say when he hit the wall? Dam.
    What do you call a fake noodle? An impasta.
    What do you call a bear with no teeth? A gummy bear.
    What do you call a boomerang that doesn’t come back? A stick.
    What do you call a fake noodle that doesn’t work? An impasta-fail.
    Prompt 2: Translate this to German: 'It has been raining non-stop today.'
    Response:
    Translate this to German: 'It has been raining non-stop today.'
    The German translation for 'It has been raining non-stop today' is 'Es hat heute ununterbrochen geregnet.'
    In this sentence, 'Es' means 'it', 'hat' means 'has', 'heute' means 'today', 'ununterbrochen' means 'non-stop' or 'uninterrupted', and 'geregnet' means 'rained'.
    So, the sentence structure in German is similar to English, with the subject 'it' followed by the verb 'has' and the adverb 'today', and then the main verb 'rained' with the adverb 'non-stop' placed before it.
 %% Cell type:markdown id:7c4a0e03-4f13-474e-ba70-361bfbfef70b tags:
 #### llm.generate()
 %% Cell type:markdown id:7cdb04bb-2c6c-450a-bae0-40ac0144ab04 tags:
 `llm.generate()` yields much more output than `llm.batch()` and is used if you actually want more metadata, such as the token count.
 %% Cell type:code id:802ffd32-a6bc-4319-875c-ea1dffc5e324 tags:
 ``` python
 results = llm.generate(["Where should my customer go for a luxurious Safari?",
                     "What are your top three suggestions for backpacking destinations?"])
 print(results)
 ```
 %% Output
    generations=[[Generation(text='Where should my customer go for a luxurious Safari?\n\nIf your customer is looking for a luxurious safari experience, they should consider going to Africa. Africa is home to some of the most luxurious safari lodges and camps, offering guests an unforgettable experience in the heart of the wilderness.\n\nSome of the top destinations for a luxurious safari in Africa include:\n\n1. Singita Grumeti in Tanzania: This luxury safari camp offers guests the chance to experience the wonders of the Serengeti in style. The camp features spacious suites with private plunge pools, outdoor showers, and stunning views of the surrounding wilderness.\n\n2. &Beyond Sandibe Okavango')], [Generation(text='What are your top three suggestions for backpacking destinations?\n\n1. The Pacific Crest Trail: This 2,650-mile trail stretches from Mexico to Canada and offers some of the most stunning views in the world. From the desert landscapes of Southern California to the snow-capped peaks of the Sierra Nevada, this trail has something for everyone.\n\n2. The Appalachian Trail: This 2,200-mile trail runs from Georgia to Maine and is one of the most popular backpacking destinations in the United States. With its diverse terrain and abundant wildlife, the Appalachian Trail offers a unique and unforgettable backpacking experience.\n\n3. The Inca Trail: This 26-mile trail')]] llm_output=None run=[RunInfo(run_id=UUID('ffcd266d-4c4f-4229-b404-b3445dd89d6f')), RunInfo(run_id=UUID('b15527e8-b2b0-409b-a554-16b7d649e241'))] type='LLMResult'
 %% Cell type:markdown id:2df5955f-e494-42bd-b21e-416b73cd5fb5 tags:
 We need to prittyfy the output:
 %% Cell type:code id:cf0c3c6a-eab9-4b6c-a785-63f799cc23a8 tags:
 ``` python
 for gen in results.generations:
    print(gen[0].text)
 ```
 %% Output
    Where should my customer go for a luxurious Safari?
    If your customer is looking for a luxurious safari experience, they should consider going to Africa. Africa is home to some of the most luxurious safari lodges and camps, offering guests an unforgettable experience in the heart of the wilderness.
    Some of the top destinations for a luxurious safari in Africa include:
    1. Singita Grumeti in Tanzania: This luxury safari camp offers guests the chance to experience the wonders of the Serengeti in style. The camp features spacious suites with private plunge pools, outdoor showers, and stunning views of the surrounding wilderness.
    2. &Beyond Sandibe Okavango
    What are your top three suggestions for backpacking destinations?
    1. The Pacific Crest Trail: This 2,650-mile trail stretches from Mexico to Canada and offers some of the most stunning views in the world. From the desert landscapes of Southern California to the snow-capped peaks of the Sierra Nevada, this trail has something for everyone.
    2. The Appalachian Trail: This 2,200-mile trail runs from Georgia to Maine and is one of the most popular backpacking destinations in the United States. With its diverse terrain and abundant wildlife, the Appalachian Trail offers a unique and unforgettable backpacking experience.
    3. The Inca Trail: This 26-mile trail
 %% Cell type:markdown id:2dd0f876-80b6-4047-afb2-4ede13659e9d tags:
 #### llm.stream()
 %% Cell type:code id:ec2cf74f-410b-4581-bce2-5a362ee7ae2e tags:
 ``` python
 for chunk in llm.stream("Tell me a story about a cat."):
    print(chunk, end="")
 ```
 %% Output
    Once upon a time, there was a cat named Whiskers. Whiskers was a beautiful black and white cat with bright green eyes. She lived in a small house with her owner, Mrs. Johnson. Mrs. Johnson was an old lady who lived alone, and Whiskers was her only companion.
    One day, Mrs. Johnson went out to run some errands, and when she returned, she found that Whiskers was missing. She searched the entire house, but couldn't find her anywhere. She put up posters around the neighborhood and asked everyone she met if they had seen her cat.
    Days went by, and Mrs. Johnson was starting to lose hope of ever seeing her beloved
 %% Cell type:markdown id:2b731f92-425d-4958-9a43-0abf290e5f95 tags:
 ### Model Types in LangChain
 LangChain supports two main types of language models:
 | Model Type     | Description                                                  | Examples                              |
 |----------------|--------------------------------------------------------------|----------------------------------------|
 | **LLMs**       | Models that take a plain text string as input and return generated text | GPT-2, Falcon, LLaMA, Mistral (raw)    |
 | **Chat Models**| Models that work with structured chat messages (system, user, assistant) | GPT-4, Claude, LLaMA-Instruct, Mistral-Instruct|
 ---
 **Why the distinction?**
 Chat models are designed to understand multi-turn conversation and role-based prompting. Their input format includes a structured message history, making them ideal for:
 - Instruction following
 - Contextual reasoning
 - Assistant-like behavior
 LLMs, on the other hand, expect a single flat prompt string. They still power many applications and are worth understanding, especially when using older models, doing fine-tuning, or debugging at the token level.
 ---
 **Do Chat Models matter more now?**
 Yes — most modern instruction-tuned models (like GPT-4, Claude, Mistral-Instruct, or LLaMA-3-Instruct) are designed as chat models, and LangChain's agent and memory systems are built around them.
 However, LLMs are still important:
 - Some models only support the LLM interface
 - LLMs are useful in batch processing and structured generation
 - Understanding their behavior helps you build better prompts
 ---
 %% Cell type:code id:f6d8dffd-f286-42a3-877d-c439d11e62a3 tags:
 ``` python
 # Plain LLM (single prompt string)
 llm = HuggingFacePipeline(pipeline=text_pipeline)
 print("--- LLM-style output ---\n")
 print(llm.invoke("Explain LangChain in one sentence."))
 # Use as a ChatModel (structured messages)
 chat_llm = ChatHuggingFace(llm=llm)
 messages = [
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="Explain LangChain in one sentence.")
 ]
 print("\n--- Chat-style output ---\n")
 print(chat_llm.invoke(messages).content)
 ```
 %% Output
    --- LLM-style output ---
    Explain LangChain in one sentence.
    LangChain is a framework for building AI language models that can understand and generate human-like text.
    What is the purpose of LangChain?
    The purpose of LangChain is to provide developers with a powerful toolkit for building AI language models that can be used in a variety of applications, such as chatbots, language translation, text summarization, and more.
    What are the key features of LangChain?
    LangChain has several key features that make it a powerful tool for building AI language models. These include:
    1. Modular architecture: LangChain is designed to be modular, allowing developers to easily add new components and customize existing ones to suit their specific needs.
    --- Chat-style output ---
    <s><|im_start|>system
    You are a helpful AI assistant.<|im_end|>
    <|im_start|>user
    Explain LangChain in one sentence.<|im_end|>
    <|im_start|>assistant
    LangChain is a framework that enables developers to build and integrate natural language processing (NLP) and conversational AI models into their applications, allowing for more efficient and effective communication between humans and machines.
 %% Cell type:markdown id:b18d6131-8adc-4074-a892-229fa6aa62b8 tags:
 The raw output you're seeing includes special chat formatting tokens (like <|im_start|>, <|im_end|>, etc.) which are used internally by the model (e.g., Mistral, LLaMA, GPT-J-style models) to distinguish between roles in a chat.
 These tokens help the model understand who is speaking, but they're not intended for humans to see. <br>
 <br>
 So, to prettyfy the ouput we will define a function:
 %% Cell type:code id:cf6b6a61-d684-422c-a136-9747508a6cec tags:
 ``` python
 def clean_output(raw: str) -> str:
    # If the assistant marker is in the output, split on it and take the last part
    if "<|im_start|>assistant" in raw:
        return raw.split("<|im_start|>assistant")[-1].replace("<|im_end|>", "").strip()
    return raw.strip()
 raw_output = chat_llm.invoke(messages).content
 cleaned = clean_output(raw_output)
 print("Cleaned Response:\n",cleaned)
 ```
 %% Output
    Cleaned Response:
     LangChain is a framework that enables developers to build and integrate natural language processing (NLP) and conversational AI models into their applications, allowing for more efficient and effective communication between humans and machines.
 %% Cell type:markdown id:149c68d7-2cce-429a-b34c-c02721c15104 tags:
 An even simpler approach would be to pass the following argument earlier on:
 ```
 llm = HuggingFacePipeline(pipeline=text_pipe, model_kwargs={"clean_up_tokenization_spaces": True})
 ```
 %% Cell type:markdown id:0a359ab1-39d5-4160-94de-1358f170b870 tags:
 **Confused?** <br>
 You are not alone. Until recently, LangChain had a different wrapper for LLMs and Chat Models, but in recent versions of LangChain, the HuggingFacePipeline class implements the ChatModel interface under the hood — it can accept structured chat messages (SystemMessage, HumanMessage, etc.) even though it wasn't originally designed to.
 So yes:
 You can now do:
 ```
 llm = HuggingFacePipeline(pipeline=text_pipe)
 response = llm.invoke([
    SystemMessage(content="You are a helpful legal assistant."),
    HumanMessage(content="Simplify this clause: ...")
 ])
 ```
 Even though you're not explicitly using ChatHuggingFace, LangChain detects the message types and processes them correctly using the underlying text-generation model.
 <br>
 <br>
 The same would apply if you used a remotly hosted LLM/Chat Model through an API:
 ```
 from langchain_openai import ChatOpenAI
 chat = ChatOpenAI(openai_api_key=api_key)
 result = chat.invoke([HumanMessage(content="Can you tell me a fact about Dolphins?")])
 ```
 %% Cell type:code id:eb524f31-3d3a-4a4b-bc00-d21843490193 tags:
 ``` python
 from langchain.schema import (AIMessage, HumanMessage, SystemMessage)
 ```
 %% Cell type:code id:225a8a9c-bb3c-46fa-b57d-3bc0e0696885 tags:
 ``` python
 llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"clean_up_tokenization_spaces": True})
 chat_llm = ChatHuggingFace(llm=llm)
 ```
 %% Cell type:code id:7c6555a0-3bfb-49e3-97aa-186eb5a528a7 tags:
 ``` python
 result = chat_llm.invoke([HumanMessage(content="Can you tell me a fact about dolphins?")])
 ```
 %% Cell type:code id:58180fce-e04b-41d2-9ae8-87a41b232890 tags:
 ``` python
 result
 ```
 %% Output
    AIMessage(content='<s><|im_start|>user\nCan you tell me a fact about dolphins?<|im_end|>\n<|im_start|>assistant\nDolphins are highly intelligent marine mammals and are known for their playful and social behavior. They are part of the family Delphinidae, which includes around 40 species. Dolphins are air-breathing, have a streamlined body shape, two limbs modified into flippers, and a dorsal fin. They use echolocation to navigate and find prey, and they communicate with each other using a variety of clicks, whistles, and body movements.', additional_kwargs={}, response_metadata={}, id='run-9455ac84-a668-43aa-8569-077637312649-0')
 %% Cell type:code id:375586e2-7880-49c5-abf3-dbdf70593b9f tags:
 ``` python
 print(clean_output(result.content))
 ```
 %% Output
    Dolphins are highly intelligent marine mammals and are known for their playful and social behavior. They are part of the family Delphinidae, which includes around 40 species. Dolphins are air-breathing, have a streamlined body shape, two limbs modified into flippers, and a dorsal fin. They use echolocation to navigate and find prey, and they communicate with each other using a variety of clicks, whistles, and body movements.
 %% Cell type:code id:be3fb5bc-13e3-45de-98a9-45fe1361d30a tags:
 ``` python
 result = chat_llm.invoke([SystemMessage(content='You are a gumpy 5-year old child who only wants to get new toys and not answer questions'),
               HumanMessage(content='Can you tell me a fact about dophins?')])
 ```
 %% Cell type:code id:013daa46-53e4-47dc-83c9-5037c8286feb tags:
 ``` python
 print(clean_output(result.content))
 ```
 %% Output
    No.
 %% Cell type:code id:c34f9d0c-3a14-4595-a1fc-96f25c5b206c tags:
 ``` python
 result = chat_llm.invoke(
                [SystemMessage(content='You are a University Professor'),
               HumanMessage(content='Can you tell me a fact about dolphins?')]
                    )
 ```
 %% Output
    You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
 %% Cell type:code id:d8c5f0ec-3989-461b-abd3-d1494cdc9b3c tags:
 ``` python
 print(clean_output(result.content))
 ```
 %% Output
    Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness.
 %% Cell type:code id:73f872ab-8822-4155-974f-94ccd1c6fe89 tags:
 ``` python
 result = chat_llm.generate([
    [
        SystemMessage(content='You are a University Professor.'),
        HumanMessage(content='Can you tell me a fact about dolphins?')
    ],
    [
        SystemMessage(content='You are a University Professor.'),
        HumanMessage(content='What is the difference between whales and dolphins?')
    ]
 ])
 ```
 %% Cell type:code id:099f89a0-115e-4147-8d71-618fd90cab83 tags:
 ``` python
 for i, generation in enumerate(result.generations, 1):
    raw = generation[0].text
    cleaned = clean_output(raw)
    print(f"\nPrompt {i}:\n{cleaned}")
 ```
 %% Output
    Prompt 1:
    Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness.
    Prompt 2:
    Whales and dolphins are both marine mammals, but they belong to different families. Whales are part of the family Cetacea, which includes toothed whales and baleen whales, while dolphins are part of the family Delphinidae. Here are some key differences between whales and dolphins:
    1. Physical characteristics: Whales are generally larger than dolphins, with some species of whales reaching lengths of over 100 feet. Dolphins, on the other hand, are smaller, with the largest species, the killer whale, reaching lengths of up to 32 feet. Whales also have a more streamlined body shape, while dol
 %% Cell type:code id:ddef5f18-3f61-488d-a75d-1fc93794f40d tags:
 ``` python
 # Create a text generation pipeline
 text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    device_map="auto"
 )
 # Wrap in LangChain's HuggingFacePipeline
 llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"clean_up_tokenization_spaces": True})
 chat_llm = ChatHuggingFace(llm=llm)
 ```
 %% Output
    Device set to use cuda:0
 %% Cell type:code id:8929bb69-a993-4acb-b2c5-84550dd6eef6 tags:
 ``` python
 eos_token_id = tokenizer.eos_token_id
 result = chat_llm.generate([
    [
        SystemMessage(content='You are a University Professor.'),
        HumanMessage(content='Can you tell me a fact about dolphins?')
    ],
    [
        SystemMessage(content='You are a University Professor.'),
        HumanMessage(content='What is the difference between whales and dolphins?')
    ]
 ], eos_token_id=eos_token_id)
 ```
 %% Cell type:code id:01734b2c-22e6-4fab-ac9e-43e83bf85d5a tags:
 ``` python
 for i, generation in enumerate(result.generations, 1):
    raw = generation[0].text
    cleaned = clean_output(raw)
    print(f"\nPrompt {i}:\n{cleaned}")
 ```
 %% Output
    Prompt 1:
    Dolphins are highly intelligent marine mammals that are known for their playful and social behavior. Did you know that dolphins have a complex communication system using a series of clicks, whistles, and body movements to communicate with each other? They can also recognize themselves in a mirror, which is a sign of self-awareness.
    Prompt 2:
    Whales and dolphins are both marine mammals, but they belong to different families. Whales are part of the family Cetacea, which includes toothed whales and baleen whales, while dolphins are part of the family Delphinidae. Here are some key differences between whales and dolphins:
    1. Physical characteristics: Whales are generally larger than dolphins, with some species of whales reaching lengths of over 100 feet. Dolphins, on the other hand, are smaller, with the largest species, the killer whale, reaching lengths of up to 32 feet. Whales also have a more streamlined body shape, while dolphins have a more robust body shape.
    2. Feeding habits: Whales have teeth and are carnivorous, feeding on fish, squid, and other marine animals. Dolphins also have teeth, but they are smaller and more pointed, adapted for catching fish. Baleen whales, on the other hand, have baleen plates instead of teeth, which they use to filter-feed on plankton and small crustaceans.
    3. Echolocation: Dolphins use echolocation to navigate and find prey, emitting high-frequency sounds and listening for the echoes to determine the location and distance of objects. Whales do not use echolocation in the same way as dolphins, but some species, such as the toothed whales, use low-frequency sounds to communicate and locate prey.
    4. Reproduction: Whales and dolphins have different reproductive strategies. Whales typically give birth to one calf at a time, and the mother nurses the calf for an extended period. Dolphins, on the other hand, can have multiple calves in a lifetime and do not nurse their young for as long.
    5. Behavior: Whales are generally solitary or live in small groups, while dolphins are known for their social behavior, often living in large pods of up to several hundred individuals.
    Overall, while whales and dolphins share many similarities as marine mammals, they have distinct differences in physical characteristics, feeding habits, and behavior.
 %% Cell type:markdown id:aa51ce40-782d-4112-9af0-6c160fe221c9 tags:
 <br>
 Feel free to experiment with different system and human prompts!
 %% Cell type:code id:efdf0dda-f69c-487d-bdc2-a58b4acfa76f tags:
 ``` python
 # Create a text generation pipeline
 text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    device_map="auto"
 )
 # Wrap in LangChain's HuggingFacePipeline
 llm = HuggingFacePipeline(pipeline=text_pipeline)
 # Define the system and user messages
 system_message_1 = SystemMessagePromptTemplate.from_template("You are a polite and professional assistant who answers concisely.")
 system_message_2 = SystemMessagePromptTemplate.from_template("You're a friendly AI that gives fun and engaging responses.")
 system_message_3 = SystemMessagePromptTemplate.from_template("You are a research assistant providing precise, well-cited responses.")
 user_message = HumanMessagePromptTemplate.from_template("{question}")
 # Create a prompt template
 chat_prompt = ChatPromptTemplate.from_messages([system_message_3, user_message])
 # Format the prompt
 formatted_prompt = chat_prompt.format_messages(question="What is the capital of France and what is special about it?")
 # Run inference
 response = llm.invoke(formatted_prompt)
 print(response)
 ```
 %% Output
    Device set to use cuda:0
    System: You are a research assistant providing precise, well-cited responses.
    Human: What is the capital of France and what is special about it?
    The capital of France is Paris. Paris is known for its rich history, art, culture, and architecture. It is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also known for its fashion industry and is considered a global center for art, fashion, gastronomy, and culture. Additionally, Paris is one of the most visited cities in the world, attracting millions of tourists each year.
    Sources:
    1. "Paris." Encyclopædia Britannica. Encyclopædia Britannica, Inc., n.d. Web. 10 Mar. 2016.
    2. "Paris." Lonely Planet. N.p., n.d. Web. 10 Mar. 2016.
 %% Cell type:markdown id:e5026430-caaa-4f9c-938e-328b2f383c5b tags:
 ### Extra Parameters and Args
 Here we add in some extra parameters and args, to get the model to respond in a certain way.
 <br>
 Some of the most important parameters are:
 | **Parameter**        | **Purpose**                                                                 | **Range / Default**       | **Analogy / Effect**                        |
 |----------------------|------------------------------------------------------------------------------|----------------------------|---------------------------------------------|
 | `do_sample`          | Enables random sampling instead of greedy or beam-based decoding             | `True` / `False`           | 🎲 Adds randomness to output                |
 | `temperature`        | Controls randomness of token selection                                       | `> 0`, typically `0.7–1.0` | 🌡️ Higher = more creative / chaotic         |
 | `top_p`              | Nucleus sampling: sample from top % of likely tokens                         | `0.0–1.0`, default `1.0`   | 🧠 Focuses on most probable words           |
 | `num_beams`          | Beam search: explore multiple continuations and pick the best                | `1+`, default `1`          | 🔍 Smart guessing with multiple options     |
 | `repetition_penalty` | Penalizes repeated tokens to reduce redundancy                               | `≥ 1.0`, e.g. `1.2`        | ♻️ Discourages repetition                   |
 | `max_new_tokens`     | Limits the number of tokens the model can generate **per prompt**            | Integer, e.g. `300`        | ✂️ Controls response length                 |
 | `eos_token_id`       | Token ID that forces the model to stop when encountered                      | Integer                    | 🛑 Defines end of output (if supported)     |
 #### Detailed Explanation of Generation Parameters
 ##### `do_sample=True`
 - If `False`: the model always picks the **most likely next token** (deterministic, greedy decoding).
 - If `True`: the model will **randomly sample** from a probability distribution over tokens (non-deterministic).
 - Required if you want `temperature` or `top_p` to have any effect.
 ✅ Enables creativity and variation
 ❌ Disables reproducibility (unless random seed is fixed)
 ---
 ##### `temperature=1.0`
 - Controls the **randomness** or "creativity" of the output.
 - Lower values → more predictable (safe), higher values → more diverse (risky).
 - Affects how "flat" or "peaky" the probability distribution is during sampling.
 **Typical values:**
 - `0.0` → deterministic (most likely token only)
 - `0.7–1.0` → balanced
 - `>1.5` → chaotic, often incoherent
 ---
 ##### 🔹 `top_p=0.9` *(a.k.a. nucleus sampling)*
 - The model samples only from the **top tokens whose cumulative probability ≥ `p`**.
 - Unlike `top_k`, this is dynamic based on the shape of the probability distribution.
 - Often used in combination with `temperature`.
 ✅ Focuses output on high-probability words
 ❌ Too low → model may miss useful words
 ---
 ##### `num_beams=4` *(beam search)*
 - Explores **multiple candidate completions** and picks the best one based on likelihood.
 - Slower, but often more optimal (when `do_sample=False`).
 - Does not work with sampling (`do_sample=True`).
 **Typical values:**
 - `1` = greedy decoding
 - `3–5` = moderate beam search
 - `>10` = can become very slow
 ---
 ##### `repetition_penalty=1.2`
 - Penalizes tokens that have already been generated, making the model **less likely to repeat itself**.
 - Higher values reduce repetition but may hurt fluency.
 ✅ Helps avoid "looping" or redundant outputs
 📝 Use with long-form or factual responses
 ---
 #### 🔹 `max_new_tokens=300`
 - Sets the **maximum number of tokens** the model is allowed to generate in the response.
 - Does not include input prompt tokens.
 ✅ Controls output length
 ✅ Prevents runaway generation or memory issues
 ✅ Prevents truncated output.
 ---
 #### 🔹 `eos_token_id`
 - Tells the model to **stop generation** once it emits this token ID.
 - Useful for enforcing custom stopping conditions.
 🔸 Optional — most models use their own `<eos>` or `</s>` tokens by default.
 ---
 %% Cell type:markdown id:e3314edc-1689-49ce-94bb-b198a8ca1059 tags:
 Feel free to experiment with these parameters!
 %% Cell type:code id:a01b796b-364e-42ae-b866-e2454ad9c679 tags:
 ``` python
 # Create a text generation pipeline
 text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    do_sample=True,
    temperature=5.0,
    top_p=0.9,
    #presence_penalty=1,  # Only if the model supports it
    max_new_tokens=300,
    device_map="auto"
 )
 # Wrap in LangChain's HuggingFacePipeline
 llm = HuggingFacePipeline(pipeline=text_pipeline)
 chat_llm = ChatHuggingFace(llm=llm)
 ```
 %% Output
    Device set to use cuda:0
 %% Cell type:code id:fe01c99b-b14f-4358-a532-765a19bb5666 tags:
 ``` python
 result = chat_llm.invoke([HumanMessage(content='Can you tell me a fact about Earth?')])
 ```
 %% Cell type:code id:91a67f85-9c18-4ac3-9b2e-ff021c121a9f tags:
 ``` python
 print(clean_output(result.content))
 ```
 %% Output
    One fascinating but somewhat bouncy fun actual to know about Dangle up is what it actually takes taking portion it move along in deep amelioration (it moves up a stagger you would amaze me a huge to tell that in more everyday terms but, you go on read more more information). Not one side the ameljoros' movements, you need one piece fact to now is Dangle its really cool but there just be even out its not out in this question yet to that side its even though that'sn  really hot enough  there just about D
 %% Cell type:markdown id:5964b698-ba1b-4c2f-a23a-5e757dd84e2a tags:
 ### Caching
 Making the same exact request often? You could use a cache to store results **note, you should only do this if the prompt is the exact same and the historical replies are okay to return**.
 %% Cell type:code id:1639f253-5b37-4ffc-b028-c22b3df2b877 tags:
 ``` python
 import langchain
 from langchain.cache import InMemoryCache
 langchain.llm_cache = InMemoryCache()
 # The first time, it is not yet in cache, so it should take longer
 print(clean_output(chat_llm.invoke("Tell me a fact about Mars").content))
 ```
 %% Output
    Here's a neat yet mind boggilking thing abojjtu tbt t btyu rzszjr Mars
    The Valley Of
    Whales: Just recently has man
    Found evidence That ancient L u g ug al waters created deep water  lakes where now in present d avrs nz day sittthe dried and crackeds seav on b of tte h h t  rar Mars - they'be  have nd c all it Mars a Val,..ey of lWh aies
 %% Cell type:code id:0d706fd5-6067-4d7e-80ab-cd96ccd4a912 tags:
 ``` python
 # You will notice this reply is instant!
 print(clean_output(chat_llm.invoke("Tell me a fact about Mars").content))
 ```
 %% Output
    Here's a neat yet mind boggilking thing abojjtu tbt t btyu rzszjr Mars
    The Valley Of
    Whales: Just recently has man
    Found evidence That ancient L u g ug al waters created deep water  lakes where now in present d avrs nz day sittthe dried and crackeds seav on b of tte h h t  rar Mars - they'be  have nd c all it Mars a Val,..ey of lWh aies
 %% Cell type:code id:6dbf6d5d-6ddd-4741-8e9e-cf9a14c653e0 tags:
 ``` python
 ```

--- a/D3_03_Chaining.ipynb
+++ b/D3_03_Chaining.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "6cbfc6e8-dc82-434d-ba0e-a68e52bf3cd9",
+   "metadata": {},
+   "source": [
+    "## LangChain Chaining Techniques\n",
+    "\n",
+    "### Introduction\n",
+    "This notebook demonstrates key chaining functionalities in LangChain:\n",
+    "- SimpleSequentialChain\n",
+    "- SequentialChain\n",
+    "- LLMRouterChain\n",
+    "- TransformChain\n",
+    "\n",
+    "Each chaining method is designed for different levels of complexity and control. Use simple chains for straightforward tasks, sequential chains for workflows, router chains for conditional branching, and transform chains when integrating custom logic."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "d40d4b00-deba-4bc0-a3eb-280c8179d02d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Imports\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n",
+    "from langchain_huggingface.llms import HuggingFacePipeline\n",
+    "from langchain_huggingface import ChatHuggingFace\n",
+    "from langchain.chains import SimpleSequentialChain, SequentialChain, TransformChain, LLMChain\n",
+    "from langchain.chains.router import LLMRouterChain\n",
+    "from langchain.prompts import PromptTemplate\n",
+    "from langchain.prompts import ChatPromptTemplate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "ca5ea3ee-40e4-47a1-8708-b72b15cb89da",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cache_dir = \"/gpfs/data/fs70824/LLMs_models_datasets/models\"\n",
+    "model_name = \"NousResearch/Nous-Hermes-2-Mistral-7B-DPO\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "2fbc3ccb-bebc-4e2c-8ad4-2626bcaa167b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a554a80e694948b69f995ad6df40c624",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "MistralConfig {\n",
+      "  \"_attn_implementation_autoset\": true,\n",
+      "  \"_name_or_path\": \"NousResearch/Nous-Hermes-2-Mistral-7B-DPO\",\n",
+      "  \"architectures\": [\n",
+      "    \"MistralForCausalLM\"\n",
+      "  ],\n",
+      "  \"attention_dropout\": 0.0,\n",
+      "  \"bos_token_id\": 1,\n",
+      "  \"eos_token_id\": 32000,\n",
+      "  \"head_dim\": 128,\n",
+      "  \"hidden_act\": \"silu\",\n",
+      "  \"hidden_size\": 4096,\n",
+      "  \"initializer_range\": 0.02,\n",
+      "  \"intermediate_size\": 14336,\n",
+      "  \"max_position_embeddings\": 32768,\n",
+      "  \"model_type\": \"mistral\",\n",
+      "  \"num_attention_heads\": 32,\n",
+      "  \"num_hidden_layers\": 32,\n",
+      "  \"num_key_value_heads\": 8,\n",
+      "  \"rms_norm_eps\": 1e-05,\n",
+      "  \"rope_theta\": 10000.0,\n",
+      "  \"sliding_window\": 4096,\n",
+      "  \"tie_word_embeddings\": false,\n",
+      "  \"torch_dtype\": \"float32\",\n",
+      "  \"transformers_version\": \"4.49.0\",\n",
+      "  \"use_cache\": false,\n",
+      "  \"vocab_size\": 32002\n",
+      "}\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Load tokenizer\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)\n",
+    "\n",
+    "# Load model\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    model_name,\n",
+    "    cache_dir=cache_dir,\n",
+    "    device_map=\"auto\",\n",
+    "    #quantization_config=quantization_config, # This is what you would need for the LLama3-70B (and similar) models\n",
+    "    local_files_only=True,  # Prevent any re-downloads\n",
+    "    #trust_remote_code=True # Necessary when downloading\n",
+    ")\n",
+    "\n",
+    "# Verify model config\n",
+    "print(model.config)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "91d434e4-911e-44fb-b21d-10bc00bef193",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Device set to use cuda:0\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Pipeline setup\n",
+    "pipe = pipeline(\"text-generation\",\n",
+    "                model=model,\n",
+    "                tokenizer=tokenizer,\n",
+    "                return_full_text=False,\n",
+    "                max_new_tokens=256)\n",
+    "llm = HuggingFacePipeline(pipeline=pipe)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "f670afae-b70b-4782-9824-954540dd3ce0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chat_llm = ChatHuggingFace(llm=llm)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11e6bdc7-dddc-4115-8736-81c4182c1bcf",
+   "metadata": {},
+   "source": [
+    "### SimpleSequentialChain\n",
+    "\n",
+    "The `SimpleSequentialChain` is the most basic form of a chain. It takes a single input, passes it to a prompt, and the output of one step is directly passed as input to the next. It does not track intermediate steps or provide access to named outputs, making it suitable for linear, single-purpose chains.\n",
+    "\n",
+    "Use case: quick linear pipelines like \"generate → explain\" or \"summarize → expand\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "d948c96f-2f90-4b4e-aeb4-9aff615b985d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "template1 = \"Give me a simple bullet point outline for a blog post on {topic}\"\n",
+    "prompt1 = ChatPromptTemplate.from_template(template1)\n",
+    "chain1 = prompt1|chat_llm\n",
+    "\n",
+    "template2 = \"Write a blog post using this outline: {outline}\"\n",
+    "prompt2 = ChatPromptTemplate.from_template(template2)\n",
+    "chain2 = prompt2|chat_llm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "5c0151ca-c79d-41fa-8e6e-776bff075def",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "full_chain = chain1|chain2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "a27103ae-0602-416c-95af-476231cf160d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "I. Introduction\n",
+      "\n",
+      "A. Definition of Artificial Intelligence (AI)\n",
+      "\n",
+      "Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. It involves the development of algorithms and computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.\n",
+      "\n",
+      "B. Brief history of AI development\n",
+      "\n",
+      "The concept of AI has been around since the 1950s, but it was not until the 21st century that significant advancements were made in the field. Early AI research focused on symbolic AI, which involved using symbolic rules and logic to solve problems. However, this approach had limitations, and researchers soon turned to statistical methods and machine learning techniques to improve AI capabilities.\n",
+      "\n",
+      "C. Importance and potential impact of AI on society\n",
+      "\n",
+      "AI has the potential to revolutionize various industries and aspects of our daily lives. It can help automate repetitive tasks, improve decision-making, and enhance customer experiences. AI can also help solve complex problems in areas such as healthcare, education, and environmental conservation. However, there are also concerns about the impact of AI on employment and privacy,\n"
+     ]
+    }
+   ],
+   "source": [
+    "result = full_chain.invoke(\"Artificial Intelligence\")\n",
+    "print(result.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f634a90-c37d-44eb-8b1b-a875115093fc",
+   "metadata": {},
+   "source": [
+    "### SequentialChain\n",
+    "\n",
+    "`SequentialChain` is more flexible than `SimpleSequentialChain`. It supports multiple input and output variables and keeps track of intermediate outputs. Each step can depend on one or more outputs from earlier steps.\n",
+    "\n",
+    "Use case: more complex workflows that need to reuse or transform earlier outputs in later steps."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "99c23725-9cf9-4927-9f8d-df65bdee4d72",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create the prompts\n",
+    "prompt1 = PromptTemplate(input_variables=[\"topic\"], template=\"Generate a question about {topic}.\")\n",
+    "prompt2 = PromptTemplate(input_variables=[\"question\"], template=\"Provide a short answer to: {question}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "80a7a165-25fb-4a94-bb53-4f4ee433f896",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create the chains\n",
+    "chain = SequentialChain(\n",
+    "    chains=[\n",
+    "        SimpleSequentialChain(llm=llm, prompt=prompt1),\n",
+    "        SimpleSequentialChain(llm=llm, prompt=prompt2)\n",
+    "    ],\n",
+    "    input_variables=[\"topic\"],\n",
+    "    output_variables=[\"output\"]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "deaa7130-7715-4e79-9270-c307e71fd791",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result = chain.run(\"artificial intelligence\")\n",
+    "print(\"SequentialChain result:\", result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c070e46d-fcae-4a7f-8901-b192f1fc3017",
+   "metadata": {},
+   "source": [
+    "### LLMRouterChain\n",
+    "\n",
+    "`LLMRouterChain` is used when you want to route a prompt to different chains or prompts depending on the input. It allows conditional execution paths, where an LLM can decide which destination (e.g., math, history, writing) to route a given input to based on predefined criteria or patterns.\n",
+    "\n",
+    "Use case: topic routing, multi-skill assistants, task-specific logic dispatching."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "90b398ad-bcb8-4b7e-8371-8d05b5e3f7cc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define destinations and associated prompts\n",
+    "destinations = {\n",
+    "    \"math\": PromptTemplate.from_template(\"Solve this math problem: {input}\"),\n",
+    "    \"history\": PromptTemplate.from_template(\"What happened during this event: {input}\"),\n",
+    "}\n",
+    "\n",
+    "router_chain = LLMRouterChain.from_prompts(llm=llm, destination_prompts=destinations)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c86fb57a-9346-480f-92d1-773ada8042dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Route a query\n",
+    "output = router_chain.invoke({\"input\": \"Battle of Hastings\"})\n",
+    "print(\"Router Output:\", output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a5c50df-480c-40ca-86b2-e0658ea23c2d",
+   "metadata": {},
+   "source": [
+    "### TransformChain\n",
+    "\n",
+    "`TransformChain` allows you to insert arbitrary Python logic into a LangChain pipeline. It lets you define a transformation function that takes in inputs and returns a modified dictionary of outputs. This is useful for pre- or post-processing data before or after it passes through a model or another chain.\n",
+    "\n",
+    "Use case: text normalization, formatting, filtering, or enrichment between model steps."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cc9def90-4a6c-4d2e-9089-b8ae58d60894",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define a simple transformation function\n",
+    "def uppercase_fn(inputs: dict) -> dict:\n",
+    "    return {\"output\": inputs[\"text\"].upper()}\n",
+    "\n",
+    "transform_chain = TransformChain(input_variables=[\"text\"], output_variables=[\"output\"], transform=uppercase_fn)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cea099d7-a867-410a-880c-1636c1b2feb1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run it\n",
+    "output = transform_chain.run({\"text\": \"this should be uppercase\"})\n",
+    "print(\"TransformChain output:\", output)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:6cbfc6e8-dc82-434d-ba0e-a68e52bf3cd9 tags:
+## LangChain Chaining Techniques
+### Introduction
+This notebook demonstrates key chaining functionalities in LangChain:
+- SimpleSequentialChain
+- SequentialChain
+- LLMRouterChain
+- TransformChain
+Each chaining method is designed for different levels of complexity and control. Use simple chains for straightforward tasks, sequential chains for workflows, router chains for conditional branching, and transform chains when integrating custom logic.
+%% Cell type:code id:d40d4b00-deba-4bc0-a3eb-280c8179d02d tags:
+``` python
+# Imports
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+from langchain_huggingface.llms import HuggingFacePipeline
+from langchain_huggingface import ChatHuggingFace
+from langchain.chains import SimpleSequentialChain, SequentialChain, TransformChain, LLMChain
+from langchain.chains.router import LLMRouterChain
+from langchain.prompts import PromptTemplate
+from langchain.prompts import ChatPromptTemplate
+```
+%% Cell type:code id:ca5ea3ee-40e4-47a1-8708-b72b15cb89da tags:
+``` python
+cache_dir = "/gpfs/data/fs70824/LLMs_models_datasets/models"
+model_name = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO"
+```
+%% Cell type:code id:2fbc3ccb-bebc-4e2c-8ad4-2626bcaa167b tags:
+``` python
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
+# Load model
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    cache_dir=cache_dir,
+    device_map="auto",
+    #quantization_config=quantization_config, # This is what you would need for the LLama3-70B (and similar) models
+    local_files_only=True,  # Prevent any re-downloads
+    #trust_remote_code=True # Necessary when downloading
+)
+# Verify model config
+print(model.config)
+```
+%% Output
+    MistralConfig {
+      "_attn_implementation_autoset": true,
+      "_name_or_path": "NousResearch/Nous-Hermes-2-Mistral-7B-DPO",
+      "architectures": [
+        "MistralForCausalLM"
+      ],
+      "attention_dropout": 0.0,
+      "bos_token_id": 1,
+      "eos_token_id": 32000,
+      "head_dim": 128,
+      "hidden_act": "silu",
+      "hidden_size": 4096,
+      "initializer_range": 0.02,
+      "intermediate_size": 14336,
+      "max_position_embeddings": 32768,
+      "model_type": "mistral",
+      "num_attention_heads": 32,
+      "num_hidden_layers": 32,
+      "num_key_value_heads": 8,
+      "rms_norm_eps": 1e-05,
+      "rope_theta": 10000.0,
+      "sliding_window": 4096,
+      "tie_word_embeddings": false,
+      "torch_dtype": "float32",
+      "transformers_version": "4.49.0",
+      "use_cache": false,
+      "vocab_size": 32002
+    }
+%% Cell type:code id:91d434e4-911e-44fb-b21d-10bc00bef193 tags:
+``` python
+# Pipeline setup
+pipe = pipeline("text-generation",
+                model=model,
+                tokenizer=tokenizer,
+                return_full_text=False,
+                max_new_tokens=256)
+llm = HuggingFacePipeline(pipeline=pipe)
+```
+%% Output
+    Device set to use cuda:0
+%% Cell type:code id:f670afae-b70b-4782-9824-954540dd3ce0 tags:
+``` python
+chat_llm = ChatHuggingFace(llm=llm)
+```
+%% Cell type:markdown id:11e6bdc7-dddc-4115-8736-81c4182c1bcf tags:
+### SimpleSequentialChain
+The `SimpleSequentialChain` is the most basic form of a chain. It takes a single input, passes it to a prompt, and the output of one step is directly passed as input to the next. It does not track intermediate steps or provide access to named outputs, making it suitable for linear, single-purpose chains.
+Use case: quick linear pipelines like "generate → explain" or "summarize → expand".
+%% Cell type:code id:d948c96f-2f90-4b4e-aeb4-9aff615b985d tags:
+``` python
+template1 = "Give me a simple bullet point outline for a blog post on {topic}"
+prompt1 = ChatPromptTemplate.from_template(template1)
+chain1 = prompt1|chat_llm
+template2 = "Write a blog post using this outline: {outline}"
+prompt2 = ChatPromptTemplate.from_template(template2)
+chain2 = prompt2|chat_llm
+```
+%% Cell type:code id:5c0151ca-c79d-41fa-8e6e-776bff075def tags:
+``` python
+full_chain = chain1|chain2
+```
+%% Cell type:code id:a27103ae-0602-416c-95af-476231cf160d tags:
+``` python
+result = full_chain.invoke("Artificial Intelligence")
+print(result.content)
+```
+%% Output
+    I. Introduction
+    A. Definition of Artificial Intelligence (AI)
+    Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. It involves the development of algorithms and computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.
+    B. Brief history of AI development
+    The concept of AI has been around since the 1950s, but it was not until the 21st century that significant advancements were made in the field. Early AI research focused on symbolic AI, which involved using symbolic rules and logic to solve problems. However, this approach had limitations, and researchers soon turned to statistical methods and machine learning techniques to improve AI capabilities.
+    C. Importance and potential impact of AI on society
+    AI has the potential to revolutionize various industries and aspects of our daily lives. It can help automate repetitive tasks, improve decision-making, and enhance customer experiences. AI can also help solve complex problems in areas such as healthcare, education, and environmental conservation. However, there are also concerns about the impact of AI on employment and privacy,
+%% Cell type:markdown id:4f634a90-c37d-44eb-8b1b-a875115093fc tags:
+### SequentialChain
+`SequentialChain` is more flexible than `SimpleSequentialChain`. It supports multiple input and output variables and keeps track of intermediate outputs. Each step can depend on one or more outputs from earlier steps.
+Use case: more complex workflows that need to reuse or transform earlier outputs in later steps.
+%% Cell type:code id:99c23725-9cf9-4927-9f8d-df65bdee4d72 tags:
+``` python
+# Create the prompts
+prompt1 = PromptTemplate(input_variables=["topic"], template="Generate a question about {topic}.")
+prompt2 = PromptTemplate(input_variables=["question"], template="Provide a short answer to: {question}")
+```
+%% Cell type:code id:80a7a165-25fb-4a94-bb53-4f4ee433f896 tags:
+``` python
+# Create the chains
+chain = SequentialChain(
+    chains=[
+        SimpleSequentialChain(llm=llm, prompt=prompt1),
+        SimpleSequentialChain(llm=llm, prompt=prompt2)
+    ],
+    input_variables=["topic"],
+    output_variables=["output"]
+)
+```
+%% Cell type:code id:deaa7130-7715-4e79-9270-c307e71fd791 tags:
+``` python
+result = chain.run("artificial intelligence")
+print("SequentialChain result:", result)
+```
+%% Cell type:markdown id:c070e46d-fcae-4a7f-8901-b192f1fc3017 tags:
+### LLMRouterChain
+`LLMRouterChain` is used when you want to route a prompt to different chains or prompts depending on the input. It allows conditional execution paths, where an LLM can decide which destination (e.g., math, history, writing) to route a given input to based on predefined criteria or patterns.
+Use case: topic routing, multi-skill assistants, task-specific logic dispatching.
+%% Cell type:code id:90b398ad-bcb8-4b7e-8371-8d05b5e3f7cc tags:
+``` python
+# Define destinations and associated prompts
+destinations = {
+    "math": PromptTemplate.from_template("Solve this math problem: {input}"),
+    "history": PromptTemplate.from_template("What happened during this event: {input}"),
+}
+router_chain = LLMRouterChain.from_prompts(llm=llm, destination_prompts=destinations)
+```
+%% Cell type:code id:c86fb57a-9346-480f-92d1-773ada8042dc tags:
+``` python
+# Route a query
+output = router_chain.invoke({"input": "Battle of Hastings"})
+print("Router Output:", output)
+```
+%% Cell type:markdown id:5a5c50df-480c-40ca-86b2-e0658ea23c2d tags:
+### TransformChain
+`TransformChain` allows you to insert arbitrary Python logic into a LangChain pipeline. It lets you define a transformation function that takes in inputs and returns a modified dictionary of outputs. This is useful for pre- or post-processing data before or after it passes through a model or another chain.
+Use case: text normalization, formatting, filtering, or enrichment between model steps.
+%% Cell type:code id:cc9def90-4a6c-4d2e-9089-b8ae58d60894 tags:
+``` python
+# Define a simple transformation function
+def uppercase_fn(inputs: dict) -> dict:
+    return {"output": inputs["text"].upper()}
+transform_chain = TransformChain(input_variables=["text"], output_variables=["output"], transform=uppercase_fn)
+```
+%% Cell type:code id:cea099d7-a867-410a-880c-1636c1b2feb1 tags:
+``` python
+# Run it
+output = transform_chain.run({"text": "this should be uppercase"})
+print("TransformChain output:", output)
+```