Module 4: Large Language Models

Module Overview

This updated module takes you beyond interacting with existing LLM interfaces to building and customizing your own LLM-powered applications. Building on the foundation from Module 3, you'll learn to work directly with LLM APIs and local models to create sophisticated, context-aware conversational agents.

You'll explore how to design and implement local LLM bots with customizable prompts and parameters, experiment with different model configurations, and tackle advanced challenges like implementing memory systems for more coherent conversations. This hands-on approach will give you practical experience in building production-ready LLM applications while understanding the technical considerations involved in deploying these powerful models.

Learning Objectives

  • Develop and customize local LLM bots with parameterized prompts and configurations
  • Implement memory systems and context management for enhanced conversational experiences

Objective 01 - OpenAI API - The API Behind ChatGPT

API Basics

In the digital landscape, software applications often possess unique functionalities that can be beneficial when integrated into other applications. A special interface is designed to handle external requests from other applications to facilitate such feature-sharing. This interface, known as the Application Programming Interface (API), serves as a secure and streamlined conduit for sharing capabilities and data between different software solutions. By utilizing an API, developers can enrich their own applications with the functionalities of another, amplifying the utility and reach of both platforms.

What is OpenAI's API and SDK?

An API serves as a bridge that allows two different applications to communicate with each other. In the context of OpenAI's GPT, the API provides a way to programmatically interact with OpenAI's models for various tasks like text generation, summarization, and more. From scratch, APIs can be involved, but with the use of tools like Software Development Kits, you can speed up the process.

Software Development Kits (SDKs) are collections of software tools and libraries that simplify complex actions, making it easier to interact with an API. OpenAI provides an SDK for Python that wraps the raw API calls, offering a more Pythonic way to make requests to the GPT models.

Setting Up OpenAI Python SDK

To work with the OpenAI Python SDK, you'll need:

Installation

Installing the OpenAI Python SDK is simple. Use pip to install the package:

pip install openai

API Key Configuration

After obtaining your API key from OpenAI, you can set it up in one of two ways:

Making First SDK Call

Initializing SDK

To use the OpenAI SDK in your Python code, you need to import it and initialize it with your API key.

import os

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

Simple Example: Text Generation

Here's a quick example to generate text using the GPT-3 model via the OpenAI SDK:

text = "Welcome to Data Science"
response = client.completions.create(
    model="gpt-3.5-turbo-instruct",
    prompt=f"Translate the following English text to French: '{text}'",
    max_tokens=60
)
generated_text = response.choices[0].text.strip()
print(generated_text)

In this example, the text generated will be the French translation of the English text specified in the text variable.

Objective 02 - OpenAI API SDK

OpenAI API SDK

Introduction

The OpenAI API SDK serves as a powerful tool for leveraging machine learning capabilities within your applications. By manipulating parameters such as prompt, max_tokens, temperature, and top_p, users can fine-tune the output to meet their specific needs. Whether you are aiming for more deterministic results or embracing randomness, understanding these options enables you to exploit the full potential of the API.

prompt

The prompt parameter is the initial string that guides the model in generating a completion. The more specific and contextual the prompt, the more accurate the generated text will be.

max_tokens

This parameter limits the number of tokens in the output. If you set max_tokens to 50, the model will generate text up to 50 tokens long.

temperature

The temperature parameter controls the randomness of the output. A higher value like 0.8 yields more random outputs, while a lower value like 0.2 makes the output more deterministic.

top_p

This parameter controls the nucleus sampling, which filters the token pool before choosing the next token. Values are between 0 and 1; lower values make the text more focused and deterministic.

Handling SDK Responses

Response Object

When you make an API call using the SDK, you receive a response object. This object contains various pieces of information, including the generated text.

To extract the generated text from the response object, you can use the following code snippet:

import os

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

text = "Welcome to Data Science"
response = client.completions.create(
    model="gpt-3.5-turbo-instruct",
    prompt=f"Translate the following English text to French: '{text}'",
    max_tokens=60
)

generated_text = response.choices[0].text.strip()
print(generated_text)

As a function...

def extract_reply(response):
    return response.choices[0].message.content.strip()

Error Handling

To handle errors gracefully, you can use Python's try-except blocks. Here's an example:

import os

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

try:
    text = "Welcome to Data Science"
    response = client.completions.create(
        model="gpt-3.5-turbo-instruct",
        prompt=f"Translate the following English text to French: '{text}'",
        max_tokens=60
    )
    
    generated_text = response.choices[0].text.strip()
    print(generated_text)
except Exception as e:
    print(f"An error occurred: {e}")

You may need to account for more common errors; errors you might find in any API. Here is the short list of those errors you may encounter:

Advanced SDK Usage

Batching Requests

To make multiple requests at once, you can use the SDK's batch support. This is more efficient than making individual calls.

Pagination

When dealing with a large amount of generated text, you can paginate the results. This helps in managing the tokens effectively.

Deep Dive: System Prompts in OpenAI's GPT API with Python SDK

System prompts are special instructions given to the model to guide its behavior throughout an interactive session or for a specific task. These are often used in conversational agents, content filters, and other scenarios where you need to condition the model's responses according to specific guidelines or goals.

Types of System Prompts

Format of System Prompts

The system prompt is generally set up at the beginning of an interaction and stays consistent throughout. It's often placed at the top of the prompt string, separate from user or task-specific prompts, to provide a general context or instruction set for the model.

Implementing System Prompts with SDK

Here's how you can include a system prompt while generating text using the chat interface:

import os

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))


def extract_reply(response):
    return response.choices[0].message.content.strip()


messages = [
    {"role": "system", "content": "You are an assistant that speaks like Shakespeare."},
    {"role": "user", "content": "How is the weather today?"}
]

result = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
)

reply = extract_reply(result)
print(reply)

Considerations for Using System Prompts

By mastering the use of system prompts, you can make the most out of OpenAI's GPT API and Python SDK for a wide array of specialized and interactive tasks.

Conclusion

In this module, we delved into the OpenAI Python SDK as a powerful tool for interacting with GPT models. Starting from the basic setup requirements and installation, we progressed through the various parameters like prompt, max_tokens, temperature, and top_p that help fine-tune the behavior and output of the GPT model.

We also took a deep dive into the concept of system prompts, a versatile feature that allows you to guide the model's behavior for specialized tasks, enforce ethical guidelines, or add a conversational context. Whether you're building a conversational agent, a content filter, or a specialized text generator, understanding how to effectively utilize system prompts can be a game-changer.

As you move forward, remember that the key to effectively using the API and SDK lies in your understanding of these parameters and features. Each project may require a different combination of them, so it's important to experiment and find what works best for your specific needs.

With this foundation, you are well-prepared to explore more advanced topics and applications in future modules.

Additional Resources

Objective 03 - Local LLM Setup

Introduction

This module outlines the setup and usage of a Local Language Model (LLM) to create a chatbot named Marv, who is programmed to provide sarcastic responses. The LLM is powered by the llama_cpp Python package and is fine-tuned to answer queries based on the persona set by the system prompt.

Topics Covered

Installing Dependencies

To get started, install the llama-cpp-python package using pip.

pip install llama-cpp-python

Download the LLM

Initializing the LLM

Import the Llama class and initialize it with the appropriate model path.

from llama_cpp import Llama
llm = Llama(model_path="./app/models/openorca-platypus2-13b.Q4_K_M.gguf")

Crafting System and User Prompts

Set up the system and user prompts. The system prompt acts as the instruction for the LLM, specifying its persona. The user prompt serves as the query or statement from the user.

system_prompt = "You are Marv, a chatbot that reluctantly answers questions with sarcastic responses."
user_prompt = "Hi Marv, what's up?"

Running the LLM and Obtaining a Response

Create a composite prompt by combining the system and user prompts and run the LLM. Extract and print the response.

prompt = f"### Instruction: {system_prompt}\n\n{user_prompt}\n\n### Response:\n"
raw_output = llm(prompt, stop=["###"], max_tokens=-1, temperature=1)
reply = raw_output.get("choices")[0].get("text").strip()
print(reply)

Understanding the Parameters

This module equips you with the know-how to set up and run a sarcastic chatbot using a Local Language Model. Feel free to modify the prompts and parameters as needed.

Putting It All together: Marv the Sarcastic Bot

from llama_cpp import Llama

system_prompt = "You are Marv, a chatbot that reluctantly answers " \
                "questions with sarcastic responses."

user_prompt = "Hi Marv, what's up?"

prompt = f"### Instruction: {system_prompt}\n\n" \
         f"{user_prompt}\n\n" \
         f"### Response:\n"

llm = Llama(model_path="./app/models/openorca-platypus2-13b.Q4_K_M.gguf")

raw_output = llm(
    prompt,
    stop=["###"],
    max_tokens=-1,
    temperature=1,
)

reply = raw_output.get("choices")[0].get("text").strip()
print(reply)

Additional Resources

Guided Project

This guided project focuses on hands-on LLM implementation and does not have traditional repository materials. For students interested in exploring additional technical background, you can review the legacy Time Series Forecasting material as supplementary content, though the current guided project and assignment are the primary focus.

Building a Chatbot with Persitent Memory

Module Assignment

This module features a hands-on implementation assignment that differs from our typical structured exercises.

Building an Advanced Local LLM Bot

Objective:

The main goal of this assignment is to develop a local LLM bot with customizable prompts and parameters. As a stretch goal, you will implement a short-term memory model for the bot, allowing for more coherent and context-aware interactions.

The instructions for this project are intentionally a bit vague. The purpose of this is for you to build something of your own design, which can present many challenges and more importantly, a portfolio-worthy project.

Prerequisites:

  • Python programming experience
  • Basic understanding of machine learning, NLP, and LLMs
  • Access to an LLM API or local LLM setup

Steps:

  1. Initial Setup
    • Set up a basic bot using a local LLM or an API service.
  2. Experimentation
    • Experiment with various prompts and parameters to understand their impact on the bot's responses.
  3. Refactoring
    • Refactor your bot into a function or class, making sure to parameterize the user_prompt.
  4. Memory Module (Stretch Goal)
    • Implement a memory system for your bot. This can range from simply feeding back previous interactions to a more complex approach like a vector database for automatic relevant recall.
  5. Evaluation
    • Evaluate the performance in terms of coherence, relevance, and context-awareness.
  6. Documentation
    • Document your design choices, implementation details, and observations.
  7. Peer Review (Stretch Goal)
    • Share your project for peer review, focusing on the bot's design, performance, and memory model.
  8. Final Submission
    • Submit your code and documentation for evaluation.

Evaluation Criteria:

  • Quality of the design and implementation of the bot
  • Effectiveness of the parameterization and customization
  • Implementation and performance of the memory model (if attempted)
  • Peer review feedback (optional)

Resources:

Assignment Solution Video