Module 4: Large Language Models
Module Overview
This updated module takes you beyond interacting with existing LLM interfaces to building and customizing your own LLM-powered applications. Building on the foundation from Module 3, you'll learn to work directly with LLM APIs and local models to create sophisticated, context-aware conversational agents.
You'll explore how to design and implement local LLM bots with customizable prompts and parameters, experiment with different model configurations, and tackle advanced challenges like implementing memory systems for more coherent conversations. This hands-on approach will give you practical experience in building production-ready LLM applications while understanding the technical considerations involved in deploying these powerful models.
Learning Objectives
- Develop and customize local LLM bots with parameterized prompts and configurations
- Implement memory systems and context management for enhanced conversational experiences
Objective 01 - OpenAI API - The API Behind ChatGPT
API Basics
In the digital landscape, software applications often possess unique functionalities that can be beneficial when integrated into other applications. A special interface is designed to handle external requests from other applications to facilitate such feature-sharing. This interface, known as the Application Programming Interface (API), serves as a secure and streamlined conduit for sharing capabilities and data between different software solutions. By utilizing an API, developers can enrich their own applications with the functionalities of another, amplifying the utility and reach of both platforms.
What is OpenAI's API and SDK?
An API serves as a bridge that allows two different applications to communicate with each other. In the context of OpenAI's GPT, the API provides a way to programmatically interact with OpenAI's models for various tasks like text generation, summarization, and more. From scratch, APIs can be involved, but with the use of tools like Software Development Kits, you can speed up the process.
Software Development Kits (SDKs) are collections of software tools and libraries that simplify complex actions, making it easier to interact with an API. OpenAI provides an SDK for Python that wraps the raw API calls, offering a more Pythonic way to make requests to the GPT models.
Setting Up OpenAI Python SDK
To work with the OpenAI Python SDK, you'll need:
- Python 3.6 or higher
- OpenAI account
- API key from OpenAI
Installation
Installing the OpenAI Python SDK is simple. Use pip to install the package:
pip install openai
API Key Configuration
After obtaining your API key from OpenAI, you can set it up in one of two ways:
- Environment Variable: Set an environment variable called
OPENAI_API_KEY
with the key as its value. - Directly in Code: Pass the API key as an argument while initializing the SDK.
Making First SDK Call
Initializing SDK
To use the OpenAI SDK in your Python code, you need to import it and initialize it with your API key.
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
Simple Example: Text Generation
Here's a quick example to generate text using the GPT-3 model via the OpenAI SDK:
text = "Welcome to Data Science"
response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=f"Translate the following English text to French: '{text}'",
max_tokens=60
)
generated_text = response.choices[0].text.strip()
print(generated_text)
In this example, the text generated will be the French translation of the English text specified in the
text
variable.
Objective 02 - OpenAI API SDK
OpenAI API SDK
Introduction
The OpenAI API SDK serves as a powerful tool for leveraging machine learning capabilities within your
applications. By manipulating parameters such as prompt
, max_tokens
,
temperature
, and top_p
, users can fine-tune the output to meet their specific
needs. Whether you are aiming for more deterministic results or embracing randomness, understanding
these options enables you to exploit the full potential of the API.
prompt
The prompt
parameter is the initial string that guides the model in generating a
completion. The more specific and contextual the prompt, the more accurate the generated text will be.
max_tokens
This parameter limits the number of tokens in the output. If you set max_tokens
to 50, the
model will generate text up to 50 tokens long.
temperature
The temperature
parameter controls the randomness of the output. A higher value like 0.8
yields more random outputs, while a lower value like 0.2 makes the output more deterministic.
top_p
This parameter controls the nucleus sampling, which filters the token pool before choosing the next token. Values are between 0 and 1; lower values make the text more focused and deterministic.
Handling SDK Responses
Response Object
When you make an API call using the SDK, you receive a response object. This object contains various pieces of information, including the generated text.
To extract the generated text from the response object, you can use the following code snippet:
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
text = "Welcome to Data Science"
response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=f"Translate the following English text to French: '{text}'",
max_tokens=60
)
generated_text = response.choices[0].text.strip()
print(generated_text)
As a function...
def extract_reply(response):
return response.choices[0].message.content.strip()
Error Handling
To handle errors gracefully, you can use Python's try-except blocks. Here's an example:
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
try:
text = "Welcome to Data Science"
response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=f"Translate the following English text to French: '{text}'",
max_tokens=60
)
generated_text = response.choices[0].text.strip()
print(generated_text)
except Exception as e:
print(f"An error occurred: {e}")
You may need to account for more common errors; errors you might find in any API. Here is the short list of those errors you may encounter:
- RateLimitExceeded: You've exceeded the number of requests permitted in a given time frame.
- ResourceNotFound: The engine specified does not exist.
- InvalidRequestError: The API request was malformed.
Advanced SDK Usage
Batching Requests
To make multiple requests at once, you can use the SDK's batch support. This is more efficient than making individual calls.
Pagination
When dealing with a large amount of generated text, you can paginate the results. This helps in managing the tokens effectively.
Deep Dive: System Prompts in OpenAI's GPT API with Python SDK
System prompts are special instructions given to the model to guide its behavior throughout an interactive session or for a specific task. These are often used in conversational agents, content filters, and other scenarios where you need to condition the model's responses according to specific guidelines or goals.
Types of System Prompts
- Conversational Directives
You can use system prompts to instruct the model to behave like a specific character or to adopt a particular tone, style, or point of view. For example, instructing the model to speak like Shakespeare or to adopt a formal tone. - Content Filtering
System prompts can also be used to enforce ethical guidelines, like avoiding generating harmful or inappropriate content. - Task-Specific Instructions
For specialized tasks like code generation, data analysis, or text summarization, system prompts can provide high-level directives that guide the model's behavior throughout the session.
Format of System Prompts
The system prompt is generally set up at the beginning of an interaction and stays consistent throughout. It's often placed at the top of the prompt string, separate from user or task-specific prompts, to provide a general context or instruction set for the model.
Implementing System Prompts with SDK
Here's how you can include a system prompt while generating text using the chat interface:
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
def extract_reply(response):
return response.choices[0].message.content.strip()
messages = [
{"role": "system", "content": "You are an assistant that speaks like Shakespeare."},
{"role": "user", "content": "How is the weather today?"}
]
result = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
)
reply = extract_reply(result)
print(reply)
Considerations for Using System Prompts
- Token Limit: Remember that system prompts consume tokens, so be mindful of the
max_tokens
parameter to ensure the output is not truncated. - Prompt Clarity: The clearer and more specific your system prompt, the better the model will be at following the guidelines or rules you've set.
- Testing: It's crucial to test the effectiveness of a system prompt rigorously to ensure it guides the model's behavior as intended.
By mastering the use of system prompts, you can make the most out of OpenAI's GPT API and Python SDK for a wide array of specialized and interactive tasks.
Conclusion
In this module, we delved into the OpenAI Python SDK as a powerful tool for interacting with GPT models.
Starting from the basic setup requirements and installation, we progressed through the various
parameters like prompt
, max_tokens
, temperature
, and
top_p
that help fine-tune the behavior and output of the GPT model.
We also took a deep dive into the concept of system prompts, a versatile feature that allows you to guide the model's behavior for specialized tasks, enforce ethical guidelines, or add a conversational context. Whether you're building a conversational agent, a content filter, or a specialized text generator, understanding how to effectively utilize system prompts can be a game-changer.
As you move forward, remember that the key to effectively using the API and SDK lies in your understanding of these parameters and features. Each project may require a different combination of them, so it's important to experiment and find what works best for your specific needs.
With this foundation, you are well-prepared to explore more advanced topics and applications in future modules.
Additional Resources
Objective 03 - Local LLM Setup
Introduction
This module outlines the setup and usage of a Local Language Model (LLM) to create a chatbot named Marv,
who is programmed to provide sarcastic responses. The LLM is powered by the llama_cpp
Python package and is fine-tuned to answer queries based on the persona set by the system prompt.
Topics Covered
- Installing Dependencies
- Initializing the LLM
- Crafting System and User Prompts
- Running the LLM and Obtaining a Response
- Understanding the Parameters
Installing Dependencies
To get started, install the llama-cpp-python
package using pip.
pip install llama-cpp-python
Download the LLM
Initializing the LLM
Import the Llama
class and initialize it with the appropriate model path.
from llama_cpp import Llama
llm = Llama(model_path="./app/models/openorca-platypus2-13b.Q4_K_M.gguf")
Crafting System and User Prompts
Set up the system and user prompts. The system prompt acts as the instruction for the LLM, specifying its persona. The user prompt serves as the query or statement from the user.
- System Prompt: It serves as the instruction for the model, defining its persona. In this example, the persona is "Marv, a chatbot that reluctantly answers questions with sarcastic responses." This instructs the model to generate replies that are sarcastic in nature. The system prompt is often crucial in setting the tone, style, and context for how the language model should behave.
- User Prompt: This is the query or statement from the user. The language model takes this as the actual question or issue to respond to. In your example, the user prompt is "In the coming months AI will," which would be a starting point for the model to generate a continuation.
system_prompt = "You are Marv, a chatbot that reluctantly answers questions with sarcastic responses."
user_prompt = "Hi Marv, what's up?"
Running the LLM and Obtaining a Response
Create a composite prompt by combining the system and user prompts and run the LLM. Extract and print the response.
prompt = f"### Instruction: {system_prompt}\n\n{user_prompt}\n\n### Response:\n"
raw_output = llm(prompt, stop=["###"], max_tokens=-1, temperature=1)
reply = raw_output.get("choices")[0].get("text").strip()
print(reply)
Understanding the Parameters
stop=["###"]
: Stops token generation at "###".max_tokens
: Sets a limit on the number of tokens. -1 for no limit.temperature
: Controls the randomness of output, ranging from 0 to 1.
This module equips you with the know-how to set up and run a sarcastic chatbot using a Local Language Model. Feel free to modify the prompts and parameters as needed.
Putting It All together: Marv the Sarcastic Bot
from llama_cpp import Llama
system_prompt = "You are Marv, a chatbot that reluctantly answers " \
"questions with sarcastic responses."
user_prompt = "Hi Marv, what's up?"
prompt = f"### Instruction: {system_prompt}\n\n" \
f"{user_prompt}\n\n" \
f"### Response:\n"
llm = Llama(model_path="./app/models/openorca-platypus2-13b.Q4_K_M.gguf")
raw_output = llm(
prompt,
stop=["###"],
max_tokens=-1,
temperature=1,
)
reply = raw_output.get("choices")[0].get("text").strip()
print(reply)
Additional Resources
Guided Project
This guided project focuses on hands-on LLM implementation and does not have traditional repository materials. For students interested in exploring additional technical background, you can review the legacy Time Series Forecasting material as supplementary content, though the current guided project and assignment are the primary focus.
Building a Chatbot with Persitent Memory
Module Assignment
This module features a hands-on implementation assignment that differs from our typical structured exercises.
Building an Advanced Local LLM Bot
Objective:
The main goal of this assignment is to develop a local LLM bot with customizable prompts and parameters. As a stretch goal, you will implement a short-term memory model for the bot, allowing for more coherent and context-aware interactions.
The instructions for this project are intentionally a bit vague. The purpose of this is for you to build something of your own design, which can present many challenges and more importantly, a portfolio-worthy project.
Prerequisites:
- Python programming experience
- Basic understanding of machine learning, NLP, and LLMs
- Access to an LLM API or local LLM setup
Steps:
- Initial Setup
- Set up a basic bot using a local LLM or an API service.
- Experimentation
- Experiment with various prompts and parameters to understand their impact on the bot's responses.
- Refactoring
- Refactor your bot into a function or class, making sure to parameterize the user_prompt.
- Memory Module (Stretch Goal)
- Implement a memory system for your bot. This can range from simply feeding back previous interactions to a more complex approach like a vector database for automatic relevant recall.
- Evaluation
- Evaluate the performance in terms of coherence, relevance, and context-awareness.
- Documentation
- Document your design choices, implementation details, and observations.
- Peer Review (Stretch Goal)
- Share your project for peer review, focusing on the bot's design, performance, and memory model.
- Final Submission
- Submit your code and documentation for evaluation.
Evaluation Criteria:
- Quality of the design and implementation of the bot
- Effectiveness of the parameterization and customization
- Implementation and performance of the memory model (if attempted)
- Peer review feedback (optional)
Resources:
Assignment Solution Video
Additional Resources
LLM APIs and Platforms
- OpenAI API Reference
- Hugging Face Transformers Pipelines
- Anthropic Claude API Documentation
- Google AI Platform Documentation
Local LLM Implementation
- Ollama: Run LLMs Locally
- llama.cpp: Efficient LLM Inference
- LlamaIndex: Data Framework for LLMs
- LangChain: Building Applications with LLMs