Large Language Model Notes

Published 2024-01-17.
Time to read: 4 minutes.

This page is part of the llm collection.

LLMs may seem like chatbots, but actually they are text prediction engines. In other words, LLMs are just very advanced autocomplete. The models ingest a string of text, compare it to the texts that it was trained on, and then make a guess about what word might come next. For example, if you give an LLM the phrase “The quick brown fox jumps over the lazy”, the LLM knows the next word is likely to be “dog”.

LLMs acquire the ability to anticipate the next word in a sentence based on the context of the previous words through training on a massive amount of text data. They are frequently referred to as autoregressive mathematical models for this reason.

Prompts

If you just give an LLM model a naked prompt like “How exactly is a prompt fed to the LLMs?”, you may or may not get a useful response. Instead, give the model a prompt like the following to force the LLM to reply in the way that it thinks a helpful assistant would.

This is a conversation between a user and a helpful assistant.
USER: How is a prompt fed to an LLM?
ASSISTANT:

An OpenAI chat model (GPT 3.5, GPT 4) request, and many other models, have 3 components.

System Prompt: define the AI's persona, the objective and behavior, and specific tasks or rules. This definition can vary between users and be changed between prompts. System designers should establish a system message to define the desired context or behavior from the model. This will shape subsequent interactions. For example:
System: You are a renowned poet. You write poems in the tone and style of Dr. Seuss. Only include topics appropriate for children. If there are too many System instructions the output may degrade. You can provide additional instructions in the User prompt!
User Prompt: provides the model with inputs or conversation to get the desired output. For example:
User: Please write an exciting poem targeted toward kindergarten students with the following: Poem Title: The First Day of School Topics: Recess, Art Class, Snack Time
Assistant Prompt: use the assistant prompt to help the model learn the desired output format. This is especially helpful to generate arrays, JSON, or HTML without extra chat text. For example:
System: You output an array of 5 strings with nouns that are kitchen objects. User: Give me an output of 5 utensils Assistant: ["spoon", "fork", "knife", "spatula", "chopsticks"] User: Now, give me an array of 5 {{my_input}}

The temperature parameter causes the response to vary from deterministic (0) to very creative (1.0). The default value is 0.8.

LLM Glossary

You might want to review the terms defined in my Quick Review of Probability Theory before reading this section. Those terms are commonly used in technical discussions of LLMs, as are the following.

Embedding: a vector that represents the significance of a word token or a portion of an image.
Context: an LLM model’s finite sliding window into its data, used because models can only handle a small portion of text.
Model: A diffusion probabilistic model, also called simply a diffusion model, is a parameterized Quick Review of Probability Theory that uses variational inference to produce samples matching the data after finite time.
Transition function
Ground truth: The predicted value.
PyTorch model (.pt and .pth filetypes): the common format for models trained using the PyTorch framework. It represents the state_dict (or the “state dictionary”), which is a Python dictionary object that maps each layer in the model to its trainable parameters (weights and biases).
Gaussian noise
Diffusion Models: Diffusion models take an input image and gradually add gaussian noise to it over many timesteps, then the noise is removed. This idea was inspired from the diffusion process known to the field of thermodynamics. Diffusion is the net movement of a substance down its concentration gradient until the gradient no longer exists. This movement represents an increase in the entropy or disorder in the system as different substances mix together.
DDPM: The seminal paper entitled Denoising Diffusion Probabilistic Models.
Tensor: See also Quick Review of Probability Theory.

In the field of computer science, tensors are a specialized data structure that are similar to arrays and matrices. Tensors are often used to encode the inputs and outputs of a model, as well as model parameters.

PyTorch tensors are similar to NumPy’s ndarrays, except that they can run on GPUs and other hardware accelerators. PyTorch tensors and NumPy arrays can often share the same underlying memory.
TensorFlow: TensorFlow is a popular framework for training machine learning models. TensorFlow models are generally quite large, and storing the full model in memory can be expensive.
Checkpoint file (.ckpt filetype): A TensorFlow checkpoint file just contains the weights of a trained TensorFlow model, and does not contain a description of the computation that the model performs. Often the model weights are all that is needed for a computation.
Safetensor: The .safetensors file format is a high-performance means of safely storing and retrieving large tensors. It is more secure than traditional formats, such as Python’s pickle. Safetensor is also faster than pickle, making it a good choice for production deployment. .ckpt-format models can be converted to .safetensors
Chinchilla: Chinchilla is a 70B parameter compute-optimal model with 1.4 trillion tokens that outperforms GPT-3. Read the paper and the code.
PaLM: a 540-billion parameter LLM from Google Research that can generalize across domains and tasks while being highly efficient.
Llama: Llama is a large language model (LLM) released by Meta AI in February 2023. A variety of model sizes were trained, ranging from 7 billion to 65 billion parameters. LLaMA's developers reported that the 13 billion parameter model's performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175 billion parameters) and that the largest model was competitive with state-of-the-art models, such as PaLM and Chinchilla.

Mainframe image; Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License by PekoeBlaze

© Copyright 1994-2025 Michael Slinn. All rights reserved.
For requests to use this copyright-protected work in any manner, email mslinn@mslinn.com.

This website was made using Jekyll and Mike Slinn’s Jekyll Plugins.