Chapter 3 LLM prompting fundamentals

Start of practical material

We’ll cover LLM prompting theory through practical examples. This section will give you a deep understanding of how chatbots work and teach you some advanced prompting skills. By understanding these technical foundations, you’ll be better equipped to leverage LLMs effectively for your R programming and data analysis tasks, and to troubleshoot when you’re not getting the results you expect.

Software requirements: VScode with R or Rstudio, ellmer package, API license.

3.1 Setup authorisation

First, you need to get an API key from the provider. Login to the provider’s website and follow the instructions.

Then, you need to add the key to your .Renviron file:

usethis::edit_r_environ()

Then type in your key like this:

OPENROUTER_API_KEY="xxxxxx"

Then restart R. ellmer will automatically find your key so long as you use the recommended envirment variable names. See ?ellmer::chat_openrouter (or chat_xxx where xxx is whatever provider you are using).

3.2 Understanding how LLMs work

Large Language Models (LLMs) operate by predicting the next token in a sequence, one token at a time. To understand how this works in practice, we’ll use the ellmer package to demonstrate some fundamental concepts.

By using ellmer to access a LLM through the API we get as close to the raw LLM as we are able. Later on we will use ‘coding assistants’ (e.g. copilot) which put another layer of software between you and the LLM.

First, let’s set up our environment and create a connection to an LLM.

library(ellmer)

# Initialize a chat with Claude
chat <- chat_openrouter(
  system_prompt = "",
  model = "anthropic/claude-3.5-haiku",
  api_args = list(max_tokens = 50)
)
chat$chat("Ecologists like to eat ")

Notice that the model doesn’t do what we intend, which is complete the sentence. LLMs have a built in Let’s use the ‘system prompt’ to provide it with strong directions.

Tip: The system prompt sets the overall context for a chat. It is meant to be a stronger directive than the user prompt. In most chat interfaces (e.g. copilot) you are interacting with the user prompt. The provider has provided the system prompt, here’s the system prompt for the chat interface version of anthropic (Claude)

chat <- chat_openrouter(
  system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides",
  model = "anthropic/claude-3.5-haiku",
#   model = "anthropic/claude-3.7-sonnet",
  api_args = list(max_tokens = 50)
)
chat$chat("Ecologists like to eat ")

Tip: It is generally more effective to tell the LLM what to do rather than what not to do (just like people!).

3.2.1 Token-by-token prediction

LLMs don’t understand text as complete sentences or concepts; they predict one token at a time based on the patterns they’ve learned during training. A token is roughly equivalent to a word part, a word, or a common phrase.

Let’s see this in action by generating text token by token with a very limited number of tokens:

 prompt <- "Ecologists like to eat"

for(i in 1:5) {
        single_token_chat <- chat_openrouter(
          system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides",
        model = "anthropic/claude-3.5-haiku",
        api_args = list(max_tokens = 1)
    )
    response <- single_token_chat$chat(prompt)
    prompt <- paste(prompt, response, sep = "")
    cat("Token", i, ":", response, "\n")
}

# Display the complete sequence
cat("Full sequence:", prompt, "\n")

This shows how the model builds its response incrementally (actually its not quite how streams of predictions are implemented, but it gives you the right idea). Each token is predicted based on the probability distribution of possible next tokens given the context.

3.2.2 Temperature effects

The “temperature” parameter controls the randomness of token predictions. Lower temperatures (closer to 0) make the model more deterministic, while higher temperatures (closer to 2) make it more creative and unpredictable.

Let’s compare responses with different temperatures:

# Create chats with different temperature settings
chat_temp <- chat_openrouter(
          system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides",
        model = "anthropic/claude-3.5-haiku",
        api_args = list(max_tokens = 50, temperature = 0)
    )

chat_temp$chat("Marine ecologists like to eat ")

chat_temp <- chat_openrouter(
          system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides",
        model = "anthropic/claude-3.5-haiku",
        api_args = list(max_tokens = 50, temperature = 2)
    )

chat_temp$chat("Marine ecologists like to eat ")

At low temperatures, you’ll notice the model consistently produces similar “safe” completions that focus on the most probable next tokens. As temperature increases, the responses become more varied and potentially more creative, but possibly less coherent.

3.2.3 Comparing model complexity

Different models have different capabilities based on their size, training data, and architecture.

For example anthropic/claude-3.5-haiku has many fewer parameters than anthropic/claude-3.7-sonnet. This means that the latter model is more complex and can handle more nuanced tasks. However, haiku is significantly cheaper to run. Haiku is 80c per million input tokens vs $3 for Sonnet. Output tokens are $4 vs $15

For the kind of simple tasks we are doing here, both give similar results. We will compare models later in the workshop when we use github copilot.

3.2.4 Understanding context windows

LLMs have a limited “context window” - the amount of text they can consider when generating a response. This affects their ability to maintain coherence over long conversations. For most LLMs this is about 100-200K tokens, which includes input and output. However, Google’s models have up to 1 million tokens.

We’ll come back to the context window when we explore more advanced tools with longer prompts. These simple prompts don’t come close to using up the context window.

3.3 DIY stats bot

Let’s put together what we’ve learnt so far and built our own chatbot. I’ve provided you with a detailed system prompt that implements a chat bot that specialises in helping with statistics. First, we read the bot markdown file from github, then we can use it in our chat session.

stats_bot <- readr::read_file(url("https://raw.githubusercontent.com/cbrown5/R-llm-workshop/refs/heads/main/resources/DIY-stats-bot-system.md"))

chat_stats <- chat_openrouter(
          system_prompt = stats_bot,
        model = "anthropic/claude-3.7-sonnet",
        api_args = list(max_tokens = 5000)
    )

ellmer has a few different options for interacting with chat bots. We’ve seen the ‘chat’ option. We can also have a live_console() or live_browser() (requires installing shinychat) chat. Let’s use one of those options. With live_browser() you’ll also see the browser automatically formats any markdown in the chat.

live_browser(chat_stats)
# live_console(chat_stats)

Here are some suggested questions to start, but feel free to try your own. “Who are you?” “Use stats mode to provide me with some suggestions for how I could make a predictive model of a variable y, where I have a large number of potential explanatory variables.”

Tip: How many of you started using “DIY-stats-bot-system.md” without first reading it? Did you find the easter egg in my prompt? For security you should ALWAYS read prompts before you start running them through LLM chats. We’ll see later that LLMs can be given ‘tools’ which allow them to run code on your computer. Its easy to see how a malicious prompt could mis-use these tools. We’ll cover security later.

3.3.1 Improving the stats bot

Make a local copy of the stats bot system prompt and try editing it. Try different commands within it and see how your chat bot responds (you’ll have to open a new chat object each time).

Here’s some ideas.

  • Try making a chat bot that is a verhment Bayesian that abhors frequentist statistics.
  • You could provide it with more mode-specific instructions. For instance, try to get the chatbot to suggest appropriate figures for verifying statistical models.
  • Try different temperatures.
  • Add your own easter egg.

Tip: Adjectives, CAPITALS, *markdown* formatting can all help create emphasis so that your model more closely follows your commands. I used ‘abhors’ and ‘verhment’ above on purpose.

3.4 Advanced prompting

LLMs can be given access to tools (github copilot is an example). Tools give the LLMs the ability to use tools to perform tasks on your computer. For example, copilot can edit files.

The next step beyond tool use is to create agents. Agents write commands then also run those commands and feed the results back to themselves. Agents are therefore autonomous coders.

Tip: LLM “Tools” give the LLM assistant the ability to take actions on your computer. There are a huge range of tools out there, including tools that write files, edit files, perform web searchers and download data. “Agents” are LLMs that return the results of tool use back to themselves. Therefore they can code and improve that code without intervention from humans (though intervention is often needed).

We can turn out chatbot into a simple agent. This will help you understand how agents work. What we will do is give a command to format all R code in a specific way. Then we can have an R function that checks every new response for R code, if it detects R code it will write this to a file.

3.4.1 Zero-shot

First make a new system message as a .md file, e.g. here: “resources/stats-bot-simple.md” in your project directory.

Add to this message:

You are an expert in statistial analysis with the R program.  Write your response with R code in <code></code> tags.

(You can also use our chat bot from before, but it might be simpler to make this shorter one)

Now give the system message a test.

stats_bot_simple <- readr::read_file("resources/stats-bot-simple.md")

chat_stats <- chat_openrouter(
          system_prompt = stats_bot,
        model = "anthropic/claude-3.7-sonnet",
        api_args = list(max_tokens = 400)
    )

response <- chat_stats$chat("What code can I use to regress temperature against fish_size, fish_size and temperature are both continuous real numbers. Include in your response code to simulate some example data, code to show how to do the regression and summary plots. ")
response

If it works the response should look something like this:

<code>
dat <- data.frame(temperature = rnorm(100, mean = 20))
dat$fish_size <- 10 + temperature*2 + rnorm(100)
model <- lm(fish_size ~ temperature)
</code>

It didn’t work for me when I tried! It kept returning the R code in markdown formatting in inconsistent ways. We want a consistent response so we can parse it easily.

Anyway, repeat that a few times to see if it consistently formats the code as we requested. You can also try using “anthropic/claude-3.5-haiku” as your model (its cheaper) to see if that is smart enough to consistently format the code.

Tip: What we just tried is called ‘zero-shot prompting’. Zero-shot means we didn’t give the LLM any examples to follow, we assume it knows what to do.

3.4.2 One-shot and multi-shot prompts

Giving an example (one-shot) or multiple examples (multi-shot) can improve the precision of the LLMs responses. Let’s update the system prompt to see if we get more reliable answers with some examples:

You are an expert in statistial analysis with the R program.  Write your response with R code in <code></code> tags.

### Examples of formatting R code

Single line example

<code> dat <- read.csv("myfile.csv") </code> 

Multiple line example

<code> 
dat <- read.csv("myfile.csv") 
ggplot(dat) + 
  aes(x = x, y = y) + 
  geom_point() 
</code> 

I saved this updated system to a file: “resources/stats-bot-simple-multiple-shot.md”. Now test it out.

stats_bot_simple <- readr::read_file("resources/stats-bot-simple-multiple-shot.md")

chat_stats <- chat_openrouter(
          system_prompt = stats_bot_simple,
        model = "anthropic/claude-3.7-sonnet",
        api_args = list(max_tokens = 1000)
    )

response <- chat_stats$chat("What code can I use to regress temperature against fish_size, fish_size and temperature are both continuous real numbers. Include in your response code to simulate some example data and code to show how to do the regression. ")
response

That worked for me. Hopefully it works for you too?

Test that to see if it gives more reliable examples. We have to be a bit careful with choosing examples as these are for formatting, not for modelling. For instance, we don’t want to inadventantly suggest the LLM always recommend a particular statistical model type. ’

3.4.3 Making a simple tool

Tools are algorithms your assistant can use to perform tasks on your computer or a server. They are what turns a chatbot that can only respond in text into an Assistant that can write code that takes actions.

Tip: You might also see the term ‘MCP’ (Model Context Protocol) used in association with LLM assistants. MCP is an interoperable standard for developing tools. This means a tool I write is consistent with other people’s tools. This helps developers and LLMs read and interpret tools. It is also important to allow LLM to use multiple tools at the same time. ’

If the request to use tags works we can then parse the responses and save them as a file, ready to run. First create an R function that can parse responses and search for tags (you can cut and paste my function or try using copilot to make one for you, it’s good at this type of tedious task):

write_response <- function(response, filename = "rscript.R"){
  # Extract the text between <code> and </code> tags using regex
  # Using dotall mode (?s) to match across multiple lines
  code_pattern <- "(?s)<code>(.*?)</code>"
  code_matches <- regmatches(response, gregexpr(code_pattern, response, perl = TRUE))
  
  # Flatten the list and remove the tags
  if(length(code_matches) > 0 && length(code_matches[[1]]) > 0) {
    code_texts <- gsub("<code>|</code>", "", code_matches[[1]])
    
    # Write all extracted code to the file
    writeLines(code_texts, filename)
    cat("Code written to", filename, "\n")
  } else {
    warning("No code tags found in the response")
  }
}

Now we can set-up our R code to write any suggestions to file:

chat_stats <- chat_openrouter(
          system_prompt = stats_bot,
        model = "anthropic/claude-3.7-sonnet",
        api_args = list(max_tokens = 400)
    )

response <- chat_stats$chat("What code can I use to regress temperature against fish_size, fish_size and temperature are both continuous real numbers. Include in your response code to simulate some example data and code to show how to do the regression. ")
write_response(response, filename = "attempt1.R")

Now check the file we just created. If it looks ok then try running it:

#CHECK THE FILE before running!!!!
source("attempt1.R")

The source() command will attempt to run the R code that was created. If it works you should see some plots pop up.

3.4.4 Summary of agents and tools

We’ve implemented a rudimentary tool and the beginnings of an LLM agent. Going forwards we’ll used specialised software for these tasks. You won’t need to code them yourself. But coding this examples is helpful to understand the fundamentals of how they work. To recap, the basic workflow an agent follows is:

  1. Set-up a system prompt with detailed instructions for how the LLM should format responses
  2. User asks a question that is sent to the LLM
  3. LLM responds and sends response back to user
  4. Software on user’s computer attempts to parse and act on the response according to pre-determined rules
  5. User’s computers enacts the commands in the response and provides results to user

Tools like Copilot Agent mode then go a step further and send the results of step 5 back to the LLM, which then interprets the results and the loop continues (sometimes with and sometimes without direct user approval).

If you want to go further with making your own tools, then I suggest you check out ellmer package. It supports tool creation in a structured way. For instance, I made a tool that allows an LLM to download and save ocean data to your computer.

3.5 Reflection on prompting fundamentals

The key things I hoped you learnt from this lesson are:

  • Basic LLM jargon, including tokens, temperature, API access and different LLM models.
  • Some different prompt strategies, including role prompting, emphasis, chain of thought and one-shot.
  • The fundamentals of tool use and agents.

Now you understand the basics, let’s get into Github Copilot.