library(ellmer)
# Initialize a chat with Claude
<- chat_openrouter(
chat system_prompt = "You are a helpful assistant.",
model = "anthropic/claude-3.5-haiku",
api_args = list(max_tokens = 50)
)$chat("Ecologists like to eat ") chat
4 Calling LLMs via an Application Programming Interface
Calling LLMs programmatically via an API is powerful programming tool. First of all, it lets us better control the LLM’s responses than chat interface software (like chatGPT or copilot). Chat software manages our interactions, including adding additional context (potentially from our chat history). So it is harder to be sure exactly how the LLM is being prompted. With access via API we control exactly what goes to the LLM.
Accessing with R or Python code also gives us the opportunity to automate LLM calls, which can be used to create Agents or automate document reading, as we will see later. Finally, it can give us access to special features that LLMs have which may not be available in some chat interfaces.
This section will give you a deep understanding of how chatbots work and teach you some advanced prompting skills. By understanding these technical foundations, you’ll be better equipped to leverage LLMs effectively for your R programming and data analysis tasks, and to troubleshoot when you’re not getting the results you expect.
Software requirements: VScode with R or Rstudio. API key, ideally to openrouter. R (best with ellmer) or Python.
4.1 What is an Application Programming Interface?
An Application Programming Interface (API) lets different software programs communicate with each other.
Large Language Models often require high powered computers to run. It is possible to have one on your computer (called local hosting), but here we will look at using LLMs hosted with ‘Providers’, these are servers remote to our computer.
An API allows you to send text prompts to an LLM provider (like OpenAI’s GPT models or Anthropic’s Claude) over the internet and receive the model’s response back to your code. This is different from using a chat interface like ChatGPT’s website, where you’re interacting through a user interface designed for humans.
In fact, locally hosted LLMs also use an API to manage requests from your computer to the LLM software.
How API calls work: 1. Your code sends a request to the API endpoint (a web address) 2. The request includes your prompt, model choice, and parameters 3. The LLM service processes your request 4. The service sends back a structured response containing the model’s output 5. Your code can then use this response in further analysis or processing
Most LLM APIs use a similar structure - you provide messages (like system prompts and user questions), specify which model to use, and set parameters to control the response. The API then returns the model’s generated text along with metadata about the request.
4.3 Understanding how LLMs work
Large Language Models (LLMs) operate by predicting the next token in a sequence, one token at a time. To understand how this works in practice, we’ll call an API directly through computer code.
By accessing the API in this way we get as close to the raw LLM as we are able. Later on we will use ‘coding assistants’ (e.g. copilot) which put another layer of software between you and the LLM.
4.3.1 Calling LLMs via an API
Try to get it to complete a sentence:
library(jsonlite)
library(httr)
<- Sys.getenv("OPENROUTER_API_KEY")
openrouter_api_key <- POST(
response url = "https://openrouter.ai/api/v1/chat/completions",
add_headers(
"Content-Type" = "application/json",
"Authorization" = paste("Bearer", openrouter_api_key)
),body = toJSON(list(
model = "anthropic/claude-3.5-haiku",
messages = list(
list(
role = "system",
content = "You are a helpful assistant."
), list(
role = "user",
content = "Ecologists like to eat "
)
)auto_unbox = TRUE),
), encode = "raw"
)<- fromJSON(content(response, "text"))
r3 $choices$message$content r3
import requests
import json
import os
from dotenv import load_dotenv
load_dotenv()
= os.getenv("OPENROUTER_API_KEY")
api_key = "anthropic/claude-3.5-haiku"
model = "Ecologists like to eat "
message
= requests.post(
response ="https://openrouter.ai/api/v1/chat/completions",
url={
headers"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},=json.dumps({
data"model": model,
"messages": [
{"role": "system",
"content": "You are a helpful assistant.",
},
{"role": "user",
"content": message,
}
]
})
)= response.json()
data = data['choices'][0]['message']['content']
content content
Notice that the model doesn’t do what we intend, which is complete the sentence. LLMs have a built in command to be an assistant. Let’s use the ‘system prompt’ to provide it with strong directions.
Tip: The system prompt sets the overall context for a chat. It is meant to be a stronger directive than the user prompt. In most chat interfaces (e.g. copilot) you are interacting with the user prompt. The provider has provided the system prompt, here’s the system prompt for the chat interface version of anthropic (Claude)
<- chat_openrouter(
chat system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides",
model = "anthropic/claude-3.5-haiku",
api_args = list(max_tokens = 50)
)$chat("Ecologists like to eat ") chat
library(jsonlite)
library(httr)
<- Sys.getenv("OPENROUTER_API_KEY")
openrouter_api_key <- POST(
response url = "https://openrouter.ai/api/v1/chat/completions",
add_headers(
"Content-Type" = "application/json",
"Authorization" = paste("Bearer", openrouter_api_key)
),body = toJSON(list(
model = "anthropic/claude-3.5-haiku",
messages = list(
list(
role = "system",
content = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides"
), list(
role = "user",
content = "Ecologists like to eat "
)
),max_tokens = 50
auto_unbox = TRUE),
), encode = "raw"
)<- fromJSON(content(response, "text"))
r3 $choices$message$content r3
import requests
import json
import os
from dotenv import load_dotenv
load_dotenv()
= os.getenv("OPENROUTER_API_KEY")
api_key = "anthropic/claude-3.5-haiku"
model = "Ecologists like to eat "
message = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides"
system_prompt
= requests.post(
response ="https://openrouter.ai/api/v1/chat/completions",
url={
headers"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},=json.dumps({
data"model": model,
"messages": [
{"role": "system",
"content": system_prompt,
},
{"role": "user",
"content": message,
},"max_tokens": 50
]
})
)= response.json()
data = data['choices'][0]['message']['content'] content
Tip: It is generally more effective to tell the LLM what to do rather than what not to do (just like people!).
4.3.2 Controlling randomness of the responess
The “temperature” parameter controls the randomness of token predictions. Lower temperatures (closer to 0) make the model more deterministic, while higher temperatures (closer to 2) make it more creative and unpredictable.
Even at 0 there is still a small amount of randomness. We can also try to make a response more deterministic with the ‘top_k’ parameter. Top K forces the model to choose among a smaller set of tokens, setting it to 1 will give the most predictable results
Let’s compare responses with different temperatures and top K settings:
# Create chats with different temperature settings
<- chat_openrouter(
chat_temp system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides",
model = "anthropic/claude-3.5-haiku",
api_args = list(max_tokens = 50, temperature = 0, top_k = 1)
)
$chat("Marine ecologists like to eat ") chat_temp
library(jsonlite)
library(httr)
<- Sys.getenv("OPENROUTER_API_KEY")
openrouter_api_key <- POST(
response url = "https://openrouter.ai/api/v1/chat/completions",
add_headers(
"Content-Type" = "application/json",
"Authorization" = paste("Bearer", openrouter_api_key)
),body = toJSON(list(
model = "anthropic/claude-3.5-haiku",
messages = list(
list(
role = "system",
content = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides"
), list(
role = "user",
content = "Marine ecologists like to eat "
)
),max_tokens = 50,
temperature = 0,
top_k = 1
auto_unbox = TRUE),
), encode = "raw"
)<- fromJSON(content(response, "text"))
r3 $choices$message$content r3
import requests
import json
import os
from dotenv import load_dotenv
load_dotenv()
= os.getenv("OPENROUTER_API_KEY")
api_key = "anthropic/claude-3.5-haiku"
model = "Marine ecologists like to eat "
message = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides"
system_prompt
= requests.post(
response ="https://openrouter.ai/api/v1/chat/completions",
url={
headers"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},=json.dumps({
data"model": model,
"messages": [
{"role": "system",
"content": system_prompt,
},
{"role": "user",
"content": message,
}
],"max_tokens": 50,
"temperature": 0,
"top_k": 1
})
)= response.json()
data = data['choices'][0]['message']['content']
content content
Now try again but set the temperature parameter higher, say to 2 and remove the top_k line, or set it to a high number like 1000
At low temperatures and top K values, you’ll notice the model consistently produces similar “safe” completions that focus on the most probable next tokens. As temperature increases, the responses become more varied and potentially more creative, but possibly less coherent.
There are quite a few different parameters you can play with to control how an LLM responds. The names and meanings of these are all documented in the Openrouter docs.
4.3.4 Comparing model complexity
Different models have different capabilities based on their size, training data, and architecture.
For example anthropic/claude-3.5-haiku
has many fewer parameters than anthropic/claude-sonnet-4
. This means that the latter model is more complex and can handle more nuanced tasks. However, haiku is significantly cheaper to run. Haiku is 80c per million input tokens vs $3 for Sonnet. Output tokens are $4 vs $15
Task for you: use the Openrouter models page to find a different model to try.
Hint try some of the Chinese models like Kimi instead of the American models.
4.3.5 Understanding context windows
LLMs have a limited “context window” - the amount of text they can consider when generating a response. This affects their ability to maintain coherence over long conversations. For most LLMs this is about 100-200K tokens, which includes input and output. However, Google’s models have up to 1 million + tokens.
We’ll come back to the context window when we explore more advanced tools with longer prompts. These simple prompts don’t come close to using up the context window.
4.4 DIY stats bot
Let’s put together what we’ve learnt so far and built our own chatbot. I’ve provided you with a detailed system prompt that implements a chat bot that specialises in helping with statistics. First, we read the bot markdown file from github, then we can use it in our chat session.
<- readr::read_file(url("https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md"))
stats_bot
<- chat_openrouter(
chat_stats system_prompt = stats_bot,
model = "anthropic/claude-sonnet-4",
api_args = list(max_tokens = 5000)
)
$chat("Who are you?") chat
library(jsonlite)
library(httr)
<- Sys.getenv("OPENROUTER_API_KEY")
openrouter_api_key <- readLines("https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md")
stats_bot <- POST(
response url = "https://openrouter.ai/api/v1/chat/completions",
add_headers(
"Content-Type" = "application/json",
"Authorization" = paste("Bearer", openrouter_api_key)
),body = toJSON(list(
model = "anthropic/claude-sonnet-4",
messages = list(
list(
role = "system",
content = paste(stats_bot, collapse="\n")
),list(
role = "user",
content = "Who are you?"
)
),max_tokens = 5000
auto_unbox = TRUE),
), encode = "raw"
)<- fromJSON(content(response, "text"))
r3 $choices$message$content r3
import requests
import json
import os
from dotenv import load_dotenv
load_dotenv()
= os.getenv("OPENROUTER_API_KEY")
api_key = "anthropic/claude-sonnet-4"
model import requests
= requests.get("https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md").text
stats_bot
= requests.post(
response ="https://openrouter.ai/api/v1/chat/completions",
url={
headers"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},=json.dumps({
data"model": model,
"messages": [
{"role": "system",
"content": stats_bot,
},
{"role": "user",
"content": "Who are you?",
}
],"max_tokens": 5000
})
)= response.json()
data = data['choices'][0]['message']['content'] content
Try asking it some different statistical questions and note how it responds.
Tip: How many of you started using “DIY-stats-bot-system.md” without first reading it? Did you find the easter egg in my prompt? For security you should ALWAYS read prompts before you start running them through LLM chats. We’ll see later that LLMs can be given ‘tools’ which allow them to run code on your computer. Its easy to see how a malicious prompt could mis-use these tools. We’ll cover security later.
4.4.1 Multi-turn conversation
You might want to have a conversation with your chat bot to help you problem solve. This is easy with Ellmer and a bit fiddly with python or base R. I’m only going to cover Ellmer here.
<- readr::read_file(url("https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md"))
stats_bot
<- chat_openrouter(
chat_stats system_prompt = stats_bot,
model = "anthropic/claude-sonnet-4",
api_args = list(max_tokens = 5000)
)
live_console(chat_stats)
# live_browser(chat_stats)
live_console
let’s you chat in the terminal. live_browser
opens up a browser window that is hosted locally on your computer. It feels the most like chatGPT.
I suggest using ellmer for multi-turn conversations. If Ellmer doesn’t work, don’t worry we’ll do more on this with github copilot in a moment.
There are some packages available to help with multi-turn conversation. For example the chat-cli. These take some time to set-up so we won’t go over them here, read the docs for more information.
Tip: LLMs performance is better if you put everything in an initial single questions, rather than a multi-turn conversation. Responses to a problem split into multiple questions are less accurate than responses to problem that is put up front in the initial prompt. This rule applies across many domains from logic to coding. I like to think of it as mansplaining, the LLMs have (sub-concioulsy?) been designed to prefer being mansplained over having a conversation. We’ll look more at this later.
4.4.2 Improving the stats bot
Make a local copy of the stats bot system prompt and try editing it. You can use your coding skills to write the stats bot to file, or just go here and copy the text: https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md.
Try different commands within it and see how your chat bot responds (you’ll have to open a new chat object each time).
Here’s some ideas.
- Try making a chat bot that is a verhment Bayesian that abhors frequentist statistics.
- You could provide it with more mode-specific instructions. For instance, try to get the chatbot to suggest appropriate figures for verifying statistical models.
- Try different temperatures.
- Add your own easter egg.
Tip: Adjectives, CAPITALS, *markdown*
formatting can all help create emphasis so that your model more closely follows your commands. I used ‘abhors’ and ‘verhment’ above on purpose.
4.4.3 Tools
Tools like Copilot Agent mode then go a step further and send the results of step 5 back to the LLM, which then interprets the results and the loop continues (sometimes with and sometimes without direct user approval).
If you want to go further with making your own tools, then I suggest you check out ellmer
package. It supports tool creation in a structured way. For instance, I made a tool that allows an LLM to download and save ocean data to your computer.
4.5 Reflection on prompting fundamentals
To recap, the basic workflow an agent follows is:
- Set-up a system prompt with detailed instructions for how the LLM should format responses
- User asks a question that is sent to the LLM
- LLM responds and sends response back to user
- Software on user’s computer attempts to parse and act on the response according to pre-determined rules
- User’s computers enacts the commands in the response and provides results to user
The key things I hoped you learnt from this lesson are: Basic LLM jargon, including tokens, temperature, API access and different LLM models.
Now you understand the basics, let’s get into Github Copilot.