4 Calling LLMs via an Application Programming Interface

Calling LLMs programmatically via an API is powerful programming tool. First of all, it lets us better control the LLM’s responses than chat interface software (like chatGPT or copilot). Chat software manages our interactions, including adding additional context (potentially from our chat history). So it is harder to be sure exactly how the LLM is being prompted. With access via API we control exactly what goes to the LLM.

Accessing with R or Python code also gives us the opportunity to automate LLM calls, which can be used to create Agents or automate document reading, as we will see later. Finally, it can give us access to special features that LLMs have which may not be available in some chat interfaces.

This section will give you a deep understanding of how chatbots work and teach you some advanced prompting skills. By understanding these technical foundations, you’ll be better equipped to leverage LLMs effectively for your R programming and data analysis tasks, and to troubleshoot when you’re not getting the results you expect.

Software requirements: VScode with R or Rstudio. API key, ideally to openrouter. R (best with ellmer) or Python.

4.1 What is an Application Programming Interface?

An Application Programming Interface (API) lets different software programs communicate with each other.

Large Language Models often require high powered computers to run. It is possible to have one on your computer (called local hosting), but here we will look at using LLMs hosted with ‘Providers’, these are servers remote to our computer.

An API allows you to send text prompts to an LLM provider (like OpenAI’s GPT models or Anthropic’s Claude) over the internet and receive the model’s response back to your code. This is different from using a chat interface like ChatGPT’s website, where you’re interacting through a user interface designed for humans.

In fact, locally hosted LLMs also use an API to manage requests from your computer to the LLM software.

How API calls work: 1. Your code sends a request to the API endpoint (a web address) 2. The request includes your prompt, model choice, and parameters 3. The LLM service processes your request 4. The service sends back a structured response containing the model’s output 5. Your code can then use this response in further analysis or processing

Most LLM APIs use a similar structure - you provide messages (like system prompts and user questions), specify which model to use, and set parameters to control the response. The API then returns the model’s generated text along with metadata about the request.

4.2 Setup authorisation

Get your API key, see Chapter 3 for how to get an API key and connect Ellmer to an LLM provider.

If you are using Python, save your API key in a .env file in your project directory, like this:

`OPENROUTER_API_KEY=my-key-here

4.3 Understanding how LLMs work

Large Language Models (LLMs) operate by predicting the next token in a sequence, one token at a time. To understand how this works in practice, we’ll call an API directly through computer code.

By accessing the API in this way we get as close to the raw LLM as we are able. Later on we will use ‘coding assistants’ (e.g. copilot) which put another layer of software between you and the LLM.

4.3.1 Calling LLMs via an API

Try to get it to complete a sentence:

library(ellmer)

# Initialize a chat with Claude
chat <- chat_openrouter(
  system_prompt = "You are a helpful assistant.",
  model = "anthropic/claude-3.5-haiku",
  api_args = list(max_tokens = 50)
)
chat$chat("Ecologists like to eat ")

library(jsonlite)
library(httr)
openrouter_api_key <- Sys.getenv("OPENROUTER_API_KEY")
response <- POST(
    url = "https://openrouter.ai/api/v1/chat/completions",
    add_headers(
      "Content-Type" = "application/json",
      "Authorization" = paste("Bearer", openrouter_api_key)
    ),
    body = toJSON(list(
      model = "anthropic/claude-3.5-haiku",
      messages = list(
        list(
          role = "system",
          content = "You are a helpful assistant."
        ),  
        list(
          role = "user",
          content = "Ecologists like to eat "
        )
      )
    ), auto_unbox = TRUE),
    encode = "raw"
  )
r3 <- fromJSON(content(response, "text"))
r3$choices$message$content

import requests
import json
import os
from dotenv import load_dotenv
load_dotenv()

api_key = os.getenv("OPENROUTER_API_KEY")
model = "anthropic/claude-3.5-haiku"
message = "Ecologists like to eat "

response = requests.post(
  url="https://openrouter.ai/api/v1/chat/completions",
  headers={
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": model,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant.",
      },
      {
        "role": "user",
        "content": message,
      }
    ]
  })
)
data = response.json()
content = data['choices'][0]['message']['content']
content

Notice that the model doesn’t do what we intend, which is complete the sentence. LLMs have a built in command to be an assistant. Let’s use the ‘system prompt’ to provide it with strong directions.

Tip: The system prompt sets the overall context for a chat. It is meant to be a stronger directive than the user prompt. In most chat interfaces (e.g. copilot) you are interacting with the user prompt. The provider has provided the system prompt, here’s the system prompt for the chat interface version of anthropic (Claude)

chat <- chat_openrouter(
  system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides",
  model = "anthropic/claude-3.5-haiku",
  api_args = list(max_tokens = 50)
)
chat$chat("Ecologists like to eat ")

library(jsonlite)
library(httr)
openrouter_api_key <- Sys.getenv("OPENROUTER_API_KEY")
response <- POST(
    url = "https://openrouter.ai/api/v1/chat/completions",
    add_headers(
      "Content-Type" = "application/json",
      "Authorization" = paste("Bearer", openrouter_api_key)
    ),
    body = toJSON(list(
      model = "anthropic/claude-3.5-haiku",
      messages = list(
        list(
          role = "system",
          content = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides"
        ),  
        list(
          role = "user",
          content = "Ecologists like to eat "
        )
      ),
      max_tokens = 50
    ), auto_unbox = TRUE),
    encode = "raw"
  )
r3 <- fromJSON(content(response, "text"))
r3$choices$message$content

import requests
import json
import os
from dotenv import load_dotenv
load_dotenv()

api_key = os.getenv("OPENROUTER_API_KEY")
model = "anthropic/claude-3.5-haiku"
message = "Ecologists like to eat "
system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides"

response = requests.post(
  url="https://openrouter.ai/api/v1/chat/completions",
  headers={
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": model,
    "messages": [
      {
        "role": "system",
        "content": system_prompt,
      },
      {
        "role": "user",
        "content": message,
      },
    "max_tokens": 50
    ]
  })
)
data = response.json()
content = data['choices'][0]['message']['content']

Tip: It is generally more effective to tell the LLM what to do rather than what not to do (just like people!).

4.3.2 Controlling randomness of the responess

The “temperature” parameter controls the randomness of token predictions. Lower temperatures (closer to 0) make the model more deterministic, while higher temperatures (closer to 2) make it more creative and unpredictable.

Even at 0 there is still a small amount of randomness. We can also try to make a response more deterministic with the ‘top_k’ parameter. Top K forces the model to choose among a smaller set of tokens, setting it to 1 will give the most predictable results

Let’s compare responses with different temperatures and top K settings:

# Create chats with different temperature settings
chat_temp <- chat_openrouter(
          system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides",
        model = "anthropic/claude-3.5-haiku",
        api_args = list(max_tokens = 50, temperature = 0, top_k = 1)
    )

chat_temp$chat("Marine ecologists like to eat ")

library(jsonlite)
library(httr)
openrouter_api_key <- Sys.getenv("OPENROUTER_API_KEY")
response <- POST(
    url = "https://openrouter.ai/api/v1/chat/completions",
    add_headers(
      "Content-Type" = "application/json",
      "Authorization" = paste("Bearer", openrouter_api_key)
    ),
    body = toJSON(list(
      model = "anthropic/claude-3.5-haiku",
      messages = list(
        list(
          role = "system",
          content = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides"
        ),  
        list(
          role = "user",
          content = "Marine ecologists like to eat "
        )
      ),
      max_tokens = 50,
      temperature = 0,
      top_k = 1
    ), auto_unbox = TRUE),
    encode = "raw"
  )
r3 <- fromJSON(content(response, "text"))
r3$choices$message$content

import requests
import json
import os
from dotenv import load_dotenv
load_dotenv()

api_key = os.getenv("OPENROUTER_API_KEY")
model = "anthropic/claude-3.5-haiku"
message = "Marine ecologists like to eat "
system_prompt = "Complete the sentences the user providers you. Continue from where the user left off. Provide one answer only. Don't provide any explanation, don't reiterate the text the user provides"

response = requests.post(
  url="https://openrouter.ai/api/v1/chat/completions",
  headers={
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": model,
    "messages": [
      {
        "role": "system",
        "content": system_prompt,
      },
      {
        "role": "user",
        "content": message,
      }
    ],
    "max_tokens": 50,
    "temperature": 0,
    "top_k": 1
  })
)
data = response.json()
content = data['choices'][0]['message']['content']
content

Now try again but set the temperature parameter higher, say to 2 and remove the top_k line, or set it to a high number like 1000

At low temperatures and top K values, you’ll notice the model consistently produces similar “safe” completions that focus on the most probable next tokens. As temperature increases, the responses become more varied and potentially more creative, but possibly less coherent.

There are quite a few different parameters you can play with to control how an LLM responds. The names and meanings of these are all documented in the Openrouter docs.

4.3.3 Navigating Openrouter

I created a lot of the code for this tutorial by browsing the Openrouter page to see what is possible.

There are two key pages.

The Models page lists all the models available. If you have some idea of what you want you can filter this list. Click the page icon to copy the string for the model id. You can use this to replace the string in model = "anthropic/claude-3.5-haiku" if you want to use a different model.

Click the model’s name to get more information about it.

A single model may be available through multiple providers (servers), these will also be listed here. Openrouter automatically picks the provider for you. Their docs tell you how to fix the provider if you want to default to a specific choice (e.g. if you are concerned about some providers for data security of ethical reasons, more on this later).

Click a provider then the drop-down arrow to see what parameter choices are available for this model (note not all providers support all parameters for a model).

The second important page is the docs page. This has all the information on using the openrouter API. There are lost of code examples, usually in Python. You can use an AI assistant to translate these to R if you are using R.

For example, there is a long list of parameters that various LLMs support. If you want to try any of these just add them using the same syntax as we did for the temperature parameter. Note that not all LLMs support all parameters. To find out what parameters a model supports go to the model’s provider page and expand a specific provider to see a list of Supported Parameters.

4.3.3.1 API errors.

Sometimes you won’t get a response instead you’ll get an HTTP error. After you send your response you may get an answer something like Error: 402.

All the error codes are listed here. Some common errors codes:

400 There’s an error in your code.

401 API key is invalid (did you copy and paste it correctly? Does it need “” around it?)

402 A 402 means you’ve run out of API credits and need to put more money on your openrouter account against that API key.

408 Time out error, your request might be too big for the speed of your internet connection.

429 Wow you are really a power user! You’ve been rate limited, meaning you’re calling the LLM too much, too quickly. Just wait a bit and try again (or add pauses in your code to slow it down if you are doing automated repeat requests).

4.3.4 Comparing model complexity

Different models have different capabilities based on their size, training data, and architecture.

For example anthropic/claude-3.5-haiku has many fewer parameters than anthropic/claude-sonnet-4. This means that the latter model is more complex and can handle more nuanced tasks. However, haiku is significantly cheaper to run. Haiku is 80c per million input tokens vs $3 for Sonnet. Output tokens are $4 vs $15

Task for you: use the Openrouter models page to find a different model to try.

Hint try some of the Chinese models like Kimi instead of the American models.

4.3.5 Understanding context windows

LLMs have a limited “context window” - the amount of text they can consider when generating a response. This affects their ability to maintain coherence over long conversations. For most LLMs this is about 100-200K tokens, which includes input and output. However, Google’s models have up to 1 million + tokens.

We’ll come back to the context window when we explore more advanced tools with longer prompts. These simple prompts don’t come close to using up the context window.

4.4 DIY stats bot

Let’s put together what we’ve learnt so far and built our own chatbot. I’ve provided you with a detailed system prompt that implements a chat bot that specialises in helping with statistics. First, we read the bot markdown file from github, then we can use it in our chat session.

stats_bot <- readr::read_file(url("https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md"))

chat_stats <- chat_openrouter(
  system_prompt = stats_bot,
  model = "anthropic/claude-sonnet-4",
  api_args = list(max_tokens = 5000)
)

chat$chat("Who are you?")

library(jsonlite)
library(httr)
openrouter_api_key <- Sys.getenv("OPENROUTER_API_KEY")
stats_bot <- readLines("https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md")
response <- POST(
    url = "https://openrouter.ai/api/v1/chat/completions",
    add_headers(
      "Content-Type" = "application/json",
      "Authorization" = paste("Bearer", openrouter_api_key)
    ),
    body = toJSON(list(
      model = "anthropic/claude-sonnet-4",
      messages = list(
        list(
          role = "system",
          content = paste(stats_bot, collapse="\n")
        ),
        list(
          role = "user",
          content = "Who are you?"
        )
      ),
      max_tokens = 5000
    ), auto_unbox = TRUE),
    encode = "raw"
  )
r3 <- fromJSON(content(response, "text"))
r3$choices$message$content

import requests
import json
import os
from dotenv import load_dotenv
load_dotenv()

api_key = os.getenv("OPENROUTER_API_KEY")
model = "anthropic/claude-sonnet-4"
import requests
stats_bot = requests.get("https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md").text

response = requests.post(
  url="https://openrouter.ai/api/v1/chat/completions",
  headers={
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": model,
    "messages": [
      {
        "role": "system",
        "content": stats_bot,
      },
      {
        "role": "user",
        "content": "Who are you?",
      }
    ],
    "max_tokens": 5000
  })
)
data = response.json()
content = data['choices'][0]['message']['content']

Try asking it some different statistical questions and note how it responds.

Tip: How many of you started using “DIY-stats-bot-system.md” without first reading it? Did you find the easter egg in my prompt? For security you should ALWAYS read prompts before you start running them through LLM chats. We’ll see later that LLMs can be given ‘tools’ which allow them to run code on your computer. Its easy to see how a malicious prompt could mis-use these tools. We’ll cover security later.

4.4.1 Multi-turn conversation

You might want to have a conversation with your chat bot to help you problem solve. This is easy with Ellmer and a bit fiddly with python or base R. I’m only going to cover Ellmer here.

stats_bot <- readr::read_file(url("https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md"))

chat_stats <- chat_openrouter(
  system_prompt = stats_bot,
  model = "anthropic/claude-sonnet-4",
  api_args = list(max_tokens = 5000)
)

live_console(chat_stats)
# live_browser(chat_stats)

live_console let’s you chat in the terminal. live_browser opens up a browser window that is hosted locally on your computer. It feels the most like chatGPT.

I suggest using ellmer for multi-turn conversations. If Ellmer doesn’t work, don’t worry we’ll do more on this with github copilot in a moment.

There are some packages available to help with multi-turn conversation. For example the chat-cli. These take some time to set-up so we won’t go over them here, read the docs for more information.

Tip: LLMs performance is better if you put everything in an initial single questions, rather than a multi-turn conversation. Responses to a problem split into multiple questions are less accurate than responses to problem that is put up front in the initial prompt. This rule applies across many domains from logic to coding. I like to think of it as mansplaining, the LLMs have (sub-concioulsy?) been designed to prefer being mansplained over having a conversation. We’ll look more at this later.

4.4.2 Improving the stats bot

Make a local copy of the stats bot system prompt and try editing it. You can use your coding skills to write the stats bot to file, or just go here and copy the text: https://raw.githubusercontent.com/cbrown5/AI-assistants-for-scientific-coding/refs/heads/main/resources/DIY-stats-bot-system.md.

Try different commands within it and see how your chat bot responds (you’ll have to open a new chat object each time).

Here’s some ideas.

Try making a chat bot that is a verhment Bayesian that abhors frequentist statistics.
You could provide it with more mode-specific instructions. For instance, try to get the chatbot to suggest appropriate figures for verifying statistical models.
Try different temperatures.
Add your own easter egg.

Tip: Adjectives, CAPITALS, *markdown* formatting can all help create emphasis so that your model more closely follows your commands. I used ‘abhors’ and ‘verhment’ above on purpose.

4.4.3 Tools

Tools like Copilot Agent mode then go a step further and send the results of step 5 back to the LLM, which then interprets the results and the loop continues (sometimes with and sometimes without direct user approval).

If you want to go further with making your own tools, then I suggest you check out ellmer package. It supports tool creation in a structured way. For instance, I made a tool that allows an LLM to download and save ocean data to your computer.

4.5 Reflection on prompting fundamentals

To recap, the basic workflow an agent follows is:

Set-up a system prompt with detailed instructions for how the LLM should format responses
User asks a question that is sent to the LLM
LLM responds and sends response back to user
Software on user’s computer attempts to parse and act on the response according to pre-determined rules
User’s computers enacts the commands in the response and provides results to user

The key things I hoped you learnt from this lesson are: Basic LLM jargon, including tokens, temperature, API access and different LLM models.

Now you understand the basics, let’s get into Github Copilot.