Writing R code with the help of LLMs

Simon Couch - Posit, PBC

Open Source Group, R / LLMs

Today, we’re talking about the LHS⬅️

Part 1: Have a chat

Meet ellmer!🐘

install.packages("ellmer")

Part 1: Have a chat

These are the same:

library(ellmer)

ch <- chat_github(
  model = "gpt-4o"
)

ch$chat("hey!")
#> "Hey there!😊 What can I 
#>  help you with today?"

Part 1: Have a chat

Your turn! Create a chat object and say “hey!”

chat_github() might “just work”
If not, set up chat_anthropic() using instructions linked below

03:00

Part 2: The system prompt

An “invisible message” at the start of your chat
Use it to influence behavior, give knowledge, define output format, etc

ch <- chat_anthropic(
  system_prompt = 
    "Try and tie any response back to the Openscapes organization."
)

ch$chat("What's 2+2?")

Part 2: The system prompt

Your turn: adjust the system prompt to the model so that, when supplied a question like “What’s 2+2?”, the model returns only the answer as a word (rather than a digit), no punctuation or exposition.

ch$chat("What's 2+2?")
#> four

03:00

Part 2: The system prompt

You can get a lot of mileage out of the system prompt:

ch <- chat_anthropic(
  system_prompt = paste0(readLines(
    "https://raw.githubusercontent.com/simonpcouch/chores/refs/heads/main/inst/prompts/roxygen-prefix.md"
  ), collapse = "\n")
)

Part 2: The system prompt

What if that wasn’t super difficult to interface with?

Part 2: The system prompt

install.packages("chores")

Attach system prompts to a key command in your IDE
Select some code, press the command, and watch code stream in
Supports common R package development actions by default

Intermission: tokens

OpenAI and Anthropic have two main ways they make money from their models:

Subscription plans (like chatgpt.com)
API usage (like from ellmer)

Intermission: tokens

API usage is “pay-as-you-go” by token:

Words, parts of words, or individual characters
- “hello” → 1 token
- “unconventional” → 3 tokens: un|con|ventional

Intermission: tokens

Here’s the pricing per million tokens for some common models:

Name	Input	Output
GPT 4o	$3.75	$15.00
GPT 4o-mini	$0.15	$0.60
Claude 4 Sonnet	$3.00	$15.00

Intermission: tokens

To put that into context, the source code for these slides so far is 650 tokens.

If I input them to GPT 4o:

\[ 650 \text{ tokens} \times \frac{\$3.75 }{1,000,000~\text{tokens}} = \$0.00244 \]

Part 3: Hallucination and code assistance

library(ggplot2)
library(modeldata)
stackoverflow
#> # A tibble: 5,594 × 21
#>    Country        Salary YearsCodedJob OpenSource Hobby CompanySizeNumber Remote
#>    <fct>           <dbl>         <int>      <dbl> <dbl>             <dbl> <fct> 
#>  1 United Kingdom 1   e5            20          0     1              5000 Remote
#>  2 United States  1.3 e5            20          1     1              1000 Remote
#>  3 United States  1.75e5            16          0     1             10000 Not r…
#> # ℹ 5,590 more rows
#> # ℹ 14 more variables: CareerSatisfaction <int>, Data_scientist <dbl>, …

If I type “plot salary vs experience”, what information does the model need access to complete that request?

Part 3: Hallucination and code assistance

If I type “plot salary vs experience”, what information does the model need access to?

The name of the relevant data frame
My preferred plotting library
The names of the relevant columns

The first two can be inferred from the source code, but the third requires access to your R session.

Part 3: Hallucination and code assistance

install.packages("gander")

An R coding assistant written in R
Automatically incorporates relevant context from your R sessions

Appendix A: Providers

A “provider” is a service that hosts models on an API.

In ellmer, each provider has its own chat_*() function, like chat_github() or chat_anthropic()

chat_github() serves some popular models, like GPT-4o, for “free”
- “free” in the sense of “we’re going to use all of the data you send us”
- heavily rate-limited; you’ll need to pay for even modest usage

Appendix A: Providers

chat_openai()
- traditionally, more consumer-focused
- weaker privacy guarantees

Appendix A: Providers

chat_anthropic() serves Claude Sonnet
- traditionally more developer/enterprised-focused
- stronger privacy guarantees
- subsidizes credits via Claude for Education

Appendix A: Providers

You can be your own “provider”, too:

chat_ollama() uses a model that runs on your laptop
- much less powerful than the Big Ones
- “free” in the usual sense

Appendix A: Providers

Many organizations have private deployments of models set up for internal, secure use. ellmer supports the common ones.

Ask around to see if this is the case at NOAA/NASA!

Learn more

github.com/simonpcouch/openscapes-25