Writing R code with the help of LLMs

Hex wall








Simon Couch - Posit, PBC

Open Source Group, R / LLMs

Today, we’re talking about the LHS⬅️

Part 1: Have a chat

Part 1: Have a chat ellmer website

Meet ellmer!🐘


install.packages("ellmer")

Part 1: Have a chat

These are the same:


library(ellmer)

ch <- chat_github(
  model = "gpt-4o"
)

ch$chat("hey!")
#> "Hey there!😊 What can I 
#>  help you with today?"

Part 1: Have a chat

Your turn! Create a chat object and say “hey!”

  • chat_github() might “just work”
  • If not, set up chat_anthropic() using instructions linked below
03:00

Part 2: The system prompt

Part 2: The system prompt

  • An “invisible message” at the start of your chat
  • Use it to influence behavior, give knowledge, define output format, etc
ch <- chat_anthropic(
  system_prompt = 
    "Try and tie any response back to the Openscapes organization."
)

ch$chat("What's 2+2?")

Part 2: The system prompt

Your turn: adjust the system prompt to the model so that, when supplied a question like “What’s 2+2?”, the model returns only the answer as a word (rather than a digit), no punctuation or exposition.


ch$chat("What's 2+2?")
#> four
03:00

Part 2: The system prompt

You can get a lot of mileage out of the system prompt:

ch <- chat_anthropic(
  system_prompt = paste0(readLines(
    "https://raw.githubusercontent.com/simonpcouch/chores/refs/heads/main/inst/prompts/roxygen-prefix.md"
  ), collapse = "\n")
)

Part 2: The system prompt

What if that wasn’t super difficult to interface with?

Part 2: The system prompt chores website


install.packages("chores")
  • Attach system prompts to a key command in your IDE
  • Select some code, press the command, and watch code stream in
  • Supports common R package development actions by default

Intermission: tokens

Intermission: tokens

OpenAI and Anthropic have two main ways they make money from their models:

  1. Subscription plans (like chatgpt.com)
  2. API usage (like from ellmer)

Intermission: tokens

API usage is “pay-as-you-go” by token:

  • Words, parts of words, or individual characters
    • “hello” → 1 token
    • “unconventional” → 3 tokens: un|con|ventional

Intermission: tokens

Here’s the pricing per million tokens for some common models:


Name Input Output
GPT 4o $3.75 $15.00
GPT 4o-mini $0.15 $0.60
Claude 4 Sonnet $3.00 $15.00

Intermission: tokens

To put that into context, the source code for these slides so far is 650 tokens.

If I input them to GPT 4o:

\[ 650 \text{ tokens} \times \frac{\$3.75 }{1,000,000~\text{tokens}} = \$0.00244 \]

Part 3: Hallucination and code assistance

Part 3: Hallucination and code assistance

library(ggplot2)
library(modeldata)
stackoverflow
#> # A tibble: 5,594 × 21
#>    Country        Salary YearsCodedJob OpenSource Hobby CompanySizeNumber Remote
#>    <fct>           <dbl>         <int>      <dbl> <dbl>             <dbl> <fct> 
#>  1 United Kingdom 1   e5            20          0     1              5000 Remote
#>  2 United States  1.3 e5            20          1     1              1000 Remote
#>  3 United States  1.75e5            16          0     1             10000 Not r…
#> # ℹ 5,590 more rows
#> # ℹ 14 more variables: CareerSatisfaction <int>, Data_scientist <dbl>, …

If I type “plot salary vs experience”, what information does the model need access to complete that request?

Part 3: Hallucination and code assistance

If I type “plot salary vs experience”, what information does the model need access to?

  • The name of the relevant data frame
  • My preferred plotting library
  • The names of the relevant columns

The first two can be inferred from the source code, but the third requires access to your R session.

Part 3: Hallucination and code assistance gander website


install.packages("gander")
  • An R coding assistant written in R
  • Automatically incorporates relevant context from your R sessions

Appendix A: Providers

Appendix A: Providers

A “provider” is a service that hosts models on an API.

In ellmer, each provider has its own chat_*() function, like chat_github() or chat_anthropic()

  • chat_github() serves some popular models, like GPT-4o, for “free”
    • “free” in the sense of “we’re going to use all of the data you send us”
    • heavily rate-limited; you’ll need to pay for even modest usage

Appendix A: Providers

  • chat_openai()
    • traditionally, more consumer-focused
    • weaker privacy guarantees

Appendix A: Providers

  • chat_anthropic() serves Claude Sonnet
    • traditionally more developer/enterprised-focused
    • stronger privacy guarantees
    • subsidizes credits via Claude for Education

Appendix A: Providers

You can be your own “provider”, too:

  • chat_ollama() uses a model that runs on your laptop
    • much less powerful than the Big Ones
    • “free” in the usual sense

Appendix A: Providers

Many organizations have private deployments of models set up for internal, secure use. ellmer supports the common ones.

Ask around to see if this is the case at NOAA/NASA!

Learn more


github.com/simonpcouch/openscapes-25


Hex wall