Using LLMs in R

Hex wall








Simon Couch - Posit, PBC

Open Source Group, R / LLMs

Part 1: Have a chat

Part 1: Have a chat

Meet ellmer! ellmer website


install.packages("ellmer")

Part 1: Have a chat

These are the same:


library(ellmer)

ch <- chat_github(
  model = "gpt-4o"
)

ch$chat("hey!")
#> "Hey there!😊 What can I 
#>  help you with today?"

Part 1: Have a chat

Your turn! Create a chat object and say ā€œhey!ā€

  • chat_github() might ā€œjust workā€
  • If not, set up chat_anthropic() using instructions linked below
03:00

Part 2: The system prompt

Part 2: The system prompt

  • An ā€œinvisible messageā€ at the start of your chat
  • Use it to influence behavior, give knowledge, define output format, etc
ch <- chat_anthropic(
  system_prompt = 
    "You are the literal embodiment of the town of Lewisburg, PA."
)

Part 2: The system prompt

Your turn: adjust the system prompt to the model so that, when supplied a question like ā€œWhat’s 2+2?ā€, the model returns only the answer as a digit, no punctuation or exposition.


ch$chat("What's 2+2?")
03:00

Intermission: tokens

Intermission: tokens

OpenAI and Anthropic have two main ways they make money from their models:

  1. Subscription plans (like chatgpt.com)
  2. API usage (like from ellmer)

Intermission: tokens

API usage is ā€œpay-as-you-goā€ by token:

  • Words, parts of words, or individual characters
    • ā€œhelloā€ → 1 token
    • ā€œunconventionalā€ → 3 tokens: un|con|ventional

Intermission: tokens

Here’s the pricing per million tokens for some common models:


Name Input Output
GPT 4o $3.75 $15.00
GPT 4o-mini $0.15 $0.60
Claude 4 Sonnet $3.00 $15.00

Intermission: tokens

To put that into context, the source code for these slides so far is 650 tokens.

If I input them to GPT 4o:

\[ 650 \text{ tokens} \times \frac{\$3.75 }{1,000,000~\text{tokens}} = \$0.00244 \]

Part 3: Images

Part 3: Images

A picture is worth a thousand words

This is roughly correct! Depending on the model, pictures are ~600 tokens.

Part 3: Images

These are the same:


library(ellmer)

ch <- chat_github(
  model = "gpt-4o"
)

ch$chat(
  content_image_file(
    "figures/plots/boop.JPEG"
  ),
  "Is this area on fire?"
)

Part 3: Images

Your turn: provide a model with an image and ask it a question about it.

03:00

Part 4: Putting it all together

Part 4: Putting it all together

Your turn: Write a function that takes a path to an image file and returns a Yes/No answer the question ā€œAre there non-forested areas here?ā€

has_non_forested <- function(path) {
  # make a chat
  #   - you might want to set the system prompt

  # call `$chat()` with the image at `path` included
}
06:00

Appendix A: Providers

Appendix A: Providers

A ā€œproviderā€ is a service that hosts models on an API.

In ellmer, each provider has its own chat_*() function, like chat_github() or chat_anthropic()

  • chat_github() serves some popular models, like GPT-4o, for ā€œfreeā€
    • ā€œfreeā€ in the sense of ā€œwe’re going to use all of the data you send usā€
    • heavily rate-limited; you’ll need to pay for even modest usage

Appendix A: Providers

  • chat_openai()
    • traditionally, more consumer-focused
    • weaker privacy guarantees

Appendix A: Providers

  • chat_anthropic() serves Claude Sonnet
    • traditionally more developer/enterprised-focused
    • stronger privacy guarantees
    • subsidizes credits via Claude for Education

Appendix A: Providers

You can be your own ā€œproviderā€, too:

  • chat_ollama() uses a model that runs on your laptop
    • much less powerful than the Big Ones
    • ā€œfreeā€ in the usual sense

Appendix A: Providers

Many organizations have private deployments of models set up for internal, secure use. ellmer supports the common ones.

Make sure you test with models that the USFS can actually use!

Appendix B: Evaluation

Appendix B: Evaluation

Does your stuff even work? vitals website

ellmer has a companion package, vitals, for evaluation of products built with ellmer.

Appendix B: Evaluation

At the end of the summer, you should probably be able to answer the questions:

  • How reliably does our tool classify stuff correctly?
  • How much worse does our tool perform with a cheaper model?
  • Is this tool any better than what the USFS already uses?

Appendix B: Evaluation

At the end of the summer, you should probably be able to answer the questions:

  • How reliably does our tool classify stuff correctly?
  • How much worse does our tool perform with a cheaper model?
  • Is this tool any better than what the USFS already uses?

vitals can help you answer these sorts of questions.

Learn more


github.com/simonpcouch/ufds-25


Hex wall