Workshops for Ukraine

Agentic Coding for R

Reliable habits for AI-assisted analysis: planning, writing,
reviewing, and documenting R workflows with AI agents

Charles Crabtree
Senior Lecturer, School of Social Sciences, Monash University
K-Club Professor, University College, Korea University

github.com/lobsterbush/workshops-for-ukraine

About Me

Charles Crabtree

Senior Lecturer, School of Social Sciences, Monash University
Previously: Dartmouth (Government), Stanford (APARC), Tokyo Foundation for Policy Research
Ph.D., Political Science, University of Michigan
Writing papers about LLMs since January 2021 — ~20 months before ChatGPT
Taught AI workshops at Dartmouth, Essex, UNM, IPSA-NUS, Instats, Statistical Horizons

I was a skeptic. I spent years studying these tools critically. AI agents converted me — not chatbots, not autocomplete, but tools that act on your computer, see results, and iterate.

Punch Cards1950s-1970s

Command Line1980s

Graphical Interface1995

Modern Desktop2024

AI AgentNow

The Shift

The new operating system is language

Every era required learning a different interface — punch cards, command lines, graphical desktops.

Now you describe what you want in plain language. The machine writes the R code, runs it, and fixes the errors.

The Tool

Warp — the agentic development environment

It's a terminal with an AI agent built in. The agent can read your files, run your code, see the output, and fix errors — all in one place.

Free to start — 75 credits/month, no credit card
Multi-model — Claude, GPT, Gemini. Switch with one click.
macOS, Windows, Linux — warp.dev

We'll use Warp today, but the patterns work with any agent that can run code on your machine.

Two Ways to Use AI

Vibe coding vs agentic coding

Vibe coding

"Just make it work"
Accept whatever the AI produces
Don't read the code
No conventions, no checks
Hope for the best

Fine for a throwaway prototype. Dangerous for a paper.

Agentic coding

Constrain the agent with your rules
Verify every output
Your conventions, enforced consistently
A second agent audits the first
Paper trail writes itself

Reproducible, verifiable, publishable.

This workshop is about the right column. Not prompt tricks — reliable habits.

Today

What we'll cover

How agents work — from tasks to numbered .R scripts
Inside the code — walkthrough of what the agent produces
Where agents fail — silent errors, wrong assumptions
Adversarial agentic coding — builder + reviewer agents on different models
Hands-on practice — build, break, and fix
Implications

You will leave with downloadable skill templates for a builder agent and a reviewer agent that you can use immediately.

Part 1

How Agents Work

Key Distinction

This isn't the ChatGPT you know

Chatbot

Tells you what to do
One response at a time
You copy-paste everything
No memory between turns

Agent

Does it for you
Multi-step workflows
.R files appear on disk
Sees errors, fixes them

How It Works

The agent loop

Agents observe each result and decide what to do next.

Observe — read your files, check what exists
Plan — decide which script to write next
Execute — write the code and run it
Check — did it work? Parse errors, read output
Repeat — fix what broke, move to the next step

This is why they recover from errors. They see what went wrong and try again — just like you would.

Your Job

Decompose -> Constrain -> Verify

The agent handles execution. You handle direction.

Decompose — break a vague research task into numbered, concrete steps
Constrain — specify your preferred packages, SE type, file format, plot style
Verify — check every output before moving on

The prompt is a contract. The more specific you are, the fewer surprises. Let me show you what that looks like.

R Example — The Prompt

One prompt -> full analysis pipeline

$ "I have a populism survey dataset at
~/data/survey-data/processed.csv
that I haven't touched in years. ~82K rows, multiple
countries, pop_1 through pop_6 items, globalization
attitudes, demographics, loss aversion, immigration
conjoint.

Explore it, model what predicts populist attitudes,
visualize the results, and put everything in
numbered .R scripts."

R Example — What the Agent Produces

From one prompt: a full project directory

Cleaning script — reads raw data, recodes, handles missingness, writes analysis-ready file
Exploration script — distributions, missingness, sample sizes by country
Analysis script — models, extracts coefficients, exports statistics
Figures script — publication-ready plots, exported to PDF and PNG
Statistics file — every number computed, not typed — your paper references it directly
README — replication instructions

Let me open the project directory and show you what's inside each file.

Part 2

Inside the Code

First: Teach the Agent Your Rules

WARP.md — persistent project context

Create this file in your project root. The agent reads it at the start of every session, so you only set conventions once.

What to include

Project description and status
Data sources and locations
R version and key packages
Known issues or limitations

Your rules — whatever you want

Your preferred SE type
Your preferred plot style
Output format (PDF, PNG, both)
Sequential script numbering
How to export statistics

These rules become the conventions file that the builder skill reads. Different users, different conventions, same process.

What to Look For

When I open these files, check for:

Relative paths only — /Users/you/... breaks on every other machine. Relative paths work everywhere.
Dropped rows are logged — the script prints "Raw rows: 82,034 | Clean rows: 71,847 | Dropped: 10,187" so you know exactly what happened
Statistics are computed, not typed — every number flows from code to paper
Scripts have clear headers — purpose, inputs, outputs
Figures match the regression table — same model, same sample, same coefficients

These are the things a reviewer agent checks automatically. Let me show you the code.

The Key Pattern — Never Hardcode a Number

Model object → file on disk → paper

1. Your model lives in R's memory:

m1 <- lm(pop_index ~ age + education
        + globalization, data = df)

# R knows everything about this model:
nobs(m1)                    # 72814
coef(m1)["globalization"]  # 0.34182
m1$std.error["globalization"] # 0.01193

→

2. R extracts numbers and writes a file:

cat(
  sprintf("\\newcommand{\\nObs}{%s}\n",
    format(nobs(m1), big.mark = ",")),
  sprintf("\\newcommand{\\mainCoef}{%.3f}\n",
    coef(m1)["globalization"]),
  file = "output/statistics.tex"
)
# sprintf formats the number
# cat writes it to the file

→

3. Your paper reads the file:

% In your .tex preamble:
\input{output/statistics.tex}

% In your text:
Our sample includes
\nObs{} respondents.
The main effect is
$\beta = \mainCoef{}$.

LaTeX replaces \nObs with 72,814 and \mainCoef with 0.342 when you compile.

Change the data, re-run the R script, recompile the paper — every number updates. No manual transcription, ever. Not using LaTeX? Same pattern works with CSV or JSON.

The Paper Trail

Documentation happens automatically

Session logs — every prompt you typed, every command the agent ran, every output it produced. In Warp: click the session menu (top right) to export as markdown or share a link.
Model tracked — each interaction records which LLM produced it (Sonnet, Opus, GPT-4o). Visible in the session log and in Warp's conversation history.
Git integration — the agent can commit after each step. Commits + session logs = complete provenance: who wrote what, when, and which model helped.

Numbered scripts + computed statistics + session logs + git = a replication package that writes itself.

Part 3

Where Agents Fail

Real Failures

Things I've seen agents do — with full confidence

🔴 Fabricated a citation

Cited "Smith & Jones (2019)" in the literature review. Paper doesn't exist. DOI leads nowhere. Sounded perfectly plausible.

🔴 N doesn't match

Abstract says N = 1,247. Results table says N = 935. Cleaning script silently dropped 312 rows. No mention anywhere.

🔴 Figure doesn't match table

Coefficient plot shows β = 0.34. Regression table says 0.21. Agent re-estimated the model with a different sample for the figure.

🟡 Variables theory never mentioned

Model has 8 predictors. Theory section discusses 4. The other 4 were added by the agent because they "seemed relevant."

Every one of these looks right at a glance. You need a systematic way to catch them.

Part 4 — The Core Idea

Two agents are better than one

One agent builds. A second agent tries to break what the first built.

The builder is confident. The reviewer is adversarial.
Together, they catch what either would miss alone.

The Pattern

Build -> Review -> Fix -> Review

1. Build

Invoke /builder-agent
Produces .R scripts, figures, stats.tex

→

2. Review

Switch model, invoke /reviewer-agent
Runs checks, verifies numbers

→

3. Fix

Builder addresses critical issues
Re-runs pipeline

→

4. Re-review

Reviewer confirms fixes
Produces final report

Strategy

Different models, different blind spots

/builder and /reviewer can run on different models from different providers.

Why cross-provider?

Each model has its own failure modes
Claude may miss what GPT catches (and vice versa)
Disagreement between models is a signal to investigate
Avoids correlated errors from the same training data

Example combinations

Builder: Claude Sonnet 4.5
Reviewer: GPT-4o

Builder: GPT-4o
Reviewer: Claude Opus

Builder: Claude Sonnet 4.5
Reviewer: Gemini 2.5 Pro

In Warp, switch models by clicking the model name in the input bar. One click between builds and reviews.

The Reviewer Agent

What `/reviewer-agent` does

You switch to a different model, invoke the reviewer, and it:

Runs every script from a clean session — do they all succeed?
Checks that N matches — counts rows in the actual data, compares to reported N
Checks figures against tables — are the coefficients the same?
Searches for hardcoded numbers — are statistics computed or typed?
Writes a review report — Critical / Warning / Note, with specific fixes

Let me show you what this looks like live.

Live Demo

Builder produces analysis, reviewer breaks it

/builder-agent
"I have a YouGov survey dataset in this folder
with an SPSS file, a codebook PDF, and CSV exports.

Explore the data, propose a research question,
model it, visualize it, and put everything in
numbered .R scripts."

Part 5

Hands-On Practice

Exercises

Four exercises — pick your level

1. Just Talk to It

No skills, no setup. Just type.

"Tell me about the files in this folder."

"I have survey data at
~/data/survey-data/processed.csv.
What's in it? How many rows? What countries?"

"Put together a plan for analyzing whether
economic insecurity predicts populist attitudes."

2. Use the Builder

Same question, structured workflow.

/builder-agent
"Does economic insecurity predict populist
attitudes? Use ~/data/survey-data/processed.csv
(~82K rows).
• DV: populism index (pop_1.n through pop_6.n)
• Key IVs: loss aversion (loss_1.n-loss_6.n)
• Controls: age, education (ed_postsec), country
• Explore first, then model, then visualize"

3. Break It

Switch model, audit the builder's work.

Switch to a different model, then:

/reviewer-agent
"Audit everything the builder just produced."

Read review_report.md. What did it find?

4. Build Something You Can Use

Don't read the code. Use it.

"Build a Twitter/X feed interface for a survey
experiment on [YOUR TOPIC].
• 8 posts with treatment variation
• Track likes, retweets, timestamps
• Embed in Qualtrics via iframe
• Single HTML file, no dependencies"

Open it. Like a post. Scroll. Does it work?

Have your own data? Try any of these with your own project.

14:00

Share what you built — and what broke

Failures are the most useful thing you can share.

Summary

Three things

Constrain the agent. Your conventions in WARP.md. Numbered scripts. No absolute paths. Statistics computed, not typed.
Review adversarially. A second agent — on a different model — catches what the first misses.
Document by default. Session logs, git commits, computed statistics. The paper trail writes itself.

Part 6

Implications

Equilibrium

The new balance

Costs collapse ↓

Data cleaning — parsing, merging, recoding
Analysis — models, robustness checks, figures
Documentation — READMEs, session logs, LaTeX
Verification — adversarial review, stress tests

Value rises ↑

Ideas — original questions, creative design
Taste — knowing what's worth doing
Fieldwork — being there, in person
Judgment — knowing when the agent is wrong

Discussion

Questions for the room

What R tasks would benefit most from a builder+reviewer pattern?
Where should human judgment remain non-negotiable?
How do we disclose and document AI assistance in papers?
What happens to methods training when agents can run the models?

12:00

How to Install and Invoke in Warp

Three steps

1. Download the skill files from the workshop repo into your project:

my-project/
|-- .warp/skills/
|   |-- builder-agent/
|   |   `-- SKILL.md
|   `-- reviewer-agent/
|       `-- SKILL.md
|-- data/
`-- WARP.md

Also works: .agents/skills/, .claude/skills/, .cursor/skills/
Global (all projects): ~/.warp/skills/

2. Type / in Warp's input bar to invoke:

/builder-agent Analyze the populism
  data at data/raw/processed.csv...

Warp auto-discovers skills and shows them in the / menu. You can also just describe your task — the agent finds the right skill.

3. Click the model selector, switch model, invoke reviewer:

# Model selector -> GPT-4o
/reviewer-agent Audit everything
  the builder just produced.

Warp reads the SKILL.md automatically when you invoke it. The agent follows the full procedure and checklist — you never repeat instructions.

Downloadable Skills

Two skills, ready to use

Builder Skill

Plans numbered .R script pipeline
Reads your conventions from WARP.md
Explores data before modeling
Exports computed statistics (never hardcoded)
Self-checks before finishing

Reviewer Skill

Runs scripts from a clean session
Checks N matches, figures match tables
Searches for hardcoded numbers and absolute paths
Can re-estimate independently in another language
Produces severity-rated review report

Also includes a conventions-example.md you can customize with your own package and style preferences.

Thank you

Charles Crabtree
Senior Lecturer, School of Social Sciences, Monash University
K-Club Professor, University College, Korea University
charles.crabtree@monash.edu · charlescrabtree.org

Resources

warp.dev — Download Warp
lobsterbush.github.io/workshops-for-ukraine — Live slides
Builder skill — download from GitHub
Reviewer skill — download from GitHub

Agentic Coding for R

Charles Crabtree

The new operating system is language

Warp — the agentic development environment

Vibe coding vs agentic coding

Vibe coding

Agentic coding

What we'll cover

How Agents Work

This isn't the ChatGPT you know

Chatbot

Agent

The agent loop

Decompose -> Constrain -> Verify

One prompt -> full analysis pipeline

From one prompt: a full project directory

Inside the Code

WARP.md — persistent project context

What to include

Your rules — whatever you want

When I open these files, check for:

Model object → file on disk → paper

Documentation happens automatically

Where Agents Fail

Things I've seen agents do — with full confidence

🔴 Fabricated a citation

🔴 N doesn't match

🔴 Figure doesn't match table

🟡 Variables theory never mentioned

Two agents are better than one

Build -> Review -> Fix -> Review

1. Build

2. Review

3. Fix

4. Re-review

Different models, different blind spots

Why cross-provider?

Example combinations

What /reviewer-agent does

Builder produces analysis, reviewer breaks it

Hands-On Practice

Four exercises — pick your level

1. Just Talk to It

2. Use the Builder

3. Break It

4. Build Something You Can Use

Share what you built — and what broke

Three things

Implications

The new balance

Costs collapse ↓

Value rises ↑

Questions for the room

Three steps

Two skills, ready to use

Builder Skill

Reviewer Skill

Thank you

What `/reviewer-agent` does