Part 1
Workshops for Ukraine
Reliable habits for AI-assisted analysis: planning, writing,
reviewing, and documenting R workflows with AI agents
Charles Crabtree
Senior Lecturer, School of Social Sciences, Monash University
K-Club Professor, University College, Korea University
About Me
I was a skeptic. I spent years studying these tools critically. AI agents converted me — not chatbots, not autocomplete, but tools that act on your computer, see results, and iterate.
The Shift
Every era required learning a different interface — punch cards, command lines, graphical desktops.
Now you describe what you want in plain language. The machine writes the R code, runs it, and fixes the errors.
The Tool
It's a terminal with an AI agent built in. The agent can read your files, run your code, see the output, and fix errors — all in one place.
We'll use Warp today, but the patterns work with any agent that can run code on your machine.
Two Ways to Use AI
Fine for a throwaway prototype. Dangerous for a paper.
Reproducible, verifiable, publishable.
This workshop is about the right column. Not prompt tricks — reliable habits.
Today
You will leave with downloadable skill templates for a builder agent and a reviewer agent that you can use immediately.
Part 1
Key Distinction
How It Works
Agents observe each result and decide what to do next.
This is why they recover from errors. They see what went wrong and try again — just like you would.
Your Job
The agent handles execution. You handle direction.
The prompt is a contract. The more specific you are, the fewer surprises. Let me show you what that looks like.
R Example — The Prompt
$ "I have a populism survey dataset at ~/data/survey-data/processed.csv that I haven't touched in years. ~82K rows, multiple countries, pop_1 through pop_6 items, globalization attitudes, demographics, loss aversion, immigration conjoint. Explore it, model what predicts populist attitudes, visualize the results, and put everything in numbered .R scripts."
R Example — What the Agent Produces
Let me open the project directory and show you what's inside each file.
Part 2
First: Teach the Agent Your Rules
Create this file in your project root. The agent reads it at the start of every session, so you only set conventions once.
These rules become the conventions file that the builder skill reads. Different users, different conventions, same process.
What to Look For
/Users/you/... breaks on every other machine. Relative paths work everywhere.These are the things a reviewer agent checks automatically. Let me show you the code.
The Key Pattern — Never Hardcode a Number
1. Your model lives in R's memory:
m1 <- lm(pop_index ~ age + education + globalization, data = df) # R knows everything about this model: nobs(m1) # 72814 coef(m1)["globalization"] # 0.34182 m1$std.error["globalization"] # 0.01193
2. R extracts numbers and writes a file:
cat( sprintf("\\newcommand{\\nObs}{%s}\n", format(nobs(m1), big.mark = ",")), sprintf("\\newcommand{\\mainCoef}{%.3f}\n", coef(m1)["globalization"]), file = "output/statistics.tex" ) # sprintf formats the number # cat writes it to the file
3. Your paper reads the file:
% In your .tex preamble: \input{output/statistics.tex} % In your text: Our sample includes \nObs{} respondents. The main effect is $\beta = \mainCoef{}$.
LaTeX replaces \nObs with 72,814 and \mainCoef with 0.342 when you compile.
Change the data, re-run the R script, recompile the paper — every number updates. No manual transcription, ever. Not using LaTeX? Same pattern works with CSV or JSON.
The Paper Trail
Numbered scripts + computed statistics + session logs + git = a replication package that writes itself.
Part 3
Real Failures
Cited "Smith & Jones (2019)" in the literature review. Paper doesn't exist. DOI leads nowhere. Sounded perfectly plausible.
Abstract says N = 1,247. Results table says N = 935. Cleaning script silently dropped 312 rows. No mention anywhere.
Coefficient plot shows β = 0.34. Regression table says 0.21. Agent re-estimated the model with a different sample for the figure.
Model has 8 predictors. Theory section discusses 4. The other 4 were added by the agent because they "seemed relevant."
Every one of these looks right at a glance. You need a systematic way to catch them.
Part 4 — The Core Idea
One agent builds. A second agent tries to break what the first built.
The builder is confident. The reviewer is adversarial.
Together, they catch what either would miss alone.
The Pattern
Invoke /builder-agent
Produces .R scripts, figures, stats.tex
Switch model, invoke /reviewer-agent
Runs checks, verifies numbers
Builder addresses critical issues
Re-runs pipeline
Reviewer confirms fixes
Produces final report
Strategy
/builder and /reviewer can run on different models from different providers.
Builder: Claude Sonnet 4.5
Reviewer: GPT-4o
Builder: GPT-4o
Reviewer: Claude Opus
Builder: Claude Sonnet 4.5
Reviewer: Gemini 2.5 Pro
In Warp, switch models by clicking the model name in the input bar. One click between builds and reviews.
The Reviewer Agent
/reviewer-agent doesYou switch to a different model, invoke the reviewer, and it:
Let me show you what this looks like live.
Live Demo
/builder-agent "I have a YouGov survey dataset in this folder with an SPSS file, a codebook PDF, and CSV exports. Explore the data, propose a research question, model it, visualize it, and put everything in numbered .R scripts."
Part 5
Exercises
No skills, no setup. Just type.
"Tell me about the files in this folder."
"I have survey data at
~/data/survey-data/processed.csv.
What's in it? How many rows? What countries?"
"Put together a plan for analyzing whether
economic insecurity predicts populist attitudes."
Same question, structured workflow.
/builder-agent "Does economic insecurity predict populist attitudes? Use ~/data/survey-data/processed.csv (~82K rows). • DV: populism index (pop_1.n through pop_6.n) • Key IVs: loss aversion (loss_1.n-loss_6.n) • Controls: age, education (ed_postsec), country • Explore first, then model, then visualize"
Switch model, audit the builder's work.
Switch to a different model, then: /reviewer-agent "Audit everything the builder just produced." Read review_report.md. What did it find?
Don't read the code. Use it.
"Build a Twitter/X feed interface for a survey experiment on [YOUR TOPIC]. • 8 posts with treatment variation • Track likes, retweets, timestamps • Embed in Qualtrics via iframe • Single HTML file, no dependencies" Open it. Like a post. Scroll. Does it work?
Have your own data? Try any of these with your own project.
14:00
Failures are the most useful thing you can share.
Summary
Part 6
Equilibrium
Discussion
12:00
How to Install and Invoke in Warp
1. Download the skill files from the workshop repo into your project:
my-project/
|-- .warp/skills/
| |-- builder-agent/
| | `-- SKILL.md
| `-- reviewer-agent/
| `-- SKILL.md
|-- data/
`-- WARP.md
Also works: .agents/skills/, .claude/skills/, .cursor/skills/
Global (all projects): ~/.warp/skills/
2. Type / in Warp's input bar to invoke:
/builder-agent Analyze the populism data at data/raw/processed.csv...
Warp auto-discovers skills and shows them in the / menu. You can also just describe your task — the agent finds the right skill.
3. Click the model selector, switch model, invoke reviewer:
# Model selector -> GPT-4o /reviewer-agent Audit everything the builder just produced.
Warp reads the SKILL.md automatically when you invoke it. The agent follows the full procedure and checklist — you never repeat instructions.
Downloadable Skills
Also includes a conventions-example.md you can customize with your own package and style preferences.
Charles Crabtree
Senior Lecturer, School of Social Sciences, Monash University
K-Club Professor, University College, Korea University
charles.crabtree@monash.edu · charlescrabtree.org
Resources