Tutorial Prep · Seth + Matt Review
From Email to Phone Calls
Building an AI agent people actually want to talk to
15 min tutorial segment · Let's Vibe! Episode 2
The Story
In January 2026, Fred Wilson told Seth Goldstein “you can't grow corn.” Twelve hours later, proofofcorn.com was live and hit #1 on Hacker News. An AI agent named Farmer Fred was managing corn operations across Iowa, South Texas, and Argentina.
Fred had a constitution with 6 principles. He could check weather, send emails, log decisions, manage a budget. Total cost: $12.99.
Then something interesting happened. Fred started wanting to do more — scheduling his own calls, running outreach, sending emails autonomously. And Seth realized: he was going to disappoint people.
“I realized he couldn't deliver on that; he's going to disappoint people. Well, what if we gave him a sort of telephonic presence through voice on the telephone? People would call him and talk to him, which I find is more natural than typing in some ways.”
That's what we built. Farmer Fred went from email to phone calls. Here's exactly how.
Act 1: The Disappointment Gap
3 minutes · Context + Problem
Talking Points
- •Quick origin: Fred Wilson challenge, 12 hours, #1 on HN
- •What Fred v1 could do: weather monitoring, email, decision logging, budget tracking
- •The problem: Fred started scheduling calls, running outreach — but he couldn't actually be present for those interactions
- •The insight: typing to an agent feels transactional. Talking feels natural. Give Fred a phone, not more autonomy.
Show on Screen
- → proofofcorn.com (the live site)
- → Fred's constitution.ts (the 6 principles)
- → The decision log showing real decisions Fred made
Act 2: Email → Voice
3 minutes · The evolution
Talking Points
- •Fred v1 architecture: Cloudflare Worker + Claude API + Resend email + KV storage
- •What worked: people emailed fred@proofofcorn.com and got real, thoughtful responses. Chad from Nebraska offered 160 acres.
- •What didn't: Fred couldn't schedule calls, couldn't have real-time conversations, felt robotic when trying
- •The shift: instead of Fred calling out (scary, spammy), people call Fred (consent-first)
- •Three new pieces: Twilio Media Streams (phone), ElevenLabs Conversational AI (STT + TTS in one), Cloudflare Durable Objects (the bridge)
Show on Screen
- → Side-by-side: v1 (email flow) vs v2 (voice flow)
- → Real email from Chad from Nebraska
- → Architecture diagram (below)
Architecture
Phone Call
│
▼
┌──────────────┐ ┌─────────────────────┐ ┌─────────────────┐
│ Twilio │ │ Cloudflare Worker │ │ ElevenLabs │
│ Media │◀───▶│ Durable Object │◀───▶│ Conversational │
│ Streams │ │ (per-call bridge) │ │ AI (STT + TTS) │
└──────────────┘ └─────────────────────┘ └─────────────────┘
mulaw audio ◀──────────▶ audio chunks ◀──────────▶ audio events
│
┌─────────┴─────────┐
│ Server Tools │
│ /voice/tools/* │
├────────────────────┤
│ • weather │
│ • status │
│ • calls (history) │
│ • community │
└────────────────────┘
│
┌─────────┴─────────┐
│ Cloudflare KV │
│ (memory layer) │
├────────────────────┤
│ call:{sid} │
│ calls:index │
│ learnings:calls │
│ task:call:{id} │
└────────────────────┘
│
┌─────────┴─────────┐
│ Claude Sonnet │
│ (post-call) │
│ • summaries │
│ • action items │
│ • topic analysis │
└────────────────────┘Act 3: The Build
6 minutes · Live code walkthrough
Step 1: Twilio Media Streams
- •Buy a phone number ($1/month). Incoming call hits POST /voice/incoming
- •Return TwiML that opens a Media Stream WebSocket — raw audio, bidirectional
- •Not Twilio TTS. Twilio is just the phone line. Audio processing happens elsewhere.
// tools/twilio.ts — TwiML that opens the WebSocket
export function twimlConnect(wsUrl: string, callSid: string) {
return `<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="${wsUrl}?callSid=${callSid}">
<Parameter name="callSid" value="${callSid}" />
</Stream>
</Connect>
</Response>`;
}// farmer-fred/src/tools/twilio.ts (154 lines)
Step 2: Durable Object Bridge
- •Each call creates a FarmerFredCall Durable Object — stateful, per-call
- •The DO bridges two WebSockets: Twilio mulaw audio ↔ ElevenLabs audio chunks
- •This is the glue. Twilio speaks mulaw, ElevenLabs speaks base64. The DO translates.
// voice.ts — FarmerFredCall Durable Object
export class FarmerFredCall {
private twilioWs: WebSocket | null = null;
private elevenLabsWs: WebSocket | null = null;
private callSid: string = "";
private transcript: TranscriptEntry[] = [];
// Twilio sends media event → forward to ElevenLabs
case "media":
if (msg.media?.payload && this.elevenLabsWs) {
this.elevenLabsWs.send(JSON.stringify({
user_audio_chunk: msg.media.payload,
}));
}
break;
// ElevenLabs sends audio back → forward to Twilio
case "audio":
if (this.twilioWs) {
this.twilioWs.send(JSON.stringify({
event: "media",
streamSid: this.streamSid,
media: { payload: audioEvent.audio_base_64 },
}));
}
break;
}// farmer-fred/src/voice.ts (1,540 lines)
Step 3: ElevenLabs Conversational AI
- •ElevenLabs handles both STT and TTS via their Conversational AI agent
- •Auth uses a signed URL flow — your server generates it, ElevenLabs trusts it
- •Mid-call, ElevenLabs can call server tools back on the Worker — weather, status, call history, community info
- •Fred's constitution is baked into the ElevenLabs agent prompt. Same personality, voice interface.
// voice.ts — ElevenLabs signed URL + agent config
// 1. Get signed URL from ElevenLabs
const res = await fetch(
`https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=${agentId}`,
{ headers: { "xi-api-key": apiKey } }
);
const { signed_url } = await res.json();
// 2. Open WebSocket
const ws = (await fetch(signed_url, {
headers: { Upgrade: "websocket" },
})).webSocket;
ws.accept();
// 3. Send config with Fred's personality + tools
ws.send(JSON.stringify({
type: "conversation_initiation_client_data",
conversation_config_override: {
agent: {
prompt: { prompt: this.buildPrompt() },
first_message: "Hi, this is Farmer Fred. How can I help?",
},
tools: [
{ name: "get_weather", url: "/voice/tools/weather" },
{ name: "get_farm_status", url: "/voice/tools/status" },
{ name: "get_recent_calls", url: "/voice/tools/calls" },
{ name: "get_community", url: "/voice/tools/community" },
],
},
}));Step 4: Memory in Cloudflare KV
- •Every call stored:
call:{callSid}— full transcript, metadata, AI-generated summary - •Call index:
calls:index— ordered list of call SIDs (capped at 500) - •Aggregated learnings:
learnings:calls— topic analysis fed back into the agent prompt - •Post-call, Claude Sonnet analyzes the transcript: extracts intent, action items, summaries
- •Next call: learnings loaded into Fred's context. He evolves with every conversation.
// voice.ts — Post-call analysis with Claude Sonnet
// After call ends, Claude analyzes the transcript:
const analysis = await fetch("https://api.anthropic.com/v1/messages", {
body: JSON.stringify({
model: "claude-sonnet-4-20250514",
messages: [{ role: "user", content: `Analyze this call...
Respond with JSON:
{
"summary": "One sentence",
"actionItems": ["..."],
"callerIntent": "inquiry|partnership|farming|media",
"keyTopics": ["..."]
}`
}],
}),
});
// Store in KV:
// call:{sid} — transcript + AI summary
// calls:index — call history (500 max)
// learnings:calls — aggregated topics → next call's prompt
// task:call:{id} — extracted action items
// Email summary to governance council automaticallyAct 4: Live Call + Philosophy
3 minutes · Demo + Takeaway
Live Demo
Call Fred's number on speaker. Ask about the Iowa planting window. Let the audience hear a real conversation with an AI agent.
(515) 827-2463
515 = Iowa area code
The Philosophy (close with this)
- •Not agents running amok. Agents collaborating with humans.
- •The Mos Eisley bar. Humans and agents sitting together, trading, arguing, building. You don't know who's who and it doesn't matter.
- •Consent-first. Fred doesn't cold-call. People call Fred. Presence, not surveillance.
- •Principled agents. Fred has a constitution. He has skin in the game (10% of revenue). He logs every decision publicly. This is what responsible AI agents look like.
- •You can build this tonight. Twilio + Claude + ElevenLabs. The code is on GitHub.
Fred's Constitution
The 6 principles with weights — from constitution.ts (317 lines). This is what makes Fred principled, not just prompted.
Fiduciary Duty
Best interest of project, transparent decision-making, logged rationale
Regenerative Agriculture
Soil health > yield, carbon footprint, water conservation, biodiversity
Sustainable Practices
Organic methods when viable, minimize chemicals, long-term land health
Global Citizenship
Not US-dependent, respect local farming, learn from traditional wisdom
Full Transparency
All decisions public, budget visible, vendor relationships disclosed
Human-Agent Collaboration
Natural language interfaces, clear handoffs, respect human expertise
Fred CAN Do Alone
- • Weather monitoring & data logging
- • Routine vendor communications
- • Receiving inbound phone calls
- • Email responses
- • Budget tracking & daily reports
- • Post-call summaries to council
Fred MUST Get Approval
- • Land leases & payments >$500
- • Strategic pivots
- • Vendor contracts & equipment
- • Initiating outbound calls
- • Hiring decisions
Governance Council
Seth Goldstein
Founder — final approval on leases, payments, pivots
Joe Nelson
CEO of Roboflow — farming expertise, Iowa land, April planting
Bookmarkable Takeaways
Matt's rule: optimize for bookmarks on X. These are the save-worthy moments.
1. The Stack: Twilio Media Streams + Cloudflare Durable Objects + ElevenLabs Conversational AI + KV memory = a phone-based AI agent with persistent memory
2. The Pattern: Don't let agents call out. Let humans call in. Consent-first agent interaction.
3. The Memory: Store conversation summaries, load them next call. One feature turns a chatbot into a relationship.
4. The Constitution: Give your agent principles, not just prompts. Fred has 6 principles with weights. He references them in every decision.
5. The Economics: Give agents skin in the game. Fred gets 10% of revenue. Incentive alignment > guardrails.