Tutorial Prep · Seth + Matt Review

From Email to Phone Calls

Building an AI agent people actually want to talk to

15 min tutorial segment · Let's Vibe! Episode 2

The Story

In January 2026, Fred Wilson told Seth Goldstein “you can't grow corn.” Twelve hours later, proofofcorn.com was live and hit #1 on Hacker News. An AI agent named Farmer Fred was managing corn operations across Iowa, South Texas, and Argentina.

Fred had a constitution with 6 principles. He could check weather, send emails, log decisions, manage a budget. Total cost: $12.99.

Then something interesting happened. Fred started wanting to do more — scheduling his own calls, running outreach, sending emails autonomously. And Seth realized: he was going to disappoint people.

“I realized he couldn't deliver on that; he's going to disappoint people. Well, what if we gave him a sort of telephonic presence through voice on the telephone? People would call him and talk to him, which I find is more natural than typing in some ways.”

That's what we built. Farmer Fred went from email to phone calls. Here's exactly how.

00:00

Act 1: The Disappointment Gap

3 minutes · Context + Problem

Talking Points

  • Quick origin: Fred Wilson challenge, 12 hours, #1 on HN
  • What Fred v1 could do: weather monitoring, email, decision logging, budget tracking
  • The problem: Fred started scheduling calls, running outreach — but he couldn't actually be present for those interactions
  • The insight: typing to an agent feels transactional. Talking feels natural. Give Fred a phone, not more autonomy.

Show on Screen

  • → proofofcorn.com (the live site)
  • → Fred's constitution.ts (the 6 principles)
  • → The decision log showing real decisions Fred made
03:00

Act 2: Email → Voice

3 minutes · The evolution

Talking Points

  • Fred v1 architecture: Cloudflare Worker + Claude API + Resend email + KV storage
  • What worked: people emailed fred@proofofcorn.com and got real, thoughtful responses. Chad from Nebraska offered 160 acres.
  • What didn't: Fred couldn't schedule calls, couldn't have real-time conversations, felt robotic when trying
  • The shift: instead of Fred calling out (scary, spammy), people call Fred (consent-first)
  • Three new pieces: Twilio Media Streams (phone), ElevenLabs Conversational AI (STT + TTS in one), Cloudflare Durable Objects (the bridge)

Show on Screen

  • → Side-by-side: v1 (email flow) vs v2 (voice flow)
  • → Real email from Chad from Nebraska
  • → Architecture diagram (below)

Architecture

Phone Call
    │
    ▼
┌──────────────┐     ┌─────────────────────┐     ┌─────────────────┐
│   Twilio     │     │  Cloudflare Worker   │     │  ElevenLabs     │
│  Media       │◀───▶│  Durable Object      │◀───▶│  Conversational │
│  Streams     │     │  (per-call bridge)   │     │  AI (STT + TTS) │
└──────────────┘     └─────────────────────┘     └─────────────────┘
  mulaw audio ◀──────────▶ audio chunks ◀──────────▶ audio events
                              │
                    ┌─────────┴─────────┐
                    │   Server Tools     │
                    │  /voice/tools/*    │
                    ├────────────────────┤
                    │ • weather          │
                    │ • status           │
                    │ • calls (history)  │
                    │ • community        │
                    └────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │  Cloudflare KV     │
                    │  (memory layer)    │
                    ├────────────────────┤
                    │ call:{sid}         │
                    │ calls:index        │
                    │ learnings:calls    │
                    │ task:call:{id}     │
                    └────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │  Claude Sonnet     │
                    │  (post-call)       │
                    │  • summaries       │
                    │  • action items    │
                    │  • topic analysis  │
                    └────────────────────┘
06:00

Act 3: The Build

6 minutes · Live code walkthrough

Step 1: Twilio Media Streams

  • Buy a phone number ($1/month). Incoming call hits POST /voice/incoming
  • Return TwiML that opens a Media Stream WebSocket — raw audio, bidirectional
  • Not Twilio TTS. Twilio is just the phone line. Audio processing happens elsewhere.

// tools/twilio.ts — TwiML that opens the WebSocket

export function twimlConnect(wsUrl: string, callSid: string) {
  return `<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Connect>
    <Stream url="${wsUrl}?callSid=${callSid}">
      <Parameter name="callSid" value="${callSid}" />
    </Stream>
  </Connect>
</Response>`;
}

// farmer-fred/src/tools/twilio.ts (154 lines)

Step 2: Durable Object Bridge

  • Each call creates a FarmerFredCall Durable Object — stateful, per-call
  • The DO bridges two WebSockets: Twilio mulaw audio ↔ ElevenLabs audio chunks
  • This is the glue. Twilio speaks mulaw, ElevenLabs speaks base64. The DO translates.

// voice.ts — FarmerFredCall Durable Object

export class FarmerFredCall {
  private twilioWs: WebSocket | null = null;
  private elevenLabsWs: WebSocket | null = null;
  private callSid: string = "";
  private transcript: TranscriptEntry[] = [];

  // Twilio sends media event → forward to ElevenLabs
  case "media":
    if (msg.media?.payload && this.elevenLabsWs) {
      this.elevenLabsWs.send(JSON.stringify({
        user_audio_chunk: msg.media.payload,
      }));
    }
    break;

  // ElevenLabs sends audio back → forward to Twilio
  case "audio":
    if (this.twilioWs) {
      this.twilioWs.send(JSON.stringify({
        event: "media",
        streamSid: this.streamSid,
        media: { payload: audioEvent.audio_base_64 },
      }));
    }
    break;
}

// farmer-fred/src/voice.ts (1,540 lines)

Step 3: ElevenLabs Conversational AI

  • ElevenLabs handles both STT and TTS via their Conversational AI agent
  • Auth uses a signed URL flow — your server generates it, ElevenLabs trusts it
  • Mid-call, ElevenLabs can call server tools back on the Worker — weather, status, call history, community info
  • Fred's constitution is baked into the ElevenLabs agent prompt. Same personality, voice interface.

// voice.ts — ElevenLabs signed URL + agent config

// 1. Get signed URL from ElevenLabs
const res = await fetch(
  `https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=${agentId}`,
  { headers: { "xi-api-key": apiKey } }
);
const { signed_url } = await res.json();

// 2. Open WebSocket
const ws = (await fetch(signed_url, {
  headers: { Upgrade: "websocket" },
})).webSocket;
ws.accept();

// 3. Send config with Fred's personality + tools
ws.send(JSON.stringify({
  type: "conversation_initiation_client_data",
  conversation_config_override: {
    agent: {
      prompt: { prompt: this.buildPrompt() },
      first_message: "Hi, this is Farmer Fred. How can I help?",
    },
    tools: [
      { name: "get_weather", url: "/voice/tools/weather" },
      { name: "get_farm_status", url: "/voice/tools/status" },
      { name: "get_recent_calls", url: "/voice/tools/calls" },
      { name: "get_community", url: "/voice/tools/community" },
    ],
  },
}));

Step 4: Memory in Cloudflare KV

  • Every call stored: call:{callSid} — full transcript, metadata, AI-generated summary
  • Call index: calls:index — ordered list of call SIDs (capped at 500)
  • Aggregated learnings: learnings:calls — topic analysis fed back into the agent prompt
  • Post-call, Claude Sonnet analyzes the transcript: extracts intent, action items, summaries
  • Next call: learnings loaded into Fred's context. He evolves with every conversation.

// voice.ts — Post-call analysis with Claude Sonnet

// After call ends, Claude analyzes the transcript:
const analysis = await fetch("https://api.anthropic.com/v1/messages", {
  body: JSON.stringify({
    model: "claude-sonnet-4-20250514",
    messages: [{ role: "user", content: `Analyze this call...
      Respond with JSON:
      {
        "summary": "One sentence",
        "actionItems": ["..."],
        "callerIntent": "inquiry|partnership|farming|media",
        "keyTopics": ["..."]
      }`
    }],
  }),
});

// Store in KV:
// call:{sid}         — transcript + AI summary
// calls:index        — call history (500 max)
// learnings:calls    — aggregated topics → next call's prompt
// task:call:{id}     — extracted action items

// Email summary to governance council automatically
12:00

Act 4: Live Call + Philosophy

3 minutes · Demo + Takeaway

Live Demo

Call Fred's number on speaker. Ask about the Iowa planting window. Let the audience hear a real conversation with an AI agent.

(515) 827-2463

515 = Iowa area code

The Philosophy (close with this)

  • Not agents running amok. Agents collaborating with humans.
  • The Mos Eisley bar. Humans and agents sitting together, trading, arguing, building. You don't know who's who and it doesn't matter.
  • Consent-first. Fred doesn't cold-call. People call Fred. Presence, not surveillance.
  • Principled agents. Fred has a constitution. He has skin in the game (10% of revenue). He logs every decision publicly. This is what responsible AI agents look like.
  • You can build this tonight. Twilio + Claude + ElevenLabs. The code is on GitHub.
REFERENCE

Fred's Constitution

The 6 principles with weights — from constitution.ts (317 lines). This is what makes Fred principled, not just prompted.

1.0

Fiduciary Duty

Best interest of project, transparent decision-making, logged rationale

0.9

Regenerative Agriculture

Soil health > yield, carbon footprint, water conservation, biodiversity

0.8

Sustainable Practices

Organic methods when viable, minimize chemicals, long-term land health

0.7

Global Citizenship

Not US-dependent, respect local farming, learn from traditional wisdom

1.0

Full Transparency

All decisions public, budget visible, vendor relationships disclosed

0.8

Human-Agent Collaboration

Natural language interfaces, clear handoffs, respect human expertise

Fred CAN Do Alone

  • • Weather monitoring & data logging
  • • Routine vendor communications
  • • Receiving inbound phone calls
  • • Email responses
  • • Budget tracking & daily reports
  • • Post-call summaries to council

Fred MUST Get Approval

  • • Land leases & payments >$500
  • • Strategic pivots
  • • Vendor contracts & equipment
  • • Initiating outbound calls
  • • Hiring decisions

Governance Council

Seth Goldstein

Founder — final approval on leases, payments, pivots

Joe Nelson

CEO of Roboflow — farming expertise, Iowa land, April planting

Bookmarkable Takeaways

Matt's rule: optimize for bookmarks on X. These are the save-worthy moments.

1. The Stack: Twilio Media Streams + Cloudflare Durable Objects + ElevenLabs Conversational AI + KV memory = a phone-based AI agent with persistent memory

2. The Pattern: Don't let agents call out. Let humans call in. Consent-first agent interaction.

3. The Memory: Store conversation summaries, load them next call. One feature turns a chatbot into a relationship.

4. The Constitution: Give your agent principles, not just prompts. Fred has 6 principles with weights. He references them in every decision.

5. The Economics: Give agents skin in the game. Fred gets 10% of revenue. Incentive alignment > guardrails.

Open Items

Architecture confirmed: Twilio Media Streams ↔ CF Durable Object ↔ ElevenLabs Conversational AI
Memory layer confirmed: Cloudflare KV with post-call Claude Sonnet analysis
Code: farmer-fred/src/voice.ts (1,540 lines), tools/twilio.ts (154 lines), constitution.ts (317 lines)
Fred's phone number confirmed: (515) 827-2463 (Iowa area code)
Tutorial assigned to Episode 2 (Seth interviews Ian, Feb 6)
?Decide with Matt: standalone tutorial or integrated into Ian conversation?
?Verify Fred is answering calls — test before recording day