Agents, MCP, and what a 19-hour build changed about how I think

Posted by & filed under .

Last week I shipped an agentic scheduler into Calendrz. Nineteen hours of wall-clock time between the first commit on the agent loop and the feature going live in production, default-enabled for paying users. I want to unpack how the architecture is put together, and why that timeline stopped surprising me around the fifth time this year it happened.

The shape of what shipped

The Smart Scheduler is a natural-language calendar agent. Users type something like “find a 30-minute slot with Alice tomorrow afternoon” and the agent replies with a concrete proposal grounded in the user’s actual calendars. One-word confirmation writes the event.

The architecture is deliberately split into two pieces that share a tool vocabulary:

  • A first-party agent living inside the Calendrz Spring Boot service, exposed at /api/v1/smart-scheduler/invoke and a chat SPA at /scheduler. Backed by the Anthropic Java SDK (2.17.0) talking to Haiku 4.5.
  • An MCP server that exposes 30 tools across three service classes, reachable from Claude.ai, Claude Desktop, or any MCP-compatible client. The MCP server is powered by Spring AI’s @Tool annotations and MethodToolCallbackProvider auto-discovery, not a hand-rolled implementation.

The scheduler agent itself has access to 13 tools: list_calendars, find_free_slots, create_event, update_event, propose_new_time, find_calendrz_user_by_name, and the rest. Critically, those 13 tools are byte-identical in their response shapes to their MCP counterparts. That’s not accidental, more on it below.

The agent loop: simpler than you think

There’s an industry habit of reaching for “agent frameworks” the moment tool use enters the picture. I deliberately didn’t. The scheduler agent is a manual for loop (capped) around Messages.create(), inspecting StopReason, executing any returned tool_use blocks, appending the tool_result blocks, and looping until END_TURN.

It fits in a single file. It’s trivial to reason about. It’s trivial to instrument. When something behaves weirdly — and agents always eventually behave weirdly — I can read the loop top-to-bottom and tell you exactly where the weirdness entered. A framework abstraction would have bought me nothing except a debugging surface.

The one thing I did invest in properly was prompt caching. Two cache breakpoints: one on the system prompt block, one on the last tool definition. For an agent that runs multiple iterations per invocation (tool call ? tool result ? next call), cache read tokens dominate the bill. With caching the system prompt and tool list are re-read at a fraction of the price from iteration two onward. For a feature that shipped on Haiku with a deliberately generous system prompt, this is the difference between an economically viable product and a non-starter.

Agent and MCP, not agent or MCP

The decision I’d most defend is building both. Here’s the framing that helped me think about it clearly:

  • MCP is a toolbox. It exposes fine-grained, composable primitives. Someone else’s AI assistant (yours, if you’re using Claude Desktop for example) gets to decide which tools to combine and in what order. It’s assistant-agnostic and it’s honest about being a low-level interface.
  • The in-app agent is a finished tool. Purpose-built, narrowly scoped to scheduling, with its own system prompt, its own conversation memory, and a UI that knows what “propose a time” means.

These are different products with different users. The power user with Claude Desktop or ChatGPT wants the toolbox. The user who opens Calendrz in the morning wants the finished tool. Giving them the same tool surface but two front doors is why the scheduler’s 13 internal tools mirror the MCP’s shapes so carefully: a bug fix in one is a bug fix in the other, and the mental model transfers.

Worth saying explicitly: the MCP server also exposes smart_scheduler_invoke as a single tool. An external assistant can call the finished tool from the toolbox. Most of my power users use both daily.

Observability matters more than it usually does

Agents fail in weirder ways than CRUD APIs fail. The model hallucinates a tool name; a tool returns a response the model can’t parse; the rate limiter trips mid-loop; a timeout cancels iteration six of ten. If you can’t tell, after the fact, which one happened, you can’t improve anything.

Every Smart Scheduler invocation (success, failure, rate-limit, circuit-open, timeout) writes a row to smart_scheduler_audit_log with the prompt, the outcome enum, token usage, wall-clock latency, the source (REST or MCP), and the error message if any. That table is boring. It is also the reason I can make informed decisions about which model to default to, how to tune the circuit breaker, and where the system prompt is under-specified. Boring observability on top of non-boring behaviour is the only way I’ve found to keep agentic features healthy in production.

The meta-story: why 19 hours

The Smart Scheduler’s Phase 1 foundation commit landed at 18:45 UTC on a Saturday. The “switch default model to Haiku” commit that closed out the UX polish landed at 14:09 UTC the following day. Somewhere in between: the full agent loop, the 13-tool registration, the audit logging, the circuit breaker, the rate limiter, the chat history threading, the MCP tool wiring, the system prompt with date/timezone/attendee grounding, the Spock tests.

I didn’t write that alone. I drove it alongside Claude Code with a set of specialised subagents: a backend agent tuned to this codebase’s conventions, a frontend agent for the Angular side, an exploration agent for read-only codebase research. My role shifted from “writer of every line” to “architect of the briefs, reviewer of the diffs, owner of the tradeoffs.” The work that used to be a sprint compresses into a day because the things I’m good at (architectural judgment, naming, knowing what to not build) are now the bottleneck instead of typing speed.

This is the part I want to be careful about. It isn’t magic. The compression only happens when:

  1. The codebase conventions are written down somewhere the agent can read them. Mine are in CLAUDE.md and under .ai/rules/.
  2. The task briefs are specific and honest about what’s in scope. A vague brief to an agent produces a vague PR. A brief that names file paths, line numbers, edge cases, and test expectations produces something you’d merge.
  3. You still read the diffs. I cannot overstate this. Agent-assisted coding is not agent-autonomous coding, and pretending otherwise is how regressions ship.

The other thing worth naming: the compression isn’t really about code speed. It’s about willingness to try. When a non-trivial backend feature costs you two days of typing instead of two weeks, you end up trying ideas you’d previously have dismissed as “not worth the time.” The Smart Scheduler is one of those ideas.

Closing thought

The most interesting technology moves of the next few years won’t be the agents themselves — they’ll be the product decisions the compressed build cycle makes possible. Shipping an end-user agent and an MCP toolbox around the same core tool vocabulary, in the same week, from the same two-person codebase, used to be a roadmap entry. Now it’s a Tuesday.

If you’re building in this space, I’d love to compare notes. The public write-up of the Calendrz Smart Scheduler is here. Tech notes, objections, things I got wrong; my inbox is open.