Wiring an Idol Agency for Data-Driven Management
I want the agency's management decisions to lean on numbers rather than gut. Sales, live attendance, studio utilization, per-member personal revenue, staff hours, the volume and direction of Discord exchanges — slowly widening "what we can see as numbers" is roughly where we are. Calling it "data-driven" would be a stretch, but that's what we're aiming at. The data itself is consolidating into MySQL. The collection layer is what I wrote about in "Dashboarding an Idol Agency with Metabase and Cloudflare Tunnel". Visualization is left to Metabase.
But opening a dashboard to look at numbers wasn't fast enough for the questions that pop up. "How did last week's per-venue live attendance correlate with same-day merch revenue?" "Who did this member exchange the most with on Discord last month?" — questions like these tend to drift into "never mind" while you're rewriting SQL. The rhythm of management decisions is set by the cost of asking.
So I wanted to query the agency DB in natural language from Claude Code. Keep Metabase as the window for deep dives, throw daily questions into chat — a two-handed setup. Standing up MCP (Model Context Protocol) in production was the answer.
Standing it up safely and usefully was more work than I expected. To shrink the "reachable surface", I assembled three-tier authentication plus a two-stage role-and-visibility filter; to fatten the "returned values", I designed how much context each response should carry; and at the end, JSON broke on an emoji boundary. Writing it down so I don't repeat the same mistakes.
Why stdio Isn't Enough
MCP transports
Model Context Protocol is the protocol that wires LLM clients to servers. Two transports exist: stdio (calling a local process via standard I/O) and HTTP (bidirectional via StreamableHTTPServerTransport). stdio stays on your machine; HTTP works across the network.
If you're only ever calling it locally, stdio MCP is enough. Write a command in Claude Code's .mcp.json and it spawns as a child process. I'd been doing exactly that for a while — a stdio MCP in a separate, local-only repo, throwing data-driven questions at Claude against the agency's business data.
Two problems with that. One: every time a data source got added, I had to mysqldump from production to local, and that overhead piled up. Freshness suffered too — refreshing every time to reflect the morning sales import, midday schedule sync, or Discord message ingestion isn't realistic. Two: stdio MCP is confined to my local process, so I couldn't throw the same questions from a laptop on the road.
Standing the MCP server itself in production and calling it over HTTP from Claude Code is the right shape. Once you commit to that, authentication is the first question.
Why I Keep a Copy of Discord Messages
Before getting into authentication, let me close the data-source question. I wanted Claude to search messages in the agency Discord. The naive first plan was "MCP tool calls Discord API and returns search results". I hit two walls almost immediately.
The first was rate limits. Discord's API enforces both per-route and global limits; hit GET /channels/{id}/messages for real and you'll see 429s within a few requests per second. Even with exponential backoff that respects retry_after, the wait stacks up when Claude is firing tool calls back-to-back. The moment a single tool call no longer feels like a few seconds, the rhythm of the conversation breaks.
The second was search flexibility. Discord's GET /guilds/{id}/messages/search exists, but it's an API designed for UI-driven use; it's too thin to build the kind of search the agency actually needs — across channels, with Japanese ngram matching, combining time range, channel, and author filters. The response shape is optimized for the search UI, awkward to process structurally downstream.
So I decided to keep a copy of Discord messages in my own MySQL. A batch periodically ingests messages and stores what we need: body, author, channel ID, timestamp. The full-text searchability of the body comes from a WITH PARSER ngram FULLTEXT index. Now the MCP tool just runs MATCH ... AGAINST(? IN BOOLEAN MODE).
The MySQL ngram parser
The default FULLTEXT index tokenizes on whitespace, which barely works for languages without word boundaries like Japanese. WITH PARSER ngram indexes the string as N-grams (default 2 characters), so MATCH ... AGAINST(? IN BOOLEAN MODE) supports Japanese partial matching and boolean operators (+term1 -excluded "exact phrase").
As a side effect, the flexibility of search conditions jumped:
- Any combination of body + author + channel + time range
- ngram + boolean mode supports
+term1 -excluded "exact phrase" - Returns the data in the shape downstream structured processing needs
Switching from "thinly wrap Discord's search inside an MCP tool" to "keep a copy of the facts in my own DB and grow a search tool on top of it" frees you from both the rate limits and the feature limits of the API. The trade is that you have to write a message sync batch — but the Discord Bot side already had a running process, so I just added a sync loop to it.
Whether to keep a copy of messages is a decision that deserves careful thought on privacy grounds, but if the premise is "our own Discord guild, in our own self-operated DB", you're just putting the same data that's already visible on Discord in a different place.
Tools Return More Than Just Matches
With a copy of the data in hand, the next thing to think about is "what does the tool return?". This is where MCP design paid off the most, in my view.
Naively, search_messages should just return the matching messages — the result set of a MATCH ... AGAINST .... But that doesn't give Claude enough to make the next decision. "What conversation thread did this message come out of?" "Who is it replying to?" "Has the same topic spilled into other channels?" — none of that is visible. In LLM tool calling, the tool's return value goes straight into the next reasoning step. If there isn't enough, Claude tries the tool again from a different angle, and the conversational rally piles up.
So I designed the response to ship three things alongside each hit:
- The N messages before and after: same channel, surrounding the hit's timestamp, fetched with
UNION ALLand bundled in. The flow of the conversation becomes visible. - The parent message it's replying to: if the hit is a reply, the parent comes along. When "an answer to a question" hits, you can see what it's an answer to.
- Channel name: included as a human-readable name, not just an ID. Even when hits from different channels mix together, the topics stay separated.
With that, one call to search_messages becomes "a unit Claude can use to figure out what's happening". Asked "show me the exchange last month where Member A replied to Member B", Claude gets back not just the hits but the surrounding messages and the parents — a self-contained chunk. In the next turn, Claude can summarize the conversation without invoking the tool again.
The same idea runs through other tools. reply_network returns the per-person aggregate and the per-target breakdown in one call; get_channel_summary returns total message count, top authors, and the daily series for the last 30 days in one call; member_sales lays out personal sales, group sales, event sales, and back amount in a single row. The principle: pack each return value at the granularity Claude can decide from in one shot.
Frame it as "one tool call = one conversational round-trip", and it falls into place. The more you can return per round-trip, the fewer rounds in the rally. This isn't about rejecting pagination or streaming; it's about how much context the initial response carries.
An MCP tool isn't a thin wrapper over the DB. It's a function that returns the chunk of information the LLM needs for its next decision. That's what I felt most strongly while building this.
Three Doors
Now, finally, authentication. The agency's MCP server has three entrances.
Physically, it's a single process; but each entrance runs through a different auth method.
| Door | Route | Auth |
|---|---|---|
| Front door | Public Internet → Cloudflare Tunnel | Google OIDC + bearer token |
| Admin VPN | WireGuard VPN → LAN | Trusted-IP bypass |
| Inside the office | Same LAN (direct TCP) | Trusted-IP bypass |
Only the front door says "come in with the right Google account, then bring me an MCP bearer". The other two follow "if you can reach me over the network, I trust you". The LAN is effectively treated as root, so it's deliberately weak.
Front Door: Wrapping MCP's OAuth Provider Around Google OIDC
The MCP OAuth flow
The MCP SDK has you implement an OAuth server with authorization_code + PKCE via the OAuthServerProvider interface. When a client lands on the authorize endpoint, you redirect to the external IdP (Google), pull member info on the callback, and issue your own auth code → access token.
What the MCP SDK assumes is the "stand up your own OAuth server" shape. I had no intention of running my own user DB with passwords, so I wrapped Google OIDC as the IdP underneath. Google Workspace gives every office staff member an account in our own domain, so authentication (who is this user?) can be delegated to Google. But Google OIDC only provides authentication — domain filtering and the "is this person an agency member, and in what role?" decision are the responsibility of the RP, this server.
Why Cloudflare Access doesn't cut it
In front of Metabase, Cloudflare Access wrapped Google OIDC and handled authentication. CF Access stands in front of the IdP, takes over the entire auth flow, and on top of that handles authorization too — policies like "allow only this email domain" or "allow only these specific addresses". If the same trick worked for MCP, this would be easy. It doesn't. CF Access carries auth state in browser session cookies; that doesn't fit a non-browser MCP client like Claude Code. CF Access does have non-browser service tokens, but those are per-app — one token per app — and don't pass per-user identity through to MCP. So I went the other way: stand up my own OAuth server, wrap Google OIDC as the IdP — at the cost of taking on, on the RP side, the authorization (domain filtering + member determination) that CF Access used to handle.
authorize() redirects to Google's authorization endpoint; /oauth/google/callback verifies the id_token against the JWKS and pulls the email claim. From there it queries our own member table and only lets through active accounts mapped to director or manager. Retired members get rejected.
Written out it sounds tangled; in a diagram, it's just the OAuth 2.0 authorization code + PKCE flow.
Steps 1 through 9 are authentication; 10 is post-authorization use. By holding the PKCE code_verifier on both the issuing side (Claude Code) and the redemption side (/token), the authorization code is useless for token exchange even if intercepted. The reason we throw away the id_token on our side and issue our own access token is a Google account existing ≠ being an agency member — retired members and Google accounts that were never members at all fail the member lookup at step 5 and get rejected.
That establishes "authenticated against a Google account, and is a director or manager of the agency". Authorization code TTL is 10 minutes; access token is 24 hours.
Admin VPN and Inside the Office: Bypass by Trusted IP
For LAN or WireGuard traffic, going through OIDC every time is overkill. Trust by prefix match.
const DEFAULT_TRUSTED_PREFIXES = [
'10.x.y.', // office LAN
'127.0.0.1',
'::1',
'::ffff:127.0.0.1',
'::ffff:10.x.y.',
]The LAN is sliced as a /23, so two prefixes are listed to cover it. isLocalTrusted() checks the IP prefix; if it matches, the bearer auth is skipped and the request moves on to the next middleware.
Treating the LAN as root is deliberate. By the time a packet physically lands on this LAN, the sender is either someone inside the office or an admin coming in over VPN (and I'm the only one with the key). You could say the network boundary is doing the work of authentication.
JSON That Dies on an Emoji
Two days into using the Discord search_messages tool, this error came back from the Claude API.
400 Bad Request: The request body is not valid JSON:
no low surrogate in stringI hadn't done anything wrong (or so I thought). The tool just searches Discord messages with ngram FULLTEXT and returns long bodies as snippets with 60 characters of context on each side.
UTF-16 surrogate pairs
JavaScript strings are internally UTF-16. Characters outside the BMP (U+0000–U+FFFF) — most emoji — are encoded as two 16-bit units (a high surrogate U+D800–U+DBFF and a low surrogate U+DC00–U+DFFF). String.prototype.slice() operates without awareness of these units, so cutting in the middle of a surrogate pair leaves a lone surrogate.
The snippet extraction was this naive:
const RADIUS = 60
return normalized.slice(start, end) // start/end may land inside a surrogateIn a message containing 🌸 (U+1F338 = D83C DF38), if start happens to fall between D83C and DF38, you end up with only one surrogate.
Inside JS, String.length works fine. console.log just renders it as a replacement character (U+FFFD) on screen. The problem only surfaces when you push it out through JSON.stringify, and a strict JSON parser on the receiving side (the Claude API) rejects it with "no low surrogate".
The fix is one short function.
export function safeSlice(s: string, start: number, end?: number): string {
let out = s.slice(start, end)
if (out.length > 0) {
const first = out.charCodeAt(0)
if (first >= 0xdc00 && first <= 0xdfff) out = out.slice(1)
}
if (out.length > 0) {
const last = out.charCodeAt(out.length - 1)
if (last >= 0xd800 && last <= 0xdbff) out = out.slice(0, -1)
}
return out
}If the start of the sliced string is a low surrogate, drop one character; if the end is a high surrogate, drop one. Applied this to MCP tool snippet generation and to the 60-character truncation of channel topics in list_channels.
A few days later the same thing hit on the Discord Bot side. Discord caps messages at 2000 characters, and when the Bot sends a long response, splitMessage chops it into pieces. As a last resort when there's no newline to cut on, it cuts at exactly 2000 characters — and again, when that boundary lands inside an emoji, a lone surrogate remains. The Discord API accepts the lone surrogate inside the JSON as a \uD83C-style escape and substitutes U+FFFD at display time. So Discord just shows a garbled character — but if you POST the same string back to Discord, or feed the log into Claude, a stricter JSON parser on the other end throws the same 400.
Ultimately I extracted safeSlice into src/lib/string.ts and shared it between MCP and Bot.
Two lessons. First, if you're sending a JS string out to the world, suspect lone surrogates. slice(), substring(), substr() — none of them are surrogate-aware. Second, just because it doesn't crash on screen with a glyph doesn't mean it's safe through JSON. Discord forgives with a U+FFFD; the Claude API doesn't.
A Two-Stage Filter: Role and Visibility
Authenticating doesn't mean tools can return anything. The agency's data has tiers.
- Editing member info and data-platform admin operations: director only
- Reading sales, schedules, reservations, Discord history: director + manager
- Talents themselves don't go through MCP for now (the DB isn't touched directly; the Discord Bot returns only their own numbers)
Role-based checks call requireMcpRole at the top of each tool.
server.tool('admin_only_action', '...', schema, async (args, extra) => {
const auth = requireMcpRole(extra, ['director'])
// ...
})Read-only tools are open to both director and manager; tools that write or edit are director only — that's how I've split them.
Roles alone aren't enough. There can be multiple managers, each responsible for a different group. On Discord, channels are configured separately as "visible to all managers" or "visible to only this one manager". If the MCP Discord-related tools (message fetch, search, activity stats, etc.) returned the same result to all managers, exchanges that shouldn't be visible would leak.
So I added a mechanism that reflects Discord-side channel visibility in MCP queries.
Discord permission overwrites
A Discord channel's visibility is determined in three layers: (1) the default permissions of the @everyone role, (2) overwrites for each role, (3) per-user overwrites. Later layers override earlier ones. Only users for whom the VIEW_CHANNEL bit ultimately falls on ALLOW can see the channel. Permissions are stored as bigint bit flags.
For example, set @everyone to DENY and the director role to ALLOW, and the channel is visible only to people with the director role. To go a step further with "show this only to one specific manager", use a per-user overwrite to set ALLOW for that manager.
Here's how it works. A daily batch syncs the agency's Discord guild roles and member info. Each channel's permission overwrites are also kept as JSON in our own DB.
The MCP tool starts from the caller's email, looks up the Discord user ID, and pulls the set of role IDs that user holds. Folding ALLOW/DENY in the order @everyone → each role → per-user overwrite, it collects the channels where the VIEW_CHANNEL bit ends up set. That set is appended to the Discord-related queries as AND channel_id IN (?, ?, ...).
Directors skip this calculation and see everything. A manager whose linkage is broken (Discord user ID not connected to a member row, or guild member record missing) falls into fail closed — empty set. The judgment is that "something visible returns nothing" is a smaller incident than "something that shouldn't be visible returns".
Admin tools that only directors can call (editing member info, etc.), and tools open to managers but narrowed by channel (Discord-related). Role decides the breadth of permission; visibility decides its reach. Implementing both in separate layers means changing one doesn't break the other.
Wrapping up
Looking back after writing this, the thinking on this MCP server went in two directions: how to shrink "what's reachable", and what to pack into "what's returned".
On the shrinking side:
- Three-tier authentication pins down "who's calling"
- Role narrows "which tool can be called"
- Visibility narrows "which data can be returned"
- Writes only go to directors (member-info editing and the like)
- And, strip lone surrogates before sending JSON out
On the packing side:
- Each search hit ships with N messages of context and the reply parent
- Aggregations include the breakdown in the same response
- "A granularity Claude can decide on in a single shot" is the design unit for return values
Shrinking is subtraction; packing is addition. Decide what won't be returned before opening the front door. Decide what should be decidable from a single response before adding a tool. I think these principles apply not only to MCP but to any production HTTP agent an LLM calls.
Once this server was up, the cost of "throw a question, see the result" dropped a notch in feel. Questions like "I get the sense merch unit price went up last month — how does it look broken down by venue?" or "why are this member's studio reservations skewed last week?" can be thrown without waiting for someone to design a dashboard. Claude Code assembles the SQL, calls it through the tool, and summarizes the result. A layer in front of "go deep in Metabase" — checking "is this hypothesis even worth pursuing?" — has come up.
Saying we run management data-driven is still a long way off. How to form hypotheses, which metrics to watch — these are still being shaped. Even so, when the cost of asking drops, the rhythm of decisions changes by that much. The MCP server widened that drop by one step. Holes you only notice in production — like the lone surrogate — will keep showing up; but the patches sit in a shared place. When the next tool gets built, the patch can be folded into the same place.