Skip to content

Dashboarding an Idol Agency with Metabase and Cloudflare Tunnel

Staff in a talent agency don't look at much besides Discord. They're busy managing talent — scheduling rehearsals, coordinating with venues, handling crises. Opening a separate tool is a luxury they don't have.

But the data lives in four places. Live event sales are in Square. Schedules are in TimeTree. Staff work logs and event reports are in Discord. Task management is in Asana. To answer "how did last month's revenue correlate with attendance," someone has to copy numbers into a spreadsheet by hand.

I believe sales and business metrics should be visible to everyone on the team. Work hours and attendance are a different matter, but you can't ask people to make management decisions without showing them the numbers. So we push metrics into Discord — where the staff already are. For deeper analysis, there's a Metabase dashboard, published securely through Cloudflare Tunnel.

About Metabase and BI tools

Metabase is an open-source dashboard and analytics tool. You connect it to a data source and build charts or reports — SQL optional. Tools in this category are collectively called BI (Business Intelligence) tools. Throughout this post, "the BI layer" refers to this kind of system.

About Cloudflare Tunnel

A Cloudflare service that publishes a server sitting inside a private network — without a global IP, without opening firewall ports. It works by having the server itself initiate an outbound connection to Cloudflare's edge. More on this further down.

Four Data Silos

Square — The One Reliable Number

Merchandise at live venues, cheki photo sessions, tickets. Every transaction at the venue goes through a Square terminal. Even cash payments get recorded in Square — that's the rule. Not just for accounting. If all sales data lives in Square, that's your business metric.

Square's strength is a solid public API. OAuth2 authentication, Orders / Payments / Catalog endpoints. Run a batch job periodically and sales data flows into the database automatically. The fact that you can process payments on a phone with the Square app is a big deal on the ground. Terminal dies? One phone keeps the sales record flowing.

Why record cash in Square too

It's tempting to skip Square for cash transactions, but that leaves holes in the sales data. The accounting ledger and Square numbers stop matching, and KPI reliability collapses. The rule "everything goes through Square" is what keeps the data foundation accurate.

TimeTree — The Industry Standard with No API

In Japan's idol industry, TimeTree has become the de facto standard for schedule management. Both talent and fans use it.

The problem: TimeTree doesn't offer a public API. One existed in the past, but it's been shut down.

And TimeTree's design prioritizes flexibility over structure. Event names, locations, times — all free-form input. Fine for humans to read, but try to process it programmatically and you hit a normalization wall. "live," "Live," "LIVE" — same word, three spellings, all coexisting.

With no API, we authenticate via Playwright and hit internal API endpoints to fetch event data. Session management, pagination, rate limiting — all the overhead a proper SDK would have handled.

Logging in through a browser on every run is slow and brittle. So once we've logged in, we save the cookies, the CSRF token, and the vendor-specific headers the internal API expects into a session file. Subsequent runs reload the saved session and hit only the API; if it returns an auth error, we automatically fall back to a fresh browser login. Pagination follows the internal API's conventions too — combine a cursor with an end-of-stream flag to know if there's more to fetch.

About Playwright

A browser automation framework. Drive Chrome / Firefox / WebKit from a script. It was originally built for E2E testing, but it's also widely used to scrape data from web services that don't expose a public API.

The history of TimeTree's API

TimeTree once offered an OAuth2-based public API with full calendar and event CRUD. Today, new application registrations are no longer accepted — it's effectively closed. The internal API still exists after login, but it's not a published interface and can break without warning when they change things.

Discord — Free Text Structuring Hell

We use Discord for staff work tracking and live event records. It's where staff spend their time, so it happened naturally.

The problem is the shape of the data. One thread per live event. Staff write in details — date, venue, performers, setlist, ticket price, attendance. All free text.

Date formats alone: "4/15," "4月15日," "2026.4.15" — all coexisting. Venue name variations, price formats, setlist numbering styles. Humans read it fine. Regex doesn't keep up. Too many variations. Rule-based parsing was a dead end.

We switched to structured extraction via Claude API's tool use. Feed the thread text in, get date, venue, performers, setlist, and attendance back as JSON.

For example, a staff member writes:

4/15 Shinjuku BLAZE
Performers: Unit A / Unit B
Setlist: 1. Hoshizora Dive 2. Catharsis 3. MC 4. Summer Rain 5. Mirai Chizu
Tickets ¥3500
Attendance 82
Merch: cheki 34 / T-shirts 8

With a tool use schema defined, this text returns:

json
{
  "date": "2026-04-15",
  "venue": "Shinjuku BLAZE",
  "artists": ["Unit A", "Unit B"],
  "setlist": ["Hoshizora Dive", "Catharsis", "MC", "Summer Rain", "Mirai Chizu"],
  "ticket_price": 3500,
  "attendance": 82,
  "merch": { "cheki": 34, "tshirt": 8 }
}

"4/15" becomes "2026-04-15." "Tickets ¥3500" becomes ticket_price: 3500. The LLM handles variation from context, and new patterns just need a prompt adjustment. Night and day compared to chasing edge cases with regex.

The usual worry with LLM-based extraction is "what if the response isn't valid JSON?" Force a specific tool call via tool_choice and the output always conforms to the declared schema — no drift to handle. A Haiku-class model is enough, too. This isn't reasoning; it's converting free text into pre-defined fields, so the smallest model runs it reliably.

Structured extraction via tool use

Claude API's tool use (function calling) constrains LLM output to a JSON schema. LLMs are naturally good at "understanding" free text and structuring it, but free-text output makes downstream processing fragile. Fixing the output format via tool use turns the LLM into a stable normalization engine.

Asana — The Unintegrated Task Silo

The fourth silo is Asana. Projects, tasks, subtasks, comments — management's progress tracking lives here. The OAuth2 public API covers CRUD on Tasks / Projects / Comments, so technically it could flow into MySQL the same way as the others.

Even so, it's deprioritized for now. License cost is why. Starter runs $10.99/month per user, Advanced $24.99/month (both annual billing). Spread that across the team and even a small group adds up to tens of thousands of yen per month.

Unlike sales or attendance, which directly drive management decisions, the value of pushing task data into the BI layer isn't clear yet — at least not enough to justify the cost. So Asana stays an independent silo for the time being.

A Crawler and a Bot in One Repository

One repository, two roles.

The crawler is a batch process. It hits the Square API for sales data, logs into TimeTree via Playwright to scrape schedules, and reads Discord threads for structured extraction. Everything writes to MySQL.

The bot is a long-running process. It maintains a WebSocket connection to Discord, pulls data from MySQL, and posts it back to channels. Staff open Discord and the latest sales and event results are right there.

Two outlets for the data. Daily metrics flow from the bot into Discord. For trend analysis by weekday, venue, or monthly trajectory, there's the Metabase dashboard. No ETL pipeline between the bot and Metabase. MySQL is the only interface.

Publishing Metabase with Cloudflare Tunnel + Access

Metabase runs on an on-prem VM with no global IP. It sits inside a private network. The goal: let staff outside the office access it from a browser.

A VPN is one option. But then you're installing VPN clients on everyone's machines, writing setup guides, troubleshooting connections. Talent agency staff aren't engineers. Anything beyond "open a URL and log in with Google" is too much to ask.

Cloudflare Tunnel — Publishing Without Opening Ports

Cloudflare Tunnel establishes an outbound connection from the server to Cloudflare's edge. No global IP needed. No firewall ports to open.

Just run a daemon called cloudflared on the VM. It maintains a persistent connection to Cloudflare's edge over QUIC (a UDP-based encrypted protocol), relaying external requests to the local Metabase instance at localhost:3000.

Cloudflare Tunnel's connection direction

Traditional reverse proxies accept inbound connections on the server. Cloudflare Tunnel is the opposite — cloudflared initiates outbound connections to Cloudflare's edge. The server-side firewall can deny all inbound traffic and it still works. Behind NAT, deep in a VLAN, on a VM with no global IP — doesn't matter.

The configuration is simple.

yaml
tunnel: <tunnel-id>
ingress:
  - hostname: metabase.example.com
    service: http://localhost:3000
  - service: http_status:404

No global IP procurement, no router port forwarding, no Let's Encrypt certificate management. TLS terminates at Cloudflare's edge.

Cloudflare Access — A Zero Trust Auth Layer

What "Zero Trust" means

A security model that drops the assumption "the internal network is safe" — every request gets authenticated and authorized, regardless of where it comes from. The opposite of the classic VPN model, where being "inside" implied trust. The boundary is identity, not location.

Tunnel alone means anyone with the URL can reach the app. Cloudflare Access inserts an authentication layer in front.

Set an Access policy: "only Google accounts from this email domain." Even if the URL leaks, no one without an authorized Google account can reach Metabase.

Prerequisite: Google Workspace

For email domain-based policies to work in Cloudflare Access, staff need Google accounts on a custom domain. If everyone has @example.com via Google Workspace, allowing that domain is all it takes. With personal Gmail accounts, you'd have to register each email individually.

Eliminating Metabase Free's Double Login

This is the key trick.

Metabase Free doesn't support SAML/OIDC SSO. That's a Pro (paid) feature. So even after passing Cloudflare Access authentication, staff face a second login at Metabase. Double login.

The fix: set the same Google OAuth Client ID in both Cloudflare Access and Metabase.

The flow:

  1. Staff navigates to the URL
  2. Cloudflare Access redirects to Google auth
  3. Logs in with Google (browser retains the Google session)
  4. Metabase login screen appears
  5. Clicks "Sign in with Google"
  6. Google session is still active — instant login, no re-entry

Not true single sign-on. One click on the Metabase login screen is still required. But no password entry, ever. One Google account handles everything.

This works because Cloudflare Access's IdP (Identity Provider) configuration and Metabase's Google Sign-In feature can share the same OAuth Client ID. It's different from Pro's SSO (SAML/OIDC with auto-provisioning and group mapping), but for a small team, it's a seamless enough auth experience.

Why the same OAuth Client ID works

A GCP OAuth Client ID can have multiple redirect URIs registered. Register both the Cloudflare Access callback URL and the Metabase callback URL under the same Client ID, and the browser's Google session covers both. From Metabase's perspective, it's a legitimate user authenticated via Google Sign-In — whether Cloudflare Access was involved is irrelevant.

Zero Trust Free Plan Limits

Cloudflare Zero Trust's Free plan supports up to 50 users. Managers, accountants, producers — if the talent agency's stakeholders fit within 50, the running cost is zero.

Session duration is set to 24 hours. Google auth runs on the first access each day, but after that it's seamless.

Two Visualization Layers

How you present data depends on who's looking, when, and what they need.

Discord BotMetabase
AudienceAll field staffManagers, leadership
FrequencyDaily, passively visibleOn demand, actively opened
ContentSales flashes, recent event resultsCross-analysis, trends, KPIs
InteractionView onlyFilter, drill down

What flows into Discord isn't just text. The bot generates PNG images server-side with Chart.js and chartjs-node-canvas, posting them to Discord channels. Sales trend line charts, per-member bar charts, venue ranking horizontal bars — the appropriate chart type is auto-selected based on the query result.

Staff are too busy managing talent to open a dashboard. Metrics with charts flowing into the Discord they already use — that alone keeps everyone aligned on where the business stands.

Metabase comes in when the question is "why isn't this venue growing attendance" or "how has the cheki sales ratio shifted over six months." With three data sources consolidated in MySQL, cross-source analysis is a single SQL query away.

  • Per-event revenue: Cross-reference Square sales with Discord event records for revenue, attendance, and spend per head by event
  • Weekday / venue analysis: Combine TimeTree schedule data with Square sales to visualize which weekdays and venues are most profitable
  • Staff hours: Aggregate work hours from Discord records as the basis for compensation
  • Monthly trends: Track revenue, attendance, and event count over time

A Configuration You Can Start Small

Every component in this setup is free or low-cost.

ComponentCost
Metabase FreeFree
Cloudflare TunnelFree
Cloudflare Access (up to 50 users)Free
MySQLFree
Square APIFree
Discord BotFree
VM (on-prem or VPS)Existing infra

Before investing in enterprise BI tools or VPN products, it's worth testing whether this combination is enough. Metabase Free's limitations (no SSO, no row-level access control) can be covered by Cloudflare Access's auth layer and operational rules.

What you need is a mechanism to funnel data into one database, and a path to publish it securely. The former is the bot. The latter is Cloudflare Tunnel. Neither requires heavy infrastructure.

The real bottleneck isn't the tooling — it's enforcing the rules that centralize data. "Everything goes through Square." "Event records go in the thread." Technically it's just ETL, but getting people on the ground to follow the rules consistently is far harder. The dashboard only means something once that habit sticks.