⚠ DRAFT — not published. Ben to edit into his voice, add the demo clip, verify the "hard part" story, and scrub before going live.

No login, because the network is the auth

Build-in-public · a teardown of my private AI stack

[ DEMO CLIP HERE — 20–40s: phone in hand, I speak, my AI on my home server does it, the answer comes back. Rough is fine. This is the hook. ]

I can talk to my AI coding agent from my phone. From the couch, from the kitchen, from a hotel — I open a web page, say what I want, and an agent running on a server in my house does the thing and talks back. No app store, no cloud account for this part, and — the part that surprises people — no login screen.

That last one isn't laziness. It's the whole design. Let me walk through it.

The itch

I run a lot of my work through AI coding agents. The problem is they keep me tethered to a keyboard. If I want to kick something off, check on a long-running task, or just steer — "no, do it the other way" — I'm back at my desk typing. That's fine at 2pm. It's annoying at 9pm when I'm on the couch and the thing I want to say is one sentence.

I wanted a front door to my AI that I could reach from anywhere, hands-free, without turning my whole setup into a public web app I'd have to secure like one.

What it feels like now

I pull up a page on my phone. I talk. It transcribes, routes what I said to the right agent session running on my home server, the agent does the work, and the response comes back on the screen — text, and files if there are any. I can be holding groceries.

The shape of it

Nothing exotic. Five pieces:

[ Browser (phone/laptop) ]  — captures my voice
        |   (over my private network only)
        v
[ Whisper (self-hosted) ]   — speech -> text
        v
[ Session router ]          — sends it to the right agent session
        v
[ My AI agent sessions ]    — do the work
        v
[ Browser UI + outbox ]     — reply comes back, with files

The interesting decisions aren't in the boxes. They're in the seams.

The clever part: the network is the auth

Here's what people expect: a web interface to something this powerful should be locked down behind a login, MFA, the works. Mine has none of that. You can't get to it at all — not a login page, nothing — unless your device is already on my private network.

I put the whole thing on a Tailscale tailnet — a private mesh network between my devices. The interface only listens there. So the "authentication" isn't a password the server checks; it's the fact that you physically can't reach the server unless you're already a trusted device on my network. The perimeter is the auth.

The honest tradeoff: this is fantastic for personal and small-team infrastructure, and it's genuinely simpler and more secure than a login on a public URL — there's no public attack surface to defend. It is not how you'd ship this to strangers; the moment you want arbitrary people to use it, you need real auth, because you can't put them all on your private network. For me, running my own stack, "you have to be on my network" is exactly the security boundary I want, and it costs me zero passwords.

Two more decisions worth stealing

I run the speech-to-text myself. The transcription happens on my own hardware (self-hosted Whisper), not a cloud API. Privacy, no per-call metering on my own voice, and it fits the thing I actually care about: owning my AI stack instead of renting it. Your voice is data too.

Voice is for steering, not chatting. I route what I say into persistent, named agent sessions — not a stateless chatbox. That matters more than it sounds. The whole point of talking to an agent hands-free is to nudge long-running work — "check on that," "no, the other approach," "ship it" — and steering needs durable context to steer within. Sessions give me that.

The part that was harder than I expected

[ BEN: verify / replace with the real story. Candidate: keeping the browser view and the backend session state in sync — reconnects, a phone that sleeps mid-task, a session that moved on while the UI was stale. The one honest scar is what makes the whole post credible. ]

The takeaway

You can build a private front door to your own AI without turning it into a public app. Put it on a private network, self-host the pieces you care about owning, and treat the network as your auth boundary. The pattern generalizes to anything you run for yourself and want to reach from your phone.

This is one piece of a private AI stack I've been building for my own day-to-day — and I've started packaging pieces of it as small, self-hostable tools you can own outright (open on purchase), rather than one more thing to rent. More teardowns coming: next up is the network layer that makes this safe — how the tailnet and DNS actually fit together.

Want the architecture template — or the next teardown?

Drop your email and I'll send the sanitized template and the next post when it lands.

Email me to follow along