What this project demonstrates
- Real voice assistant on Apple Watch with streamed audio over WatchConnectivity
- Phone-side OpenAI WebSocket bridge that works around watchOS networking limits
- Fast Mode on gpt-realtime with semantic VAD for true streaming speech-to-speech
- Think Mode pipeline on gpt-5.5 with configurable reasoning and OpenAI web_search
- HKWorkoutSession used purely as a process-lifetime workaround, no health data touched
- AOD-aware watch UI with phase-colored glow, audio-reactive halo, and reachability state
WatchGPT is proof that a hand-built native pipeline can put a credible voice assistant on the wrist where Siri still cannot.
WatchGPT puts ChatGPT on your wrist as a real voice assistant. Tap the main button on the Apple Watch, talk, and the answer streams back through the watch speaker. No phone screen required, no waking up to read a transcript, no pretending Siri is enough.
The architecture exists because watchOS will not let you do the obvious thing. The watch cannot reliably open arbitrary outbound WebSockets, even though HTTPS data tasks succeed. URLSessionWebSocketTask, NWConnection with NWProtocolWebSocket, and waitsForConnectivity all fail in their own ways. The fix is a paired iPhone companion that holds the OpenAI WebSocket and relays 24 kHz PCM audio over WatchConnectivity, while the watch handles the mic, speaker, and UI. Both apps must be running and reachable for a session.
There are two assistant modes. Fast Mode runs OpenAI gpt-realtime with semantic VAD and true streaming speech-to-speech, delivering conversational latency that feels like talking to a person. Think Mode runs a slower turn-based pipeline against gpt-5.5 with configurable reasoning effort, for the cases where the smarter, more deliberate model matters more than feel. Both support web search: Think Mode through OpenAI built-in web_search server-side, Fast Mode through a Brave Search function-tool callout if you provide a free Brave key.
The watch UI is built around a rounded-square voice button with a phase-colored glow and an audio-reactive halo, paired with an iPhone reachability pill and polished transcript bubbles. The app is AOD-aware: when the screen dims on a wrist drop, the button switches to a calm grayscale look so stale phase colors do not mislead. Hands-free is the default, push-to-talk is one toggle away, voice barge-in is off by default for cleaner half-duplex with tap-to-interrupt, and 38 languages can be locked in or left on Auto. Mic sensitivity has High, Standard, and Low presets for noisy rooms.
Process lifetime is the other quiet engineering battle. watchOS aggressively reaps third-party apps within tens of seconds of a wrist drop, even with the audio background mode. WatchGPT starts an HKWorkoutSession of activity type mindAndBody purely as a lifetime crutch, with no health data read, written, or shared. The iPhone companion holds full transcript history with auto-generated titles and summaries, an iMessage-style chat-bubble detail view, copy and share actions, swipe-to-delete, and a collapsible diagnostics panel exposing mode, turn-taking, last OpenAI event, reconnect counts, mic peak, and watch chunk telemetry. The OpenAI API key never touches the watch.
This is a personal sideload project on purpose. Bring-your-own-API-key is not end-user friendly, the workout keep-alive workaround would not survive App Review, and the project treads on Siri territory. So the answer is the GitHub repo, your own Apple ID, and a free personal team. Hacker-friendly by design.
What it does
Fast Mode + Think Mode
Fast Mode runs gpt-realtime with semantic VAD for streaming speech-to-speech. Think Mode runs gpt-5.5 turn-based with configurable reasoning effort.
Web Search In Both Modes
Think Mode uses OpenAI built-in web_search server-side. Fast Mode does a function-tool callout to Brave Search when you provide a free key.
Phone-Bridged Architecture
iPhone companion holds the OpenAI WebSocket and relays 24 kHz PCM audio over WatchConnectivity, working around watchOS networking limits.
Premium Watch UI
Rounded-square voice button, phase-colored glow, audio-reactive halo, reachability pill, AOD-aware grayscale fallback, polished transcript bubbles.
38 Languages, Mic Presets
Lock the assistant and transcription model to one language or leave on Auto. High, Standard, and Low mic sensitivity for quiet, default, and noisy rooms.
Transcript History + Diagnostics
Every session is saved on the iPhone with auto-generated title and summary, swipe-to-delete, copy and share, and a live diagnostics panel.
Product gallery
Comments
Built with
- Swift
- SwiftUI
- watchOS
- WatchConnectivity
- OpenAI Realtime API
- OpenAI Codex
More from the lab
A few related projects that share the same design sensibility and build approach.
