← All roles

Voice & Realtime Engineer

EngineeringRemote · WorldwideFull-time$180k – $250k + equity

About the role

The core interaction in Gray is the interrupt loop: talk, watch it work, redirect mid-task. That loop lives or dies on latency. You will own the realtime pipeline — on-device transcription, streaming TTS, barge-in detection, and the websocket layer that carries it all.

What you will own

  • Own end-to-end voice latency, from mic open to first spoken syllable
  • Run transcription on-device and keep it accurate for operator vocabulary — hostnames, flags, paths
  • Build barge-in that works: detect intent to interrupt without false triggers
  • Design the streaming protocol between app, self-hosted box, and model providers
  • Instrument everything; latency regressions should page you before users notice

What you bring

  • Deep experience with realtime audio — WebRTC, audio codecs, VAD, or speech pipelines
  • Strong systems instincts; you think in milliseconds and buffers
  • Production experience with streaming LLM or speech APIs
  • Comfort working across mobile, server, and protocol layers

Nice to have

  • On-device ML experience (whisper.cpp, Core ML)
  • You've built a voice assistant, even a toy one
  • Contributions to realtime open source

What we offer

  • Meaningful equity. Every role carries a real stake in Layer Gray, with a 10-year exercise window.
  • Remote, worldwide. Work from anywhere. We hire for the role, not the time zone.
  • Hardware budget. $4,000 for your machines, plus a home server allowance — you should run Gray on your own box.
  • Flexible time off. 25 days minimum, and we mean minimum. We track outcomes, not hours.
  • Health covered. Full medical, dental, and vision for you and your dependents, wherever you are.
  • Two offsites a year. The whole crew, one room, twice a year. The rest of the time, async.
  • Model subscriptions. Claude, GPT, and friends — every frontier model subscription, paid.
  • Learning budget. $2,000 a year for books, courses, and conferences. No approval theatre.

About Gray

Gray is voice-first AI you operate like a terminal. You speak; it runs real work across your machines — SSH sessions, multi-agent jobs, files, scheduled tasks — then speaks back. It is self-hosted, private by architecture, and built for the people who run the internet’s plumbing. Gray is made by Layer Gray, Inc.