Why we run AR and AI on the device, not the cloud
Round-trips kill real-time experiences and pile up cloud bills. Here is how Nosmai keeps frames on the phone, and why that is better for latency, privacy and your margins.
Every camera-first feature lives or dies by latency. The moment you send a frame to a server, classify it, and wait for a response, you have already lost the real-time feel, and you have signed up for a bill that scales with every user you add.
The round-trip tax
A cloud call from a mobile device is rarely under 200ms once you account for the network, queueing and inference. For a 60fps camera that budget is 16 milliseconds per frame. There is simply no room for a server in the loop.
- Latency that breaks the live feel of filters and effects
- Per-call costs that grow linearly with usage
- A pipeline that can fall over the moment the network does
- User content leaving the device, a compliance surface you now own
On-device by default

Nosmai Effects and Moderation run entirely on the device GPU. Frames never leave the phone, so there is nothing to upload, nothing to store, and nothing to bill per call. You get sub-8ms processing and a feature that works on a plane.
Once you stop shipping pixels to a server, latency and privacy stop being trade-offs. They become defaults. — Maya Chen
When the edge makes sense
Some workloads genuinely need a server. For those, keep the footprint thin: a secure proxy that holds your keys and handles routing, caching and failover, without your secrets ever shipping in the binary. The rule of thumb: keep perception on-device, and only leave the device when you truly have to.
That single decision, on-device first, edge only when required, is what lets a small team ship camera AI that feels instant and stays affordable as it grows.