The standard pitch for cloud-first IoT goes like this: sensors push data to the cloud, the cloud runs your analytics, the cloud sends you an alert. It sounds clean. The problem is the word "sends." Between when a sensor detects something wrong and when a technician receives a notification, a chain of events has to complete — and in facility monitoring, the length of that chain is not a performance optimization opportunity. It is a safety consideration.

I work on Meshkindle's systems architecture, and latency is the question I think about more than almost anything else. Here is a breakdown of the actual numbers, where they come from, and why moving inference to the edge changes the calculus for high-stakes monitoring scenarios.

The Cloud-First Latency Chain

A cloud-first IoT architecture passes data through several stages between sensor and alert. Each stage adds latency. The numbers I am citing here are not worst-case estimates — they are the realistic ranges we observe in production deployments across different cellular, Wi-Fi, and LPWAN uplink configurations.

Stage Typical latency Notes
Sensor to edge gateway (RF) 50–300 ms 802.15.4 mesh hop; varies with hop count and traffic
Edge gateway to cloud (WAN) 100 ms – 8 min Varies dramatically with uplink type; cellular in remote areas, rate-limited LPWAN
Cloud ingestion and normalization 1–30 seconds Batching, queue depth, autoscale lag
Anomaly detection model inference 500 ms – 5 min Batch inference schedules vs. streaming pipelines
Alert generation and notification delivery 10 seconds – 3 min Push notification delivery, email queues, SMS gateway

Add those stages and you get a range of roughly 2 minutes on the optimistic end to 25+ minutes on the pessimistic end. The 8–25 minute figure cited in our product documentation reflects realistic field conditions: a cloud-first deployment relying on a cellular uplink in an industrial building with RF interference, with batch inference running on a 5-minute schedule, delivering alerts via email.

For the vast majority of facility monitoring use cases — trend analysis, energy reporting, maintenance scheduling — that latency is completely acceptable. For a server room water leak, a cold-storage temperature excursion, or a smoke pre-condition, it is not.

What Edge Inference Actually Means

Edge inference means running the anomaly detection model on the gateway node itself, before any data leaves the facility. The sensor reading arrives at the node, the model evaluates it against the learned baseline, and if the reading falls outside the normal envelope, the alert is generated locally. The cloud gets notified, but the notification does not wait for the cloud to make the decision.

In practice, this collapses the latency chain to two stages: sensor to edge gateway (50–300 ms) and local inference (under 50 ms for the quantized models we run on our gateway hardware). Alert generation follows immediately. Push notification delivery to a technician's mobile app typically adds another 5–15 seconds. Total wall-clock time from threshold breach to technician notification: under 90 seconds in standard deployment conditions.

That is not a minor improvement over cloud-first. It is a 5–15x reduction in detection-to-notification latency in typical conditions, and a much larger gap in scenarios where WAN connectivity is degraded.

The WAN-Independent Operation Case

There is a failure scenario in cloud-first deployments that does not get enough attention: WAN outage during an active facility incident. Internet connectivity in industrial facilities is not uniformly reliable. Power events, router failures, ISP outages, and cellular coverage gaps are all real phenomena. In a cloud-first architecture, a WAN outage does not just degrade your monitoring capability — it eliminates it entirely during exactly the type of event most likely to cause a WAN outage (a power failure, which is also the event most likely to accompany a fire suppression discharge or an HVAC cascade fault).

Edge inference decouples anomaly detection from WAN availability. The gateway node continues running the detection model against incoming sensor data whether or not it has a cloud connection. Local alerts fire. The on-site notification system activates. The facility team gets notified through the local mesh network — using on-site LoRaWAN radio or direct Ethernet to an on-premises alert panel — independent of whether the internet is up.

We designed this architecture specifically because Felix had watched a $2.1M HVAC failure unfold over 72 hours at a life sciences campus, where a sensor had been offline for three weeks and nobody knew because it reported to a cloud dashboard nobody checked consistently. The cloud-dependent monitoring model has a systemic vulnerability: when the connection to the cloud is unreliable, the monitoring system becomes unreliable. Edge inference eliminates that dependency for time-critical detection.

Model Accuracy at the Edge: The Trade-Off

Running inference on a gateway node means running a constrained model. Our gateway hardware uses an ARM Cortex-M7 processor with 1 MB of on-chip SRAM — enough to run a quantized anomaly detection model, but not enough to run the same model you would deploy on a cloud inference cluster with 16 GB of RAM. That trade-off is real and worth being direct about.

The models we deploy at the edge are purpose-built for their detection task: temperature anomaly, vibration signature deviation, humidity spike, pressure differential. Each model is trained on baseline data from the specific asset it monitors, then quantized to 8-bit integer weights for edge deployment. The quantization introduces a small accuracy reduction — roughly 1–2% in F1 score compared to the full-precision cloud model in our testing. In exchange, inference time drops from 500+ milliseconds in a cloud pipeline to under 50 milliseconds on the node.

For the detection tasks that matter most in facility monitoring, the edge model accuracy is sufficient. We are not running a general-purpose anomaly detector across heterogeneous data streams. We are running a narrow, calibrated model against a single asset's telemetry stream, comparing it to a baseline learned over the first 14 days of deployment. That narrow task suits edge computation well.

Cloud Inference Still Matters — Just Not for Alerts

Edge inference does not mean the cloud becomes irrelevant. The right architecture uses both: edge for time-critical detection, cloud for everything that benefits from longer time horizons and richer data.

Trend analysis — understanding whether a compressor's vibration pattern is drifting gradually over 30 days — requires the full historical record and a more sophisticated model than an edge node can run. Root cause correlation across multiple assets — identifying that three HVAC units in the same zone are all showing abnormal behavior simultaneously — requires a view of the whole facility's data, not a single node's stream. Energy consumption optimization requires integrating occupancy patterns, weather data, and tariff schedules over weeks-long windows.

All of that belongs in the cloud. The division of labor in a well-designed facility IoT architecture is: edge inference handles the sub-90-second detection window where latency is a safety consideration; cloud inference handles the hours-to-weeks analysis window where breadth and historical depth matter.

Deployment Implications

Choosing an edge-inference architecture has hardware implications. Your gateway nodes need enough compute to run quantized models — that is not a given for every off-the-shelf IoT gateway. Many gateways are designed purely for data aggregation and forwarding; they collect, format, and upload, but they do not have the processor headroom or the firmware stack to run local inference.

It also has a baseline calibration requirement. An edge anomaly model needs a learned baseline for each asset it monitors. In our deployment process, that calibration period is 14 days: the node collects baseline readings under normal operating conditions, the model initializes against that distribution, and edge detection activates. That means a new deployment has a 14-day window before edge detection is fully active. Cloud inference can supplement during that window using a broader baseline from similar assets.

The latency difference between edge inference and cloud-first IoT is not a benchmark to optimize toward. It is a decision point: if a threshold breach in your facility could cause harm in under 25 minutes, cloud-first latency is a structural risk. For those environments, edge inference is not an optional enhancement. It is a requirement.