Azure IoT data pipeline: devices — IoT Hub — Event Hubs — Functions — Time Series — React dashboard
Connecting 10,000 devices directly to a database is not a pipeline — it is a bottleneck. A production IoT data pipeline must handle millions of messages per day, survive outages without data loss, and serve real-time and historical queries simultaneously.
Stage 1: Ingest -> Azure IoT Hub (MQTT/HTTPS) Stage 2: Route -> Message routing to endpoints by type Stage 3: Process -> Azure Functions (Event Hubs trigger) Stage 4: Serve -> Time Series Insights + Cosmos DB
app.eventHub("processTelemetry", {
connection: "EventHubsConn",
eventHubName: "telemetry",
cardinality: "many",
handler: async (events, context) => {
for (const e of events) {
await tsi.write(e.deviceId, e.timestamp, e.readings);
await cosmos.upsert({id: e.deviceId, lastSeen: e.timestamp});
}
}
});
Event Hubs uses partitions to achieve throughput — each partition is an independent ordered stream. Consumer groups read from partitions in parallel, so more partitions means more parallel processing. A common mistake is under-partitioning: starting with 4 partitions and discovering that your Azure Functions cannot scale to process 100,000 messages per second when a firmware bug causes all devices to simultaneously send error events.
Partition count cannot be changed after creation. Provision for 2–5x your expected peak throughput. Use the device ID as the partition key — this ensures all messages from a single device are ordered within a partition, which simplifies time-series reconstruction and state management in your Functions.
When a cloud outage ends, every device tries to reconnect simultaneously. At 10,000 devices, this creates a thundering herd: millions of buffered messages hit Event Hubs in seconds. Your processing Functions must handle this gracefully — implement exponential backoff with jitter in device firmware, and configure Event Hubs with sufficient throughput units and retention to absorb the burst.
int backoff_ms = BASE_BACKOFF_MS;
while (!connected) {
int jitter = random(0, backoff_ms / 2);
vTaskDelay(pdMS_TO_TICKS(backoff_ms + jitter));
backoff_ms = MIN(backoff_ms * 2, MAX_BACKOFF_MS);
connected = mqtt_connect();
}
IoT data pipelines typically have two paths: the hot path for real-time processing (dashboards, alerts, anomaly detection) and the cold path for historical storage and batch analytics. The hot path prioritises latency — Azure Functions processing Event Hub events and pushing to SignalR for live dashboards. The cold path prioritises completeness — all raw messages written to Azure Blob Storage in Parquet format for later analysis with Azure Synapse or Databricks.
Designing both paths from day one avoids the common problem of discarding raw data that turns out to be valuable later. Storage is cheap; reconstructing historical sensor data that was never stored is impossible. Our standard pipeline writes all raw device messages to cold storage and a processed subset to the warm tier — you can always re-process the raw data, but you cannot un-lose it.
FSS is a full-stack IoT engineering team — hardware, firmware, cloud, and mobile in one place.
FSS Technology designs and builds IoT products from silicon to cloud — embedded firmware, custom hardware, and Azure backends.
Talk to our team →