Home › Blog › Building an IoT Data Pipeline with Azure Event Hubs and Azure Functions

Cloud Azure Cloud Data Pipeline

Building an IoT Data Pipeline with Azure Event Hubs and Azure Functions

📅 October 2025 ⏳ 3 min read FSS Engineering Team

Azure IoTdeviceEvent HubDashboard

Azure IoT data pipeline: devices — IoT Hub — Event Hubs — Functions — Time Series — React dashboard

Connecting 10,000 devices directly to a database is not a pipeline — it is a bottleneck. A production IoT data pipeline must handle millions of messages per day, survive outages without data loss, and serve real-time and historical queries simultaneously.

Pipeline Architecture

// Four-stage pipeline

Stage 1: Ingest   -> Azure IoT Hub (MQTT/HTTPS)
Stage 2: Route    -> Message routing to endpoints by type
Stage 3: Process  -> Azure Functions (Event Hubs trigger)
Stage 4: Serve    -> Time Series Insights + Cosmos DB

Azure Functions — Event Hub Trigger

// Event Hub trigger — process telemetry batch

app.eventHub("processTelemetry", {
    connection: "EventHubsConn",
    eventHubName: "telemetry",
    cardinality: "many",
    handler: async (events, context) => {
        for (const e of events) {
            await tsi.write(e.deviceId, e.timestamp, e.readings);
            await cosmos.upsert({id: e.deviceId, lastSeen: e.timestamp});
        }
    }
});

⚠️ Message retention warning

IoT Hub retains D2C messages for max 7 days. Always configure a Blob Storage fallback endpoint — if your Functions go offline, you have a full archive to replay from. Cold storage costs are negligible; lost data is unrecoverable.

Partitioning Strategy for Event Hubs

Event Hubs uses partitions to achieve throughput — each partition is an independent ordered stream. Consumer groups read from partitions in parallel, so more partitions means more parallel processing. A common mistake is under-partitioning: starting with 4 partitions and discovering that your Azure Functions cannot scale to process 100,000 messages per second when a firmware bug causes all devices to simultaneously send error events.

Partition count cannot be changed after creation. Provision for 2–5x your expected peak throughput. Use the device ID as the partition key — this ensures all messages from a single device are ordered within a partition, which simplifies time-series reconstruction and state management in your Functions.

Handling Device Reconnection Storms

When a cloud outage ends, every device tries to reconnect simultaneously. At 10,000 devices, this creates a thundering herd: millions of buffered messages hit Event Hubs in seconds. Your processing Functions must handle this gracefully — implement exponential backoff with jitter in device firmware, and configure Event Hubs with sufficient throughput units and retention to absorb the burst.

// Device firmware — reconnection with jitter

int backoff_ms = BASE_BACKOFF_MS;
while (!connected) {
    int jitter = random(0, backoff_ms / 2);
    vTaskDelay(pdMS_TO_TICKS(backoff_ms + jitter));
    backoff_ms = MIN(backoff_ms * 2, MAX_BACKOFF_MS);
    connected = mqtt_connect();
}

Cold Path vs Hot Path Architecture

IoT data pipelines typically have two paths: the hot path for real-time processing (dashboards, alerts, anomaly detection) and the cold path for historical storage and batch analytics. The hot path prioritises latency — Azure Functions processing Event Hub events and pushing to SignalR for live dashboards. The cold path prioritises completeness — all raw messages written to Azure Blob Storage in Parquet format for later analysis with Azure Synapse or Databricks.

Designing both paths from day one avoids the common problem of discarding raw data that turns out to be valuable later. Storage is cheap; reconstructing historical sensor data that was never stored is impossible. Our standard pipeline writes all raw device messages to cold storage and a processed subset to the warm tier — you can always re-process the raw data, but you cannot un-lose it.