Most IoT backends start on Azure App Service. It is the right choice for the first 10,000 devices: managed, cheap, fast to ship. Then a customer onboards 250,000 sensors over a weekend, telemetry goes from steady to spiky, the team adds a stream processor, three more APIs, a long-running export job and a reporting service. Suddenly App Service is fighting you.
This article is the architecture we deploy at FSS Technology when an IoT product crosses the threshold where Kubernetes pays for itself. It covers when to migrate from App Service, the AKS reference topology for IoT, the multi-tenancy decisions that matter, KEDA-driven autoscaling on Service Bus, the observability stack we standardize on, GitOps with Flux, secret management with workload identity, time-series storage choices and the cost patterns that keep the platform sustainable at millions-of-devices scale. It complements the lighter-weight pipeline we describe in our IoT data pipeline article; this is what comes next.
App Service is excellent until any of the following becomes true:
If you are nodding at three or more of those, AKS is the next stop. If only one, fix the symptom and stay where you are. Kubernetes is not free, and the operational tax is real. Plan for at least one engineer who owns cluster operations end to end before you commit; “someone in DevOps will figure it out” is not a plan.
The cluster we deploy for production IoT workloads has four logical tiers, each in its own node pool with appropriate VM SKUs.
System pods (CoreDNS, kube-proxy, ingress, observability) live in a dedicated system node pool with at least three nodes for HA. Never run system workloads on the same pool as application workloads; a noisy tenant should never be able to evict ingress.
IoT Hub remains the right device gateway even when the rest of the backend is on AKS. It owns device identity, MQTT/AMQP protocol handling, device twin state and direct method invocation. AKS owns the business logic.
The integration pattern:
This pattern is the natural extension of the simpler architecture we describe in the IoT data pipeline article linked above. The processing model is the same; the runtime substrate is more powerful, and the per-tenant boundaries become enforceable rather than aspirational.
This is the most consequential architecture decision you will make. Both models work; the wrong choice will cost you painfully.
Pros: cheap, simple to operate, single control plane to upgrade, easy cross-tenant analytics. Cons: shared cluster blast radius, noisy-neighbor risk on the data plane, harder to satisfy strict compliance regimes that demand compute isolation, kubelet and apiserver become shared scaling bottlenecks.
Use this when tenants are similar in size, trust each other (or trust you to enforce isolation), and the value of operational simplicity outweighs strict isolation. Apply NetworkPolicy, ResourceQuota, LimitRange and Pod Security Admission to every namespace by default.
Pros: hard isolation, independent upgrade cadence, per-tenant SLOs, simple compliance story. Cons: 5x to 10x operational cost, harder cross-tenant features, fleet management complexity (Azure Arc, Cluster API or Fleet Manager become mandatory).
Use this when tenants are large enough to justify the overhead (typically 50,000+ devices each), or when regulatory requirements demand it (defense, regulated pharma, certain government scenarios).
Shared cluster with namespace-per-tenant for the long tail, plus dedicated clusters for the few large customers who need or want hard isolation. This works because the platform code is identical; only the deployment topology differs. GitOps makes the duplication manageable.
For internal east-west and tenant subdomains under a shared apex, NGINX Ingress Controller is the pragmatic default: well-understood, fast, infinitely tunable. For Azure-native ingress with WAF and integration with Azure Front Door, Application Gateway Ingress Controller (AGIC) is the right call.
The pattern we run for IoT backends is AGIC at the public edge for WAF and DDoS coverage, NGINX inside the cluster for tenant routing and TLS termination per subdomain. Cert-manager with Let’s Encrypt for automated certificates, ExternalDNS for automatic Azure DNS records.
Horizontal Pod Autoscaler on CPU is wrong for IoT. Backpressure does not show up as CPU saturation; it shows up as a growing queue. KEDA (Kubernetes Event-driven Autoscaling) reads the queue and scales accordingly.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: telemetry-processor-scaler
namespace: tenant-acme
spec:
scaleTargetRef:
name: telemetry-processor
pollingInterval: 15
cooldownPeriod: 120
minReplicaCount: 2
maxReplicaCount: 60
triggers:
- type: azure-servicebus
metadata:
queueName: telemetry-ingest
namespace: fss-prod-sb
messageCount: "500"
authenticationRef:
name: keda-azure-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-azure-auth
namespace: tenant-acme
spec:
podIdentity:
provider: azure-workload
identityId: a1b2c3d4-...
The two numbers that matter: messageCount is the per-replica target, not the total threshold. Set it to your single-replica steady-state throughput so KEDA scales to maintain that ratio. cooldownPeriod prevents flapping; 120 seconds is a sane default for IoT, where bursts are common.
For Event Hubs (telemetry ingest), KEDA has a dedicated trigger that scales on partition lag rather than queue depth. Use it; CPU-based scaling on Event Hubs consumers is always wrong because the consumer is IO-bound on the broker.
The stack we standardize on splits responsibility cleanly:
The non-negotiable: every log, metric and trace carries the tenant ID as a label or attribute. Without that, you cannot answer “how is tenant X doing right now,” which is the question that matters most when something is on fire.
Manual kubectl apply in production is malpractice. GitOps closes the loop: Git is the source of truth, the cluster reconciles itself toward Git, drift is impossible because it gets corrected within minutes.
Flux v2 is the lighter-weight choice and integrates natively with AKS via the GitOps extension. ArgoCD has a richer UI and better multi-cluster ergonomics. Both work; pick based on team preference and stick with it.
The repo layout we use:
fleet/
clusters/
prod-weu/
flux-system/
infrastructure/ # ingress, cert-manager, KEDA, observability
tenants/
acme/
contoso/
prod-eus/
...
apps/
telemetry-processor/
base/ # Helm chart or kustomize base
overlays/
prod/
staging/
charts/
fss-iot-stack/ # umbrella Helm chart, see below
An umbrella Helm chart for the platform with a values structure that lets per-tenant overrides stay small.
global:
region: westeurope
env: prod
imageRegistry: fssprod.azurecr.io
workloadIdentity:
enabled: true
clientId: ""
observability:
otlpEndpoint: https://otel.fss.cc
ingest:
replicas: 3
image:
repository: fss/iot-ingest
tag: 2.14.0
resources:
requests: { cpu: 250m, memory: 512Mi }
limits: { cpu: 1, memory: 1Gi }
iotHub:
eventHubEndpoint: ""
consumerGroup: ingest
processing:
replicas: 4
keda:
enabled: true
minReplicas: 4
maxReplicas: 80
queueName: telemetry-ingest
targetMessageCount: 500
api:
replicas: 2
ingress:
host: api.tenant.fss.cc
tlsSecret: api-tls
storage:
adx:
cluster: fss-adx-weu
database: telemetry
cosmos:
account: fss-cosmos-weu
database: device-state
tenants:
- name: acme
deviceQuota: 250000
overrides:
processing:
targetMessageCount: 1000
The shape of these values matters as much as the contents: every parameter that varies per tenant must live under tenants[].overrides, never sprinkled across the chart. This is what keeps the platform maintainable as the tenant count grows. Pair the chart with a Helmfile or Flux Kustomization that renders per-tenant releases from a single source of truth.
Mounting connection strings as Kubernetes secrets via plain Helm values is the historic anti-pattern. The modern approach uses Azure Workload Identity plus the Secrets Store CSI Driver.
Combined with private endpoints on Key Vault and IP-restricted access, this satisfies even strict compliance reviews without adding operational complexity for application teams.
Our default for IoT telemetry above 1 billion events per month. ADX ingests at extreme rates (we have measured sustained 1.2 million events per second on a modest cluster), KQL is excellent for time-series analytics, and it integrates natively with Event Hubs as a streaming source. Cost scales with cluster size and retention, not with query volume. Pair ADX with materialized views for the common dashboard queries; first-byte latency on a 30-day rollup drops from seconds to tens of milliseconds.
The right choice when telemetry needs to live in a relational store with foreign keys to operational data, or when SQL is a hard requirement for downstream consumers. Run it on Azure Database for PostgreSQL Flexible Server or self-hosted on AKS for full control. Hypertables and continuous aggregates handle most cold-storage and downsampling needs.
InfluxDB is fine for smaller deployments under 100 million events per month. ClickHouse is competitive with ADX on throughput but requires more operational ownership. Cosmos DB with the time-series pattern works for low-cardinality scenarios but gets expensive fast at IoT scale.
Five levers move the bill more than anything else.
For a typical multi-tenant IoT platform serving 1 to 2 million devices, these patterns combined cut the run-rate by roughly half compared to a naive deployment. We see steady-state monthly cost in the range of 12,000 to 22,000 EUR for that scale, dominated by ADX and egress, not compute.
None of these is optional at the scale this article assumes. The patterns we apply across our broader DevOps practice and the cloud foundations we offer to customers are built around them. The same control plane runs the backends behind YIS and OMNIYON; the patterns are field-tested under real fleet load.
AKS is not a silver bullet; it is a substrate that, when used with discipline, lets an IoT platform scale from one tenant to hundreds and from thousands of devices to millions without rewriting the backend. The patterns in this article (ingest tier separation, KEDA on queue depth, GitOps, workload identity, ADX for telemetry, tiered storage and Spot for processing) are what make the difference between a Kubernetes cluster that works and an operational nightmare.
If you are running an IoT product that has outgrown App Service, or if you are designing a new platform that needs to scale from day one, the team at FSS Cloud Infrastructure can architect, deploy and operate AKS-based IoT backends end-to-end. We have shipped this stack for hospitality groups, marine fleets and industrial operators, and we offer it as a managed platform or as a one-off engagement to bring your team to production. Talk to us about a one-week architecture review before you commit to a path.
FSS Technology designs and builds IoT products from silicon to cloud — embedded firmware, custom hardware, and Azure backends.
Talk to our team →