Home › Blog › Kubernetes (AKS) for IoT Backends: Multi-Tenant Architecture That Scales to Millions of Devices

Cloud AKS Architecture Azure Cloud DevOps Docker IoT Kubernetes

Kubernetes (AKS) for IoT Backends: Multi-Tenant Architecture That Scales to Millions of Devices

📅 April 2026 ⏳ 11 min read FSS Engineering Team

Most IoT backends start on Azure App Service. It is the right choice for the first 10,000 devices: managed, cheap, fast to ship. Then a customer onboards 250,000 sensors over a weekend, telemetry goes from steady to spiky, the team adds a stream processor, three more APIs, a long-running export job and a reporting service. Suddenly App Service is fighting you.

This article is the architecture we deploy at FSS Technology when an IoT product crosses the threshold where Kubernetes pays for itself. It covers when to migrate from App Service, the AKS reference topology for IoT, the multi-tenancy decisions that matter, KEDA-driven autoscaling on Service Bus, the observability stack we standardize on, GitOps with Flux, secret management with workload identity, time-series storage choices and the cost patterns that keep the platform sustainable at millions-of-devices scale. It complements the lighter-weight pipeline we describe in our IoT data pipeline article; this is what comes next.

When to outgrow App Service for IoT

App Service is excellent until any of the following becomes true:

You need more than 30 plans across regions to isolate workloads
Cold starts on consumption Functions are blocking real-time scenarios
You are running long-lived stream processors that do not fit the request/response model
You need fine-grained CPU and memory ratios that App Service tiers do not offer
Network policies, sidecars or service mesh are on the roadmap
Multi-tenant isolation requires per-tenant compute boundaries, not just per-tenant data
You want consistent local-dev to prod parity (Docker Compose to Kubernetes is closer than Docker Compose to App Service)

If you are nodding at three or more of those, AKS is the next stop. If only one, fix the symptom and stay where you are. Kubernetes is not free, and the operational tax is real. Plan for at least one engineer who owns cluster operations end to end before you commit; “someone in DevOps will figure it out” is not a plan.

Reference AKS topology for IoT

The cluster we deploy for production IoT workloads has four logical tiers, each in its own node pool with appropriate VM SKUs.

Ingest tier: stateless services that accept telemetry from Azure IoT Hub, MQTT brokers and HTTP webhooks. Burst-prone, latency-sensitive. Run on D-series VMs with cluster autoscaler enabled.
Processing tier: stream processors, enrichment, rule engines, downsampling jobs. Throughput-bound. Run on F-series compute-optimized VMs.
API tier: REST and GraphQL endpoints for tenant applications, mobile apps, partner integrations. Memory-balanced. Run on D-series with HPA on request rate.
Telemetry storage tier: not the database itself, but the gateway services that talk to Event Hubs, Azure Data Explorer, blob and Cosmos DB. Run on D-series with high IOPS attached storage if local buffering is needed.

System pods (CoreDNS, kube-proxy, ingress, observability) live in a dedicated system node pool with at least three nodes for HA. Never run system workloads on the same pool as application workloads; a noisy tenant should never be able to evict ingress.

Azure IoT Hub plus AKS pattern

IoT Hub remains the right device gateway even when the rest of the backend is on AKS. It owns device identity, MQTT/AMQP protocol handling, device twin state and direct method invocation. AKS owns the business logic.

The integration pattern:

Devices connect to IoT Hub over MQTT 3.1.1 or MQTT 5
IoT Hub routes telemetry to Event Hubs (default endpoint) or Service Bus topics (filtered routes)
AKS workloads consume from Event Hubs using the EventProcessorClient with checkpointing in blob storage
Cloud-to-device commands flow back through IoT Hub via direct methods or twin desired properties
AKS publishes to Service Bus for fan-out, with subscribers per tenant or per feature

This pattern is the natural extension of the simpler architecture we describe in the IoT data pipeline article linked above. The processing model is the same; the runtime substrate is more powerful, and the per-tenant boundaries become enforceable rather than aspirational.

Multi-tenancy: namespace per tenant vs cluster per tenant

This is the most consequential architecture decision you will make. Both models work; the wrong choice will cost you painfully.

Namespace per tenant

Pros: cheap, simple to operate, single control plane to upgrade, easy cross-tenant analytics. Cons: shared cluster blast radius, noisy-neighbor risk on the data plane, harder to satisfy strict compliance regimes that demand compute isolation, kubelet and apiserver become shared scaling bottlenecks.

Use this when tenants are similar in size, trust each other (or trust you to enforce isolation), and the value of operational simplicity outweighs strict isolation. Apply NetworkPolicy, ResourceQuota, LimitRange and Pod Security Admission to every namespace by default.

Cluster per tenant

Pros: hard isolation, independent upgrade cadence, per-tenant SLOs, simple compliance story. Cons: 5x to 10x operational cost, harder cross-tenant features, fleet management complexity (Azure Arc, Cluster API or Fleet Manager become mandatory).

Use this when tenants are large enough to justify the overhead (typically 50,000+ devices each), or when regulatory requirements demand it (defense, regulated pharma, certain government scenarios).

The hybrid we usually deploy

Shared cluster with namespace-per-tenant for the long tail, plus dedicated clusters for the few large customers who need or want hard isolation. This works because the platform code is identical; only the deployment topology differs. GitOps makes the duplication manageable.

Ingress with NGINX and AGIC

For internal east-west and tenant subdomains under a shared apex, NGINX Ingress Controller is the pragmatic default: well-understood, fast, infinitely tunable. For Azure-native ingress with WAF and integration with Azure Front Door, Application Gateway Ingress Controller (AGIC) is the right call.

The pattern we run for IoT backends is AGIC at the public edge for WAF and DDoS coverage, NGINX inside the cluster for tenant routing and TLS termination per subdomain. Cert-manager with Let’s Encrypt for automated certificates, ExternalDNS for automatic Azure DNS records.

KEDA scaling on Service Bus queue depth

Horizontal Pod Autoscaler on CPU is wrong for IoT. Backpressure does not show up as CPU saturation; it shows up as a growing queue. KEDA (Kubernetes Event-driven Autoscaling) reads the queue and scales accordingly.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: telemetry-processor-scaler
  namespace: tenant-acme
spec:
  scaleTargetRef:
    name: telemetry-processor
  pollingInterval: 15
  cooldownPeriod: 120
  minReplicaCount: 2
  maxReplicaCount: 60
  triggers:
  - type: azure-servicebus
    metadata:
      queueName: telemetry-ingest
      namespace: fss-prod-sb
      messageCount: "500"
    authenticationRef:
      name: keda-azure-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-azure-auth
  namespace: tenant-acme
spec:
  podIdentity:
    provider: azure-workload
    identityId: a1b2c3d4-...

The two numbers that matter: messageCount is the per-replica target, not the total threshold. Set it to your single-replica steady-state throughput so KEDA scales to maintain that ratio. cooldownPeriod prevents flapping; 120 seconds is a sane default for IoT, where bursts are common.

For Event Hubs (telemetry ingest), KEDA has a dedicated trigger that scales on partition lag rather than queue depth. Use it; CPU-based scaling on Event Hubs consumers is always wrong because the consumer is IO-bound on the broker.

Observability: Prometheus, Grafana, OpenTelemetry, Application Insights

The stack we standardize on splits responsibility cleanly:

Metrics: Prometheus scraping pods, kube-state-metrics and node-exporter. Long-term storage in Azure Monitor managed Prometheus. Visualization in Azure Managed Grafana.
Logs: Container logs to Azure Monitor Container Insights, with retention policies per namespace. Application logs structured as JSON, never free text.
Traces: OpenTelemetry SDK in every service, OTLP exporter to Azure Monitor Application Insights. Sample at 100 percent for errors, 1 percent for success.
Synthetic monitoring: Application Insights availability tests against tenant subdomains, with multi-region probes.

The non-negotiable: every log, metric and trace carries the tenant ID as a label or attribute. Without that, you cannot answer “how is tenant X doing right now,” which is the question that matters most when something is on fire.

GitOps with Flux or ArgoCD

Manual kubectl apply in production is malpractice. GitOps closes the loop: Git is the source of truth, the cluster reconciles itself toward Git, drift is impossible because it gets corrected within minutes.

Flux v2 is the lighter-weight choice and integrates natively with AKS via the GitOps extension. ArgoCD has a richer UI and better multi-cluster ergonomics. Both work; pick based on team preference and stick with it.

The repo layout we use:

fleet/
  clusters/
    prod-weu/
      flux-system/
      infrastructure/    # ingress, cert-manager, KEDA, observability
      tenants/
        acme/
        contoso/
    prod-eus/
      ...
  apps/
    telemetry-processor/
      base/              # Helm chart or kustomize base
      overlays/
        prod/
        staging/
  charts/
    fss-iot-stack/       # umbrella Helm chart, see below

Helm values structure

An umbrella Helm chart for the platform with a values structure that lets per-tenant overrides stay small.

global:
  region: westeurope
  env: prod
  imageRegistry: fssprod.azurecr.io
  workloadIdentity:
    enabled: true
    clientId: ""
  observability:
    otlpEndpoint: https://otel.fss.cc

ingest:
  replicas: 3
  image:
    repository: fss/iot-ingest
    tag: 2.14.0
  resources:
    requests: { cpu: 250m, memory: 512Mi }
    limits:   { cpu: 1,    memory: 1Gi }
  iotHub:
    eventHubEndpoint: ""
    consumerGroup: ingest

processing:
  replicas: 4
  keda:
    enabled: true
    minReplicas: 4
    maxReplicas: 80
    queueName: telemetry-ingest
    targetMessageCount: 500

api:
  replicas: 2
  ingress:
    host: api.tenant.fss.cc
    tlsSecret: api-tls

storage:
  adx:
    cluster: fss-adx-weu
    database: telemetry
  cosmos:
    account: fss-cosmos-weu
    database: device-state

tenants:
  - name: acme
    deviceQuota: 250000
    overrides:
      processing:
        targetMessageCount: 1000

The shape of these values matters as much as the contents: every parameter that varies per tenant must live under tenants[].overrides, never sprinkled across the chart. This is what keeps the platform maintainable as the tenant count grows. Pair the chart with a Helmfile or Flux Kustomization that renders per-tenant releases from a single source of truth.

Secret management with Azure Key Vault and workload identity

Mounting connection strings as Kubernetes secrets via plain Helm values is the historic anti-pattern. The modern approach uses Azure Workload Identity plus the Secrets Store CSI Driver.

Each workload runs as a Kubernetes ServiceAccount linked to an Azure Managed Identity
The Managed Identity has scoped permissions on a Key Vault
Secrets are mounted as files (or synced to Kubernetes secrets) by the CSI driver, with automatic rotation
No long-lived credentials live in the cluster, in Git or in pipelines

Combined with private endpoints on Key Vault and IP-restricted access, this satisfies even strict compliance reviews without adding operational complexity for application teams.

Database choices for time-series

Azure Data Explorer (ADX)

Our default for IoT telemetry above 1 billion events per month. ADX ingests at extreme rates (we have measured sustained 1.2 million events per second on a modest cluster), KQL is excellent for time-series analytics, and it integrates natively with Event Hubs as a streaming source. Cost scales with cluster size and retention, not with query volume. Pair ADX with materialized views for the common dashboard queries; first-byte latency on a 30-day rollup drops from seconds to tens of milliseconds.

TimescaleDB

The right choice when telemetry needs to live in a relational store with foreign keys to operational data, or when SQL is a hard requirement for downstream consumers. Run it on Azure Database for PostgreSQL Flexible Server or self-hosted on AKS for full control. Hypertables and continuous aggregates handle most cold-storage and downsampling needs.

InfluxDB and others

InfluxDB is fine for smaller deployments under 100 million events per month. ClickHouse is competitive with ADX on throughput but requires more operational ownership. Cosmos DB with the time-series pattern works for low-cardinality scenarios but gets expensive fast at IoT scale.

Cost optimization patterns

Five levers move the bill more than anything else.

Right-sized node pools: F-series for compute, D-series for balanced, E-series for memory-heavy. Mixing SKUs across pools reduces over-provisioning.
Spot node pools for stateless processing: KEDA-driven processing tiers run beautifully on Spot with 60 to 80 percent cost reduction. Always have a non-Spot baseline for resilience.
Reserved instances and savings plans: 1-year and 3-year commitments for the baseline node pool. Typical 30 to 40 percent savings.
Tiered telemetry storage: hot in ADX for 30 days, warm in Parquet on blob for 1 year, cold in archive blob beyond. ADX external tables make the cold tier queryable on demand.
Egress discipline: keep traffic in-region, use private endpoints to avoid public egress charges, batch downstream writes to reduce per-operation costs.

For a typical multi-tenant IoT platform serving 1 to 2 million devices, these patterns combined cut the run-rate by roughly half compared to a naive deployment. We see steady-state monthly cost in the range of 12,000 to 22,000 EUR for that scale, dominated by ADX and egress, not compute.

Operational excellence checklist

Cluster autoscaler enabled with sane min and max per pool
Pod Disruption Budgets on every workload
Liveness, readiness and startup probes correctly distinguished
NetworkPolicy default-deny with explicit allow rules
Image scanning in CI, image signing with Notation, admission control with Ratify or Kyverno
Backup of stateful workloads with Velero, tested restores quarterly
Disaster recovery runbook with documented RTO and RPO per tenant tier

None of these is optional at the scale this article assumes. The patterns we apply across our broader DevOps practice and the cloud foundations we offer to customers are built around them. The same control plane runs the backends behind YIS and OMNIYON; the patterns are field-tested under real fleet load.

Bringing it together

AKS is not a silver bullet; it is a substrate that, when used with discipline, lets an IoT platform scale from one tenant to hundreds and from thousands of devices to millions without rewriting the backend. The patterns in this article (ingest tier separation, KEDA on queue depth, GitOps, workload identity, ADX for telemetry, tiered storage and Spot for processing) are what make the difference between a Kubernetes cluster that works and an operational nightmare.

If you are running an IoT product that has outgrown App Service, or if you are designing a new platform that needs to scale from day one, the team at FSS Cloud Infrastructure can architect, deploy and operate AKS-based IoT backends end-to-end. We have shipped this stack for hospitality groups, marine fleets and industrial operators, and we offer it as a managed platform or as a one-off engagement to bring your team to production. Talk to us about a one-week architecture review before you commit to a path.

Building something connected?

FSS Technology designs and builds IoT products from silicon to cloud — embedded firmware, custom hardware, and Azure backends.

Talk to our team →