Self-Serve CI Control Plane
Nov 10, 2025 • 7 MIN READ

Self-Serve without Tickets: A Developer-First CI Control Plane

How a Series-B product team cut a 3-day ticket queue to minutes

TL;DR

  • Tickets slow teams. Self-service gives developers environments and pipelines without human gatekeepers—with guardrails, not chaos. Team Topologies
  • Goal: eliminate toil by turning routine runner work into a productized control plane. Google SRE
  • Principle: use golden paths—an opinionated, supported way to create, update, and use immutable pools across OSes. VMware · Red Hat
  • Outcome: fewer handoffs, hours to first pipeline (not days), consistent results across Linux/Windows/macOS, in cloud, on-prem, or air-gapped setups.
RBAC Quotas Audit

Platform as a product → paved roads (golden paths) reduce cognitive load and variance. Team Topologies

The problem: tickets, handoffs, and toil

When a team needs CI capacity (or a change), the default in many orgs is a ticket to another team. That creates toil—manual, repetitive work that scales linearly with service growth—exactly the kind of work SRE guidance says to eliminate or cap. SRE book · SRE workbook

Meanwhile, each team invents its own way to "get a runner," ballooning cognitive load. Platform engineering's advice: treat the platform as a product and reduce cognitive load with paved/golden paths. Team Topologies

The real case: ticket ping-pong that blocked delivery

Context. A Series-B product team (Linux/Windows/macOS builds) needed new CI capacity every sprint. The flow looked like this:

  1. Developers → DevOps: "Need 150 runners for load tests."
  2. DevOps → IT: VMs, network, images, policies.
  3. Back-and-forth clarifications; DevOps installs toolchains; agents register.
  4. Validation fails on a subset → more tickets.

Baseline pain (3-month avg).

  • 3–7 days from request to first green build for a new team.
  • 150 VMs with full stacks: ~3 days of human effort end-to-end.
  • Runner tickets: 90–120/month; ~42% bounced DevOps↔IT at least once.

Decision. Replace tickets/DIY scripts with a developer-first control plane: quotas/RBAC, Pool Definitions (OS, size, role packs, CI connector), immutable updates (spec change → safe rotation with audit log), and optional approvals. This mirrors platform-as-a-product and golden path guidance. Team Topologies · Red Hat

Design goals (and non-goals)

  • Self-serve with guardrails: developers provision CI without tickets; RBAC/quotas keep it sane. Team Topologies
  • One way that works: a golden path for runner pools (Linux/Windows/macOS), so teams don't reinvent scripts. VMware
  • Toil reduction: routine runner work becomes platform features. Google SRE
  • Anywhere: same DX in cloud, on-prem, and air-gapped.

What we avoid

  • Ticket-based provisioning as a primary path.
  • In-place edits to runners (updates flow via definitions — see Post #1).

Architecture at a glance: product, not playbooks

  1. Create project (scoped quotas, RBAC).
  2. Define pool (OS + size + role packs/toolchains + CI connector).
  3. Attach CI (Jenkins agent or GitHub runner).
  4. Change spec → rotate safely (immutable replace; audit log).

This respects how GitHub self-hosted runners / Runner Scale Sets and Actions Runner Controller work — excellent primitives the control plane productizes into a uniform flow.

Walkthrough (selective, not a manual)

Scenario: A backend team needs 10 Linux runners with Docker + build tools, connected to GitHub.

project create --name backend-app --quota pools=2,nodes=20
pool define  --name backend-ci --os=linux --size=10 \
  --roles=docker-<pinned>,build-essential --ci=github-runner
pool apply   backend-ci
apply.spec   pool=backend-ci os=linux size=10 roles=[docker...,build-essential]
create.node  node=backend-01 ... backend-10
register.ci  node=backend-01..10 ci=github agent_id=GHA-...
pool.ready   pool=backend-ci nodes=10
Self-serve CI control plane screenshot 1
Self-serve CI control plane screenshot 2

"Isn't this just ARC/scale sets?"

Others vs. Control Plane (quick read)

  • ARC / Runner Scale Sets: autoscale self-hosted runners on K8s; you still own images, versions, and updates. Great primitives; not a golden path by themselves. ARC docs · Scale sets
  • DIY scripts / tickets: flexible, but high toil and inconsistent results. SRE
  • Control plane: opinionated flows + immutable pools + RBAC/quotas + audited updates → consistent, low-toil CI across OSes.

What changed (numbers that matter)

KPI Before After
Time-to-first-pipeline (new team) 3–7 days 2–6 hours
Provisioning 150 VMs w/ stacks ~3 days (human effort) Spec apply ~2 min; healthy ~12–20 min
Runner tickets / month 90–120 20–30 (≈ −78%)
Ticket reassign rate (DevOps ↔ IT) ~42% <10%

Expected: eliminating toil and providing a golden path reduces handoffs and cognitive load. SRE · Team Topologies

Guardrails: self-serve ≠ "anything goes"

  • RBAC + quotas: org/project roles; caps on pools/nodes/burst size. Team Topologies
  • Approval hooks (optional): e.g., pools over a threshold require an approver.
  • Cataloged role packs: pinned toolchains; no random images. Red Hat
  • Audit trail: spec diffs + rotation logs for every change.

Cloud, on-prem, air-gapped: same experience

  • Cloud / connected on-prem: SSO, webhooks, registries via proxy.
  • Air-gapped: offline bundles and internal mirrors keep the workflow identical with zero egress. Air-gapped CD

References