Skip to content
← Back to Blog

CI/CD for Micro-App Fleets: Saving ~1,500 Hours/Year Without Slowing Devs Down

· 6 min read
cicdplatformdevopsmicro-frontendsfrontend

When you have one frontend, a slow pipeline is annoying.

When you have a fleet-dozens of micro-apps, shared libraries, and a shell-slow pipelines become a tax on the entire org.

We did a platform pass on CI/CD for a micro-app fleet and clawed back on the order of ~1,500 engineer-hours/year of “waiting for green checks” time-without reducing quality gates or making local development worse.

This post is the playbook: what we standardized, what we cached, where we parallelized, how we did previews, and which metrics mattered.

Start with measurement (or you’ll optimize the wrong thing)

Before changing anything, we measured:

  • PR cycle time: first push → merge
  • CI wait time: first push → all required checks green
  • pipeline duration per job (install, lint, types, tests, build, deploy)
  • cache hit rate
  • failure reasons (lint, types, flaky tests, infra timeouts)

The goal wasn’t “make CI faster” in abstract. It was:

Reduce the amount of time engineers are blocked by automation.

If you make pipelines faster by making them less reliable, you lose.

The baseline problems we saw in micro-app fleets

  • every repo had a slightly different pipeline
  • caching was inconsistent or ineffective
  • quality gates varied by team (some apps had no typecheck)
  • preview environments were ad-hoc or non-existent
  • shared libraries caused cascading failures when versions drifted

We needed consistency without centralizing ownership of every deployment.

The solution: a “paved road” CI template with escape hatches

The biggest win was standardization:

  • one pipeline template used across repos
  • one set of required checks (format/lint/types/test/build)
  • one caching strategy
  • one preview environment story

Teams could extend it, but:

  • required jobs stayed required
  • extensions had defined hooks (pre/post steps)
  • bypasses were explicit (temporary, visible)

Caching: what actually worked

Caching is easy to do badly. The goal is not “cache everything.” The goal is:

  • cache what’s expensive
  • cache what’s correct
  • keep cache keys stable enough to hit, but specific enough not to poison

1) Cache the package manager store, not node_modules

Caching node_modules is huge and often brittle.

Caching the Yarn/PNPM store tends to be:

  • smaller
  • more stable
  • more effective

2) Cache build outputs at the task level (not the repo level)

For fleets, the killer is rebuilding the same things repeatedly:

  • shared UI packages
  • generated types
  • identical bundles across PRs that don’t touch those packages

We used task-level caching (Turborepo/Nx/your build system equivalent) so jobs like build and test became incremental.

3) Make cache keys reflect reality

Common mistake:

  • cache key changes on every commit → no hits

Better:

  • key on lockfile + tool versions + relevant config files

When we fixed keys, hit rate jumped immediately.

Parallelization: speed without chaos

We structured pipelines so the long poles ran concurrently:

  • lint
  • typecheck
  • unit tests
  • build

Then we gated deploy on “everything passed.”

Split tests by intent

We separated:

  • fast unit tests (run on every PR)
  • slower integration tests (run on PR + main)
  • e2e smoke tests (run on preview deploy or main deploy)

We didn’t delete coverage; we moved it to the right stage.

Shard the slow parts

If a test suite is legitimately large, sharding can turn 15 minutes into 5.

The key is keeping flakes under control:

  • deterministic seeds
  • stable test ordering
  • retry policy only for known flaky categories

Preview environments: the feature that changed behavior

Preview environments weren’t just “nice.” They changed how people worked:

  • product could review earlier
  • QA shifted left
  • integration issues surfaced before merge

For micro-frontends, the real win is integration previews:

  • the shell loads the PR version of a remote
  • everything else stays on the current mainline

That requires a remote registry that can resolve:

  • “stable” remote URL (main)
  • “preview” remote URL (PR)

Even a simple convention (e.g., remote@pr-123) is enough to unlock this.

Release safety: don’t trade speed for incident rate

We paired faster CI with safer release mechanics:

  • canary releases (small % of traffic / limited tenant set)
  • kill switches (per remote, per feature)
  • health checks for remote loading and mount errors
  • automatic rollback on error-rate spikes

If you ship micro-apps independently, you must be able to stop one app without stopping the whole platform.

Where the ~1,500 hours/year came from

We estimated savings conservatively using:

  • baseline median CI wait time
  • improved median CI wait time
  • number of PRs merged per week
  • number of engineers affected

Even small reductions add up across a fleet:

  • saving 6 minutes/PR
  • at 500 PRs/week
  • is 3,000 minutes/week (~50 hours/week)
  • multiplied across the year is real time returned to humans

The exact numbers will differ in your org, but the pattern is consistent:

In a fleet, you don’t win by shaving 30 seconds off one pipeline. You win by making the default experience fast and reliable everywhere.

The metrics we kept on dashboards

  • median and p95 pipeline duration (per job and end-to-end)
  • cache hit rate
  • flaky test rate
  • deployment frequency
  • change failure rate (rollbacks/incidents)
  • time-to-restore (MTTR) for deploy-related incidents

Those are hard to argue with in stakeholder conversations.

What I’d do differently next time

  • invest earlier in integration previews (they surface “it works locally” fallacies fast)
  • treat pipeline templates like products (versioning + changelog + migrations)
  • build better visibility into “why did this pipeline take 3× longer?” (it’s usually cache misses + dependency churn)

If you’re doing this next week, start here

  • standardize required checks across repos
  • fix caching for package installs
  • introduce task-level caching for builds/tests
  • parallelize lint/types/tests/build
  • add preview environments and surface links in PRs
  • add release guardrails (canary + kill switches)
  • measure and publish the results

CI/CD for a micro-app fleet isn’t about YAML. It’s about operating the fleet like a platform: consistent defaults, visible metrics, and safety mechanisms that scale with the number of teams shipping.