WebSocket Clients Under Load: Keeping the UI Stable at 1,000+ Messages/Second

Real-time UIs are fun until they’re not.

At low volume, WebSockets feel magical: messages arrive, state updates, UI reacts.

At high volume-hundreds to thousands of messages per second - the same code path becomes a UI denial-of-service you wrote yourself:

the main thread is busy parsing JSON and running reducers
React re-renders too often
memory grows due to unbounded queues
the UI falls behind reality (and never catches up)

This post is the pattern that kept our UI stable at sustained high throughput: backpressure, buffering, rAF scheduling, and a transport/state split that makes “drop or coalesce” a first-class choice.

The #1 mistake: calling `setState` per message

If you do this:

socket.onmessage = (e) => {
  const msg = JSON.parse(e.data);
  setItems((prev) => apply(prev, msg));
};

You will eventually hit a wall.

Even if each update is “fast,” you’re forcing React (and your app) to do work at the message rate, not at the user-perceived frame rate.

Users don’t care if you processed 1,000 updates per second. They care if the UI stays responsive at 60 FPS.

Principle: process at the rate the UI can display

We aligned work to the browser’s paint loop:

accept messages as fast as they arrive
buffer them cheaply
process them in batches on requestAnimationFrame
keep UI updates bounded

This is the mental flip:

“Messages are input; frames are output.”

Architecture: transport → buffer → reducer → view

We separated concerns into four layers:

Transport: WebSocket lifecycle, reconnect, resume, ping/pong.
Buffer: fast append-only queue with bounded memory + coalescing.
State reducer: apply batches to an external store (not React state per message).
View: React subscribes to the store and renders at a controlled cadence.

This separation made backpressure and drop strategies possible without rewriting the UI.

Buffering and rAF batching

A minimal batching loop looks like this:

type Msg = { type: string; ts: number; payload: unknown };

let queue: Msg[] = [];
let scheduled = false;

function onMessage(msg: Msg) {
  queue.push(msg);
  if (!scheduled) {
    scheduled = true;
    requestAnimationFrame(flush);
  }
}

function flush() {
  scheduled = false;
  const batch = queue;
  queue = [];
  applyBatch(batch);
  if (queue.length > 0) requestAnimationFrame(flush);
}

This alone often turns “UI locks up” into “UI stays usable.”

But at sustained load, you need real backpressure.

Backpressure: you must decide what to do when you fall behind

If message throughput exceeds processing throughput, you have exactly three options:

Scale up processing (optimize parsing, use a Worker, reduce work)
Reduce rendering work (coalesce updates, render less often, virtualize)
Drop data (intentionally, with semantics)

If you don’t choose, the browser chooses for you (by freezing the UI).

Coalesce by key (“latest wins”)

Most real-time UIs don’t need every intermediate value. They need the latest value per entity.

We introduced a coalescing buffer for “state snapshots”:

type EntityMsg = { entityId: string; payload: unknown };

const latestById = new Map<string, EntityMsg>();

function onEntityUpdate(msg: EntityMsg) {
  latestById.set(msg.entityId, msg);
  scheduleFlush();
}

function flushEntityUpdates() {
  const batch = Array.from(latestById.values());
  latestById.clear();
  applyEntityBatch(batch);
}

This bounds memory and turns a flood of updates into “one update per entity per frame.”

Bound the queue (and make it observable)

We set a hard limit:

above N messages queued → start coalescing more aggressively
above M messages queued → drop “non-critical” message types

Dropping sounds scary until you realize:

a frozen UI is worse than stale UI
many message types are “best effort” (typing indicators, transient metrics)

The key is making drop behavior explicit and measurable.

Parse and heavy work off the main thread

JSON parsing is often the silent killer at high volume.

When we hit a ceiling, we moved parsing and normalization into a Web Worker:

main thread: receives message strings, posts to worker
worker: parses, validates, normalizes, and posts structured messages back

Even a partial offload can drastically reduce main thread contention.

State architecture: external store + selective subscriptions

We avoided “global React state updates” by using an external store and subscribing selectively.

The idea:

apply batches to a store that can update multiple pieces of state without forcing a full tree re-render
subscribe components to just the slices they need

You can do this with:

useSyncExternalStore
Zustand/Jotai/Recoil (carefully)
a tiny custom store

The critical constraint is: batch updates must be cheap and predictable.

Rendering strategies that matter under load

Virtualize long lists (don’t render 5,000 rows because you received 5,000 updates)
Avoid expensive derived computations in render (precompute in reducer/batch)
Use memoization intentionally (but don’t turn it into cargo cult)
Throttle visual-only effects (sparklines, animations, counters)

If you have charts, consider:

decimating data
drawing on canvas
updating at 10–20 FPS while keeping interactions at 60 FPS

Reconnect and resume: the difference between “real-time” and “correct”

High-throughput systems expose reconnect bugs fast.

Our reconnect strategy:

exponential backoff with jitter
heartbeat detection (server ping/pong or app-level)
resume from last seen sequence when supported

The resume piece matters because it converts “disconnect” from data loss into “temporary lag.”

At minimum:

include a monotonically increasing seq on messages
client persists lastSeq
on reconnect, client sends lastSeq and server replays from there (or sends a snapshot)

If you can’t resume, you need a snapshot strategy:

reconnect → fetch current state snapshot via HTTP → continue streaming deltas

Instrumentation: if you can’t see lag, you can’t fix it

We tracked a few simple signals in production:

incoming message rate (per type)
queue length / coalescing map size
dropped message counts (by reason)
batch apply duration
frame time / long tasks

When the UI degrades, you want to answer:

Are we CPU bound parsing?
Are we reducer bound?
Are we rendering bound?
Are we network bound?

Without instrumentation, everything looks like “WebSockets are slow.”

What I’d do differently next time

Design coalescing semantics early (“latest wins” vs “every event matters”).
Standardize message envelopes with type, seq, and timestamps from day one.
Build lag visibility into the UI (even a tiny “realtime delay” indicator).

A summary you can apply

Buffer messages cheaply.
Process on rAF in batches.
Coalesce aggressively by key where semantics allow.
Bound memory and drop intentionally (with metrics).
Offload parsing/normalization to a Worker if needed.
Use an external store and selective subscriptions.
Make reconnect/resume correct, not just “it reconnects.”
Instrument queue length, drops, and batch durations.

The goal is not “process every message as fast as possible.” The goal is keep the UI responsive while staying as correct as the product requires.