Resumable Upload State Machines

A long-running upload has more states than a boolean can hold, and the moment two implicit flags disagree β€” paused and uploading β€” you get a frozen bar or a double-sent chunk. Modeling the transfer as an explicit finite state machine with a single source of truth for the committed offset eliminates those impossible states and makes resume-after-failure a first-class operation rather than an afterthought. This topic sits under Frontend UX, Chunking & Progress Tracking and pairs directly with upload error recovery patterns, which drives the transitions into and out of the retrying state.

Prerequisites

  • [ ] Node 20+ and a bundler that supports modern ESM
  • [ ] TypeScript 5.x with strict enabled
  • [ ] A chunked or tus-compatible upload endpoint that accepts Range/Upload-Offset
  • [ ] Browser support for IndexedDB (every evergreen browser qualifies)
  • [ ] A stable file fingerprint (size + last-modified + name, or a content hash)

How a resumable upload machine works

The machine has six states. idle is the resting state before a file is chosen. uploading is the active transfer; each confirmed chunk advances a durable offset. paused is a user-initiated halt that keeps the offset intact. retrying is entered automatically on a recoverable error and exits back to uploading after a backoff delay. completed and failed are terminal, except that failed permits a manual restart.

The offset is the keystone. It is not the number of bytes you have sent β€” it is the number of bytes the server has committed, learned from acknowledgements and confirmed by the resume handshake. Persisting { state, offset, uploadId, fingerprint, chunkSize } after every confirmed chunk means a crashed or reloaded tab can reconstruct exactly where it stood.

Upload finite state machine States idle, uploading, paused, retrying, completed and failed, with labelled transitions for start, pause, resume, chunk acknowledgement, recoverable error, fatal error and retry. idle START uploading offset += chunk paused PAUSE RESUME retrying backoff ERROR RETRY ALL_DONE completed fatal ERROR failed
The six-state upload machine: confirmed chunks advance the offset in uploading; recoverable errors detour through retrying; fatal errors and completion are terminal.

Step 1: Define states, events, and the transition table

Encode the machine as data so it can be tested, rendered, and serialized. The transition table is the contract; the reducer never invents an edge that is not in the table.

export type UploadState =
  | "idle" | "uploading" | "paused" | "retrying" | "completed" | "failed";

export type UploadEvent =
  | { type: "START" }
  | { type: "PAUSE" }
  | { type: "RESUME" }
  | { type: "CHUNK_OK"; bytes: number }
  | { type: "ERROR"; fatal: boolean }
  | { type: "RETRY_NOW" }
  | { type: "ALL_DONE" };

export interface UploadContext {
  state: UploadState;
  offset: number;     // bytes the server has committed
  total: number;      // file size in bytes
  uploadId: string;
  fingerprint: string;
  chunkSize: number;
}

const table: Record<UploadState, Partial<Record<UploadEvent["type"], UploadState>>> = {
  idle:      { START: "uploading" },
  uploading: { PAUSE: "paused", ERROR: "retrying", ALL_DONE: "completed", CHUNK_OK: "uploading" },
  paused:    { RESUME: "uploading" },
  retrying:  { RETRY_NOW: "uploading", PAUSE: "paused", ERROR: "failed" },
  completed: {},
  failed:    { START: "uploading" },
};

Expected: importing table and calling table.uploading.PAUSE returns "paused"; table.completed.START is undefined, which the reducer treats as a no-op.

Step 2: Write a pure reducer that updates the offset

The reducer is the only place state changes. It applies the table, special-cases fatal errors, and advances the committed offset only on CHUNK_OK β€” never on dispatch alone.

import { type UploadContext, type UploadEvent } from "./machine.js";

export function reduce(ctx: UploadContext, event: UploadEvent): UploadContext {
  if (event.type === "ERROR" && event.fatal) {
    return { ...ctx, state: "failed" };
  }
  const target =
    (table as Record<string, Partial<Record<string, UploadContext["state"]>>>)
      [ctx.state]?.[event.type];

  if (!target) {
    console.warn(`No transition for ${event.type} in ${ctx.state}`);
    return ctx; // illegal edge is ignored, not thrown
  }

  if (event.type === "CHUNK_OK") {
    const offset = Math.min(ctx.total, ctx.offset + event.bytes);
    return { ...ctx, state: target, offset };
  }
  return { ...ctx, state: target };
}

Expected log when dispatching RESUME while idle: No transition for RESUME in idle, and ctx is returned unchanged.

Step 3: Persist the context to IndexedDB after every confirmed chunk

Durable state is what makes the machine resumable. Write the whole context after each CHUNK_OK so a reload reconstructs the exact offset. A thin promise wrapper over IndexedDB keeps the call sites readable. (For the full schema and migration strategy, see the companion guide on persisting upload state in IndexedDB, linked under related pages.)

const DB = "uploads", STORE = "sessions", VERSION = 1;

function open(): Promise<IDBDatabase> {
  return new Promise((resolve, reject) => {
    const req = indexedDB.open(DB, VERSION);
    req.onupgradeneeded = () =>
      req.result.createObjectStore(STORE, { keyPath: "fingerprint" });
    req.onsuccess = () => resolve(req.result);
    req.onerror = () => reject(req.error);
  });
}

export async function saveSession(ctx: UploadContext): Promise<void> {
  const db = await open();
  await new Promise<void>((resolve, reject) => {
    const tx = db.transaction(STORE, "readwrite");
    tx.objectStore(STORE).put(ctx);
    tx.oncomplete = () => resolve();
    tx.onerror = () => reject(tx.error);
  });
  db.close();
}

export async function loadSession(fingerprint: string): Promise<UploadContext | undefined> {
  const db = await open();
  const ctx = await new Promise<UploadContext | undefined>((resolve, reject) => {
    const req = db.transaction(STORE, "readonly").objectStore(STORE).get(fingerprint);
    req.onsuccess = () => resolve(req.result as UploadContext | undefined);
    req.onerror = () => reject(req.error);
  });
  db.close();
  return ctx;
}

Expected: after one confirmed 5 MB chunk, loadSession(fingerprint) returns a context with offset === 5_242_880 and state === "uploading".

Step 4: Perform the resume handshake before sending bytes

On reload you must not trust the persisted offset blindly β€” the server may have garbage-collected an incomplete upload, or committed more than the client recorded. Ask the server. For a tus endpoint, a HEAD returns the authoritative Upload-Offset; for a generic multipart endpoint, query the committed range.

export async function resumeHandshake(
  endpoint: string,
  uploadId: string,
): Promise<number> {
  const res = await fetch(`${endpoint}/${uploadId}`, { method: "HEAD" });

  if (res.status === 404) {
    throw new Error("UPLOAD_EXPIRED"); // server dropped it: must restart
  }
  if (!res.ok) {
    throw new Error(`Handshake failed: HTTP ${res.status}`);
  }

  // tus reports the committed byte count via Upload-Offset.
  const offset = Number(res.headers.get("Upload-Offset") ?? "0");
  if (!Number.isFinite(offset) || offset < 0) {
    throw new Error("Invalid Upload-Offset from server");
  }
  return offset;
}

Expected: a partially uploaded session returns the server’s committed byte count (for example 15_728_640); a server-side 404 raises UPLOAD_EXPIRED, which the machine maps to a START-from-zero flow.

Step 5: Drive the chunk loop from the reconciled offset

With the authoritative offset in hand, slice from there and feed each chunk through the loop, dispatching CHUNK_OK on success and ERROR on failure so the machine β€” and the persisted offset β€” stay in lockstep.

export async function runLoop(
  file: File,
  ctx: UploadContext,
  endpoint: string,
  send: (blob: Blob, offset: number) => Promise<void>,
  dispatch: (e: UploadEvent) => UploadContext,
): Promise<void> {
  ctx.offset = await resumeHandshake(endpoint, ctx.uploadId);
  ctx = dispatch({ type: "START" });

  while (ctx.offset < file.size && ctx.state === "uploading") {
    const end = Math.min(file.size, ctx.offset + ctx.chunkSize);
    const blob = file.slice(ctx.offset, end);
    try {
      await send(blob, ctx.offset);
      ctx = dispatch({ type: "CHUNK_OK", bytes: blob.size });
      await saveSession(ctx);
    } catch {
      ctx = dispatch({ type: "ERROR", fatal: false });
      return; // hand control to the retry loop
    }
  }
  if (ctx.offset >= file.size) dispatch({ type: "ALL_DONE" });
}

Expected: on a resumed upload, the first file.slice starts at the reconciled offset, so already-committed bytes are never re-sent.

Configuration reference

Option Type Default Effect
chunkSize number (bytes) 5_242_880 Slice size; must be β‰₯ 5 MB for S3 multipart non-final parts
fingerprint string derived IndexedDB key identifying the file across sessions
maxAttempts number 6 Retries before retrying transitions to failed
handshakeMethod "HEAD" | "GET" "HEAD" Verb used to read the committed offset
persistOn "chunk" | "interval" "chunk" When the context is written to IndexedDB

Edge cases & gotchas

Stale offset after server-side garbage collection

If the server expires incomplete uploads after, say, 24 hours, the persisted offset becomes a lie. Always run the handshake before resuming; map a 404 to a clean restart rather than letting Range requests fail one by one.

Fingerprint collisions

Using only file.name as the key collides when a user re-uploads a different file with the same name. Combine name, size, and lastModified; for high-stakes uploads compute a content hash so a changed file gets a new session instead of resuming onto stale bytes.

IndexedDB write latency under load

Writing the full context on every chunk can lag on slow storage. If you see contention, debounce persistence to once per second and accept that a crash may cost one chunk of re-upload β€” the handshake will correct any drift on resume.

Illegal transitions from race conditions

A late CHUNK_OK can arrive after the user hit pause. Because the reducer ignores undefined edges, the stray event is dropped instead of corrupting state β€” but make sure your UI reads state from the reducer, not from a separate flag that could disagree.

Verification

Confirm the committed offset directly against a tus-style endpoint, then check that a resumed slice starts where the server left off.

# Ask the server for the authoritative committed offset.
curl -sI -X HEAD https://api.example.com/uploads/abc123 \
  | grep -i 'upload-offset'
# => Upload-Offset: 15728640
import { reduce } from "./reducer.js";

const start = { state: "uploading", offset: 0, total: 10, uploadId: "x",
  fingerprint: "f", chunkSize: 5 } as const;
const after = reduce(start, { type: "CHUNK_OK", bytes: 5 });
console.assert(after.offset === 5, "offset must advance on CHUNK_OK");
console.assert(reduce(after, { type: "ERROR", fatal: true }).state === "failed",
  "fatal error must be terminal");

FAQ

Why an explicit state machine instead of a few boolean flags?

Boolean flags allow combinations that should be impossible β€” isPaused && isUploading β€” and every such combination is a latent bug. A finite state machine enumerates only the legal states and edges, so illegal transitions become no-ops you can log instead of crashes you have to debug.

Where exactly should the offset come from β€” the client or the server?

The server. The client tracks what it sent, but only the server knows what it committed. Treat the persisted client offset as a hint and reconcile it with a handshake (HEAD for tus Upload-Offset, or a committed-range query) before sending any bytes.

How does pause differ from a failure-driven retry?

paused is user-intent and stays put indefinitely with the offset frozen; retrying is automatic, time-boxed by backoff, and managed alongside upload error recovery patterns. Keeping them as distinct states means a paused upload never silently consumes the retry budget.

Can I show progress while in the retrying state?

Yes β€” read the same committed offset the machine persists and surface it through real-time upload progress events. The bar should hold steady at the last committed percentage during backoff rather than resetting, which reassures users that progress is preserved.