Laravel Queues & Horizon at Scale (2026): Idempotency, Retries, and Enterprise Job Pipelines

In enterprise Laravel systems, the queue is where the business actually runs: imports, billing calculations, notifications, webhooks, reports, AI ingestion, and scheduled workflows. The fastest way to create outages is to treat queues like “background tasks” instead of a production-critical system.

This guide shows how to run Laravel queues + Horizon safely at scale: queue separation, correct retry configuration, idempotency, long-running jobs, batch pipelines, and monitoring—without duplicate processing or random worker meltdowns.

For the full enterprise build guide, start here: Laravel Development (2026): The Complete Guide to Building & Scaling Enterprise Applications. If you need help hardening your production queues under SLA: Laravel Maintenance.


Quick navigation


1) Why queues break in production

  • Mixed workloads: imports + notifications + CPU-heavy rating all share the same workers.
  • Wrong timeouts: workers kill jobs early; jobs restart; duplicates happen.
  • No idempotency: retries create double charges, double emails, double exports.
  • Unbounded jobs: one job tries to do “everything” and eats memory.
  • No queue monitoring: you discover problems only after customers complain.

Enterprise rule: A job must be safe to retry. If it isn’t, it’s not production-ready.


2) Enterprise queue design (separation + priorities)

Start by separating workloads by behavior. A clean enterprise baseline looks like:

  • critical (payments, billing actions, contract state changes)
  • notifications (emails/SMS/webhooks)
  • imports (file ingestion + normalization)
  • compute (rating engines, analytics, heavy transforms)
  • exports (CSV/XLS/PDF generation)

Why this matters: if exports spike, your payments must still run. If imports spike, your notifications must still send. Queue separation prevents “cross-contamination.”

Horizon example: dedicated supervisors per workload

Here’s a clean starting point (adapt to your infra). The key is: separate queues + different timeouts + right worker counts.

// config/horizon.php (example)
'environments' => [
  'production' => [
    'critical' => [
      'connection' => 'redis',
      'queue' => ['critical'],
      'balance' => 'auto',
      'minProcesses' => 2,
      'maxProcesses' => 10,
      'tries' => 3,
      'timeout' => 120,
    ],

    'notifications' => [
      'connection' => 'redis',
      'queue' => ['notifications'],
      'balance' => 'auto',
      'minProcesses' => 1,
      'maxProcesses' => 8,
      'tries' => 5,
      'timeout' => 120,
    ],

    'imports' => [
      'connection' => 'redis',
      'queue' => ['imports'],
      'balance' => 'auto',
      'minProcesses' => 1,
      'maxProcesses' => 4,
      'tries' => 1,
      'timeout' => 3600,
    ],

    'compute' => [
      'connection' => 'redis',
      'queue' => ['compute'],
      'balance' => 'auto',
      'minProcesses' => 2,
      'maxProcesses' => 20,
      'tries' => 2,
      'timeout' => 1800,
    ],

    'exports' => [
      'connection' => 'redis',
      'queue' => ['exports'],
      'balance' => 'auto',
      'minProcesses' => 1,
      'maxProcesses' => 6,
      'tries' => 2,
      'timeout' => 900,
    ],
  ],
];

3) timeout vs retry_after (the #1 duplicate-processing cause)

This mistake causes “phantom duplicates” in billing systems: the worker kills a job at timeout, but Redis thinks it’s still running, then re-queues it after retry_after and the job runs again.

Rule of thumb: retry_after must be greater than your longest real job runtime (plus buffer), and worker timeout must align to that reality.

// config/queue.php (Redis)
'redis' => [
  'driver' => 'redis',
  'connection' => 'default',
  'queue' => env('REDIS_QUEUE', 'default'),
  // Must be > longest job runtime + buffer
  'retry_after' => 5400,
  'block_for' => null,
  'after_commit' => true,
];

Enterprise reality: If you don’t measure job runtime distribution (p95/p99), you’ll guess timeouts—and guessing creates duplicates.


4) Idempotency patterns (do this or pay later)

Idempotency means: if a job runs twice (retry, redeploy, crash), the business outcome remains correct. Here are enterprise-grade patterns that work.

A) Idempotency key + DB unique constraint

Best for payments, invoices, webhooks, and “create once” operations.

// Example: write once using a unique key
// Table has UNIQUE(idempotency_key)

DB::transaction(function () use ($key, $payload) {
  $already = DB::table('job_outcomes')
    ->where('idempotency_key', $key)
    ->lockForUpdate()
    ->first();

  if ($already) {
    return; // safe retry
  }

  // Do the business action
  // ...

  DB::table('job_outcomes')->insert([
    'idempotency_key' => $key,
    'status' => 'done',
    'created_at' => now(),
    'updated_at' => now(),
  ]);
});

B) Distributed lock (Redis) for “one-at-a-time” processing

Best for imports, per-customer processing, per-file pipelines, and allowance tracking (bundles).

use Illuminate\Support\Facades\Cache;

$lock = Cache::lock("import:{$supplierId}:{$period}", 7200);

if (! $lock->get()) {
  return; // another worker is processing this already
}

try {
  // do work safely
} finally {
  optional($lock)->release();
}

C) “Exactly-once” isn’t real—design for safe “at-least-once”

Queues are designed for at-least-once delivery. Your architecture must assume duplicates are possible and handle them safely.


5) Long-running jobs: imports, exports, heavy rating

Enterprise strategy is simple: no single job should do unbounded work. Use chunking + batches.

  • Parse files using streams (don’t load entire CSV into memory).
  • Process in chunks (1k–5k rows) based on your workload.
  • Commit progress frequently (and log progress events).
  • Use “replace vs patch” write modes depending on the pipeline.

CTO signal: Chunking + idempotency is how you make “reprocessing” safe—critical in billing and finance systems.


6) Batches & pipelines: structure multi-stage processing

Most enterprise workflows are pipelines, not single jobs. Example:

Import (chunked) →
Normalize + dedupe →
Rate (compute queue) →
Apply bundles →
Update summary →
Export / notify

Use batch IDs to tie all stages together so ops can answer: “What happened for period X?”

  • Store job_id / batch_id on records created/updated
  • Write processing history per stage (start/end + counts + errors)
  • Make each stage rerunnable independently (with idempotency)

7) Monitoring & alerting (what to track)

Enterprise queue monitoring should trigger alerts before customer impact.

  • Backlog depth per queue (critical/imports/compute)
  • Job failure rate (spikes = release regression or upstream data shift)
  • Runtime p95/p99 for heavy jobs (timeout tuning and capacity planning)
  • Deadlocks/lock wait timeouts (DB-level contention)
  • Worker restarts/OOM (memory leaks or oversized jobs)

This is exactly what an enterprise maintenance plan should include: monitoring + tuning + incident response. See: Laravel Maintenance.


8) Incident playbook: backlog spikes, deadlocks, failures

Scenario A: backlog spike on imports

  • Confirm upstream change (new file format/volume).
  • Scale imports supervisor temporarily.
  • Reduce chunk size if memory is spiking.
  • Enable strict validation + issue logging so pipeline doesn’t stall.

Scenario B: deadlocks/lock wait timeouts

  • Identify hot queries and add the right composite indexes.
  • Reduce transaction scope (shorter transactions).
  • Batch updates (e.g., 200–500 rows) instead of huge updates.
  • Implement deadlock retry around critical updates.

Scenario C: job failures after release

  • Pause only the affected queue (not all queues).
  • Rollback the release or hotfix the specific job.
  • Replay failed jobs safely only if idempotency is confirmed.

9) Copy/paste enterprise checklist

  1. Separate queues by workload (critical/notifications/imports/compute/exports).
  2. Align worker timeout and Redis retry_after to real p95/p99 runtimes.
  3. Make every job safe to retry (idempotency keys + locks + unique constraints).
  4. Chunk long tasks (stream files, small batches, frequent commits).
  5. Use batch IDs + processing histories per stage.
  6. Monitor backlog depth, failure rates, runtime distribution, and worker memory.
  7. Create an incident playbook for spikes, deadlocks, and regressions.

Next steps (internal links)

Need enterprise Laravel architecture + scaling?

We design queue-first systems, safe pipelines, and high-throughput job processing (Horizon + Redis + MySQL tuning).

Want monitoring + SLA support?

We keep production stable: queue monitoring, failure triage, performance tuning, patching, and incident response.

Upgrading to Laravel 12 safely? Laravel Upgrade Service. Building AI ingestion pipelines (RAG indexing/agents) on queues? Laravel AI Development.

FAQ

What is the most common cause of duplicate processing in Laravel queues?

Misconfigured retry_after vs worker timeout, combined with jobs that aren’t idempotent. If a job is killed and re-queued, it will run twice unless your design prevents duplicates.

How do enterprises scale Horizon safely?

By separating workloads into dedicated queues and supervisors, measuring job runtimes (p95/p99), and applying idempotency patterns so retries are safe.

Should we use one queue or multiple queues?

Multiple queues. Mixing imports, CPU-heavy compute, and critical billing actions causes unpredictable latency and incidents. Queue separation makes performance predictable and safer under load.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *