In enterprise Laravel systems, the queue is where the business actually runs: imports, billing calculations, notifications, webhooks, reports, AI ingestion, and scheduled workflows. The fastest way to create outages is to treat queues like “background tasks” instead of a production-critical system.
This guide shows how to run Laravel queues + Horizon safely at scale: queue separation, correct retry configuration, idempotency, long-running jobs, batch pipelines, and monitoring—without duplicate processing or random worker meltdowns.
For the full enterprise build guide, start here: Laravel Development (2026): The Complete Guide to Building & Scaling Enterprise Applications. If you need help hardening your production queues under SLA: Laravel Maintenance.
Quick navigation
- 1) Why queues break in production
- 2) Enterprise queue design (separation + priorities)
- 3) timeout vs retry_after (the #1 duplicate-processing cause)
- 4) Idempotency patterns (do this or pay later)
- 5) Long-running jobs: imports, exports, heavy rating
- 6) Batches & pipelines: how to structure multi-stage processing
- 7) Monitoring & alerting (what to track)
- 8) Incident playbook: backlog spikes, deadlocks, failures
- 9) Copy/paste checklist
- Next steps
1) Why queues break in production
- Mixed workloads: imports + notifications + CPU-heavy rating all share the same workers.
- Wrong timeouts: workers kill jobs early; jobs restart; duplicates happen.
- No idempotency: retries create double charges, double emails, double exports.
- Unbounded jobs: one job tries to do “everything” and eats memory.
- No queue monitoring: you discover problems only after customers complain.
Enterprise rule: A job must be safe to retry. If it isn’t, it’s not production-ready.
2) Enterprise queue design (separation + priorities)
Start by separating workloads by behavior. A clean enterprise baseline looks like:
- critical (payments, billing actions, contract state changes)
- notifications (emails/SMS/webhooks)
- imports (file ingestion + normalization)
- compute (rating engines, analytics, heavy transforms)
- exports (CSV/XLS/PDF generation)
Why this matters: if exports spike, your payments must still run. If imports spike, your notifications must still send. Queue separation prevents “cross-contamination.”
Horizon example: dedicated supervisors per workload
Here’s a clean starting point (adapt to your infra). The key is: separate queues + different timeouts + right worker counts.
// config/horizon.php (example)
'environments' => [
'production' => [
'critical' => [
'connection' => 'redis',
'queue' => ['critical'],
'balance' => 'auto',
'minProcesses' => 2,
'maxProcesses' => 10,
'tries' => 3,
'timeout' => 120,
],
'notifications' => [
'connection' => 'redis',
'queue' => ['notifications'],
'balance' => 'auto',
'minProcesses' => 1,
'maxProcesses' => 8,
'tries' => 5,
'timeout' => 120,
],
'imports' => [
'connection' => 'redis',
'queue' => ['imports'],
'balance' => 'auto',
'minProcesses' => 1,
'maxProcesses' => 4,
'tries' => 1,
'timeout' => 3600,
],
'compute' => [
'connection' => 'redis',
'queue' => ['compute'],
'balance' => 'auto',
'minProcesses' => 2,
'maxProcesses' => 20,
'tries' => 2,
'timeout' => 1800,
],
'exports' => [
'connection' => 'redis',
'queue' => ['exports'],
'balance' => 'auto',
'minProcesses' => 1,
'maxProcesses' => 6,
'tries' => 2,
'timeout' => 900,
],
],
];
3) timeout vs retry_after (the #1 duplicate-processing cause)
This mistake causes “phantom duplicates” in billing systems: the worker kills a job at timeout, but Redis thinks it’s still running, then re-queues it after retry_after and the job runs again.
Rule of thumb: retry_after must be greater than your longest real job runtime (plus buffer), and worker timeout must align to that reality.
// config/queue.php (Redis)
'redis' => [
'driver' => 'redis',
'connection' => 'default',
'queue' => env('REDIS_QUEUE', 'default'),
// Must be > longest job runtime + buffer
'retry_after' => 5400,
'block_for' => null,
'after_commit' => true,
];
Enterprise reality: If you don’t measure job runtime distribution (p95/p99), you’ll guess timeouts—and guessing creates duplicates.
4) Idempotency patterns (do this or pay later)
Idempotency means: if a job runs twice (retry, redeploy, crash), the business outcome remains correct. Here are enterprise-grade patterns that work.
A) Idempotency key + DB unique constraint
Best for payments, invoices, webhooks, and “create once” operations.
// Example: write once using a unique key
// Table has UNIQUE(idempotency_key)
DB::transaction(function () use ($key, $payload) {
$already = DB::table('job_outcomes')
->where('idempotency_key', $key)
->lockForUpdate()
->first();
if ($already) {
return; // safe retry
}
// Do the business action
// ...
DB::table('job_outcomes')->insert([
'idempotency_key' => $key,
'status' => 'done',
'created_at' => now(),
'updated_at' => now(),
]);
});
B) Distributed lock (Redis) for “one-at-a-time” processing
Best for imports, per-customer processing, per-file pipelines, and allowance tracking (bundles).
use Illuminate\Support\Facades\Cache;
$lock = Cache::lock("import:{$supplierId}:{$period}", 7200);
if (! $lock->get()) {
return; // another worker is processing this already
}
try {
// do work safely
} finally {
optional($lock)->release();
}
C) “Exactly-once” isn’t real—design for safe “at-least-once”
Queues are designed for at-least-once delivery. Your architecture must assume duplicates are possible and handle them safely.
5) Long-running jobs: imports, exports, heavy rating
Enterprise strategy is simple: no single job should do unbounded work. Use chunking + batches.
- Parse files using streams (don’t load entire CSV into memory).
- Process in chunks (1k–5k rows) based on your workload.
- Commit progress frequently (and log progress events).
- Use “replace vs patch” write modes depending on the pipeline.
CTO signal: Chunking + idempotency is how you make “reprocessing” safe—critical in billing and finance systems.
6) Batches & pipelines: structure multi-stage processing
Most enterprise workflows are pipelines, not single jobs. Example:
Import (chunked) →
Normalize + dedupe →
Rate (compute queue) →
Apply bundles →
Update summary →
Export / notify
Use batch IDs to tie all stages together so ops can answer: “What happened for period X?”
- Store job_id / batch_id on records created/updated
- Write processing history per stage (start/end + counts + errors)
- Make each stage rerunnable independently (with idempotency)
7) Monitoring & alerting (what to track)
Enterprise queue monitoring should trigger alerts before customer impact.
- Backlog depth per queue (critical/imports/compute)
- Job failure rate (spikes = release regression or upstream data shift)
- Runtime p95/p99 for heavy jobs (timeout tuning and capacity planning)
- Deadlocks/lock wait timeouts (DB-level contention)
- Worker restarts/OOM (memory leaks or oversized jobs)
This is exactly what an enterprise maintenance plan should include: monitoring + tuning + incident response. See: Laravel Maintenance.
8) Incident playbook: backlog spikes, deadlocks, failures
Scenario A: backlog spike on imports
- Confirm upstream change (new file format/volume).
- Scale imports supervisor temporarily.
- Reduce chunk size if memory is spiking.
- Enable strict validation + issue logging so pipeline doesn’t stall.
Scenario B: deadlocks/lock wait timeouts
- Identify hot queries and add the right composite indexes.
- Reduce transaction scope (shorter transactions).
- Batch updates (e.g., 200–500 rows) instead of huge updates.
- Implement deadlock retry around critical updates.
Scenario C: job failures after release
- Pause only the affected queue (not all queues).
- Rollback the release or hotfix the specific job.
- Replay failed jobs safely only if idempotency is confirmed.
9) Copy/paste enterprise checklist
- Separate queues by workload (critical/notifications/imports/compute/exports).
- Align worker timeout and Redis retry_after to real p95/p99 runtimes.
- Make every job safe to retry (idempotency keys + locks + unique constraints).
- Chunk long tasks (stream files, small batches, frequent commits).
- Use batch IDs + processing histories per stage.
- Monitor backlog depth, failure rates, runtime distribution, and worker memory.
- Create an incident playbook for spikes, deadlocks, and regressions.
Next steps (internal links)
Need enterprise Laravel architecture + scaling?
We design queue-first systems, safe pipelines, and high-throughput job processing (Horizon + Redis + MySQL tuning).
Want monitoring + SLA support?
We keep production stable: queue monitoring, failure triage, performance tuning, patching, and incident response.
Upgrading to Laravel 12 safely? Laravel Upgrade Service. Building AI ingestion pipelines (RAG indexing/agents) on queues? Laravel AI Development.
FAQ
What is the most common cause of duplicate processing in Laravel queues?
Misconfigured retry_after vs worker timeout, combined with jobs that aren’t idempotent. If a job is killed and re-queued, it will run twice unless your design prevents duplicates.
How do enterprises scale Horizon safely?
By separating workloads into dedicated queues and supervisors, measuring job runtimes (p95/p99), and applying idempotency patterns so retries are safe.
Should we use one queue or multiple queues?
Multiple queues. Mixing imports, CPU-heavy compute, and critical billing actions causes unpredictable latency and incidents. Queue separation makes performance predictable and safer under load.
Leave a Reply