Here is the trap I see most often: they scale vertically instead of architecturally. A workflow that handled ten sign-ups a day starts handling a thousand, and the team's first move is a bigger VPS. More RAM, more cores, maybe an optimistic prayer.
The wall is not hardware; it is geometry. I have watched teams burn a 32-core box to the ground and still drop webhooks because one long-running PDF extraction blocked the main thread for ninety seconds.
The fix is not a bigger box. It is a small set of configuration changes and a shift in how you think about execution flow. I use two mental models to guide every tuning engagement: the Queue-Mode Threshold, the point where main mode becomes a liability; and the Long-Runner Trap, the hidden memory cost of workflows that stay alive too long.
1,000 exec/day or 5+ concurrent"> Switch to queue mode once an instance crosses roughly 1,000 production executions per day, or when any single workflow regularly sees more than five concurrent runs. Below that, main mode is simpler. Above it, staying in main mode is a deliberate decision to accept dropped events and UI lockups.
In main mode, the process that paints the workflow editor also executes your code. When a burst of webhooks arrives — say, five hundred events in ten seconds — n8n tries to execute them all. Without a concurrency limit, memory spikes and the container OOMs. With a concurrency limit, the excess queue up in memory, which also eventually OOMs. It is a band-aid on a design mismatch.
Queue mode separates concerns. The main process handles the UI and webhook ingestion. Redis holds the job queue. Dedicated worker processes pick up executions and run them to completion. A blocked worker does not block your webhook endpoint, and you can scale workers horizontally instead of buying a bigger monolith.
The configuration is straightforward:
# Main process environment
EXECUTIONS_MODE=queue
QUEUE_BULL_REDIS_HOST=redis
QUEUE_BULL_REDIS_PORT=6379
QUEUE_BULL_REDIS_PASSWORD=your-redis-password
QUEUE_HEALTH_CHECK_ACTIVE=true
# docker-compose.yml excerpt
services:
redis:
image: redis:7-alpine
command: redis-server --requirepass your-redis-password --maxmemory 512mb
n8n-main:
image: n8nio/n8n:latest
command: n8n start
environment:
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=redis
- QUEUE_BULL_REDIS_PORT=6379
ports:
- "5678:5678"
n8n-worker-1:
image: n8nio/n8n:latest
command: n8n worker
environment:
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=redis
- N8N_CONCURRENCY_PRODUCTION_LIMIT=10
n8n-worker-2:
image: n8nio/n8n:latest
command: n8n worker
environment:
- EXECUTIONS_MODE=queue
- N8N_CONCURRENCY_PRODUCTION_LIMIT=10
Each worker can run multiple executions concurrently based on its limit. I usually start with two workers at concurrency ten each and add workers when Redis queue depth stays above fifty for more than a few minutes.
If you are not ready for queue mode, at least set a production concurrency limit to prevent burst-induced crashes:
N8N_CONCURRENCY_PRODUCTION_LIMIT=20
But treat that as a temporary guardrail, not a strategy. The threshold arrives faster than most teams predict.
The Split In Batches node is the standard tool for processing large datasets, but most teams set a batch size once — usually fifty — and forget it.
The right batch size is a function of the data, not the tutorial you copied. Aim for ≤5 MB of active JSON in flight per batch. If one item is 100 KB, batch stays under fifty; if one item is 2 MB, drop to five.
If you are importing contacts into HubSpot and the API allows one hundred calls per ten seconds, a batch size of fifty feels safe. But if each contact record carries a nested company profile, custom fields, and a base64 avatar, fifty items might be twenty megabytes of JSON sitting in memory. Suddenly your "safe" batch is the reason the worker OOMs.
You can check this by inspecting the output panel of the node feeding your Split In Batches: if one item is 100KB, your batch size should stay under fifty; if one item is 2MB, drop the batch to five, or two, or one.
Here is the loop structure I use:
[Read CSV / API / DB]
-> [Split In Batches (size: N)]
-> [Processing Node]
-> [Loop back to Split In Batches]
Configuration:
| Setting | Value | Reason |
|---|---|---|
| Batch Size | N |
Sized by memory footprint and API limit |
| Reset | false |
Resume from the last completed batch on failure |
For APIs with strict rate limits, I add a throttling Code node inside the loop:
// Code node: "Throttle Batches"
const batchIndex = $input.first().json.$batchIndex ?? 0;
// Wait one second every five batches
if (batchIndex > 0 && batchIndex % 5 === 0) {
await new Promise(resolve => setTimeout(resolve, 1000));
}
return $input.all();
I also track progress in static workflow data so I can resume a failed import without starting over:
// Code node: "Track Progress"
const batchIndex = $input.first().json.$batchIndex ?? 0;
$workflow.staticData.lastBatch = batchIndex;
return $input.all();
If the workflow fails on batch forty-seven, I check $workflow.staticData.lastBatch and adjust my source query to skip already-processed records.
If the API you are calling has built-in pagination — a next URL or offset parameter — do
not use Split In Batches at all. The HTTP Request node's pagination feature handles this
with less inter-node overhead and no risk of infinite loops.
Any workflow expected to run longer than five minutes is a memory leak waiting for permission. n8n keeps the full execution state — every item, every binary buffer, every intermediate node output — in memory for the duration of the run.
The longer the execution, the more garbage accumulates. If you are also holding binary data in memory or in the database, the crash is guaranteed; only the timing is uncertain.
The default configuration stores binary data in the database and in memory. For a workflow processing vendor PDFs or image exports, this is catastrophic. Switch to filesystem mode immediately:
N8N_DEFAULT_BINARY_DATA_MODE=filesystem
N8N_BINARY_DATA_STORAGE_PATH=/data/n8n-binary
N8N_PAYLOAD_SIZE_MAX=256
NODE_OPTIONS=--max-old-space-size=4096
With filesystem mode, the HTTP Request node writes downloaded files directly to disk. The S3 node reads from disk to upload. The file never fully loads into the Node.js heap. I have seen this single change drop memory usage by an order of magnitude on file-heavy workflows.
Next, control execution data saving. For high-volume workflows — webhook handlers, scheduled syncs running every few minutes — I disable saving data for successful runs entirely:
EXECUTIONS_DATA_SAVE_ON_SUCCESS=none
EXECUTIONS_DATA_SAVE_ON_ERROR=all
EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS=true
This prevents the executions table from growing by gigabytes per week. I also set aggressive pruning:
EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=168
EXECUTIONS_DATA_PRUNE_MAX_COUNT=5000
If you use Wait nodes for human-in-the-loop approvals, the execution data must survive
until the workflow resumes. Set EXECUTIONS_DATA_MAX_AGE higher than your longest Wait
timeout. If you prune after seven days but your approval window is ten days, you will
strand executions in limbo.
For any long-running workflow, I also set a hard timeout in the workflow settings. I default to 120 seconds for webhook handlers and 600 seconds for batch jobs. If a workflow cannot finish in that window, it should be chunked into smaller units, not given more rope.
Parallelism is a dial, not a switch. I increase it only when I know the downstream API can handle the load and my instance has the memory headroom.
The most common bottleneck I see is a loop making hundreds of sequential HTTP requests. The execution detail view will show a single HTTP Request node consuming ninety percent of the total execution time. The fix is rarely "faster API." It is parallel batching.
The HTTP Request node has a Batch Size setting that controls how many requests run concurrently. If you have 248 items and the API supports it, raising the batch size from 1 to 10 drops the wall time from thirty-eight seconds to roughly four seconds:
| Bottleneck | Cause | Fix |
|---|---|---|
| Single node takes 90%+ of time | Sequential API calls | Set HTTP Request batch size, or use bulk endpoints |
| Many nodes each take a few seconds | Large payloads passed between nodes | Strip unused fields before passing downstream |
| Loop iterations are slow | Rate limits or heavy item payloads | Tune batch size, add strategic delays |
If the API is fragile — prone to 500s or rate limits — I pair parallel execution with a circuit breaker. After three consecutive failures, I stop calling the API and alert the team:
// Code node: "Circuit Breaker Check"
const state = $input.first().json;
const FAILURE_THRESHOLD = 3;
const RECOVERY_TIMEOUT_MS = 300000;
const now = Date.now();
const lastFailure = state.last_failure_time ? new Date(state.last_failure_time).getTime() : 0;
if (state.circuit_state === 'open') {
if (now - lastFailure > RECOVERY_TIMEOUT_MS) {
return [{ json: { action: 'proceed', circuitState: 'half-open' } }];
}
return [{ json: { action: 'skip', circuitState: 'open' } }];
}
return [{ json: { action: 'proceed', circuitState: state.circuit_state || 'closed' } }];
For batch operations where some items will inevitably fail, I enable Continue on Fail on the processing node, then split successes and failures downstream. The failures route to a dead letter queue for manual review. This keeps one bad record from killing an entire batch of five hundred.
Finally, guard the instance itself with concurrency limits:
N8N_CONCURRENCY_PRODUCTION_LIMIT=20
QUEUE_WORKER_CONCURRENCY=10
When the limit is hit, new executions queue in Redis rather than spawning infinite threads. In main mode without queue mode, they queue in memory, which still risks OOM under extreme load.
The webhook node's response mode setting is the most underrated throughput control in n8n. Choose wrong, and you create a self-inflicted denial-of-service loop.
Use Immediately response mode on every production webhook that triggers a workflow lasting more than one second. The caller gets its 200 acknowledgment instantly, and n8n continues processing in the background.
I use When Last Node Finishes only for internal API endpoints where the caller actually needs the computed result.
The failure mode is subtle but devastating. Shopify requires a 200 response within five seconds. If your workflow takes thirty seconds to process an order and you are using "When Last Node Finishes," Shopify times out and retries. Now you have two executions. Then three. The retries stack until your instance is spending all its resources processing duplicate events for orders that already succeeded. It looks like a performance problem. It is actually a configuration problem.
| Scenario | Response Mode | Code | Reason |
|---|---|---|---|
| Shopify order webhook | Immediately | 200 | Caller needs fast ack only |
| Internal frontend API | When Last Node Finishes | 200 | Caller needs final result |
| GitHub CI trigger | Immediately | 202 | Accepted; results posted back later via status API |
If you need to return data early and continue, place the Respond to Webhook node on the true branch of an IF node, or directly after the trigger, then continue the rest of the flow. The webhook caller gets its response, and the execution proceeds uninterrupted.
Before you tune batches, concurrency, or memory limits, check your database. n8n defaults to SQLite, which is fine for development and light production. It falls apart under write concurrency.
I migrate to PostgreSQL once an instance crosses roughly ten thousand executions per day, or when I need more than five concurrent workflows running reliably. SQLite's single-writer lock causes SQLITE_BUSY errors that manifest as random timeouts and, in some cases, data corruption.
DB_TYPE=postgresdb
DB_POSTGRESDB_HOST=postgres
DB_POSTGRESDB_PORT=5432
DB_POSTGRESDB_DATABASE=n8n
DB_POSTGRESDB_USER=n8n_user
DB_POSTGRESDB_PASSWORD=your-secure-password
DB_POSTGRESDB_POOL_SIZE=20
There is no built-in migration tool from SQLite to PostgreSQL. You export your workflows as JSON, stand up the new database, import them, and re-enter credentials. Plan it during a maintenance window, but do not skip it.
No amount of queue tuning fixes a locked database file.
Performance tuning is not a one-time project. It is a maintenance habit.
Over 1,000 executions per day → plan a queue-mode migration. Over 10,000 → stop planning and start migrating.
Open every production webhook node. If it says "When Last Node Finishes" and the workflow takes longer than three seconds, switch it to "Immediately" or add a Respond to Webhook node.
Open the last twenty executions. If any single node consumes more than 80% of the total time, decide if it needs parallel batching, a bulk API call, or to be moved to a sub-workflow.
Calculate the memory footprint of one item. If a batch exceeds roughly 5 MB of active JSON, drop the batch size.
Set N8N_DEFAULT_BINARY_DATA_MODE=filesystem and define a storage path with adequate
disk space.
Set the global EXECUTIONS_DATA_SAVE_ON_SUCCESS=none, then override to "Yes" only on
workflows you are actively debugging.
If DB_TYPE is still sqlite, schedule the PostgreSQL migration.
Pick a limit appropriate for your RAM and stick to it.
Temporarily break a credential, trigger a failure, and confirm your error workflow and alerting still work. Untested error handling is broken error handling.