The trap is seductive. The quick-start Docker command spins up a container in seconds, the editor loads on localhost:5678, and it feels like you're eighty percent done. You're five percent done. What the docs hand you is SQLite in an anonymous volume, an encryption key generated silently and stored who-knows-where, zero health checks, and a process that exits when you close the terminal. I've watched teams call this "self-hosted production" and then lose every credential because they rebuilt the container without persisting the .n8n directory.
That experience taught me to treat the first docker run as a liability, not a launch. Production doesn't mean "more." It means "recoverable."
The smallest set of infrastructure decisions that separates a toy from a production system: Postgres instead of SQLite, a pinned Docker image tag, an explicit encryption key backed up offline, health checks with automatic restarts, and log rotation. Everything else is an optimisation you add later.
If you have these five pieces in place, you can survive a reboot, a container recreation, and a disk that fills up faster than you expected. Skip any one of them and you're building on a deadline you don't control.
I start every self-hosted n8n project with a single docker-compose.yml and a .env file. The Compose file defines the infrastructure; the .env file holds the secrets. I never commit the .env to version control, and I never hardcode passwords into the Compose definition.
Here is the stack I run on day one:
services:
postgres:
image: postgres:16-alpine
container_name: n8n-postgres
restart: unless-stopped
environment:
POSTGRES_DB: ${POSTGRES_DB}
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
n8n:
image: n8nio/n8n:${N8N_VERSION}
container_name: n8n
restart: unless-stopped
ports:
- "5678:5678"
environment:
DB_TYPE: postgresdb
DB_POSTGRESDB_HOST: postgres
DB_POSTGRESDB_DATABASE: ${POSTGRES_DB}
DB_POSTGRESDB_USER: ${POSTGRES_USER}
DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
N8N_HOST: ${N8N_HOST}
N8N_PROTOCOL: https
WEBHOOK_URL: https://${N8N_HOST}/
N8N_EDITOR_BASE_URL: https://${N8N_HOST}/
GENERIC_TIMEZONE: ${GENERIC_TIMEZONE}
EXECUTIONS_DATA_PRUNE: "true"
EXECUTIONS_DATA_MAX_AGE: 168
N8N_ENCRYPTION_KEY: ${N8N_ENCRYPTION_KEY}
volumes:
- n8n_data:/home/node/.n8n
depends_on:
postgres:
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:5678/healthz || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
logging:
driver: json-file
options:
max-size: "50m"
max-file: "3"
volumes:
n8n_data:
postgres_data:
Every choice here is deliberate:
restart: unless-stopped brings the container back after a host reboot.depends_on with condition: service_healthy prevents the classic race condition where n8n starts before Postgres has finished initialising its data directory and then crashes in a boot loop.latest. Using latest means the next docker compose pull could introduce a breaking change or a schema migration you aren't ready for. Pinning makes rollbacks a one-line change.EXECUTIONS_DATA_PRUNE and MAX_AGE: 168 caps execution history at seven days. Without it your Postgres grows unbounded.The .env file lives next to the Compose file and looks like this:
N8N_VERSION=1.94.1
N8N_HOST=n8n.example.com
GENERIC_TIMEZONE=America/New_York
POSTGRES_DB=n8n_db
POSTGRES_USER=n8n_user
POSTGRES_PASSWORD=a-long-random-password
N8N_ENCRYPTION_KEY=your-generated-key-from-openssl
Generate it once with openssl rand -hex 32. Back it up to two places: a shared team
password manager and an offline location. n8n does not support key rotation natively. If
you lose this key, every credential in the database becomes undecryptable. There is no
recovery script. Treat it like a master password.
I don't open postgresql.conf on a fresh n8n install. The defaults in Postgres 16 handle the connection counts and write patterns that n8n generates up to a surprisingly high ceiling. The only Postgres tuning that matters on day one is choosing Postgres instead of SQLite.
SQLite is the default, and it works until it doesn't. It's a single file locked by one process, and n8n's concurrency model eventually collides with it. The failure mode is ugly: workflows hang, the editor becomes sluggish, and you start seeing "database is locked" errors that clear only after a restart. There is no tuning knob that fixes SQLite's locking model. Switching to Postgres fixes it instantly because Postgres handles row-level locking and concurrent writes properly.
After the engine choice, the only database maintenance I worry about is backups. A daily pg_dump compressed with gzip and shipped to an S3-compatible bucket is the minimum. I keep thirty days of retention and I test a restore quarterly.
A backup that has never been restored is a hope, not a plan.
If you want to be thorough, add a health check to Postgres that uses pg_isready and a start period of at least thirty seconds. On first boot, Postgres spends time initialising its data directory. Without the start period, Docker marks the container unhealthy before it has even finished starting, and your orchestration logic gets confused.
Queue mode splits n8n into a main instance that queues work and separate worker instances that execute it, backed by Redis. It is the right architecture for high-volume workloads. It is also the wrong architecture for low-volume workloads because it adds operational surface area — another network dependency, another container, another thing to debug at two in the morning.
1,000 exec/day or >5min steps">
Add Redis and switch EXECUTIONS_MODE to queue when either: (a) the instance handles more
than ~1,000 executions/day, or (b) it has long-running steps over five minutes that block
the main process. Below that threshold, regular mode is simpler, more reliable, and easier
to reason about.
When you cross the threshold, the change to the Compose file is small:
services:
redis:
image: redis:7-alpine
container_name: n8n-redis
restart: unless-stopped
volumes:
- redis_data:/data
n8n:
image: n8nio/n8n:${N8N_VERSION}
environment:
EXECUTIONS_MODE: queue
QUEUE_BULL_REDIS_HOST: redis
QUEUE_BULL_REDIS_PORT: 6379
# ... other env vars and depends_on for postgres and redis ...
The n8n main instance now pushes jobs to Redis, and workers pick them up. If a worker crashes mid-execution, the job stays in the queue and another worker picks it up. That resilience is worth the complexity only if you're actually hitting the limits of regular mode. I've seen teams enable queue mode on workflows that run fifty times a day and then spend weeks debugging Redis memory limits that weren't worth solving.
The first rule of secrets is to keep them out of docker-compose.yml. That file gets copied into Git repositories, shared in Slack threads, and pasted into support tickets. A password in a Compose file is a password that has leaked.
My default pattern is a .env file with strict permissions:
chmod 600 .env
echo ".env" >> .gitignore
For teams already running Docker Swarm or a full secrets manager like HashiCorp Vault, Docker Secrets are the better choice. You store secrets in files under a secrets/ directory, mount them into containers, and reference them with the _FILE suffix:
secrets:
db_password:
file: ./secrets/db_password.txt
services:
n8n:
secrets:
- db_password
environment:
DB_POSTGRESDB_PASSWORD_FILE: /run/secrets/db_password
But if you're on a single Docker host, don't over-engineer it. A .env file with proper permissions and a backup strategy is more than adequate. What kills teams is not the choice between .env and Docker Secrets; it's the failure to back up the N8N_ENCRYPTION_KEY.
In versions prior to 2.0, Code nodes have full access to process.env, which means any
workflow you import from the community can read your database password, your encryption
key, and your API tokens. Set N8N_BLOCK_ENV_ACCESS_IN_NODE=true. If a Code node
legitimately needs a secret, pass it through a credential or an upstream node, not through
the environment.
What you should absolutely not do is create duplicate credentials for every workflow that touches the same API. One Slack credential shared across thirty workflows means one point of rotation. Thirty duplicate credentials means thirty places to forget when the token changes.
You don't need a metrics dashboard with fifty graphs on day one. You need three things: proof the application is alive, proof the disk isn't full, and proof errors are reaching you.
/healthz every thirty seconds. I use Uptime Kuma for this because it is self-hosted, lightweight, and can alert via Slack or email within a minute of failure. The /healthz endpoint returns HTTP 200 when the n8n process is running. It does not check database connectivity, so I also set up a second monitor that triggers a test webhook against a lightweight health-check workflow every five minutes.json-file logging options to prevent a runaway execution trace from filling the disk.If you find out about failures from your users, your monitoring has failed.
If the editor is exposed to the internet at all, I add fail2ban on the Nginx logs to block brute-force login attempts. n8n's login page has no built-in rate limiting. An attacker can guess passwords indefinitely unless you stop them at the edge.
Not everything that sounds like "production" is necessary on day one. I deliberately skip:
Premature complexity is how you end up debugging Redis connection pools instead of shipping workflows.
If you're running n8n on the quick-start command today, here's your Monday morning checklist.
Run openssl rand -hex 32 and write the output to two places: a team password manager
and an offline location.
Create a docker-compose.yml with Postgres 16-alpine, a pinned n8n version, health checks
on both services, and log rotation. Move your secrets into a .env file with chmod 600
and add it to .gitignore.
Set EXECUTIONS_DATA_PRUNE=true with seven-day retention so your disk doesn't drown in
execution history.
Put a reverse proxy with TLS in front of the instance — Caddy is the fastest path. Point
WEBHOOK_URL to the public address.
One external monitor against /healthz plus one error workflow that screams at you when
something breaks.
Do this, and you have a system that survives reboots, container recreations, and the first time your webhook goes viral. Everything else is just tuning.