How I'd Set Up n8n if I Were Doing It Today | The Workflow Engineer

The trap is seductive. The quick-start Docker command spins up a container in seconds, the editor loads on localhost:5678, and it feels like you're eighty percent done. You're five percent done. What the docs hand you is SQLite in an anonymous volume, an encryption key generated silently and stored who-knows-where, zero health checks, and a process that exits when you close the terminal. I've watched teams call this "self-hosted production" and then lose every credential because they rebuilt the container without persisting the .n8n directory.

That experience taught me to treat the first docker run as a liability, not a launch. Production doesn't mean "more." It means "recoverable."

Framework · The Day-One Baseline

The smallest set of infrastructure decisions that separates a toy from a production system: Postgres instead of SQLite, a pinned Docker image tag, an explicit encryption key backed up offline, health checks with automatic restarts, and log rotation. Everything else is an optimisation you add later.

If you have these five pieces in place, you can survive a reboot, a container recreation, and a disk that fills up faster than you expected. Skip any one of them and you're building on a deadline you don't control.

The Compose File I Actually Deploy

I start every self-hosted n8n project with a single docker-compose.yml and a .env file. The Compose file defines the infrastructure; the .env file holds the secrets. I never commit the .env to version control, and I never hardcode passwords into the Compose definition.

Here is the stack I run on day one:

services:
  postgres:
    image: postgres:16-alpine
    container_name: n8n-postgres
    restart: unless-stopped
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  n8n:
    image: n8nio/n8n:${N8N_VERSION}
    container_name: n8n
    restart: unless-stopped
    ports:
      - "5678:5678"
    environment:
      DB_TYPE: postgresdb
      DB_POSTGRESDB_HOST: postgres
      DB_POSTGRESDB_DATABASE: ${POSTGRES_DB}
      DB_POSTGRESDB_USER: ${POSTGRES_USER}
      DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
      N8N_HOST: ${N8N_HOST}
      N8N_PROTOCOL: https
      WEBHOOK_URL: https://${N8N_HOST}/
      N8N_EDITOR_BASE_URL: https://${N8N_HOST}/
      GENERIC_TIMEZONE: ${GENERIC_TIMEZONE}
      EXECUTIONS_DATA_PRUNE: "true"
      EXECUTIONS_DATA_MAX_AGE: 168
      N8N_ENCRYPTION_KEY: ${N8N_ENCRYPTION_KEY}
    volumes:
      - n8n_data:/home/node/.n8n
    depends_on:
      postgres:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "wget -qO- http://localhost:5678/healthz || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    logging:
      driver: json-file
      options:
        max-size: "50m"
        max-file: "3"

volumes:
  n8n_data:
  postgres_data:

Every choice here is deliberate:

restart: unless-stopped brings the container back after a host reboot.
depends_on with condition: service_healthy prevents the classic race condition where n8n starts before Postgres has finished initialising its data directory and then crashes in a boot loop.
Pinned image tag instead of latest. Using latest means the next docker compose pull could introduce a breaking change or a schema migration you aren't ready for. Pinning makes rollbacks a one-line change.
EXECUTIONS_DATA_PRUNE and MAX_AGE: 168 caps execution history at seven days. Without it your Postgres grows unbounded.
Log rotation at the Docker level (50 MB × 3 files) so a single runaway execution can't fill the disk.

The .env file lives next to the Compose file and looks like this:

N8N_VERSION=1.94.1
N8N_HOST=n8n.example.com
GENERIC_TIMEZONE=America/New_York
POSTGRES_DB=n8n_db
POSTGRES_USER=n8n_user
POSTGRES_PASSWORD=a-long-random-password
N8N_ENCRYPTION_KEY=your-generated-key-from-openssl

The encryption key is the unrecoverable secret

Generate it once with openssl rand -hex 32. Back it up to two places: a shared team password manager and an offline location. n8n does not support key rotation natively. If you lose this key, every credential in the database becomes undecryptable. There is no recovery script. Treat it like a master password.

Postgres: The Only Tuning That Matters

I don't open postgresql.conf on a fresh n8n install. The defaults in Postgres 16 handle the connection counts and write patterns that n8n generates up to a surprisingly high ceiling. The only Postgres tuning that matters on day one is choosing Postgres instead of SQLite.

SQLite is the default, and it works until it doesn't. It's a single file locked by one process, and n8n's concurrency model eventually collides with it. The failure mode is ugly: workflows hang, the editor becomes sluggish, and you start seeing "database is locked" errors that clear only after a restart. There is no tuning knob that fixes SQLite's locking model. Switching to Postgres fixes it instantly because Postgres handles row-level locking and concurrent writes properly.

After the engine choice, the only database maintenance I worry about is backups. A daily pg_dump compressed with gzip and shipped to an S3-compatible bucket is the minimum. I keep thirty days of retention and I test a restore quarterly.

A backup that has never been restored is a hope, not a plan.

If you want to be thorough, add a health check to Postgres that uses pg_isready and a start period of at least thirty seconds. On first boot, Postgres spends time initialising its data directory. Without the start period, Docker marks the container unhealthy before it has even finished starting, and your orchestration logic gets confused.

Queue Mode and the Threshold for Adding Redis

Queue mode splits n8n into a main instance that queues work and separate worker instances that execute it, backed by Redis. It is the right architecture for high-volume workloads. It is also the wrong architecture for low-volume workloads because it adds operational surface area — another network dependency, another container, another thing to debug at two in the morning.

Framework · The Queue Mode Threshold

1,000 exec/day or >5min steps"> Add Redis and switch EXECUTIONS_MODE to queue when either: (a) the instance handles more than ~1,000 executions/day, or (b) it has long-running steps over five minutes that block the main process. Below that threshold, regular mode is simpler, more reliable, and easier to reason about.

When you cross the threshold, the change to the Compose file is small:

services:
  redis:
    image: redis:7-alpine
    container_name: n8n-redis
    restart: unless-stopped
    volumes:
      - redis_data:/data

  n8n:
    image: n8nio/n8n:${N8N_VERSION}
    environment:
      EXECUTIONS_MODE: queue
      QUEUE_BULL_REDIS_HOST: redis
      QUEUE_BULL_REDIS_PORT: 6379
    # ... other env vars and depends_on for postgres and redis ...

The n8n main instance now pushes jobs to Redis, and workers pick them up. If a worker crashes mid-execution, the job stays in the queue and another worker picks it up. That resilience is worth the complexity only if you're actually hitting the limits of regular mode. I've seen teams enable queue mode on workflows that run fifty times a day and then spend weeks debugging Redis memory limits that weren't worth solving.

Secret Management: What to Do and What to Avoid

The first rule of secrets is to keep them out of docker-compose.yml. That file gets copied into Git repositories, shared in Slack threads, and pasted into support tickets. A password in a Compose file is a password that has leaked.

My default pattern is a .env file with strict permissions:

chmod 600 .env
echo ".env" >> .gitignore

For teams already running Docker Swarm or a full secrets manager like HashiCorp Vault, Docker Secrets are the better choice. You store secrets in files under a secrets/ directory, mount them into containers, and reference them with the _FILE suffix:

secrets:
  db_password:
    file: ./secrets/db_password.txt

services:
  n8n:
    secrets:
      - db_password
    environment:
      DB_POSTGRESDB_PASSWORD_FILE: /run/secrets/db_password

But if you're on a single Docker host, don't over-engineer it. A .env file with proper permissions and a backup strategy is more than adequate. What kills teams is not the choice between .env and Docker Secrets; it's the failure to back up the N8N_ENCRYPTION_KEY.

Block Code-node access to env

In versions prior to 2.0, Code nodes have full access to process.env, which means any workflow you import from the community can read your database password, your encryption key, and your API tokens. Set N8N_BLOCK_ENV_ACCESS_IN_NODE=true. If a Code node legitimately needs a secret, pass it through a credential or an upstream node, not through the environment.

What you should absolutely not do is create duplicate credentials for every workflow that touches the same API. One Slack credential shared across thirty workflows means one point of rotation. Thirty duplicate credentials means thirty places to forget when the token changes.

Monitoring: The Minimal Viable Alerting Stack

You don't need a metrics dashboard with fifty graphs on day one. You need three things: proof the application is alive, proof the disk isn't full, and proof errors are reaching you.

An external monitor hitting /healthz every thirty seconds. I use Uptime Kuma for this because it is self-hosted, lightweight, and can alert via Slack or email within a minute of failure. The /healthz endpoint returns HTTP 200 when the n8n process is running. It does not check database connectivity, so I also set up a second monitor that triggers a test webhook against a lightweight health-check workflow every five minutes.
Disk space alerts. n8n stores binary data and execution logs on the filesystem by default. On a busy instance, that grows faster than you'd expect. A simple cron script that checks usage and alerts when the volume passes eighty percent has saved me more times than I can count. I also cap container logs with Docker's json-file logging options to prevent a runaway execution trace from filling the disk.
An error workflow inside n8n itself. Every production workflow should have error handling that routes to a dedicated "alert" workflow that sends a message to Slack with the execution URL, the error message, and the workflow name.

If you find out about failures from your users, your monitoring has failed.

If the editor is exposed to the internet at all, I add fail2ban on the Nginx logs to block brute-force login attempts. n8n's login page has no built-in rate limiting. An attacker can guess passwords indefinitely unless you stop them at the edge.

What You Can Skip for Now

Not everything that sounds like "production" is necessary on day one. I deliberately skip:

Blue-green deployments until the system is critical enough that a two-minute upgrade window costs money.
Air-gapped telemetry settings unless you're in a regulated industry where outbound connections are banned.
Webhook IP allowlisting and HMAC verification for internal tools. Add these immediately for any public endpoint that triggers destructive actions.
Queue mode until the volume justifies it.
Docker Secrets unless the infrastructure team already has a secrets workflow.

Premature complexity is how you end up debugging Redis connection pools instead of shipping workflows.

What to Do Monday Morning

If you're running n8n on the quick-start command today, here's your Monday morning checklist.

Generate and back up an encryption key

Run openssl rand -hex 32 and write the output to two places: a team password manager and an offline location.

Stand up the production Compose stack

Create a docker-compose.yml with Postgres 16-alpine, a pinned n8n version, health checks on both services, and log rotation. Move your secrets into a .env file with chmod 600 and add it to .gitignore.

Prune execution history

Set EXECUTIONS_DATA_PRUNE=true with seven-day retention so your disk doesn't drown in execution history.

Add TLS and a public URL

Put a reverse proxy with TLS in front of the instance — Caddy is the fastest path. Point WEBHOOK_URL to the public address.

Set up basic monitoring

One external monitor against /healthz plus one error workflow that screams at you when something breaks.

Do this, and you have a system that survives reboots, container recreations, and the first time your webhook goes viral. Everything else is just tuning.