Part
2
  |  
Building Blocks
  |  
Chapter
8

Data Transformation Without Losing Your Mind

Most workflow bugs aren't logic bugs. They're shape bugs — a field is null when you expected a string, or an array when you expected an object. The fix isn't more null checks; it's a different way of thinking about data.
Reading Time
10
mins
BACK TO n8n Workflow ENgineer

Here's the trap I see most teams fall into: they build workflows as straight pipes. Data comes from API A, travels through a few nodes, and lands in API B. The shape of the payload is treated like a fact of nature instead of a contract. When API A changes a field name, or API B starts rejecting missing keys, the team doesn't notice until the execution log fills with red. Then they patch it. They add another IF node to check for undefined, another Set node to rename the new field, and another branch to handle the edge case.

I've seen production workflows with twenty nodes where nineteen of them are just defensive tape around a single misshapen payload.

The underlying problem is that the workflow owns nothing. It borrows its data shape from every vendor it touches. The right way to fix this is to give your workflow its own shape — a normal form that belongs to you — and translate everything else at the edges.

Own Your Shape

Framework · The normal form · N + M, not N × M

The single canonical shape your workflow operates on. The schema you own, independent of any vendor, webhook, or database. Every other system speaks its own dialect — but inside your workflow, everything is one language.

Consider a typical e-commerce pipeline. You receive order events from a Shopify webhook, enrich them from an internal API, and write them to a Postgres database, a HubSpot CRM, and a Slack channel. If you map Shopify's shape directly to each destination, you are maintaining three separate transformations. When Shopify adds a nested customer object or renames total_price to current_total_price, you now have three broken paths. The combinatorics get ugly fast.

Architecture Sources Destinations Mappings to maintain
Point-to-point 2 3 6
Point-to-point 4 5 20
Normal form 4 5 9

With a normal form, the math is always N + M, not N × M. You define one internal shape — say, an Order object with orderId, customerEmail, totalAmount, currency, status, and createdAt — and you enforce it at the boundary. The Shopify adapter converts customer.email to customerEmail and current_total_price to totalAmount. The CRM adapter converts your customerEmail back to HubSpot's email. The rest of your workflow never knows Shopify existed.

Key takeaway

The normal form is not the union of every field you've ever seen. It is the intersection of what your business logic actually needs. Keep the inside small and stable. Let the edges deal with everyone else's chaos.

Adapters at the Edges

Think of your workflow as a sandwich. The bread is adapters; the meat is business logic. Every external touchpoint gets an input adapter on the way in and an output adapter on the way out. Nothing else is allowed to speak vendor schema.

An input adapter has exactly one job: convert an external payload into your normal form, validate it, and drop everything else. In n8n, the Edit Fields node is the right tool for most of this work. If a legacy CRM sends contact data with uppercase field names and you need clean lowercase keys, you don't need JavaScript. You need a single Edit Fields node in Map Each Item mode.

// Incoming from the CRM webhook:
{
  "FIRST_NAME": "Jane",
  "LAST_NAME": "Doe",
  "EMAIL_ADDR": "jane@example.com",
  "PH_NUMBER": "+1-555-0199"
}

Configure the node with these assignments:

Output Field Expression
firstName {{ $json.FIRST_NAME }}
lastName {{ $json.LAST_NAME }}
email {{ $json.EMAIL_ADDR }}
phone {{ $json.PH_NUMBER }}

Enable Keep Only Set — which the node labels as Include Only Set Fields in newer n8n versions — so the extraneous uppercase keys die at the gate. The downstream workflow receives exactly the normal form, nothing more.

// Output to the rest of the workflow:
{
  "firstName": "Jane",
  "lastName": "Doe",
  "email": "jane@example.com",
  "phone": "+1-555-0199"
}

For nested objects, dot notation in Edit Fields expressions replaces whole Code nodes. If a Shopify order delivers the city inside shipping_address.city, an expression field mapped to city with the value {{ $json.shipping_address.city }} does the job. Combine it with a fallback so a missing path doesn't propagate undefined:

{{ $json.shipping_address.city || "N/A" }}

Or use nullish coalescing when zero or empty string is a valid value:

{{ $json.lead_score ?? 0 }}

Output adapters do the reverse. Right before the HTTP Request node that posts to HubSpot, drop another Edit Fields node that maps your normal form into HubSpot's expected shape. If the HubSpot API later changes its required fields, you have one place to update. Your business logic in the middle stays untouched.

This is the adapter pattern applied to workflows, and it is the difference between a system you can reason about and a pile of string.

Edit Fields vs. Code Node

The most common question I get is when to reach for a Code node instead of Edit Fields. My rule is simple: if the transformation is a rename, a simple expression, a type cast, or a nested field extraction, Edit Fields wins. If the transformation requires iteration, external libraries, conditional branching across items, or algorithmic logic, Code node is the right call.

Task Right Tool Why
Rename fields, flatten dot paths Edit Fields Visible, auditable, no sandbox startup cost
Null fallbacks, date formatting with Luxon Edit Fields Expressions handle it natively
Currency conversion, simple arithmetic Edit Fields {{ $json.cents / 100 }} is clearer than code
Build a CSV from multiple items Code Node Needs Buffer, string manipulation, and binary prep
Parse XML to JSON Code Node Needs xml2js library
Validate payload against JSON Schema Code Node Needs ajv and complex error aggregation
Cross-item aggregation (sums, joins) Code Node Needs $input.all()
Loop with internal conditionals Code Node Edit Fields has no for loop

The mistake I see is packing three simple renames and a currency division into a Code node because "it's just faster to write." It isn't faster to maintain. A Code node spins up a sandbox, hides its logic from non-developers on the team, and breaks visually — you can't click a line inside a forty-line JavaScript block and inspect its output the way you can click an Edit Fields node. When something breaks at 2 a.m., the person on call should be able to read the node name and know what it does.

Framework · The three-field heuristic

Chaining six Edit Fields nodes to simulate a loop is a smell. If you need more than three computed fields that depend on each other in sequence, step back and ask whether a single Code node with clear variable names is cleaner. Often it is.

But I still split the work: one Edit Fields node for sanitization, one Code node for the algorithm, one Edit Fields node for output trimming. The boundary between them is visible in the canvas.

When I do use a Code node for transformation, I write it defensively. I never assume a field exists. I validate types. And I return a predictable shape — my normal form — even when the input is malformed.

const input = $input.first().json;

if (!input || typeof input.email !== 'string') {
  return [{ json: { error: 'Invalid input', valid: false } }];
}

return [{
  json: {
    customerEmail: input.email.toLowerCase().trim(),
    valid: true
  }
}];

Fail fast at the adapter, return the error through a dedicated branch, and let the rest of the workflow assume it is receiving clean data.

The Shape Drift Problem

The normal form and adapter pattern protect you from known variation. They do not protect you from change.

Framework · Shape drift

The slow, silent mutation of a payload schema that breaks everything downstream. APIs change their schemas silently — fields move, types shift, a webhook that used to send a string starts sending an object.

Shape drift is worse than an outage because it looks like data corruption. Everything appears green in the execution log until someone notices the business metrics are wrong.

The fix is not to read the API changelog every morning. The fix is contract testing at the boundary.

Immediately after every webhook trigger and every HTTP Request node that fetches external data, I place a validation step. If the payload does not match the contract, the workflow rejects it before any business logic runs. In n8n, a Code node with ajv works well for this:

const Ajv = require('ajv');
const ajv = new Ajv({ allErrors: true });

const schema = {
  type: 'object',
  required: ['orderId', 'customerEmail', 'totalAmount'],
  properties: {
    orderId: { type: 'string', minLength: 1 },
    customerEmail: { type: 'string', format: 'email' },
    totalAmount: { type: 'number', minimum: 0 },
    currency: { type: 'string', enum: ['USD', 'EUR', 'GBP'] }
  },
  additionalProperties: false
};

const validate = ajv.compile(schema);
const payload = $input.first().json.body || $input.first().json;

if (!validate(payload)) {
  const errors = validate.errors.map(e => `${e.instancePath} ${e.message}`);
  return [{ json: { valid: false, errors, statusCode: 400 } }];
}

return [{ json: { valid: true, data: payload, statusCode: 200 } }];

Route the output through an IF node: valid continues to the adapter and business logic; false goes to a Respond to Webhook node with a 400 or to an internal alert channel. The sender gets immediate feedback, and your database never sees the bad shape.

But contract validation at runtime is only half the battle. You also need contract tests: scheduled workflows that probe your integrations and fail if the shape drifts. I run a daily workflow that calls each critical external API with a known test request and validates the response against my current normal form schema. If the API vendor added a required field, changed a type, or flattened a nested object, I know within twenty-four hours — not when the production pipeline starts writing garbage.

For webhooks that you can't easily poll, log the schema fingerprint. A lightweight Code node can hash the set of top-level keys and a few nested paths, then write that hash to a database table on every hundredth execution. A second scheduled workflow compares today's hash to last week's. A mismatch means drift, even if every individual payload technically passed validation because the values happened to look correct.

What to Do Monday Morning

You do not need to rewrite every workflow this week. You need to start owning your data shape in the places where pain is already visible.

Pick the workflow that broke most recently

Open it and identify the first node after the trigger. Ask: what shape does the rest of this workflow actually need? Write down five to ten fields. That is your normal form for this domain.

Add an input adapter

Place an Edit Fields node right after the trigger. Map the incoming vendor fields into your normal form. Enable Keep Only Set so nothing else leaks through. Add a fallback expression for every field that might be missing.

Add output adapters at every destination

Find every destination API and add an output adapter — another Edit Fields node — right before the HTTP Request that calls it. The output adapter's job is to translate your normal form into whatever that vendor wants today.

Demote rename-only Code nodes

Any Code node that is only renaming fields, dividing by a hundred, or flattening dot notation should be deleted and replaced with Edit Fields. Your future self will thank you when debugging at midnight.

Set up one contract test

Choose your most volatile integration. Build a scheduled workflow that fetches a sample payload and validates it against your normal form schema. Send yourself a Slack message when it fails.

Your workflows will be thinner. Your failures will be louder and earlier. And your data will finally belong to you.