A 38-node n8n pipeline that predicts F1 race winners daily — fusing live data, eight driver metrics, RAG over race history, and GPT-4o — with confidence-gated Slack alerts.

The brief: a self-updating F1 analyst that only speaks up when it's confident.
The goal was a fully automated system that predicts the winner of each upcoming Formula 1 race — not with a gut feeling, but with a data-driven, statistically grounded forecast refreshed every single day. Crucially, it had to know when not to shout: only high-confidence predictions should trigger alerts, so the signal never drowns in noise. We delivered this as a 38-node n8n workflow combining live data ingestion, statistical feature engineering, retrieval-augmented AI reasoning, and a confidence gate.
Stage 1 — Multi-source data collection. A schedule trigger fires daily at 8 AM and a configuration node centralises every tunable parameter — the F1 API base URL, current season, news and weather endpoints, how many years of history to use, and the confidence threshold. The workflow then pulls a rich, multi-source dataset from the Ergast F1 API (no auth required): current season schedule, driver standings, constructor standings, historical race results, qualifying results, and circuit information. Fresh, layered data matters here — standings reveal momentum while historical circuit results expose recurring performance patterns.
Stage 2 — Feature engineering. Raw standings are merged into a unified dataset, and a code node computes eight advanced driver-performance metrics — including podium rate, win rate, consistency score, recent form, DNF (reliability) rate, points per race, and average qualifying and finishing positions. This is the heart of the analytical edge: these derived metrics surface the patterns that a championship table alone simply cannot show.
Stage 3 — Historical vectorization (RAG). Three years of F1 historical data are transformed into a searchable knowledge base. A document loader ingests the raw history, a recursive text splitter chunks it into contextual units, OpenAI embeddings vectorize those chunks, and they're stored in an in-memory vector store. This lets the AI perform semantic search over the past — finding genuinely similar race scenarios (comparable circuits, conditions, or standings situations) rather than relying on surface-level lookups.
Stage 4 — AI prediction. A LangChain agent powered by GPT-4o acts as an expert race analyst. It's equipped with four tools: the historical-data retrieval tool (RAG over the vector store), a real-time F1 news tool, a weather-forecast tool, and a statistical-analysis code tool. The agent analyses the current-season data and actively uses all four tools to gather context, then outputs a structured prediction: the predicted winner, a confidence score from 0 to 1, the top-three predicted finishers, a minimum of five key factors, risk factors, and a weather-impact assessment — with instructions to cite specific historical statistics rather than guess.
Stage 5 — Confidence gating. Every prediction passes through an IF node that compares its confidence score against a configurable threshold (0.75 by default). Only predictions at or above the bar proceed to alerting and storage. This single gate is what keeps the system trustworthy — it deliberately suppresses weak forecasts to reduce alert fatigue and ensure only reliable insights are ever acted upon.
Stage 6 — Alerting and record-keeping. When a prediction clears the threshold, the system pushes a high-confidence alert to Slack and persists the full analysis in two places: a PostgreSQL table (with a defined schema covering prediction date, predicted winner, confidence score, source, data version, and the complete analysis) and a Google Sheets prediction tracker. The dual store gives both a robust queryable database and a human-friendly log for reviewing accuracy over time.
Why this design works. It mirrors how a serious analyst actually works — gather everything, engineer the metrics that matter, compare against history, reason over it, and only commit to a call when the evidence supports it. The configuration node makes the whole pipeline portable (swap the season, sources, history depth, or threshold without touching logic), the RAG layer grounds predictions in real precedent, and the confidence gate enforces discipline. It's also easily extensible: constructor predictions can be added by adjusting the agent prompt, and Slack can be swapped for Discord or Teams. The result is a hands-off engine well-suited to sports-analytics dashboards, fantasy F1 leagues, and F1 news or prediction sites — delivering a fresh, statistically grounded race forecast every day, but only raising its hand when it's genuinely confident.