AI Engineering | Jurgens Schoeman

Architecture

The 5 Layers of AI Engineering

AI engineering is not just "call an API and get a response." It's a layered discipline — from foundation models and inference, through orchestration and integration, to the end-user application. Understanding each layer is key to building robust, production-grade AI systems.

Mental Model

Think of AI engineering like a TCP/IP stack. Each layer has a clear responsibility and abstraction boundary. You build on top of lower layers without needing to understand their internals — until something breaks.

Layer 5 — Application

User-Facing AI Applications

Chat interfaces, AI copilots, autonomous agents, coding assistants. This is what users interact with. Blazor, Next.js, React Native apps calling the orchestration layer.

Layer 4 — Integration

Tool Use & External Systems

MCP servers, function calling, webhooks. AI connects to databases, APIs, code editors, file systems. This is where N8N workflows and custom skills live.

Layer 3 — Orchestration

Prompt Pipelines & Agent Loops

Chain-of-thought, RAG pipelines, multi-agent coordination, memory management. Tools: LangChain, Semantic Kernel, custom orchestrators, N8N.

Layer 2 — Inference / API

Model Access & Context Management

Claude API, OpenAI API, OLlama (local inference). Manages context windows, token budgets, temperature, streaming. Rate limiting and cost optimisation.

Layer 1 — Foundation

Models & Training

Claude Sonnet/Opus, GPT-4o, Codex, Llama 3, Mistral, Phi-3. Pre-trained foundation models. Fine-tuning and RLHF happen here — mostly handled by the model providers.

My Toolkit

Current AI Workflow Stack

The tools I use daily as a full-stack engineer to move faster, write better code, and automate the repetitive parts of the job.

Claude Code

Anthropic's agentic coding tool. Runs in the terminal with full file-system access, executes commands, and iterates on complex multi-file refactors autonomously. My go-to for architecture-level coding sessions.

Agentic Terminal

ChatGPT Codex

OpenAI's cloud-based coding agent. Excels at writing boilerplate, generating tests, and explaining complex algorithms. Works directly inside GitHub repos and VS Code via Copilot.

Code Gen GitHub

Antigravity

AI-powered development acceleration platform. Integrates directly into the development workflow to provide context-aware suggestions, automated documentation, and intelligent code review at the PR level.

Workflow PR Review

N8N

Open-source workflow automation. I use N8N to orchestrate AI pipelines — connecting LLMs to databases, webhooks, Slack, GitHub, Azure services, and custom HTTP endpoints. Visual workflow builder meets code flexibility.

Automation Orchestration

OLlama

Run large language models locally — Llama 3, Mistral, Phi-3, Codestral and more. Zero API costs, complete data privacy, and no internet dependency. Essential for air-gapped environments and sensitive codebases.

Local AI Privacy

MCP Servers

Model Context Protocol servers expose resources, tools, and prompts to AI models. I build and run custom MCP servers for GitHub, Azure APIs, databases, and internal business tools — making AI genuinely useful in real workflows.

MCP Integration

Pro Tips

Prompt Engineering: 10 Layers Deep

Prompt engineering is a skill, not a trick. These layers build on each other — master them in order for compounding returns on AI output quality.

System Prompts are Architecture

Define the model's persona, constraints, and output format in the system prompt. Treat it as your API contract — version-control it alongside your code. A weak system prompt produces unpredictable outputs.

Role Priming Sets the Expert Frame

"You are a senior C# architect reviewing a pull request for security vulnerabilities..." immediately elevates response quality. The model adopts the context and expertise level you specify.

Few-Shot Examples Beat Long Instructions

Provide 3–5 input/output examples of exactly the format you want. The model learns the pattern faster from examples than from lengthy prose instructions. Especially powerful for code generation.

Chain-of-Thought for Complex Reasoning

Add "Think step by step before answering" or "Show your reasoning" for tasks requiring logic, debugging, or architecture decisions. CoT dramatically reduces reasoning errors.

Output Format Contracts

Specify exact output structure: JSON schema, markdown headings, code blocks with language tags. Models that know the expected format produce far more consistent, parseable outputs — essential for automated pipelines.

Negative Constraints Reduce Hallucination

Explicitly say what NOT to do: "Do not add comments unless they explain non-obvious logic." "Do not use deprecated APIs." "If unsure, say so — do not guess." Negative constraints prevent common failure modes.

RAG Grounds Responses in Reality

Retrieval Augmented Generation injects relevant context (docs, code, DB records) into the prompt at query time. Instead of asking the model to "know" your codebase, give it the relevant files — then ask your question.

Temperature is a Quality Dial

Low temperature (0.1–0.3): deterministic, factual tasks — SQL queries, code reviews, data extraction. High temperature (0.7–0.9): creative writing, brainstorming, generating diverse options. Always set it intentionally.

Iterative Multi-Turn Refinement

Don't try to get the perfect answer in one giant prompt. Start broad, then refine: "Now add error handling." "Now make it idiomatic C#." "Now write the unit tests." Multi-turn produces better results than monolithic prompts.

Context Window is Prime Real Estate

The most important information should appear at the start and end of your context — the "attention sink" and "recency bias" effects. Trim irrelevant context aggressively. A focused 10K token prompt beats a noisy 100K one.

Protocol

MCP — Model Context Protocol

MCP is an open protocol developed by Anthropic that standardises how AI models connect to external tools, data sources, and services. Think of it as a USB-C for AI — one standard connector that works across different models and hosts.

Before MCP, every AI integration was bespoke — custom function definitions, vendor-specific APIs, brittle glue code. MCP defines a standard protocol where servers expose resources (data), tools (actions), and prompts (reusable prompt templates) that any MCP-compatible client can consume.

MCP Server

Exposes resources, tools, and prompt templates. Can be your codebase, a database, Azure APIs, GitHub, Jira, etc.

MCP Client

Claude Desktop, VS Code, custom apps. Discovers available tools from the server and invokes them on the model's behalf.

Impact

MCP & Event-Driven Architectures

MCP fundamentally changes how AI integrates with event-driven systems. An AI agent can now be a first-class participant in your event architecture — publishing and consuming events through MCP tool servers.

AI as an Event Consumer

An N8N workflow triggers on an Azure Service Bus message and invokes Claude via MCP to classify, enrich, or route the event — all without human intervention.

AI as an Event Producer

Claude Code detects a failed deployment in a log file and autonomously publishes a structured incident event to your Service Bus, triggering downstream alerting workflows.

Custom Skills as MCP Resources

Package reusable domain knowledge (e.g., "review this PR for OWASP Top 10 violations") as MCP prompt templates. Teams share and version these skills like they do code libraries.

AI in the CI/CD Feedback Loop

Custom MCP servers expose Azure DevOps APIs — AI agents can read pipeline failures, suggest fixes, create work items, and open PRs as part of an automated feedback loop.

Local AI

Running AI Locally with OLlama

OLlama makes running large language models locally trivially simple. One command, instant local inference — no API keys, no data leaving your machine, no per-token costs.

When to Use Local AI

Sensitive codebases where IP must not leave the network · Air-gapped environments · High-volume tasks where API costs are prohibitive · Offline development · Privacy-first client requirements

bash

# Install OLlama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull and run models
ollama pull llama3:70b
ollama pull codestral:22b
ollama pull phi3:mini

# Start local API server (OpenAI-compatible)
ollama serve
# Available at http://localhost:11434

# Use with any OpenAI SDK
const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',
});

Integration

N8N + OLlama: Local AI Pipelines

Combine N8N's workflow automation with OLlama's local inference to build powerful, private AI automation pipelines. No external API dependencies required.

Automated Code Review Pipeline

N8N triggers on GitHub PR webhook → extracts diff → sends to OLlama (codestral) → posts structured review comment back to GitHub PR.

Documentation Generator

N8N monitors new .NET files in a repo → OLlama generates XML doc comments and README sections → creates PR with documentation additions.

Security Triage Assistant

N8N receives Snyk vulnerability webhooks → OLlama classifies severity and suggests fixes → creates Azure DevOps work items with remediation steps.