AI Engineering
How I use AI as a force multiplier in full-stack engineering — from the tools I run daily to how MCP and custom skills are reshaping event-driven architectures and modern software delivery.
The 5 Layers of AI Engineering
AI engineering is not just "call an API and get a response." It's a layered discipline — from foundation models and inference, through orchestration and integration, to the end-user application. Understanding each layer is key to building robust, production-grade AI systems.
Think of AI engineering like a TCP/IP stack. Each layer has a clear responsibility and abstraction boundary. You build on top of lower layers without needing to understand their internals — until something breaks.
User-Facing AI Applications
Chat interfaces, AI copilots, autonomous agents, coding assistants. This is what users interact with. Blazor, Next.js, React Native apps calling the orchestration layer.
Tool Use & External Systems
MCP servers, function calling, webhooks. AI connects to databases, APIs, code editors, file systems. This is where N8N workflows and custom skills live.
Prompt Pipelines & Agent Loops
Chain-of-thought, RAG pipelines, multi-agent coordination, memory management. Tools: LangChain, Semantic Kernel, custom orchestrators, N8N.
Model Access & Context Management
Claude API, OpenAI API, OLlama (local inference). Manages context windows, token budgets, temperature, streaming. Rate limiting and cost optimisation.
Models & Training
Claude Sonnet/Opus, GPT-4o, Codex, Llama 3, Mistral, Phi-3. Pre-trained foundation models. Fine-tuning and RLHF happen here — mostly handled by the model providers.
Current AI Workflow Stack
The tools I use daily as a full-stack engineer to move faster, write better code, and automate the repetitive parts of the job.
Claude Code
Anthropic's agentic coding tool. Runs in the terminal with full file-system access, executes commands, and iterates on complex multi-file refactors autonomously. My go-to for architecture-level coding sessions.
ChatGPT Codex
OpenAI's cloud-based coding agent. Excels at writing boilerplate, generating tests, and explaining complex algorithms. Works directly inside GitHub repos and VS Code via Copilot.
Antigravity
AI-powered development acceleration platform. Integrates directly into the development workflow to provide context-aware suggestions, automated documentation, and intelligent code review at the PR level.
N8N
Open-source workflow automation. I use N8N to orchestrate AI pipelines — connecting LLMs to databases, webhooks, Slack, GitHub, Azure services, and custom HTTP endpoints. Visual workflow builder meets code flexibility.
OLlama
Run large language models locally — Llama 3, Mistral, Phi-3, Codestral and more. Zero API costs, complete data privacy, and no internet dependency. Essential for air-gapped environments and sensitive codebases.
MCP Servers
Model Context Protocol servers expose resources, tools, and prompts to AI models. I build and run custom MCP servers for GitHub, Azure APIs, databases, and internal business tools — making AI genuinely useful in real workflows.
Prompt Engineering: 10 Layers Deep
Prompt engineering is a skill, not a trick. These layers build on each other — master them in order for compounding returns on AI output quality.
System Prompts are Architecture
Define the model's persona, constraints, and output format in the system prompt. Treat it as your API contract — version-control it alongside your code. A weak system prompt produces unpredictable outputs.
Role Priming Sets the Expert Frame
"You are a senior C# architect reviewing a pull request for security vulnerabilities..." immediately elevates response quality. The model adopts the context and expertise level you specify.
Few-Shot Examples Beat Long Instructions
Provide 3–5 input/output examples of exactly the format you want. The model learns the pattern faster from examples than from lengthy prose instructions. Especially powerful for code generation.
Chain-of-Thought for Complex Reasoning
Add "Think step by step before answering" or "Show your reasoning" for tasks requiring logic, debugging, or architecture decisions. CoT dramatically reduces reasoning errors.
Output Format Contracts
Specify exact output structure: JSON schema, markdown headings, code blocks with language tags. Models that know the expected format produce far more consistent, parseable outputs — essential for automated pipelines.
Negative Constraints Reduce Hallucination
Explicitly say what NOT to do: "Do not add comments unless they explain non-obvious logic." "Do not use deprecated APIs." "If unsure, say so — do not guess." Negative constraints prevent common failure modes.
RAG Grounds Responses in Reality
Retrieval Augmented Generation injects relevant context (docs, code, DB records) into the prompt at query time. Instead of asking the model to "know" your codebase, give it the relevant files — then ask your question.
Temperature is a Quality Dial
Low temperature (0.1–0.3): deterministic, factual tasks — SQL queries, code reviews, data extraction. High temperature (0.7–0.9): creative writing, brainstorming, generating diverse options. Always set it intentionally.
Iterative Multi-Turn Refinement
Don't try to get the perfect answer in one giant prompt. Start broad, then refine: "Now add error handling." "Now make it idiomatic C#." "Now write the unit tests." Multi-turn produces better results than monolithic prompts.
Context Window is Prime Real Estate
The most important information should appear at the start and end of your context — the "attention sink" and "recency bias" effects. Trim irrelevant context aggressively. A focused 10K token prompt beats a noisy 100K one.
MCP — Model Context Protocol
MCP is an open protocol developed by Anthropic that standardises how AI models connect to external tools, data sources, and services. Think of it as a USB-C for AI — one standard connector that works across different models and hosts.
Before MCP, every AI integration was bespoke — custom function definitions, vendor-specific APIs, brittle glue code. MCP defines a standard protocol where servers expose resources (data), tools (actions), and prompts (reusable prompt templates) that any MCP-compatible client can consume.
MCP Server
Exposes resources, tools, and prompt templates. Can be your codebase, a database, Azure APIs, GitHub, Jira, etc.
MCP Client
Claude Desktop, VS Code, custom apps. Discovers available tools from the server and invokes them on the model's behalf.
MCP & Event-Driven Architectures
MCP fundamentally changes how AI integrates with event-driven systems. An AI agent can now be a first-class participant in your event architecture — publishing and consuming events through MCP tool servers.
AI as an Event Consumer
An N8N workflow triggers on an Azure Service Bus message and invokes Claude via MCP to classify, enrich, or route the event — all without human intervention.
AI as an Event Producer
Claude Code detects a failed deployment in a log file and autonomously publishes a structured incident event to your Service Bus, triggering downstream alerting workflows.
Custom Skills as MCP Resources
Package reusable domain knowledge (e.g., "review this PR for OWASP Top 10 violations") as MCP prompt templates. Teams share and version these skills like they do code libraries.
AI in the CI/CD Feedback Loop
Custom MCP servers expose Azure DevOps APIs — AI agents can read pipeline failures, suggest fixes, create work items, and open PRs as part of an automated feedback loop.
Running AI Locally with OLlama
OLlama makes running large language models locally trivially simple. One command, instant local inference — no API keys, no data leaving your machine, no per-token costs.
Sensitive codebases where IP must not leave the network · Air-gapped environments · High-volume tasks where API costs are prohibitive · Offline development · Privacy-first client requirements
# Install OLlama curl -fsSL https://ollama.ai/install.sh | sh # Pull and run models ollama pull llama3:70b ollama pull codestral:22b ollama pull phi3:mini # Start local API server (OpenAI-compatible) ollama serve # Available at http://localhost:11434 # Use with any OpenAI SDK const client = new OpenAI({ baseURL: 'http://localhost:11434/v1', apiKey: 'ollama', });
N8N + OLlama: Local AI Pipelines
Combine N8N's workflow automation with OLlama's local inference to build powerful, private AI automation pipelines. No external API dependencies required.
Automated Code Review Pipeline
N8N triggers on GitHub PR webhook → extracts diff → sends to OLlama (codestral) → posts structured review comment back to GitHub PR.
Documentation Generator
N8N monitors new .NET files in a repo → OLlama generates XML doc comments and README sections → creates PR with documentation additions.
Security Triage Assistant
N8N receives Snyk vulnerability webhooks → OLlama classifies severity and suggests fixes → creates Azure DevOps work items with remediation steps.