Learning Paths
Structured learning journeys for AI engineers. Each path is designed to take you from foundational concepts to production-ready implementations. Follow the steps in order, or jump to where you need to be today.
Path 1: Foundations
Understand tokens, context limits, probability, and failure modes.
- 1 How LLMs Actually Work: Tokens, Context, and Probability
A production-minded explanation of what LLMs actually do under the hood—and why tokens, context windows, and probability matter for cost, latency, and reliability.
- 2 Prompting Is Not Magic: What Really Changes the Output
Prompting does not make models smarter or more truthful. This article explains what prompts actually change under the hood, why small edits cause big differences, and how engineers should think about prompting in production systems.
- 3 Why Models Hallucinate (And Why That's Expected)
Hallucination is not a bug in large language models but a predictable outcome of probabilistic text generation. This article explains why hallucinations happen, when they become more likely, and how engineers should design around them.
- 4 Choosing the Right Model for the Job
There is no universally best AI model. This article presents a production-minded approach to model selection, focusing on trade-offs, system requirements, and strategies for switching and fallback.
Path 2: Prompting for Production
Make prompts stable, testable, and safe to integrate with systems.
- 1 Prompt Structure Patterns for Production
Prompts used in production must behave like interfaces, not ad hoc text. This article introduces proven prompt structure patterns that improve reliability, debuggability, and long-term maintainability.
- 2 Output Control with JSON and Schemas
Free-form AI output is fragile in production. This article explains how to use JSON and schema validation to make LLM outputs safer, more predictable, and easier to integrate with deterministic systems.
- 3 Debugging Bad Prompts Systematically
When AI outputs fail, random prompt tweaking is not debugging. This article presents a systematic methodology for identifying, reproducing, and fixing prompt-related failures in production systems.
- 4 Prompt Anti-patterns Engineers Fall Into
Many prompt failures come from familiar engineering anti-patterns applied to natural language. This article identifies the most common prompt anti-patterns and explains why they break down in production.
Path 3: RAG Systems
Build grounded AI with retrieval, ranking, and measurable quality.
- 1 Why RAG Exists (And When Not to Use It)
RAG is not a universal fix for AI correctness. This article explains the real problem RAG addresses, its hidden costs, and how to decide whether retrieval is justified for a given system.
- 2 Chunking Strategies That Actually Work
Effective chunking is an information architecture problem, not a text-splitting task. This article covers practical chunking strategies that improve retrieval accuracy in real-world RAG systems.
- 3 Retrieval Is the Hard Part
Most RAG failures stem from poor retrieval, not weak models. This article explains why retrieval is difficult, how to improve it, and how to debug retrieval failures systematically.
- 4 Evaluating RAG Quality: Precision, Recall, and Faithfulness
Without evaluation, RAG systems cannot improve reliably. This article introduces practical metrics and evaluation strategies for measuring retrieval accuracy, answer grounding, and regression over time.
Path 4: Evaluation & Quality
Define launch gates and metrics so AI changes are testable, comparable, and safe to ship.
- 1 How to Evaluate an LLM Feature Before Launch (A Practical Pass/Fail Workflow)
A practical pre-launch workflow for evaluating LLM-powered features with pass/fail criteria, scoped test sets, and regression checks before rollout.
- 2 LLM Evaluation Metrics That Actually Matter (Task Success, Groundedness, Calibration)
Defines the core evaluation metrics that matter in production LLM systems and shows when each metric is useful, misleading, or incomplete.
- 3 Building a Test Set for LLM Features (Golden Cases, Edge Cases, Failure Buckets)
A practical guide to constructing reusable LLM test sets with golden cases, edge cases, and failure buckets that support regression testing.
- 4 Build an Eval Harness for Prompt and RAG Changes (Without Overengineering It)
Shows how to build a lightweight evaluation harness for prompt and RAG changes so teams can compare revisions without slowing down shipping.
- 5 Playbook: Building an Evaluation Pipeline for Prompt + RAG Changes
A step-by-step playbook for building an evaluation pipeline that catches regressions in prompt and RAG changes before production rollout.
Path 5: LLMOps & Deployment
Instrument, rollout, and operate AI features with production-safe observability and deployment controls.
- 1 Observability for AI Systems: What to Log, Trace, and Alert On
A production-focused observability framework for AI systems covering logs, traces, metrics, alerts, and debugging workflows.
- 2 Production Rollouts for AI Features (Shadow Mode, Canary, Guardrails, Rollback)
A rollout strategy for AI features using shadow mode, canaries, guardrails, and rollback plans to reduce production risk.
Path 6: Security & Safety
Harden AI systems against prompt injection, unsafe tool execution, and sensitive-data handling failures.
- 1 Prompt Injection and Retrieval Poisoning: Practical Defenses for Production Systems
Practical defenses against prompt injection and retrieval poisoning, with engineering patterns for containment, validation, and incident response.
- 2 PII and Sensitive Data in LLM Apps (Redaction, Storage Boundaries, Access Controls)
A practical guide to handling PII and sensitive data in LLM applications, including redaction strategies, storage boundaries, and access controls.
Path 7: Cost & Performance
Control unit economics and response speed without breaking reliability or product quality.
- 1 Cost Control Patterns for LLM Apps (Routing, Caching, Truncation, Fallbacks)
Proven cost-control patterns for LLM applications, including routing, caching, truncation, and fallback strategies that preserve quality.
- 2 Latency Budgeting for AI Features (Where the Time Goes and How to Cut It)
A latency budgeting framework for AI features that breaks down where time goes across model, retrieval, and orchestration layers.
Path 8: Agents & Workflows
Choose the right level of autonomy and constrain tool-using systems for production use.
- 1 Agents vs Workflows: A Decision Framework for Engineers (Use Cases, Failure Modes, Escalation Paths)
A decision framework for when to use agents vs deterministic workflows, with failure modes and escalation paths for production systems.
Custom Learning Path
Don't need to follow the full path? Jump directly to what you're building today:
- Building a chatbot? Start with Prompt Structure Patterns
- Need structured data? Go to Output Control with JSON
- Building a RAG system? Begin at Why RAG Exists
- Debugging prompts? See Debugging Bad Prompts