Learning Paths

Structured learning journeys for AI engineers. Each path is designed to take you from foundational concepts to production-ready implementations. Follow the steps in order, or jump to where you need to be today.

Path 1: Foundations

4 articles • Estimated time: 60 minutes

Understand tokens, context limits, probability, and failure modes.

  1. 1 How LLMs Actually Work: Tokens, Context, and Probability

    A production-minded explanation of what LLMs actually do under the hood—and why tokens, context windows, and probability matter for cost, latency, and reliability.

  2. 2 Prompting Is Not Magic: What Really Changes the Output

    Prompting does not make models smarter or more truthful. This article explains what prompts actually change under the hood, why small edits cause big differences, and how engineers should think about prompting in production systems.

  3. 3 Why Models Hallucinate (And Why That's Expected)

    Hallucination is not a bug in large language models but a predictable outcome of probabilistic text generation. This article explains why hallucinations happen, when they become more likely, and how engineers should design around them.

  4. 4 Choosing the Right Model for the Job

    There is no universally best AI model. This article presents a production-minded approach to model selection, focusing on trade-offs, system requirements, and strategies for switching and fallback.

Path 2: Prompting for Production

4 articles • Estimated time: 60 minutes

Make prompts stable, testable, and safe to integrate with systems.

  1. 1 Prompt Structure Patterns for Production

    Prompts used in production must behave like interfaces, not ad hoc text. This article introduces proven prompt structure patterns that improve reliability, debuggability, and long-term maintainability.

  2. 2 Output Control with JSON and Schemas

    Free-form AI output is fragile in production. This article explains how to use JSON and schema validation to make LLM outputs safer, more predictable, and easier to integrate with deterministic systems.

  3. 3 Debugging Bad Prompts Systematically

    When AI outputs fail, random prompt tweaking is not debugging. This article presents a systematic methodology for identifying, reproducing, and fixing prompt-related failures in production systems.

  4. 4 Prompt Anti-patterns Engineers Fall Into

    Many prompt failures come from familiar engineering anti-patterns applied to natural language. This article identifies the most common prompt anti-patterns and explains why they break down in production.

Path 3: RAG Systems

4 articles • Estimated time: 60 minutes

Build grounded AI with retrieval, ranking, and measurable quality.

  1. 1 Why RAG Exists (And When Not to Use It)

    RAG is not a universal fix for AI correctness. This article explains the real problem RAG addresses, its hidden costs, and how to decide whether retrieval is justified for a given system.

  2. 2 Chunking Strategies That Actually Work

    Effective chunking is an information architecture problem, not a text-splitting task. This article covers practical chunking strategies that improve retrieval accuracy in real-world RAG systems.

  3. 3 Retrieval Is the Hard Part

    Most RAG failures stem from poor retrieval, not weak models. This article explains why retrieval is difficult, how to improve it, and how to debug retrieval failures systematically.

  4. 4 Evaluating RAG Quality: Precision, Recall, and Faithfulness

    Without evaluation, RAG systems cannot improve reliably. This article introduces practical metrics and evaluation strategies for measuring retrieval accuracy, answer grounding, and regression over time.

Path 4: Evaluation & Quality

5 articles • Estimated time: 75 minutes

Define launch gates and metrics so AI changes are testable, comparable, and safe to ship.

  1. 1 How to Evaluate an LLM Feature Before Launch (A Practical Pass/Fail Workflow)

    A practical pre-launch workflow for evaluating LLM-powered features with pass/fail criteria, scoped test sets, and regression checks before rollout.

  2. 2 LLM Evaluation Metrics That Actually Matter (Task Success, Groundedness, Calibration)

    Defines the core evaluation metrics that matter in production LLM systems and shows when each metric is useful, misleading, or incomplete.

  3. 3 Building a Test Set for LLM Features (Golden Cases, Edge Cases, Failure Buckets)

    A practical guide to constructing reusable LLM test sets with golden cases, edge cases, and failure buckets that support regression testing.

  4. 4 Build an Eval Harness for Prompt and RAG Changes (Without Overengineering It)

    Shows how to build a lightweight evaluation harness for prompt and RAG changes so teams can compare revisions without slowing down shipping.

  5. 5 Playbook: Building an Evaluation Pipeline for Prompt + RAG Changes

    A step-by-step playbook for building an evaluation pipeline that catches regressions in prompt and RAG changes before production rollout.

Path 5: LLMOps & Deployment

2 articles • Estimated time: 30 minutes

Instrument, rollout, and operate AI features with production-safe observability and deployment controls.

  1. 1 Observability for AI Systems: What to Log, Trace, and Alert On

    A production-focused observability framework for AI systems covering logs, traces, metrics, alerts, and debugging workflows.

  2. 2 Production Rollouts for AI Features (Shadow Mode, Canary, Guardrails, Rollback)

    A rollout strategy for AI features using shadow mode, canaries, guardrails, and rollback plans to reduce production risk.

Path 6: Security & Safety

2 articles • Estimated time: 30 minutes

Harden AI systems against prompt injection, unsafe tool execution, and sensitive-data handling failures.

  1. 1 Prompt Injection and Retrieval Poisoning: Practical Defenses for Production Systems

    Practical defenses against prompt injection and retrieval poisoning, with engineering patterns for containment, validation, and incident response.

  2. 2 PII and Sensitive Data in LLM Apps (Redaction, Storage Boundaries, Access Controls)

    A practical guide to handling PII and sensitive data in LLM applications, including redaction strategies, storage boundaries, and access controls.

Path 7: Cost & Performance

2 articles • Estimated time: 30 minutes

Control unit economics and response speed without breaking reliability or product quality.

  1. 1 Cost Control Patterns for LLM Apps (Routing, Caching, Truncation, Fallbacks)

    Proven cost-control patterns for LLM applications, including routing, caching, truncation, and fallback strategies that preserve quality.

  2. 2 Latency Budgeting for AI Features (Where the Time Goes and How to Cut It)

    A latency budgeting framework for AI features that breaks down where time goes across model, retrieval, and orchestration layers.

Path 8: Agents & Workflows

1 articles • Estimated time: 15 minutes

Choose the right level of autonomy and constrain tool-using systems for production use.

  1. 1 Agents vs Workflows: A Decision Framework for Engineers (Use Cases, Failure Modes, Escalation Paths)

    A decision framework for when to use agents vs deterministic workflows, with failure modes and escalation paths for production systems.

Custom Learning Path

Don't need to follow the full path? Jump directly to what you're building today: