Field Guide · v1.1
Sizing LLM Inference for Production
Inference is where AI infrastructure spend compounds indefinitely with usage growth. Most production LLM fleets are paying 2–3× what they need to - not from hardware limits, but from sizing decisions made without a disciplined framework. This guide covers the complete decision sequence: workload characterization, GPU memory sizing, parallelism strategy, KV cache optimization, and fleet sizing from the latency-throughput curve.
📄 107 pages◎ 9 chapters👤 Staff / Principal ML Engineers
Read the guide →
Field Guide · In Progress
Agentic Systems in Production
Production architecture for multi-agent LLM systems: orchestration patterns, tool reliability, memory and state management, latency budgets, failure modes, observability, and cost control. The guide that bridges research prototypes and production deployments.
📄 100+ pages◎ 10 chapters👤 Staff / Principal ML EngineersIn Progress