Vinay Jayanna

Field Guides

Each guide is a complete treatment of a production ML systems topic - not a survey, not a tutorial, but a decision framework you can apply directly to real deployments.

Field Guide · v1.1

Sizing LLM Inference for Production

Inference is where AI infrastructure spend compounds indefinitely with usage growth. Most production LLM fleets are paying 2–3× what they need to - not from hardware limits, but from sizing decisions made without a disciplined framework. This guide covers the complete decision sequence: workload characterization, GPU memory sizing, parallelism strategy, KV cache optimization, and fleet sizing from the latency-throughput curve.

📄 107 pages◎ 9 chapters👤 Staff / Principal ML Engineers

Read the guide →

Field Guide · In Progress

Agentic Systems in Production

Production architecture for multi-agent LLM systems: orchestration patterns, tool reliability, memory and state management, latency budgets, failure modes, observability, and cost control. The guide that bridges research prototypes and production deployments.

📄 100+ pages◎ 10 chapters👤 Staff / Principal ML EngineersIn Progress

Writing

Technical articles on LLM inference, ML infrastructure, and production AI systems.

Multi-Tenant LLM Inference: Bridging Research and Reality

LinkedIn · 2024

The Great LLM Inference Showdown: TensorRT-LLM vs vLLM

LinkedIn · 2024

The Challenge of Production LLM Serving: A Ray Serve Perspective

LinkedIn · 2024

KV Cache: The Hidden Optimization Behind Real-Time AI

LinkedIn · 2024

The Infrastructure No One Talks About: How Vector Search Makes Gen AI Work

LinkedIn · 2024

All articles on LinkedIn →

LLM Inference &ML Infrastructure

Field Guides

Writing

LLM Inference &
ML Infrastructure