6. Training & Inference

Overview

Notes on efficient training and inference: batching, KV cache, quantization, distillation, serving patterns.