
Hör so viel du willst – ohne Stundenlimit
Tauche ein in eine Welt mit über 600.000 Hörbüchern und E-Books – 2 Monate ohne Stundenbegrenzung für nur 1 € pro Monat. Ohne Bindung, jederzeit kündbar.
Jetzt Angebot aktivierenSachbuch
"Text Generation Inference (TGI): Deploying Transformers with Streaming and Batching"
This book is for engineers and platform practitioners who need to move transformer inference from demos into reliable, high-performance production systems. Rather than treating serving as a thin wrapper around model.generate(), it addresses the real tensions that emerge under live traffic: latency versus throughput, streaming responsiveness versus compute cost, and hardware efficiency versus operational simplicity. Readers building chat systems, internal AI platforms, or GPU-backed inference services will find a rigorous guide to what actually governs TGI behavior in production.
Across the book, you will build a working mental model of TGI’s runtime architecture, request flow, streaming semantics, and continuous batching scheduler. It explains prefill and decode execution, token-budget controls, time-to-first-token behavior, multi-GPU sharding, replication trade-offs, and inference optimizations such as attention and memory techniques. The result is practical decision-making skill: how to size deployments, tune throughput-latency trade-offs, expose stable APIs, integrate streaming clients, and diagnose bottlenecks with metrics and tracing.
The treatment assumes experience with transformers, GPU-based model serving, and modern infrastructure patterns such as HTTP APIs, proxies, and observability tooling. Its distinguishing strength is operational depth: the material is organized around deployment decisions, failure modes, compatibility boundaries, and performance trade-offs, making it es
© 2026 NobleTrex Press (E-Book): 6610001219109
Erscheinungsdatum
E-Book: 8. Mai 2026
Über 600.000 Titel
Lade Titel herunter mit dem Offline Modus
Exklusive Titel und Storytel Originals
Sicher für Kinder (Kindermodus)
Einfach jederzeit kündbar
Für alle, die gelegentlich hören und lesen.
8.90 € /Monat
Jederzeit kündbar
Abo-Upgrade jederzeit möglich
Für alle, die unbegrenzt hören und lesen möchten.
18.90 € /Monat
Jederzeit kündbar
Wechsel zu Basic jederzeit möglich