Dengarkan dan baca

Masuki dunia cerita tanpa batas

  • Baca dan dengarkan sebanyak yang Anda mau
  • Lebih dari 1 juta judul
  • Judul eksklusif + Storytel Original
  • Uji coba gratis 14 hari, lalu €9,99/bulan
  • Mudah untuk membatalkan kapan saja
Coba gratis
Details page - Device banner - 894x1036
Cover for Text Generation Inference (TGI): Deploying Transformers with Streaming and Batching

Text Generation Inference (TGI): Deploying Transformers with Streaming and Batching

Bahasa
Inggris
Format
Kategori

Non Fiksi

"Text Generation Inference (TGI): Deploying Transformers with Streaming and Batching"

This book is for engineers and platform practitioners who need to move transformer inference from demos into reliable, high-performance production systems. Rather than treating serving as a thin wrapper around model.generate(), it addresses the real tensions that emerge under live traffic: latency versus throughput, streaming responsiveness versus compute cost, and hardware efficiency versus operational simplicity. Readers building chat systems, internal AI platforms, or GPU-backed inference services will find a rigorous guide to what actually governs TGI behavior in production.

Across the book, you will build a working mental model of TGI’s runtime architecture, request flow, streaming semantics, and continuous batching scheduler. It explains prefill and decode execution, token-budget controls, time-to-first-token behavior, multi-GPU sharding, replication trade-offs, and inference optimizations such as attention and memory techniques. The result is practical decision-making skill: how to size deployments, tune throughput-latency trade-offs, expose stable APIs, integrate streaming clients, and diagnose bottlenecks with metrics and tracing.

The treatment assumes experience with transformers, GPU-based model serving, and modern infrastructure patterns such as HTTP APIs, proxies, and observability tooling. Its distinguishing strength is operational depth: the material is organized around deployment decisions, failure modes, compatibility boundaries, and performance trade-offs, making it es

© 2026 NobleTrex Press (E-book): 6610001219109

Tanggal rilis

E-book: 8 Mei 2026

Tag

    Selalu dengan Storytel

    • Lebih dari 900.000 judul

    • Mode Anak (lingkungan aman untuk anak)

    • Unduh buku untuk akses offline

    • Batalkan kapan saja

    Terpopuler

    Premium

    Bagi yang ingin mendengarkan dan membaca tanpa batas.

    Rp39000 /bulan

    • Akses bulanan tanpa batas

    • Batalkan kapan saja

    • Judul dalam bahasa Inggris dan Indonesia

    Coba sekarang

    Premium 6 bulan

    Bagi yang ingin mendengarkan dan membaca tanpa batas

    Rp189000 /6 bulan

    Hemat 19%
    • Akses bulanan tanpa batas

    • Batalkan kapan saja

    • Judul dalam bahasa Inggris dan Indonesia

    Coba sekarang

    Local

    Bagi yang hanya ingin mendengarkan dan membaca dalam bahasa lokal.

    Rp19900 /bulan

    • Akses tidak terbatas

    • Batalkan kapan saja

    • Judul dalam bahasa Indonesia

    Coba sekarang

    Local 6 bulan

    Bagi yang hanya ingin mendengarkan dan membaca dalam bahasa lokal.

    Rp89000 /6 bulan

    Hemat 25%
    • Akses tidak terbatas

    • Batalkan kapan saja

    • Judul dalam bahasa Indonesia

    Coba sekarang