LitMy.ru - литература в один клик

Quantization and Fast Inference: A practitioner’s guide to efficient AI (MEAP v1)

  • Добавил: literator
  • Дата: 16-05-2026, 15:28
  • Комментариев: 0

Название: Quantization and Fast Inference: A practitioner’s guide to efficient AI (MEAP v1)
Автор: Vivek Kalyanarangan
Издательство: Manning Publications
Год: 2026
Страниц: 155
Язык: английский
Формат: pdf (true), epub
Размер: 13.1 MB

A practitioner’s guide to efficient AI.

Today's AI models demand a lot of memory, compute, and server horsepower--which quickly translates into cost. Quantization and Fast Inference show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment--all with minimal accuracy loss.

From quantization fundamentals to runtime packaging, the book gives you a complete and comprehensive overview of the full quantization pipeline. It starts by deriving quantization mapping from first principles, and then builds your knowledge and skill through techniques for production-tested PTQ and QAT workflows and a fully-compressed deployment. You'll learn to apply post-training quantization to production models, run quantization-aware training using fake quantization and straight-through estimators, and handle subtle tradeoffs like activation outliers in LLMs, KV cache pressure, and sub-8-bit formats like NF4 and FP4.

what's inside
Applying post-training quantization to production models
Deploying efficiently on CPUs, edge devices, and mobile
Framework-agnostic techniques and real cross-framework parity testing
Flowcharts and checklists for efficient decision making

about the reader
For ML engineers and researchers experienced in Python.

about the author
Vivek Kalyanarangan is an AI/ML architect, researcher, and educator with over twelve years of experience designing and deploying large-scale machine learning systems.

Скачать Quantization and Fast Inference : A practitioner’s guide to efficient AI (MEAP v1)












[related-news] [/related-news]
Внимание
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.