Название: Mastering Computer Vision with PyTorch and Machine Learning Автор: Caide Xiao Издательство: IOP Publishing Год: 2024 Страниц: 365 Язык: английский Формат: pdf (true), epub Размер: 110.5 MB
This book, together with the accompanying Python codes, provides a thorough and extensive guide for mastering advanced computer vision techniques for image processing by using the open-source machine learning framework PyTorch. Known for its user-friendly interface and Python programming style, PyTorch is accessible and one of the most popular tools among researchers and practitioners in the field of Artificial Intelligence.
Computer Vision is a field of Artificial Intelligence and Computer Science that focuses on enabling computers to interpret and understand visual information from the world around them. Computer vision and Machine Learning are closely related fields. Machine Learning is used in computer vision to enable computers to automatically find patterns and relationships in large datasets of images and videos. With a focus on practical applications, this book covers essential concepts such as Kullback Leibler divergence, maximum likelihood, convolutional neural networks (CNN), generative adversarial networks (GAN), Wasserstein generative adversarial networks (WGAN), WGAN with gradient penalty (WGAN-GP), information maximizing generative adversarial networks (infoGAN), variational autoencoders (VAE), and their applications for image classification/image generation. Readers will also learn how to leverage the latest computer vision techniques like Yolov8 for object detection, stable diffusion models for image generation, vision transformers for zero-shot object detection, knowledge distillation for compression of neural networks, DINO for self-supervised learning, segment anything models (SAM), NeRF and 3D Gaussian Splatting for 3D scenes synthesis. This book is a valuable resource for professionals, researchers, and students who want to expand their knowledge of advanced computer vision techniques using PyTorch. With clear explanations, practical examples, and real-world use cases, readers will learn how to apply computer vision techniques to image analysis tasks, and develop skills necessary to build and train their own models for advanced image analysis. Whether you are a beginner or an experienced data scientist, this book will provide you with the knowledge and tools you need to succeed.
Prerequisites to readers: Computer Vision projects use neural network models or vision transformers as mathematical functions to process images. The outputs of these functions could be integers for image classification, or new images similar to images in train datasets for image generation, or bounding boxes with class identities and probabilities for objection detection. Neural networks in models are mathematical operators organized in special sequences. Large-scale computer vision models may have billions of parameters to learn from training datasets. Model training time is dependent on the sizes of the input data, Python codes, and computer hardware to process the data. This book is written for readers with linear algebra knowledge about matrix calculations and basic ideas about statistics. With such mathematics background, readers can write their codes even they have no idea about Python which is a simple yet powerful programming language with excellent functionality for machine learning. Kaggle and Google Colab provide free GPUs for any people with a Google account. In those two free internet platforms, powerful computer hardware and software are ready to use. Most codes listed in the book can run on either of the two free internet platforms. Readers do not need to buy expensive computers.
Because of some limits in Kaggle and Google Colab, it's better to have your own computer with the following open-source software. The first is Microsoft Visual Studio Code (VS Code). It is a free code editor provided by Microsoft for Windows, Linux, and macOS. It has many features and extensions for editing, debugging, and developing codes in many programming languages. It is a popular choice for people of all skill levels due to its ease of use, customizability, and community-driven development. In a VS Code terminal, we can then install many extensions, such as Python, Jupyter notebook, PyTorch, Torchvision, NumPy, Pandas, Matplotlib, OpenCV, tqdm, and more. The development of computer science is very fast. We need to have more than one virtual workspace in VS Code for different versions of Python and other libraries. Most codes in the book can run with Python v3.6, and many codes in the last six chapters must run with Python v3.9 to v3.11 for the latest computer vision techniques, such as Ultralytics Yolov7 to Yolov9, Hugging Face zero-shot object detection, Facebook DINO and SAM.
Скачать Mastering Computer Vision with PyTorch and Machine Learning
Внимание
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.