All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
Int8
Quantization Inference
How Int8 Quantized
Convolution Works
Sparseml Yolov5 Documentation
Amir GitHub
Use of Char Command in MATLAB
Awq0
Onnx vs Ultralytics
Porfelwirting Qshen with Awsar
Dyad Model
Tensorrt Dla
Int8 Quantization
Int8
Dynamic Model Quantization
Int8
Quantization
LLM Int4
Regression Quantileneural Network Matlab
Vision Language Model Quantization
Hawq Practical and Theory
Model Quantization
Meaning Quantaization Ai
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
Int8
Quantization Inference
How Int8 Quantized
Convolution Works
Sparseml Yolov5 Documentation
Amir GitHub
Use of Char Command in MATLAB
Awq0
Onnx vs Ultralytics
Porfelwirting Qshen with Awsar
Dyad Model
Tensorrt Dla
Int8 Quantization
Int8
Dynamic Model Quantization
Int8
Quantization
LLM Int4
Regression Quantileneural Network Matlab
Vision Language Model Quantization
Hawq Practical and Theory
Model Quantization
Meaning Quantaization Ai
19:55
Faster and Lighter Model Inference with ONNX Runtime from Cloud to Client
Aug 3, 2022
Microsoft
markdefalco
0:44
Quantization: What Everyone Gets Wrong (Accuracy Myths)
65 views
3 weeks ago
YouTube
Code & Capital
1:17
ArmVision Assist – Offline Action Agent for ARM Mobile
13 views
5 months ago
YouTube
Shyam Sharma
0:16
FP32 FP16 FP8 TENSOR INT #chatgpt #llm #google #tech #ytshorts #yt #youtube #youtubeshorts
61 views
1 month ago
YouTube
Amit_Chopra_assruc
0:41
Google magic bullet - TurboQuant #ai #gpu #google #chips #cuda #quantization
1.3K views
1 month ago
YouTube
Neural AI Flair
2:18
What is Quantization in AI? How you Run Models on Your Laptop| #ai #maths #shorts #yt #quantization
1 month ago
YouTube
Harsh Shukla
1:08:05
Tikhomirov M.M. - Training of large language models - 8. Inference, quantization
218 views
2 weeks ago
YouTube
teach-in
6:29
Inference Optimization: Making AI Faster & Cheaper (Latency, Throughput & GPUs)
56 views
1 month ago
YouTube
wecite
0:45
Quantization Explained: How LLMs Get Smaller and Faster
88 views
1 month ago
YouTube
Dev Alpha Lab
7:29
Model Quantization Explained 8 bit, 4 bit & Inference Optimization #genai #aigenerated
32 views
2 months ago
YouTube
SmartSkale
2:36
I added KV caching and INT8 KV quantization to our transformer inference, improving throughput by 35x.All of this was done from scratch in Rust + CUDA, on top of a homemade ML framework.On a 4-token prompt with 252 generated tokens:- Original: 0.76 tok/s- KV cache fp32: 27.21 tok/s- KV cache int8 (quantized): 27.29 tok/sTry it out yourself here: https://t.co/kFS9Z0fs4hIn practice:- KV caching gave us about a 35x end-to-end speedup- INT8 KV cache kept roughly the same speed as fp32 but cut KV cac
48.8K views
3 weeks ago
x.com
Reese Chong
0:23
This is the clearest explanation of how LLM quantization works:It lets engineers compress a model by 4x and run it 2x faster without quality loss.I stumbled on this piece by Sam Rose and honestly wish I had it when I first tried to understand quantization.He breaks it down from absolute zero - bits, floats, how weights are stored - all the way to actually benchmarking quantized models. With interactive demos you can play with.Here's the core idea in 60 seconds:→ An LLM is billions of numbers (we
7K views
2 weeks ago
x.com
Sukh Sroay
Z-Image-Turbo INT8 — AI Playground & API - deAPI.ai
3 weeks ago
deapi.ai
Model Precision and Deployment Choices in Object Detection | Mohammad Zaid posted on the topic | LinkedIn
4 views
3 weeks ago
linkedin.com
Edge ML Development for i.MX Processors
Aug 21, 2024
nxp.com
41:59
Sampling Theorem Quantization and Binary Coding
7.1K views
Apr 11, 2021
YouTube
Engineering with Bingabr
9:58
SmoothQuant
4.4K views
Oct 25, 2023
YouTube
MIT HAN Lab
0:51
Quantization explained
504 views
3 months ago
YouTube
Chip Talks AI
9:57
What is LLM Quantization ?
3.2K views
Mar 19, 2025
YouTube
New Machina
2:11
NVIDIA Tesla T4 Introduction to Inference
3.7K views
Apr 18, 2019
YouTube
Boston Limited
1:16:40
Lecture 30: Quantized Training
3.3K views
Oct 7, 2024
YouTube
GPU MODE
12:10
Optimize Your AI - Quantization Explained
465.1K views
Dec 28, 2024
YouTube
Matt Williams
1:36
What Is Quantization? | Decoding LLM File Names
1.3K views
4 months ago
YouTube
Anaconda, Inc.
1:01
Towards Unified INT8 Training for Convolutional Neural Network
803 views
Jul 17, 2020
YouTube
ComputerVisionFoundation Videos
7:14
What Are Weights in AI Models
381 views
3 months ago
YouTube
CloudProInc
4:42
Optimize LLMs for faster AI inference
434 views
3 months ago
YouTube
Red Hat
18:58
From FP32 to INT8: Post-Training Quantization Explained in PyTorch
928 views
6 months ago
YouTube
MLWorks
57:10
Lec 30 | Quantization, Pruning & Distillation
6.8K views
Mar 23, 2025
YouTube
NPTEL IIT Delhi
6:59
Boosting Model Performance with Quantization Techniques
7 views
7 months ago
YouTube
NextGen AI Explorer
1:11:43
Lecture 05 - Quantization (Part I) | MIT 6.S965
19.2K views
Sep 22, 2022
YouTube
MIT HAN Lab
See more
More like this
Feedback