# Qwantize

Optimal quantization methods for block-scaled formats.

## Installation

```bash
pip install qwantize
```

Requires PyTorch (>=2.0) and Triton (>=3.0).

## Repository

- **GitHub**: [github.com/ayghri/qwantize](https://github.com/ayghri/qwantize)

## Formats

- **INT8** -- Symmetric INT8 with FP8 E4M3 scales (block sizes 32, 64, 128, 256)
- **NVFP4** -- FP4 E2M1 with FP8 E4M3 scales (block sizes 16, 32)
- **MXFP4** -- FP4 E2M1 with UE8M0 (power-of-2) scales (block sizes 16, 32)

## Quick Start

```python
from qwantize import nvfp4_naive, nvfp4_optimal, nvfp4_dequantize, compute_metrics

# W has shape (..., block_size) where block_size is 16 or 32
# dim specifies which dimension is the block dimension (default: -1)
W_blocked = W.reshape(M, K // 32, 32)

# Quantize: returns (scales, quants)
scales, quants = nvfp4_optimal(W_blocked, dim=-1)

# Dequantize separately
W_dq = nvfp4_dequantize(scales, quants, dim=-1)

# Or get dequantized output directly
scales, quants, W_dq = nvfp4_optimal(W_blocked, dim=-1, return_dequant=True)

metrics = compute_metrics(W, W_dq.reshape(M, K), X)
```

```{toctree}
:maxdepth: 2
:caption: Methods

optimal_scale_search
hessian_scale_search
spgl1_compensation
scale_distance
triton_kernels
custom_codebook
```

```{toctree}
:maxdepth: 2
:caption: Results

results
```

```{toctree}
:maxdepth: 2
:caption: API Reference

api/int8
api/nvfp4
api/mxfp4
api/metrics
```