Pytorch For Cuda 12.6 _hot_ -

from torch.cuda.amp import autocast, GradScaler scaler = GradScaler() with autocast(dtype=torch.float16): # or torch.bfloat16 for Hopper+ output = model(input) loss = criterion(output, target) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update()

x = torch.randn(10000, 10000).cuda() y = torch.randn(10000, 10000).cuda() z = torch.matmul(x, y) print(f"Matrix multiplication result shape: z.shape") print(f"Peak memory: torch.cuda.max_memory_allocated() / 1e9:.2f GB") pytorch for cuda 12.6

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 from torch

Pytorch For Cuda 12.6 _hot_ -