V1-5-pruned-emaonly-fp16 -
The model is an optimized version of the original Stable Diffusion v1.5 base model. It is designed specifically for inference —the process of generating images from text prompts—rather than for further training or fine-tuning. Decoding the Name
Training a neural network is like riding a rollercoaster. It has highs and lows. During training, the model keeps two versions of itself: v1-5-pruned-emaonly-fp16
: The underlying architecture used for text-to-image generation, known for its high compatibility with third-party additions like LoRAs and ControlNets. The model is an optimized version of the
This part of the name often causes the most confusion. EMA stands for . It has highs and lows
The "emaonly" tag indicates that the non-EMA weights have been stripped out of the file entirely. By keeping only the EMA weights, the file size is roughly halved. While the filename implies it is "EMA only," this version is generally considered the "standard" inference model because it offers the best balance of quality and size.
Now came the magic trick. Normally, the model stored numbers in fp32 (32-bit floating point)—very precise, like measuring a hair’s width with a laser. But for image generation, you don’t need that level of precision. fp16 uses 16 bits—half the storage, half the memory bandwidth.
The curators looked inside the model and saw a jungle of mathematical weights—over 1 billion parameters. But many were duplicates or near-zero values. Pruning was like trimming a bonsai tree. They surgically removed the weakest connections. A neuron that never fired? Gone. A weight that was always 0.00001? Deleted.