Tensor.Art
Create

Stable Diffusion [Floating Point, Performance in the Cloud]


Updated:

Overview of Data Formats used in AI

fp32 is the default data format used for training, along with mixed-precision training that uses both fp32 and fp16. fp32 has more than adequate scale and definition to effectively train the most complex neural networks. It also results in large models both in terms of parameter size and complexity.

fp16 data format both in hardware and software with good performance. In running AI inference workloads, the adoption of fp16 instead of the mainstream fp32 offers tremendous advantages in terms of speed-up while reducing power consumption and memory footprint. This advantage comes with virtually no accuracy loss. The switch to fp16 is completely seamless and does not require any major code changes or fine-tuning.

CPUs will improve their AI inference workload performance instantly. Overview of Data Formats used in AI fp32 is the default data format used for training, along with mixed-precision training that uses both fp32 and fp16. fp32 has more than adequate scale and definition to effectively train the most complex neural networks.

fp32 can represent numbers between 10⁻⁴⁵ and 10³⁸. In most cases, such a wide range is wasteful and does not bring additional precision. The use of fp16 reduces this range to 10⁻⁸ and 65,504 and cuts in half the memory requirements while also accelerating the training and inference speeds. Make sure to avoid under and overflow situations.

Once the training is completed, one of the most popular ways to improve performance is to quantize the network. A popular data format used in this process, mainly in edge applications is int8 and results in at most a 4x reduction in size with a notable performance improvement. However, quantization into int8 frequently leads to some accuracy loss. Sometimes, the loss is limited to a fraction of a percent but often results in a few percent of degradation, and in many applications, this degradation becomes unacceptable.

There are ways to limit accuracy loss by doing quantization-aware training. This consists of introducing the int8 data format selectively and/or progressively during training. It is also possible to apply quantization to the weights while keeping activation functions at fp32 resolution. Though these methods will help limit the accuracy loss, they will not eliminate it altogether. fp16 is a data format that can be the right solution for preventing accuracy loss while requiring minimal or no conversion effort. Indeed, it has been observed in many benchmarks that the transition from fp32 to fp16 results in no noticeable accuracy without any re-training.

Conclusion

For NVIDIA GPUs and AI, deploy in fp16 to double inference speeds while reducing the memory, footprint and power consumption.

Note: If the original model was not trained using fp16, its conversion to fp16 is extremely easy and does not require re-training or code changes. It is also shown that the switch to fp16 led to no visible accuracy loss in most cases.

Source: https://amperecomputing.com/

0

Comments