Turbo-Softmax
Fast high-precision Softmax kernels in C for resource-constrained CPUs and MCUs.
Turbo-Softmax is a high-precision, yet blazing fast Softmax implementation in C. It targets MCU/embedded settings where hardware SIMD/FPU may be limited, and speedups must be achieved without large LUT memory.
Highlights
- Range reduction with IEEE-754 bit-level construction of
2^i - Fast
exp(t)via a 5th-order polynomial approximation - 4.0×–4.2× speedup over typical
math.himplementations (dims 16–1024) - Numerical stability: max error
< 1e-6, negligible KL divergence
Build (GCC/MinGW)
gcc -O3 -std=c11 -Wall -Wextra -pedantic example.c qsoftmax.c -lm -o example.exe
./example.exe