Turbo-Softmax

Fast high-precision Softmax kernels in C for resource-constrained CPUs and MCUs.

Turbo-Softmax is a high-precision, yet blazing fast Softmax implementation in C. It targets MCU/embedded settings where hardware SIMD/FPU may be limited, and speedups must be achieved without large LUT memory.

Highlights

  • Range reduction with IEEE-754 bit-level construction of 2^i
  • Fast exp(t) via a 5th-order polynomial approximation
  • 4.0×–4.2× speedup over typical math.h implementations (dims 16–1024)
  • Numerical stability: max error < 1e-6, negligible KL divergence

Build (GCC/MinGW)

gcc -O3 -std=c11 -Wall -Wextra -pedantic example.c qsoftmax.c -lm -o example.exe
./example.exe