: Use T* and restrict in prototypes and __builtin_assume_aligned inside function bodies
: Replace _mm256_dp_pd with _mm256_hsum_pd
(JEM): Figure out how to expose msignal across DLL interface
: Consider alias to __m128*
: Consider introducing w = distuv[0]
: Add other OS'es
: Use alignas if present