Armv7 Neon Zip

VST1.16 D0, D1, [r0]! @ Store the 8 interleaved result (Q0+Q1 concept)

@ We need to process 8 elements at a time (4 from A, 4 from B) @ Loop unrolling is implied for simplicity LOOP: CMP r3, #4 BLT END armv7 neon zip

// C Prototype: void interleave_u16(uint16_t *dst, uint16_t *srcA, uint16_t *srcB, int count); // ARMv7 Assembly Implementation interleave_u16: @ r0 = dst, r1 = srcA, r2 = srcB, r3 = count r1 = srcA

Warning
// // // //