Cudatoolkit 12.6 | __exclusive__

Here's an example CUDA program that demonstrates how to use CUDA to accelerate a simple matrix multiplication: