Added the launch_kernel() function that launches a kernel by smartly picking
the number of threads and blocks rather than using the hard coded numbers I had in there. This makes some functions noticeably faster. Also added a dot() function that is fully asynchronous.
Showing
Please
register
or
sign in
to comment