Changed the trainer threading code to use dlib::thread_pool instead of
std::async() since std::async creates new threads with each invocation, which in turn causes objects with thread_local storage duration to be reconstructed each time. This is problematic because CUDA context objects for cublas and cudnn get reconstructed over and over, slowing things down and generally using more resources than should be used.
Showing
Please
register
or
sign in
to comment