Made the check in dnn_trainer for convergence more robust. Previously, if we

encountered a bad mini-batch that made the loss value suddenly jump up by a larger than normal value it could make the trainer think we converged. Now the test is robust to recent spikes in loss value.

Made the check in dnn_trainer for convergence more robust. Previously, if we
encountered a bad mini-batch that made the loss value suddenly jump up by a larger than normal value it could make the trainer think we converged. Now the test is robust to recent spikes in loss value.
e6614b50 · Davis King · 4d623597 · e6614b50
Commit e6614b50 authored Oct 01, 2016 by Davis King
Hide whitespace changes
Inline Side-by-side

Showing with 19 additions and 4 deletions

trainer.h dlib/dnn/trainer.h +19 -4

No files found.
--- a/dlib/dnn/trainer.h
+++ b/dlib/dnn/trainer.h
@@ -678,10 +678,25 @@ namespace dlib
                    steps_without_progress = count_steps_without_decrease(previous_loss_values);
                    if (steps_without_progress >= iter_without_progress_thresh)
                    {
-                        // optimization has flattened out, so drop the learning rate. 
+                        // Double check that we aren't seeing decrease.  This second check
-                        learning_rate = learning_rate_shrink*learning_rate;
+                        // discards the top 10% largest values and checks again.  We do
-                        steps_without_progress = 0;
+                        // this because sometimes a mini-batch might be bad and cause the
-                        previous_loss_values.clear();
+                        // loss to suddenly jump up, making count_steps_without_decrease()
+                        // return a large number.  But if we discard the top 10% of the
+                        // values in previous_loss_values when we are robust to that kind
+                        // of noise.  Another way of looking at it, if the reason
+                        // count_steps_without_decrease() returns a large value is only
+                        // because the most recent loss values have suddenly been large,
+                        // then we shouldn't stop or lower the learning rate.  We should
+                        // keep going until whatever disturbance we hit is damped down.  
+                        steps_without_progress = count_steps_without_decrease_robust(previous_loss_values);
+                        if (steps_without_progress >= iter_without_progress_thresh)
+                        {
+                            // optimization has flattened out, so drop the learning rate. 
+                            learning_rate = learning_rate_shrink*learning_rate;
+                            steps_without_progress = 0;
+                            previous_loss_values.clear();
+                        }
                    }
                }
                else if (lr_schedule.size() != 0) // or use the learning rate schedule if we have one.