-
Sean Bell authored
Summary: This adds an exponential learning rate schedule, parameterized by `GAMMA` which specifies the ratio between the final and initial learning rate. This is a more natural parameterization than `A ^ t` or `exp(A * t)`. Since most jobs drop by ~3 orders of magnitude throughout training, typically `GAMMA = 1e-3` should work for many scenarios. Reviewed By: ashwinb, rbgirshick Differential Revision: D14654419 fbshipit-source-id: 8dabbc8df28d32e4fd748b61bc32efaf3935d244
e2aef3a7