This follows the [scheduling rules from Detectron.](https://github.com/facebookresearch/Detectron/blob/master/configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml#L14-L30)
Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules),
and we have divided the learning rate by 8x.
We also changed the batch size during testing, but that is generally not necessary because testing
requires much less memory than training.
### Multi-GPU training
We use internally `torch.distributed.launch` in order to launch